Kmeans.py Kmeans的Python实现

大小: 10KB

文件类型: .py

金币: 1

下载: 2 次

发布日期: 2021-11-17
语言: Python
标签: Kmeans python

高速下载

资源简介

K-Means算法是典型的基于距离的聚类算法，其中k代表类簇个数，means代表类簇内数据对象的均值（这种均值是一种对类簇中心的描述），因此，K-Means算法又称为k-均值算法。K-Means算法是一种基于划分的聚类算法，以距离作为数据对象间相似性度量的标准，即数据对象间的距离越小，则它们的相似性越高，则它们越有可能在同一个类簇。数据对象间距离的计算有很多种，k-means算法通常采用欧氏距离来计算数据对象间的距离。该算法认为簇是由距离靠近的对象组成的，因此把得到紧凑且独立的簇作为最终目标。

资源截图

小图大图

代码片段和文件信息

import numpy as np
import matplotlib.pyplot as plt

# 先在四个中心点附近产生一堆数据
real_center = [（1 1） （1 2） （2 2） （2 1）]
point_number = 50

points_x = []
points_y = []

for center in real_center:
    offset_x offset_y = np.random.randn（point_number） * 0.3 np.random.randn（point_number） * 0.25
    x_val y_val = center[0] + offset_x center[1] + offset_y

    points_x.append（x_val）
    points_y.append（y_val）

points_x = np.concatenate（points_x） # 将二维数组拼接成一个list
points_y = np.concatenate（points_y）

# 绘制点图
plt.scatter（points_x points_y color=‘green‘ marker=‘+‘）
# 绘制中心点
center_x center_y = zip（*real_center）
plt.scatter（center_x center_y color=‘red‘ marker=‘^‘）
plt.xlim（0 3）
plt.ylim（0 3）
plt.show（）

#=================================================================================
# 第一步，随机选择 K 个点
K = 4
p_list = np.stack（[points_x points_y] axis=1） #用np.stack将points_x和points_y拼接，变成（xy）的坐标形式   p_list.shape（2002）
index = np.random.choice（len（p_list） size=K） #在p_list中随机选择K个点的序列号
centeroid = p_list[index] #取出指定序列号的点的坐标

# 以下是画图部分
for p in centeroid:
    plt.scatter（p[0] p[1] marker=‘^‘）

plt.xlim（0 3）
plt.ylim（0 3）
plt.show（）

#===============================================================================
# 第二步，遍历所有点 P，将 P 放入最近的聚类中心的集合中
points_set = {key: [] for key in range（K）}
# print（“前“points_set）
for p in p_list:
    # print（“center:“centeroid）
    # print（“循环出来的p：“p）
    #判断这个点离哪一个中心点近，近则记录下index。然后在对应index的集合中加入该点
    nearest_index = np.argmin（np.sum（（centeroid - p） ** 2 axis=1） ** 0.5） #np.argmin返回（距离）最小值的下标，参数axis=1：按行方向求和
    points_set[nearest_index].append（p）
    # point_set = {0:[（[x1y1]）（[x2y2]）......]1:[]......}

# 以下是画图部分
for k_index p_set in points_set.items（）:
    p_xs = [p[0] for p in p_set]
    p_ys = [p[1] for p in p_set]
    plt.scatter（p_xs p_ys color=‘C{}‘.format（k_index））

for ix p in enumerate（centeroid）:
    plt.scatter（p[0] p[1] color=‘C{}‘.format（ix） marker=‘^‘ edgecolor=‘black‘ s=128）

plt.xlim（0 3）
plt.ylim（0 3）
plt.show（）

#===============================================================================
# 第三步，遍历每一个点集，计算新的聚类中心
for k_index p_set in points_set.items（）:
    # print（k_indexp_set）
    p_xs = [p[0] for p in p_set]
    p_ys = [p[1] for p in p_set]
    # print（p_xs）
    # print（p_ys）
    centeroid[k_index 0] = sum（p_xs） / len（p_set）
    centeroid[k_index 1] = sum（p_ys） / len（p_set）

# 0 [array（[1.6062368 0.7195094]）
# array（[1.5209016  0.79879955]）
# array（[1.58217716 0.6752188 ]）
# array（[1.47984399 0.77579545]）
# array（[1.56584005 0.72633556]）
# array（[1.36814733 0.85458767]）
# array（[1.47112209 1.26962527]）
# array（[1.39860841 0.88106941]）
# array（[1.42538269 1.00176483]）
# array（[1.8285598  0.63840008]）
# array（[1.59278399 0.87566845]）
# array（[1.75997827 1.58673337]）
# array（[2.40024409 1.43842766]）
# array（[1.63783343 1.04777846]）
# array（[2.27202361

上一篇：主成分分析（PCA）python实现（含数据集）
下一篇：朴素贝叶斯算法实现的文本分类_Python

共有条评论

Kmeans.py Kmeans的Python实现

资源简介

资源截图

代码片段和文件信息

评论

相关资源