使用 SciPy 庫實現糖尿病資料集的 K 均值聚類


我們將在此使用的皮馬印第安人糖尿病資料集最初來自國家糖尿病、消化和腎臟疾病研究所。根據以下診斷因素,該資料集可用於將患者置於糖尿病叢集或非糖尿病叢集中 −

  • 懷孕

  • 葡萄糖

  • 血壓

  • 皮膚厚度

  • 胰島素

  • BMI

  • 糖尿病譜系函式

  • 年齡

您可以在 Kaggle 網站上獲取 .CSV 格式的此資料集。

示例

以下示例將使用 SciPy 庫從皮馬印第安人糖尿病資料集中建立兩個叢集,即糖尿病和非糖尿病。

#importing the required Python libraries:
import matplotlib.pyplot as plt
import numpy as np
from scipy.cluster.vq import whiten, kmeans, vq

#Loading the dataset:
dataset = np.loadtxt(r"{your path}\pima-indians-diabetes.csv", delimiter=",")

# Printing the data after excluding the outcome column
dataset = dataset[:, 0:8]
print("Data :
", dataset, "
") #Normalizing the data: dataset = whiten(dataset) # generating code book by computing K-Means with K = 2 (2 clusters i.e., diabetic, and non-diabetic clusters) centroids, mean_dist = kmeans(dataset, 2) print("Code book :
", centroids, "
") clusters, dist = vq(dataset, centroids) print("Clusters :
", clusters, "
") # forming cluster of non-diabetic patients non_diabetic = list(clusters).count(0) # forming cluster of diabetic patients diabetic = list(clusters).count(1) #Plotting the pie chart having clusters x_axis = [] x_axis.append(diabetic) x_axis.append(non_diabetic) colors = ['red', 'green'] print("Total number of diabetic patients : " + str(x_axis[0]) + "
Total number non-diabetic patients : " + str(x_axis[1])) y = ['diabetic', 'non-diabetic'] plt.pie(x_axis, labels=y, colors=colors, shadow='false') plt.show()

輸出

Data :
[[ 6. 148. 72. ... 33.6 0.627 50. ]
[ 1. 85. 66. ... 26.6 0.351 31. ]
[ 8. 183. 64. ... 23.3 0.672 32. ]
...
[ 5. 121. 72. ... 26.2 0.245 30. ]
[ 1. 126. 60. ... 30.1 0.349 47. ]
[ 1. 93. 70. ... 30.4 0.315 23. ]]

Code book :
[[2.08198148 4.17698255 3.96280983 1.04984582 0.56968574 4.13266474
1.40143319 3.86427413]
[0.6114727 3.56175537 3.35245694 1.42268776 0.76239717 4.01974705
1.43848683 2.24399453]]

Clusters :
[0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1
0
0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1
1 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1
0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0
0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0
1 0 1 1 1 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 1 1 1 1
0 0 1 1 0 1 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1
1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1
0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0
1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 0
1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0
1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 0 1
1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0
0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 1
0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1
0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 0
1 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1
0 0 0 0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 1 1 1 0 0
1 1 1 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1
0 1 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 1]

Total number of diabetic patients : 492
Total number non-diabetic patients : 276

更新時間:2021-12-14

719 瀏覽量

開啟你的 職業生涯

完成課程獲得認證

開始學習
廣告