使用 SciPy 庫實現糖尿病資料集的 K 均值聚類
我們將在此使用的皮馬印第安人糖尿病資料集最初來自國家糖尿病、消化和腎臟疾病研究所。根據以下診斷因素,該資料集可用於將患者置於糖尿病叢集或非糖尿病叢集中 −
懷孕
葡萄糖
血壓
皮膚厚度
胰島素
BMI
糖尿病譜系函式
年齡
您可以在 Kaggle 網站上獲取 .CSV 格式的此資料集。
示例
以下示例將使用 SciPy 庫從皮馬印第安人糖尿病資料集中建立兩個叢集,即糖尿病和非糖尿病。
#importing the required Python libraries:
import matplotlib.pyplot as plt
import numpy as np
from scipy.cluster.vq import whiten, kmeans, vq
#Loading the dataset:
dataset = np.loadtxt(r"{your path}\pima-indians-diabetes.csv", delimiter=",")
# Printing the data after excluding the outcome column
dataset = dataset[:, 0:8]
print("Data :
", dataset, "
")
#Normalizing the data:
dataset = whiten(dataset)
# generating code book by computing K-Means with K = 2 (2 clusters i.e., diabetic, and non-diabetic clusters)
centroids, mean_dist = kmeans(dataset, 2)
print("Code book :
", centroids, "
")
clusters, dist = vq(dataset, centroids)
print("Clusters :
", clusters, "
")
# forming cluster of non-diabetic patients
non_diabetic = list(clusters).count(0)
# forming cluster of diabetic patients
diabetic = list(clusters).count(1)
#Plotting the pie chart having clusters
x_axis = []
x_axis.append(diabetic)
x_axis.append(non_diabetic)
colors = ['red', 'green']
print("Total number of diabetic patients : " + str(x_axis[0]) + "
Total number non-diabetic patients : " + str(x_axis[1]))
y = ['diabetic', 'non-diabetic']
plt.pie(x_axis, labels=y, colors=colors, shadow='false')
plt.show()輸出
Data : [[ 6. 148. 72. ... 33.6 0.627 50. ] [ 1. 85. 66. ... 26.6 0.351 31. ] [ 8. 183. 64. ... 23.3 0.672 32. ] ... [ 5. 121. 72. ... 26.2 0.245 30. ] [ 1. 126. 60. ... 30.1 0.349 47. ] [ 1. 93. 70. ... 30.4 0.315 23. ]] Code book : [[2.08198148 4.17698255 3.96280983 1.04984582 0.56968574 4.13266474 1.40143319 3.86427413] [0.6114727 3.56175537 3.35245694 1.42268776 0.76239717 4.01974705 1.43848683 2.24399453]] Clusters : [0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 1] Total number of diabetic patients : 492 Total number non-diabetic patients : 276

廣告
資料結構
網路
RDBMS
作業系統
Java
iOS
HTML
CSS
Android
Python
C 程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP