如何使用Python Scikit-learn構建樸素貝葉斯分類器?
基於貝葉斯定理的樸素貝葉斯分類是根據未知資料集預測類別的過程。Scikit-learn有三個樸素貝葉斯模型,分別是:
- 高斯樸素貝葉斯
- 伯努利樸素貝葉斯
- 多項式樸素貝葉斯
在本教程中,我們將學習使用Python Scikit-learn (Sklearn)構建高斯樸素貝葉斯和伯努利樸素貝葉斯分類器。
高斯樸素貝葉斯分類器
高斯樸素貝葉斯分類器基於以均值和方差為特徵的連續分佈。
讓我們透過一個例子看看如何使用Scikit-Learn Python ML庫來構建高斯樸素貝葉斯分類器。
在這個例子中,我們將使用高斯樸素貝葉斯模型,該模型假設每個標籤的資料都來自簡單的正態分佈。我們將使用的資料集是威斯康星州乳腺癌診斷資料庫。
示例
# Importing the necessary packages import sklearn from sklearn.datasets import load_breast_cancer # Loading the dataset and organizing the data DataSet = load_breast_cancer() labelnames = DataSet['target_names'] labels = DataSet['target'] featurenames = DataSet['feature_names'] features = DataSet['data'] # Organizing dataset into training and testing set # by using train_test_split() function from sklearn.model_selection import train_test_split train, test, train_labels, test_labels = train_test_split(features,labels,test_size = 0.30, random_state = 300) # Model evaluation by using Naïve Bayes algorithm. from sklearn.naive_bayes import GaussianNB # Let's initializing the model: NBclassifier = GaussianNB() # Train the model: NBmodel = NBclassifier.fit(train, train_labels) # Making predictions by using pred() function: NBpreds = NBclassifier.predict(test) print("The predictions are:\n", NBpreds[:15]) # Finding accuracy of our Naive Bayes classifier: from sklearn.metrics import accuracy_score print("Accuracy of our classifier is:", accuracy_score(test_labels, NBpreds) *100)
輸出
它將產生以下輸出:
The predictions are: [0 0 1 1 0 0 0 1 1 1 1 1 0 1 0] Accuracy of our classifier is: 93.56725146198829
伯努利樸素貝葉斯分類器
伯努利樸素貝葉斯分類器是一種二元演算法。當我們需要檢查特徵是否存在時,它非常有用。
讓我們透過一個例子看看如何使用Scikit-Learn Python ML庫來構建伯努利樸素貝葉斯分類器。
示例
在下面的例子中,我們將使用scikit-learn python庫在一個虛擬資料集上實現伯努利樸素貝葉斯演算法。
from sklearn.datasets import make_classification # Importing libraries from sklearn.datasets import make_classification import matplotlib.pyplot as plt # Creating the classification dataset with one informative feature and one cluster per class nb_samples = 300 X, Y = make_classification(n_samples=nb_samples, n_features=2, n_informative=2, n_redundant=0) # Plotting the dataset plt.figure(figsize=(7.50, 3.50)) plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95) plt.subplot(111) plt.scatter(X[:, 0], X[:, 1], marker="o", c=Y, s=40, edgecolor="k") plt.show()
輸出
我們將獲得以下虛擬資料集:

示例
現在,讓我們在這個虛擬資料集上構建伯努利樸素貝葉斯分類器:
# Importing libraries from sklearn.datasets import make_classification import numpy as np # Model evaluation by using Bernoulli Naïve Bayes algorithm. # Import Bernoulli Naive bayes from sklearn from sklearn.naive_bayes import BernoulliNB # Organizing dataset into training and testing set # by using train_test_split() function from sklearn.model_selection import train_test_split # Creating the classification dataset with one informative feature and one cluster per class nb_samples = 300 X, Y = make_classification(n_samples=nb_samples, n_features=2, n_informative=2, n_redundant=0) X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.30) # Let's initializing the model B_NaiveBayes = BernoulliNB(binarize=0.0) # Train the model B_NaiveBayes.fit(X_train, Y_train) # Making predictions by using pred() function data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]]) Preds=B_NaiveBayes.predict(data) print(Preds)
輸出
它將產生以下輸出:
array([0, 0, 1, 1])
廣告
資料結構
網路
關係資料庫管理系統(RDBMS)
作業系統
Java
iOS
HTML
CSS
Android
Python
C語言程式設計
C++
C#
MongoDB
MySQL
Javascript
PHP