如何使用Python Scikit-learn構建樸素貝葉斯分類器？

Python Scikit-learn 伺服器端程式設計程式設計

基於貝葉斯定理的樸素貝葉斯分類是根據未知資料集預測類別的過程。Scikit-learn有三個樸素貝葉斯模型，分別是：

高斯樸素貝葉斯
伯努利樸素貝葉斯
多項式樸素貝葉斯

在本教程中，我們將學習使用Python Scikit-learn (Sklearn)構建高斯樸素貝葉斯和伯努利樸素貝葉斯分類器。

高斯樸素貝葉斯分類器

高斯樸素貝葉斯分類器基於以均值和方差為特徵的連續分佈。

讓我們透過一個例子看看如何使用Scikit-Learn Python ML庫來構建高斯樸素貝葉斯分類器。

在這個例子中，我們將使用高斯樸素貝葉斯模型，該模型假設每個標籤的資料都來自簡單的正態分佈。我們將使用的資料集是威斯康星州乳腺癌診斷資料庫。

示例

# Importing the necessary packages
import sklearn
from sklearn.datasets import load_breast_cancer

# Loading the dataset and organizing the data
DataSet = load_breast_cancer()
labelnames = DataSet['target_names']
labels = DataSet['target']
featurenames = DataSet['feature_names']
features = DataSet['data']

# Organizing dataset into training and testing set
# by using train_test_split() function
from sklearn.model_selection import train_test_split
train, test, train_labels, test_labels = train_test_split(features,labels,test_size = 0.30, random_state = 300)

# Model evaluation by using Naïve Bayes algorithm.
from sklearn.naive_bayes import GaussianNB

# Let's initializing the model:
NBclassifier = GaussianNB()

# Train the model:
NBmodel = NBclassifier.fit(train, train_labels)

# Making predictions by using pred() function:
NBpreds = NBclassifier.predict(test)
print("The predictions are:\n", NBpreds[:15])

# Finding accuracy of our Naive Bayes classifier:
from sklearn.metrics import accuracy_score
print("Accuracy of our classifier is:", accuracy_score(test_labels, NBpreds) *100)

輸出

它將產生以下輸出：

The predictions are:
[0 0 1 1 0 0 0 1 1 1 1 1 0 1 0]
Accuracy of our classifier is: 93.56725146198829

伯努利樸素貝葉斯分類器

伯努利樸素貝葉斯分類器是一種二元演算法。當我們需要檢查特徵是否存在時，它非常有用。

讓我們透過一個例子看看如何使用Scikit-Learn Python ML庫來構建伯努利樸素貝葉斯分類器。

示例

在下面的例子中，我們將使用scikit-learn python庫在一個虛擬資料集上實現伯努利樸素貝葉斯演算法。

from sklearn.datasets import make_classification
# Importing libraries
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt

# Creating the classification dataset with one informative feature and one cluster per class
nb_samples = 300
X, Y = make_classification(n_samples=nb_samples, n_features=2, n_informative=2, n_redundant=0)

# Plotting the dataset
plt.figure(figsize=(7.50, 3.50))
plt.subplots_adjust(bottom=0.05, top=0.9, left=0.05, right=0.95)
plt.subplot(111)
plt.scatter(X[:, 0], X[:, 1], marker="o", c=Y, s=40, edgecolor="k")
plt.show()

輸出

我們將獲得以下虛擬資料集：

示例

現在，讓我們在這個虛擬資料集上構建伯努利樸素貝葉斯分類器：

# Importing libraries
from sklearn.datasets import make_classification
import numpy as np

# Model evaluation by using Bernoulli Naïve Bayes algorithm.

# Import Bernoulli Naive bayes from sklearn
from sklearn.naive_bayes import BernoulliNB

# Organizing dataset into training and testing set
# by using train_test_split() function
from sklearn.model_selection import train_test_split

# Creating the classification dataset with one informative feature and one cluster per class
nb_samples = 300
X, Y = make_classification(n_samples=nb_samples, n_features=2, n_informative=2, n_redundant=0)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.30)

# Let's initializing the model
B_NaiveBayes = BernoulliNB(binarize=0.0)

# Train the model
B_NaiveBayes.fit(X_train, Y_train)

# Making predictions by using pred() function
data = np.array([[0, 0], [0, 1], [1, 0], [1, 1]])
Preds=B_NaiveBayes.predict(data)
print(Preds)

輸出

它將產生以下輸出：

array([0, 0, 1, 1])

Gaurav Leekha

更新於：2022年10月4日

3K+ 瀏覽量

開啟你的職業生涯

完成課程獲得認證

開始學習