如何使用 Python Scikit-learn 建立隨機森林分類器？

Python Scikit-learn 伺服器端程式設計程式設計

隨機森林是一種監督式機器學習演算法，透過在資料樣本上建立決策樹來用於分類、迴歸和其他任務。建立決策樹後，隨機森林分類器收集每個決策樹的預測結果，並透過投票的方式選擇最佳解決方案。

隨機森林分類器的最大優勢之一是它透過平均結果來減少過擬合。這就是為什麼與單個決策樹相比，我們能獲得更好結果的原因。

建立隨機森林分類器的步驟

我們可以按照以下步驟使用 Python Scikit-learn 建立隨機森林分類器：

步驟 1 - 匯入所需的庫。

步驟 2 - 載入資料集。

步驟 3 - 將資料集劃分為訓練集和測試集。

步驟 4 - 從 sklearn.ensemble 模組匯入隨機森林分類器。

步驟 5 - 建立資料集的資料框。

步驟 6 - 建立隨機森林分類器，並使用 fit() 函式訓練模型。

步驟 7 - 對測試集進行預測。

步驟 8 - 匯入指標以查詢分類器的準確性。

步驟 9 - 列印隨機森林分類器的準確性。

示例

在下面的示例中，我們將使用 Iris 植物資料集來構建隨機森林分類器。

# Import required libraries
import sklearn
import pandas as pd
from sklearn import datasets

# Load the iris dataset from sklearn
iris_clf = datasets.load_iris()
print(iris_clf.target_names)
print(iris_clf.feature_names)

# Dividing the datasets into training datasets and test datasets
X, y = datasets.load_iris( return_X_y = True)
from sklearn.model_selection import train_test_split

# 60 % training dataset and 40 % test datasets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.40)

# Import random forest classifier from sklearn assemble module
from sklearn.ensemble import RandomForestClassifier

# Create dataframe
data = pd.DataFrame({'sepallength': iris_clf.data[:, 0],
'sepalwidth': iris_clf.data[:, 1],
'petallength': iris_clf.data[:, 2],
'petalwidth': iris_clf.data[:, 3],
'species': iris_clf.target})

# Create a Random Forest classifier
RForest_clf = RandomForestClassifier(n_estimators = 100)

# Train the model on the training dataset by using fit() function
RForest_clf.fit(X_train, y_train)

# Predict from the test dataset
y_pred = RForest_clf.predict(X_test)

# Import metrics for accuracy calculation
from sklearn import metrics
print('\n'"Accuracy of our Random Forst Classifier is: ",
metrics.accuracy_score(y_test, y_pred)*100)

輸出

它將產生以下輸出：

['setosa' 'versicolor' 'virginica']
['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']

Accuracy of our Random Forst Classifier is: 95.0

讓我們使用我們的分類器來預測花卉型別：

# Predicting the type of flower
RForest_clf.predict([[5, 4, 3, 1]])

輸出

它將產生以下輸出：

array([1])

陣列([1]) 代表 versicolor 型別。

# Predicting the type of flower
RForest_clf.predict([[5, 4, 5, 2]])

輸出

它將產生以下輸出：

array([2])

這裡陣列([2]) 代表 virginica 型別。

Gaurav Leekha

更新於：2022年10月4日

1K+ 次檢視

啟動您的職業生涯

完成課程獲得認證

開始學習