XGBoost - 分類

XGBoost 最常見的用途之一是分類。它根據輸入特徵預測離散的類別標籤。分類是使用 XGBClassifier 模組進行的，該模組專門用於處理分類任務。

XGBClassifier 語法

為了提高效能，我們可以調整 XGBoost 中 XGBClassifier 類的超引數。構建 XGBoost 分類器的基本語法如下所示：

model = xgb.XGBClassifier(
    objective='multi:softprob',
    num_class=num_classes,      
    max_depth=max_depth,       
    learning_rate=learning_rate,
    subsample=subsample,        
    colsample_bytree=colsample, 
    n_estimators=num_estimators
)

以下是 XGBClassifier 語法中使用的超引數的描述：

objective='multi:softprob' - 它是目標引數，對於多類分類是可選的，並返回每個類的機率分數。對於二分類，預設值為 'binary:logistic'。
num_class=num_classes - 它是多類分類任務所需的，顯示資料集中存在的類別數量。
max_depth=max_depth - 它是可選引數，顯示每棵決策樹的最大深度。
learning_rate=learning_rate - 它是可選引數，其中步長收縮避免過擬合。
subsample=subsample - 它是可選引數，顯示每棵樹使用的樣本分數。
colsample_bytree=colsample - 也是可選引數，顯示每棵樹使用的特徵分數。
n_estimators=num_estimators - 它是必需引數，用於查詢提升迭代次數並處理模型的整體複雜度。

XGBoost 分類示例

鳶尾花資料集是機器學習中非常流行的資料集。它包含 150 個鳶尾花示例，每個示例都有四個測量值，需要將三種鳶尾花物種分類。

讓我們使用鳶尾花資料集來演示如何使用 XGBoost 庫進行分類。

   import xgboost as xgb
   from sklearn.datasets import load_iris
   from sklearn.model_selection import train_test_split
   from sklearn.metrics import accuracy_score, classification_report

   # Load the Iris dataset
   data = load_iris()
   X, y = data.data, data.target

   # Split the data into training and test sets
   X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

   #Create an XGBoost classifier
   model = xgb.XGBClassifier()

   #Train the model on the training data
   model.fit(X_train, y_train)

   #Make predictions on the test set
   predictions = model.predict(X_test)

   #Calculate accuracy
   accuracy = accuracy_score(y_test, predictions)

   print("Model's Accuracy is:", accuracy)
   print("\nModel's Classification Report is:")
   print(classification_report(y_test, predictions, target_names=data.target_names))

輸出

這將導致以下結果：

Model's Accuracy is: 1.0

Model's Classification Report is:
              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

總結

XGBoost 是一種強大的機器學習工具，尤其適用於分類任務。由於它速度快且具有有助於防止過擬合的功能，因此在許多情況下都能很好地工作。例如，我們使用 XGBoost 將鳶尾花分類到不同的型別中，實現了 1.0 的完美準確率。它的靈活性和效率使 XGBoost 成為許多現實生活中分類問題的絕佳選擇。

列印頁面