- LightGBM 教程
- LightGBM - 首頁
- LightGBM - 概述
- LightGBM - 架構
- LightGBM - 安裝
- LightGBM - 核心引數
- LightGBM - Boosting 演算法
- LightGBM - 樹生長策略
- LightGBM - 資料集結構
- LightGBM - 二分類
- LightGBM - 迴歸
- LightGBM - 排序
- LightGBM - Python 實現
- LightGBM - 引數調優
- LightGBM - 繪圖功能
- LightGBM - 早停訓練
- LightGBM - 特徵互動約束
- LightGBM 與其他 Boosting 演算法對比
- LightGBM 有用資源
- LightGBM - 有用資源
- LightGBM - 討論
LightGBM - 早停訓練
早停訓練是一種方法,如果評估資料集上評估的評估指標在特定次數的迴圈後沒有改善,我們就會停止訓練。Lightgbm 的類似 sklearn 的估計器在 train() 和 fit() 方法中都有一個名為 early_stopping_rounds 的引數。此引數接受一個整數值,表示如果評估指標結果在一定輪數後沒有改善,則應停止訓練過程。
此引數接受一個整數值,表示如果評估指標結果在幾輪後沒有改善,則應終止訓練過程。
因此請記住,這需要一個評估資料集才能工作,因為它依賴於針對評估資料集評估的評估指標結果。
示例
在載入波士頓房價資料集之前,我們將首先匯入必要的庫。從 1.2 版本開始,Scikit-Learn 中不再提供此資料集,因此我們將使用 sklearn.datasets.load_boston() 複製該特徵。
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(boston.data, boston.target)
print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
train_dataset = lgb.Dataset(X_train, Y_train, feature_name=boston.feature_names.tolist())
test_dataset = lgb.Dataset(X_test, Y_test, feature_name=boston.feature_names.tolist())
booster = lgb.train({"objective": "regression", "verbosity": -1, "metric": "rmse"},
train_set=train_dataset, valid_sets=(test_dataset,),
early_stopping_rounds=5,
num_boost_round=100)
from sklearn.metrics import r2_score
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
# Display the R2 scores in the console
print("\nR2 Score on Test Set : %.2f"%r2_score(Y_test, test_preds))
print("R2 Score on Train Set : %.2f"%r2_score(Y_train, train_preds))
輸出
這將產生以下結果
Sizes of Train or Test Datasets: (404, 13) (102, 13) (404,) (102,) [1] valid_0's rmse: 9.10722 Training until validation scores don't improve for 5 rounds [2] valid_0's rmse: 8.46389 [3] valid_0's rmse: 7.93394 [4] valid_0's rmse: 7.43812 [5] valid_0's rmse: 7.01845 [6] valid_0's rmse: 6.68186 [7] valid_0's rmse: 6.43834 [8] valid_0's rmse: 6.17357 [9] valid_0's rmse: 5.96725 [10] valid_0's rmse: 5.74169 [11] valid_0's rmse: 5.55389 [12] valid_0's rmse: 5.38595 [13] valid_0's rmse: 5.24832 [14] valid_0's rmse: 5.13373 [15] valid_0's rmse: 5.0457 [16] valid_0's rmse: 4.96688 [17] valid_0's rmse: 4.87874 [18] valid_0's rmse: 4.8246 [19] valid_0's rmse: 4.75342 [20] valid_0's rmse: 4.69854 Did not meet early stopping. Best iteration is: [20] valid_0's rmse: 4.69854 R2 Score on Test Set: 0.81 R2 Score on Train Set: 0.97
此程式將乳腺癌資料集分成兩個部分,如訓練和測試。它訓練一個 LightGBM 模型來判斷腫瘤是危險的還是無害的,因此如果效能沒有改善,則提前停止。最後,它預測測試集和訓練集的結果,並計算模型的準確率。
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)
print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
booster = lgb.LGBMModel(objective="binary", n_estimators=100, metric="auc")
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),],
early_stopping_rounds=3)
from sklearn.metrics import accuracy_score
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]
train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]
# Display the accuracy results
print("\nAccuracy Score on Test Set : %.2f"%accuracy_score(Y_test, test_preds))
print("Accuracy Score on Train Set : %.2f"%accuracy_score(Y_train, train_preds))
輸出
這將導致以下結果
Sizes of Train or Test Datasets : (426, 30) (143, 30) (426,) (143,) [1] valid_0's auc: 0.986129 Training until validation scores don't improve for 3 rounds [2] valid_0's auc: 0.989355 [3] valid_0's auc: 0.988925 [4] valid_0's auc: 0.987097 [5] valid_0's auc: 0.990108 [6] valid_0's auc: 0.993011 [7] valid_0's auc: 0.993011 [8] valid_0's auc: 0.993441 [9] valid_0's auc: 0.993441 [10] valid_0's auc: 0.994194 [11] valid_0's auc: 0.994194 [12] valid_0's auc: 0.994194 [13] valid_0's auc: 0.994409 [14] valid_0's auc: 0.995914 [15] valid_0's auc: 0.996129 [16] valid_0's auc: 0.996989 [17] valid_0's auc: 0.996989 [18] valid_0's auc: 0.996344 [19] valid_0's auc: 0.997204 [20] valid_0's auc: 0.997419 [21] valid_0's auc: 0.997849 [22] valid_0's auc: 0.998065 [23] valid_0's auc: 0.997849 [24] valid_0's auc: 0.998065 [25] valid_0's auc: 0.997634 Early stopping, best iteration is: [22] valid_0's auc: 0.998065 Accuracy Score on Test Set : 0.97 Accuracy Score on Train Set : 0.98
如何透過“early_stopping()”回撥提前停止訓練?
LightGBM 實際上支援使用 early_stopping() 回撥機制進行早停訓練。我們可以將 early_stopping() 函式的輪數作為回撥引數傳遞給 train()/fit() 方法。回撥的使用如下所示:
from sklearn.model_selection import train_test_split
X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target)
print("Sizes of Train or Test Datasets : ", X_train.shape, X_test.shape, Y_train.shape, Y_test.shape)
booster = lgb.LGBMModel(objective="binary", n_estimators=100, metric="auc")
booster.fit(X_train, Y_train,
eval_set=[(X_test, Y_test),],
callbacks=[lgb.early_stopping(3)]
)
from sklearn.metrics import accuracy_score
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
test_preds = [1 if pred > 0.5 else 0 for pred in test_preds]
train_preds = [1 if pred > 0.5 else 0 for pred in train_preds]
print("\nAccuracy Score on Test Set : %.2f"%accuracy_score(Y_test, test_preds))
print("Accuracy Score on Train Set : %.2f"%accuracy_score(Y_train, train_preds))
輸出
這將生成以下結果
Sizes of Train or Test Datasets : (426, 30) (143, 30) (426,) (143,) [1] valid_0's auc: 0.954328 Training until validation scores don't improve for 3 rounds [2] valid_0's auc: 0.959322 [3] valid_0's auc: 0.982938 [4] valid_0's auc: 0.988244 [5] valid_0's auc: 0.987203 [6] valid_0's auc: 0.98762 [7] valid_0's auc: 0.98814 Early stopping, best iteration is: [4] valid_0's auc: 0.988244 Accuracy Score on Test Set : 0.94 Accuracy Score on Train Set : 0.95
廣告