SageMaker - 機器學習模型訓練

您可以使用 Amazon SageMaker 的完全託管訓練服務輕鬆訓練機器學習模型。

要訓練機器學習模型，您可以使用 SageMaker 的內建演算法，也可以使用我們自己的模型。在這兩種情況下，SageMaker 都允許您高效地執行訓練作業。

如何使用 Amazon SageMaker 訓練模型？

讓我們藉助下面的Python程式瞭解如何使用 SageMaker 訓練模型：

步驟 1：準備您的資料

首先，準備您的資料並將其以 CSV 格式或任何其他合適的格式儲存在 Amazon S3 中。Amazon SageMaker 從 S3 讀取資料用於訓練作業。

步驟 2：定義估算器

現在，您需要定義估算器。您可以使用 Estimator 物件配置訓練作業。對於此示例，我們將使用內建的 XGBoost 演算法訓練模型，如下所示：

import sagemaker
from sagemaker import get_execution_role
from sagemaker.inputs import TrainingInput

# Define your SageMaker session and role
session = sagemaker.Session()
role = get_execution_role()

# Define the XGBoost estimator
xgboost = sagemaker.estimator.Estimator(
    image_uri=sagemaker.image_uris.retrieve("xgboost", session.boto_region_name),
    role=role,
    instance_count=1,
    instance_type="ml.m4.xlarge",
    output_path=f"s3://your-bucket/output",
    sagemaker_session=session,
)

# Set hyperparameters
xgboost.set_hyperparameters(objective="binary:logistic", num_round=100)

步驟 3：指定訓練資料

我們需要指定訓練資料以進行進一步處理。您可以使用 TrainingInput 類指定資料在 S3 中的位置，如下所示：

# Specify training data in S3
train_input = TrainingInput
   (s3_data="s3://your-bucket/train", content_type="csv")
validation_input = TrainingInput
   (s3_data="s3://your-bucket/validation", content_type="csv")

步驟 4：訓練模型

最後，透過呼叫 fit 方法啟動訓練作業，如下所示：

# Train the model
xgboost.fit({"train": train_input, "validation": validation_input})

訓練完成後，SageMaker 將自動配置資源，執行訓練作業並將模型輸出儲存到指定的 S3 位置。

使用 SageMaker 進行分散式訓練

Amazon SageMaker 支援分散式訓練，使您可以跨多個例項擴充套件訓練。當您處理大型資料集或深度學習模型時，這非常有用。SageMaker 提供了支援分散式訓練的框架，例如 TensorFlow 和 PyTorch。

要啟用分散式訓練，您可以增加 Estimator 物件中的 instance_count 引數。

示例

下面是一個使用 TensorFlow 的示例：

from sagemaker.tensorflow import TensorFlow

# Define the TensorFlow estimator with distributed training
tensorflow_estimator = TensorFlow(
    entry_point="train.py",
    role=role,
    instance_count=2,
    instance_type="ml.p3.2xlarge",
    framework_version="2.3",
    py_version="py37",
)

# Train the model on multiple instances
tensorflow_estimator.fit({"train": train_input, "validation": validation_input})

在此示例中，SageMaker 使用兩個 ml.p3.2xlarge 例項進行分散式訓練。這將減少大型模型的訓練時間。

列印頁面