Chainer - 建立神經網路

使用 Chainer **建立神經網路**是一個靈活且直觀的流程，它採用 **執行時定義** 的方法。這允許開發者在資料流經網路時動態構建和修改計算圖。Chainer 支援各種神經網路架構，從簡單的前饋網路到更復雜的結構，如迴圈神經網路或卷積神經網路。

透過啟用動態圖構建，Chainer 使得更容易實驗不同的網路設計，除錯問題並實現針對特定任務定製的高階模型。這種靈活性在研究和開發中特別有價值，因為快速原型設計和迭代至關重要。

在 Chainer 中建立神經網路的步驟

讓我們詳細瞭解如何使用 Chainer 構建、訓練和測試一個簡單的前饋神經網路。這些步驟突出了 Chainer 執行時定義方法的靈活性和簡單性，透過簡化對不同網路架構和訓練方法的實驗。

安裝 Chainer

在開始使用 Chainer 建立神經網路之前，我們應該確保 **Chainer** 已安裝在我們的工作環境中。我們可以使用以下程式碼透過 pip 安裝它：

pip install Chainer

匯入所需的庫

在我們的工作環境中安裝 Chainer 後，我們需要從 Chainer 中匯入所有必要的元件，例如 Chain、Variable、最佳化器以及啟用函式和損失計算的函式。

import chainer
import chainer.functions as F
import chainer.links as L
from chainer import Chain, optimizers, Variable
import numpy as np

定義神經網路

在此步驟中，我們將定義一個具有兩個隱藏層的神經網路。每個層都將使用 ReLU 啟用函式，輸出層將使用 sigmoid 函式，因為這是一個二元分類任務。

以下是定義神經網路的程式碼：

class SimpleNN(Chain):
   def __init__(self):
      super(SimpleNN, self).__init__()
      with self.init_scope():
         self.l1 = L.Linear(None, 10)  # Input to hidden layer 1
         self.l2 = L.Linear(10, 10)    # Hidden layer 1 to hidden layer 2
         self.l3 = L.Linear(10, 1)     # Hidden layer 2 to output layer

   def forward(self, x):
      h1 = F.relu(self.l1(x))
      h2 = F.relu(self.l2(h1))
      y = F.sigmoid(self.l3(h2))  # Sigmoid activation for binary classification
      return y

建立模型和最佳化器

接下來，我們必須初始化模型並選擇一個最佳化器。在這裡，我們使用 **Adam 最佳化器**。以下是程式碼：

# Instantiate the model and optimizer
model = SimpleNN()
optimizer = optimizers.Adam()
optimizer.setup(model)

準備資料

為了更好地理解，我們將建立一些虛擬資料。通常情況下，我們會在此時載入我們的資料集。

# Generate example data
X_train = np.random.rand(100, 5).astype(np.float32)  # 100 samples, 5 features
y_train = np.random.randint(0, 2, size=(100, 1)).astype(np.int32)  # 100 binary labels (integers)

訓練網路

在此步驟中，我們在 Chainer 中執行手動訓練迴圈，展示瞭如何透過以小批次迭代資料集來訓練神經網路，執行前向傳遞以進行預測，計算損失，然後使用反向傳播更新模型的權重。迴圈執行指定數量的 epoch，並列印每個 epoch 的損失以跟蹤模型的訓練進度。

我們將使用一個簡單的迴圈來訓練網路。對於每個 epoch，我們將執行以下步驟：

前向傳遞
計算損失
反向傳播（梯度計算）
更新權重

以下是執行 Chainer 中簡單手動訓練迴圈的程式碼：

n_epochs = 10
batch_size = 10

for epoch in range(n_epochs):
   for i in range(0, len(X_train), batch_size):
      x_batch = Variable(X_train[i:i+batch_size])
      y_batch = Variable(y_train[i:i+batch_size])

      # Forward pass
      y_pred = model.forward(x_batch)

      # Debugging: Print shapes and types
      print(f"x_batch shape: {x_batch.shape}, type: {x_batch.dtype}")
      print(f"y_batch shape: {y_batch.shape}, type: {y_batch.dtype}")
      print(f"y_pred shape: {y_pred.shape}, type: {y_pred.dtype}")

      # Ensure y_pred and y_batch have the same shape
      if y_pred.shape != y_batch.shape:
         y_pred = F.reshape(y_pred, y_batch.shape)

      # Compute loss
      loss = F.sigmoid_cross_entropy(y_pred, y_batch)

      # Backward pass and weight update
      model.cleargrads()
      loss.backward()
      optimizer.update()

   print(f'Epoch {epoch+1}, Loss: {loss.array}')

測試模型

訓練後，我們需要在新的資料上測試模型。以下是我們在 Chainer 中進行測試的方法。以下是程式碼：

# Test the model
X_test = np.random.rand(10, 5).astype(np.float32)  # 10 samples, 5 features
y_test = model.forward(Variable(X_test))
print("Predictions:", y_test.data)

儲存和載入模型

我們可以將訓練好的模型儲存到檔案中，並在以後載入以進行推理，如下所示：

# Save the model
chainer.serializers.save_npz('simple_nn.model', model)

# Load the model
chainer.serializers.load_npz('simple_nn.model', model)

現在讓我們將上面提到的所有步驟組合成一個，並檢視在 Chainer 中建立的神經網路的結果：

import chainer
import chainer.functions as F
import chainer.links as L
from chainer import Chain, optimizers, Variable
import numpy as np

class SimpleNN(Chain):
   def __init__(self):
      super(SimpleNN, self).__init__()
      with self.init_scope():
         self.l1 = L.Linear(None, 10)  # Input to hidden layer 1
         self.l2 = L.Linear(10, 10)   # Hidden layer 1 to hidden layer 2
         self.l3 = L.Linear(10, 1)    # Hidden layer 2 to output layer

   def forward(self, x):
      h1 = F.relu(self.l1(x))
      h2 = F.relu(self.l2(h1))
      y = F.sigmoid(self.l3(h2))  # Sigmoid activation for binary classification
      return y

# Instantiate the model and optimizer
model = SimpleNN()
optimizer = optimizers.Adam()
optimizer.setup(model)

# Generate example data
X_train = np.random.rand(100, 5).astype(np.float32)  # 100 samples, 5 features
y_train = np.random.randint(0, 2, size=(100, 1)).astype(np.int32)  # 100 binary labels (integers)

n_epochs = 10
batch_size = 10

for epoch in range(n_epochs):
   for i in range(0, len(X_train), batch_size):
      x_batch = Variable(X_train[i:i+batch_size])
      y_batch = Variable(y_train[i:i+batch_size])

      # Forward pass
      y_pred = model.forward(x_batch)

      # Debugging: Print shapes and types
      print(f"x_batch shape: {x_batch.shape}, type: {x_batch.dtype}")
      print(f"y_batch shape: {y_batch.shape}, type: {y_batch.dtype}")
      print(f"y_pred shape: {y_pred.shape}, type: {y_pred.dtype}")

      # Ensure y_pred and y_batch have the same shape
      if y_pred.shape != y_batch.shape:
         y_pred = F.reshape(y_pred, y_batch.shape)

      # Compute loss
      loss = F.sigmoid_cross_entropy(y_pred, y_batch)

      # Backward pass and weight update
      model.cleargrads()
      loss.backward()
      optimizer.update()

   print(f'Epoch {epoch+1}, Loss: {loss.array}')

# Test the model
X_test = np.random.rand(10, 5).astype(np.float32)  # 10 samples, 5 features
y_test = model.forward(Variable(X_test))
print("Predictions:", y_test.data)

# Save the model
chainer.serializers.save_npz('simple_nn.model', model)

# Load the model
chainer.serializers.load_npz('simple_nn.model', model)

以下是使用 Chainer 框架建立的簡單神經網路的輸出：

x_batch shape: (10, 5), type: float32
y_batch shape: (10, 1), type: int32
y_pred shape: (10, 1), type: float32
x_batch shape: (10, 5), type: float32
y_batch shape: (10, 1), type: int32
y_pred shape: (10, 1), type: float32
x_batch shape: (10, 5), type: float32
y_batch shape: (10, 1), type: int32
y_pred shape: (10, 1), type: float32
x_batch shape: (10, 5), type: float32
y_batch shape: (10, 1), type: int32
------------------------------------
------------------------------------
------------------------------------
y_pred shape: (10, 1), type: float32
Epoch 10, Loss: 0.6381329298019409
Predictions: [[0.380848  ]
 [0.40808532]
 [0.35226226]
 [0.42560062]
 [0.3757095 ]
 [0.35753834]
 [0.38465175]
 [0.35967904]
 [0.37653774]
 [0.4149222 ]]

在本章中，我們演示了使用 Chainer 建立和訓練神經網路的基本工作流程。我們可以實驗不同的架構、最佳化器和超引數，以瞭解它們如何影響效能。

列印頁面