如何在 PyTorch 中應用 2D 卷積操作？

我們可以使用 **torch.nn.Conv2d()** 模組對由多個輸入平面組成的輸入影像應用 2D 卷積操作。它在卷積神經網路 (CNN) 中實現為一層。2D 卷積層的輸入必須為 **[N,C,H,W]** 大小，其中 **N** 是批次大小，C 是通道數，H 和 W 是輸入張量的高度和寬度。

通常，2D 卷積操作應用於影像張量。對於 RGB 影像，通道數為 3。卷積操作的主要特徵是濾波器或核心大小和步長。此模組支援 **TensorFloat32**。

語法

torch.nn.Conv2d(in_channels, out_channels, kernel_size)

引數

**in_channels** – 輸入影像中的通道數。
**out_channels** – 卷積操作產生的通道數。
**kernel_size** – 卷積核的大小。

除了以上三個引數外，還有一些可選引數，例如 **stride、padding、dilation** 等。我們將在以下示例中詳細介紹這些引數的示例。

步驟

您可以使用以下步驟應用 2D 卷積操作：

匯入所需的庫。在以下所有示例中，所需的 Python 庫是 **torch**。請確保您已安裝它。要在影像上應用 2D 卷積操作，我們還需要 **torchvision** 和 **Pillow**。

import torch
import torchvision
from PIL import Image

定義 **輸入** 張量或讀取輸入影像。如果輸入是影像，則我們首先將其轉換為 torch 張量。
定義 **in_channels、out_channels、kernel_size** 和其他引數。
接下來，透過將上述定義的引數傳遞給 **torch.nn.Conv2d()** 來定義卷積操作 conv。

conv = nn.Conv2d(in_channels, out_channels, kernel_size)

將卷積操作 conv 應用於輸入張量或影像張量。

output = conv(input)

接下來列印卷積操作後的張量。如果輸入是影像張量，則要視覺化影像，我們首先將卷積操作後獲得的張量轉換為 PIL 影像，然後視覺化影像。

讓我們舉幾個例子，以便更好地理解。

輸入影像

我們將在示例 2 中使用以下影像作為輸入檔案。

示例 1

在下面的 Python 示例中，我們對輸入張量執行 2D 卷積操作。我們應用了 **kernel_size、stride、padding** 和 **dilation** 的不同組合。

# Python 3 program to perform 2D convolution operation
import torch
import torch.nn as nn

'''torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0)
'''
in_channels = 2
out_channels = 3
kernel_size = 2
conv = nn.Conv2d(in_channels, out_channels, kernel_size)

# conv = nn.Conv2d(2, 3, 2)

'''input of size [N,C,H, W]
N==>batch size,
C==> number of channels,
H==> height of input planes in pixels,
W==> width in pixels.
'''

# define the input with below info
N=2
C=2
H=4
W=4
input = torch.empty(N,C,H,W).random_(256)
print("Input Tensor:
", input)
print("Input Size:",input.size())

# Perform convolution operation
output = conv(input)
print("Output Tensor:
", output)
print("Output Size:",output.size())

# With square kernels (2,2) and equal stride
conv = nn.Conv2d(2, 3, 2, stride=2)
output = conv(input)
print("Output Size:",output.size())

# non-square kernels and unequal stride and with padding
conv = nn.Conv2d(2, 3, (2, 3), stride=(2, 1), padding=(2, 1))
output = conv(input)
print("Output Size:",output.size())

# non-square kernels and unequal stride and with padding and dilation
conv = nn.Conv2d(2, 3, (2, 3), stride=(2, 1), padding=(2, 1),
dilation=(2, 1))
output = conv(input)
print("Output Size:",output.size())

輸出

Input Tensor:
   tensor([[[[218., 190., 62., 113.],
      [244., 63., 207., 220.],
      [238., 110., 29., 131.],
      [ 65., 249., 183., 188.]],

      [[122., 250., 28., 126.],
      [ 10., 42., 4., 145.],
      [ 1., 122., 165., 189.],
      [ 59., 100., 1., 187.]]],

      [[[213., 18., 186., 162.],
      [121., 10., 107., 123.],
      [ 32., 129., 5., 227.],
      [ 76., 4., 196., 246.]],

      [[ 41., 191., 64., 195.],
      [146., 163., 39., 177.],
      [121., 84., 223., 144.],
      [ 44., 182., 25., 15.]]]])
Input Size: torch.Size([2, 2, 4, 4])
Output Tensor:
   tensor([[[[ 200.8638, 67.4519, 109.4424],
      [ 100.6047, 58.4399, 95.3557],
      [ 89.4536, 105.6236, 138.5873]],

      [[ -71.7612, -69.3269, 14.8537],
      [ -48.7640, -111.0042, -163.9681],
      [ -60.4490, 0.4771, -34.4785]],

      [[ -74.8413, -156.2264, -51.3553],
      [ -47.2120, -25.1986, -65.1617],
      [-109.8461, -68.7073, -47.6045]]],

      [[[ 90.5058, 51.1314, 138.2387],
      [ 62.8581, 62.5389, 56.5713],
      [ 78.0566, 57.6294, 143.0357]],

      [[-154.6399, -100.9079, -108.6138],
      [ -99.6024, -120.7665, -112.6453],
      [-107.5664, -76.9361, 17.8084]],

      [[ 23.9299, -95.5887, -51.7418],
      [ -46.8106, 15.3651, -66.4384],
      [ 2.1374, -65.6986, -144.9656]]]],
   grad_fn=<MkldnnConvolutionBackward>)
Output Size: torch.Size([2, 3, 3, 3])
Output Size: torch.Size([2, 3, 2, 2])
Output Size: torch.Size([2, 3, 4, 4])
Output Size: torch.Size([2, 3, 3, 4])

示例 2

在下面的 Python 示例中，我們對輸入影像執行 2D 卷積操作。要應用 2D 卷積，我們首先將影像轉換為 torch 張量，並在卷積後再次將其轉換為 PIL 影像以進行視覺化。

# Python program to perform 2D convolution operation on an image
# Import the required libraries
import torch
import torchvision
from PIL import Image
import torchvision.transforms as T

# Read input image
img = Image.open('dogncat.jpg')

# convert the input image to torch tensor
img = T.ToTensor()(img)
print("Input image size:
", img.size()) # size = [3, 466, 700]

# unsqueeze the image to make it 4D tensor
img = img.unsqueeze(0) # image size = [1, 3, 466, 700]
# define convolution layer
# conv = nn.Conv2d(in_channels, out_channels, kernel_size)
conv = torch.nn.Conv2d(3, 3, 2)

# apply convolution operation on image
img = conv(img)

# squeeze image to make it 3D
img = img.squeeze(0) #now size is again [3, 466, 700]

# convert image to PIL image
img = T.ToPILImage()(img)

# display the image after convolution
img.show()

**注意** - 由於初始化的權重在不同的執行中可能不同，因此您可能會在卷積操作後獲得不同的輸出影像。

輸出

Input image size: torch.Size([3, 525, 700])
Output image size: torch.Size([3, 524, 699])

請注意，由於 **權重** 和 **偏差** 的初始化，您可能會在每次執行後看到獲得的影像發生一些變化。

Shahid Akhtar Khan

更新於: 2022年1月25日

7K+ 瀏覽量

開啟你的職業生涯

透過完成課程獲得認證

開始學習

如何在 PyTorch 中應用 2D 卷積操作？

語法

引數

步驟

輸入影像

示例 1

輸出

示例 2

輸出

開啟你的 職業生涯

開啟你的職業生涯