如何在PyTorch中應用二維轉置卷積運算?


我們可以使用**torch.nn.ConvTranspose2d()**模組對包含多個輸入平面的輸入影像應用二維轉置卷積運算。此模組可以看作是**Conv2d**關於其輸入的梯度。

二維轉置卷積層的輸入大小必須為**[N,C,H,W]**,其中**N**是批大小,**C**是通道數,**H**和**W**分別是輸入影像的高度和寬度。

通常,二維轉置卷積運算應用於影像張量。對於RGB影像,通道數為3。轉置卷積運算的主要特徵是濾波器或核心大小和步幅。此模組支援**TensorFloat32**。

語法

torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size)

引數

  • **in_channels** – 輸入影像中的通道數。

  • **out_channels** – 轉置卷積運算產生的通道數。

  • **kernel_size** – 卷積核的大小。

除了以上三個引數外,還有一些可選引數,例如**stride、padding、dilation**等。我們將在下面的Python示例中詳細介紹這些引數。

步驟

您可以使用以下步驟應用二維轉置卷積運算:

  • 匯入所需的庫。在以下所有示例中,所需的Python庫是**torch**。確保您已經安裝它。要在影像上應用二維轉置卷積運算,我們還需要**torchvision**和**Pillow**。

import torch
import torchvision
from PIL import Image
  • 定義**輸入**張量或讀取輸入影像。如果輸入是影像,則我們首先將其轉換為torch張量。

  • 定義**in_channels、out_channels、kernel_size**和其他引數。

  • 接下來,透過將上述定義的引數傳遞給**torch.nn.ConvTranspose2d()**來定義轉置卷積運算convt。

convt = nn.ConvTranspose2d(in_channels, out_channels, kernel_size)
  • 將轉置卷積運算convt應用於輸入張量或影像張量。

output = convt(input)
  • 接下來列印轉置卷積運算後的張量。如果輸入是影像張量,則要視覺化影像,我們首先將轉置卷積運算後獲得的張量轉換為PIL影像,然後視覺化影像。

讓我們來看一些示例,以便更清楚地理解。

輸入影像

我們將在示例2中使用以下影像作為輸入檔案。

示例1

在下面的Python示例中,我們對輸入張量執行二維轉置卷積運算。我們應用**kernel_size、stride、padding**和**dilation**的不同組合。

# Python 3 program to perform 2D transpose convolution operation
import torch
import torch.nn as nn

'''torch.nn.ConvTranspose2d(in_channels, out_channels, kernel_size, stride=1, padding=0)

'''

in_channels = 2
out_channels = 3
kernel_size = 2

convt = nn.ConvTranspose2d(in_channels, out_channels, kernel_size)

# conv = nn.ConvTranspose2d(3, 6, 2)

'''input of size [N,C,H, W]
N==>batch size,
C==> number of channels,
H==> height of input planes in pixels,
W==> width in pixels.
'''

# define the input with below info
N=1
C=2
H=4
W=4
input = torch.empty(N,C,H,W).random_(256)
# input = torch.randn(2,3,32,64)
print("Input Tensor:
", input) print("Input Size:",input.size()) # Perform transpose convolution operation output = convt(input) print("Output Tensor:
", output) print("Output Size:",output.size()) # With square kernels (3,3) and equal stride convt = nn.ConvTranspose2d(2, 3, 3, stride=2) output = convt(input) print("Output Size:",output.size()) # non-square kernels and unequal stride and with padding convt = nn.ConvTranspose2d(2, 3, (3, 5), stride=(2, 1), padding=(4, 2)) output = convt(input) print("Output Size:",output.size()) # non-square kernels and unequal stride and with padding and dilation convt = nn.ConvTranspose2d(2, 3, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1)) output = convt(input) print("Output Size:",output.size())

輸出

Input Tensor:
   tensor([[[[115., 76., 102., 6.],
      [221., 173., 23., 205.],
      [123., 23., 112., 18.],
      [189., 178., 167., 143.]],

      [[239., 180., 226., 88.],
      [224., 30., 196., 224.],
      [ 57., 222., 47., 84.],
      [ 25., 255., 201., 114.]]]])
Input Size: torch.Size([1, 2, 4, 4])
Output Tensor:
   tensor([[[[ 48.1156, 64.6112, 64.9630, 47.2604, 3.9925],
      [74.9169, 80.7055, 138.8992, 82.8471, 54.3722],
      [20.0938, 49.5610, 30.2914, 93.3563, 3.1597],
      [-27.1410, 118.8138, 92.8670, 50.6170, 37.5564],
      [-27.7676, 6.5762, 33.6408, 6.7176, -8.8372]],
      [[ -18.2188, -56.5362, -49.8063, -43.3336, -16.8645],
      [ -23.4012, -6.1607, 40.5064, -17.4547, -25.1738],
      [ -5.7752, 53.6838, -27.9412, 36.7660, 44.0866],
      [ -23.5205, 1.1443, -29.0826, -34.7213, -4.1535],
      [ 5.6746, 38.4026, 72.8414, 59.2990, 34.9241]],
      [[ -35.0380, -31.4031, -38.0059, -19.3247, -5.6272],
      [-109.2401, -12.9763, -62.2776, -31.0825, 19.2766],
      [ -93.6596, -18.5403, -67.5457, -61.8533, 32.3005],
      [ -27.7020, -71.3938, -18.9532, -26.8304, 20.0184],
      [ -29.2334, -85.8179, -35.4292, -16.4065, 19.0788]]]],
   grad_fn=<SlowConvTranspose2DBackward>)
Output Size: torch.Size([1, 3, 5, 5])
Output Size: torch.Size([1, 3, 9, 9])
Output Size: torch.Size([1, 3, 1, 4])
Output Size: torch.Size([1, 3, 5, 4])

示例2

在下面的Python示例中,我們對輸入影像執行二維轉置卷積運算。為了應用二維轉置卷積,我們首先將影像轉換為torch張量,並在轉置卷積之後,再次將其轉換為PIL影像以進行視覺化。

# Python program to perform 2D transpose convolution operation
# Import the required libraries
import torch
import torchvision
from PIL import Image
import torchvision.transforms as T

# Read input image
img = Image.open('car.jpg')

# convert the input image to torch tensor
img = T.ToTensor()(img)
print("Input image size:", img.size()) # size = [3, 464, 700]

# unsqueeze the image to make it 4D tensor
img = img.unsqueeze(0) # image size = [1, 3, 464, 700]

# define transpose convolution layer
# convt = nn.ConvTranspose2d(in_channels, out_channels, kernel_size)
convt = torch.nn.ConvTranspose2d(3, 3, 2)

# apply transpose convolution operation on image
img = convt(img)
# squeeze image to make it 3D
img = img.squeeze(0) # now image is again 3D
print("Output image size:",img.size())

# convert image to PIL image
img = T.ToPILImage()(img)

# display the image after convolution
img.show()

'''
Note: You may get different output image after the convolution operation
because the weights initialized may be different at different runs.
'''

輸出

Input image size: torch.Size([3, 464, 700])
Output image size: torch.Size([3, 465, 701])

請注意,由於**權重**和**偏差**的初始化,您可能會在每次執行後看到獲得的影像的一些變化。

更新於:2022年1月25日

2K+瀏覽量

啟動您的職業生涯

透過完成課程獲得認證

開始
廣告
© . All rights reserved.