如何在 PyTorch 中在影像上繪製邊界框？

torchvision.utils 包提供了 draw_bounding_boxes() 函式，用於在影像上繪製邊界框。它支援形狀為 (C x H x W) 的 torch 張量型別的影像，其中 C 是通道數，W 和 H 分別是影像的寬度和高度。

如果我們使用 Pillow 或 OpenCV 讀取影像，則需要先將其轉換為 torch 張量。我們可以在影像上繪製一個或多個邊界框。此函式返回一個 dtype 為 uint8 的影像張量，其中繪製了邊界框。

邊界框應為大小為 [N,4] 的 torch 張量，其中 N 是要繪製的邊界框的數量。每個邊界框應包含 (xmin, ymin, xmax, ymax) 格式的四個點。換句話說：0 ≤ xmin < xmax < W，且 0 ≤ ymin < ymax < H。

我們也可以在邊界框上新增標籤。我們可以調整邊界框的顏色和寬度。此外，我們還可以使用指定的顏色填充邊界框區域。

語法

torch.utils.draw_bounding_boxes(image, boxes)

引數

image - 形狀為 (C x H x W) 的張量型別影像。
boxes - 大小為 [N,4] 的張量，包含 (xmin, ymin, xmax, ymax) 格式的邊界框座標。

它還接受更多可選引數，例如 labels、colors、fill、width 等。

輸出

它返回一個大小為 [C,H,W] 的影像張量，其中繪製了邊界框。

步驟

匯入所需的庫。在以下所有示例中，所需的 Python 庫為 torch 和 torchvision。確保您已安裝它們。

import torch
import torchvision
from torchvision.io import read_image
from torchvision.utils import draw_bounding_boxes

使用 image_read() 函式讀取 JPEG 或 PNG 影像。使用影像型別 (.jpg 或 .png) 指定完整的影像路徑。此函式的輸出是一個大小為 [image_channels, image_height, image_width] 的 torch 張量。

img = read_image('cat.png')

將邊界框定義為 torch 張量。邊界框張量的 dtype 應為 torch.int。如果只需要繪製一個邊界框，則對張量進行 unsqueeze 操作。

bbox = [290, 115, 405, 385]
bbox = torch.tensor(bbox, dtype=torch.int)

使用 draw_bounding_boxes() 函式在影像上繪製邊界框。可以選擇將繪製了邊界框的影像分配給一個新變數。

img=draw_bounding_boxes(img, bbox, width=3, colors=(255,255,0))

將繪製了邊界框的影像張量轉換為 PIL 影像並顯示它。

img = torchvision.transforms.ToPILImage()(img)
img.show()

輸入影像

我們將在以下示例中使用這些影像作為輸入檔案。

示例 1

以下程式演示瞭如何在影像上繪製邊界框。

# Import the required libraries
import torch
import torchvision
from torchvision.io import read_image
from torchvision.utils import draw_bounding_boxes

# read input image
img = read_image('cat.png')

# bounding box in (xmin, ymin, xmax, ymax) format
# top-left point=(xmin, ymin), bottom-right point = (xmax, ymax)
bbox = [290, 115, 405, 385]
bbox = torch.tensor(bbox, dtype=torch.int)
print(bbox)
print(bbox.size())
bbox = bbox.unsqueeze(0)
print(bbox.size())

# draw bounding box on the input image
img=draw_bounding_boxes(img, bbox, width=3, colors=(255,255,0))

# transform it to PIL image and display
img = torchvision.transforms.ToPILImage()(img)
img.show()

輸出

tensor([290, 115, 405, 385], dtype=torch.int32)
torch.Size([4])
torch.Size([1, 4])

示例 2

以下程式演示瞭如何在影像上繪製多個邊界框。

import torch
import torchvision
from torchvision.io import read_image
from torchvision.utils import draw_bounding_boxes

img = read_image('catndog.png')

# bounding box in (xmin, ymin, xmax, ymax) format
bbox1 = [30, 45, 330, 450]
bbox2 = [320, 150, 690, 460]
bbox = [bbox1, bbox2]
bbox = torch.tensor(bbox, dtype=torch.int)
print(bbox)
print(bbox.size())

# draw bounding boxes on the input image
img=draw_bounding_boxes(img, bbox, width=3,
colors=[(255,0,0),(0,255,0)])
img = torchvision.transforms.ToPILImage()(img)
img.show()

輸出

tensor([[ 30, 45, 330, 450],
   [320, 150, 690, 460]], dtype=torch.int32)
torch.Size([2, 4])

示例 3

以下程式演示瞭如何在影像上繪製和填充多個邊界框。

import torch
import torchvision
from torchvision.io import read_image
from torchvision.utils import draw_bounding_boxes
img = read_image('catndog.png')

# bounding box in (xmin, ymin, xmax, ymax) format
bbox1 = [30, 45, 330, 450]
bbox2 = [320, 150, 690, 460]
bbox = [bbox1, bbox2]
labels = ['Cat', 'Dog']
bbox = torch.tensor(bbox, dtype=torch.int)
print(bbox)
print(bbox.size())

# draw bounding boxes with fill color
img=draw_bounding_boxes(img, bbox,width=3,labels= labels,colors=[(255,0,0),(0,255,0)],fill =True,font_size=20)
img = torchvision.transforms.ToPILImage()(img)
img.show()

輸出

tensor([[ 30, 45, 330, 450],
   [320, 150, 690, 460]], dtype=torch.int32)
torch.Size([2, 4])

Shahid Akhtar Khan

更新於: 2022年1月20日

7K+ 次瀏覽

開啟你的職業生涯

透過完成課程獲得認證

開始學習