使用遷移學習的多類別影像分類

簡介

基於影像資料的深度學習中最常見的任務之一是影像分類。由於新的高效能機器學習框架的開發，影像分類在研究領域變得越來越有趣。這種分類可以是二元分類，其中存在兩類影像，也可以是多類別分類，它處理超過兩類影像。在這裡，在本文中，我們將探索使用遷移學習進行多類別影像分類。

多類別影像分類

隨著人工神經網路的進步以及卷積神經網路的開發，對影像進行復雜操作變得容易，並促進了多類別影像分類、影像分割和影像檢測等任務的增長和發展。

多類別影像分類是使用 CNN 網路可以執行的最基本但功能強大的計算機視覺任務之一。在這種方法中，我們有多於兩類的影像，根據其類別進行標記（例如，CIFAR、Fashion MNIST）。

為了進行分類，我們可以準備我們自己的標記資料集，或者下載現有的影像資料集，如 CIFAR 10。如果每個類別的影像數量較少，預處理技術可能包括影像增強等任務，以增加影像資料集的變化。

為了訓練模型，我們可以使用任何深度學習框架（如 TensorFlow 或 Pytorch 等）從頭開始構建模型架構，或者使用現成的骨幹架構（如 VGG16、Resnet 等）。後者的優勢在於我們不必從頭開始構建架構，而只需要專注於微調模型或根據我們的用例更改最後 1 或 2 層。這就是遷移學習發揮作用的地方，它是一種非常直觀的訓練影像模型的技術。

什麼是遷移學習，為什麼它很重要？

遷移學習是機器學習領域的一個研究問題。它儲存解決一個問題時獲得的知識，並將其應用於另一個但相關的問題。例如，在學習識別貓時獲得的知識可以應用於嘗試識別獵豹時。在深度學習中，遷移學習是一種技術，其中神經網路模型首先在一個類似於正在解決的問題的問題上進行訓練。遷移學習具有縮短學習模型訓練時間的優勢，並且可以導致更低的泛化誤差。我們可以使用在其他資料集（如 ImageNet）上訓練的預訓練模型，並修改最後一層以滿足我們任務的目的。在這種情況下，我們可以節省訓練從頭開始構建模型的時間、精力和資源。這些經過訓練的模型擁有大量影像模式和資訊，這些資訊來自對已訓練影像的嚴格訓練。

程式碼實現

在這個例子中，我們將使用 CIFAR-10 資料集進行多類別分類。我們還將使用 VGG19 網路並對其進行修改以進行遷移學習。

使用的資料集

資料集是 CIFRAR 10。它來自加拿大高階研究所 (CIFAR)。它包含 60000 張 32×32 彩色影像，分為 10 類，每類 6000 張影像。這 10 個不同的類別代表飛機、汽車、鳥類、貓、鹿、狗、青蛙、馬、船和卡車。此資料集中有 50000 張訓練影像和 10000 張測試影像。該資料集可以從 Keras 中匯入。

使用 Keras API 實現

示例

import numpy as np
import pandas as pd
from sklearn.utils.multiclass import unique_labels
import os
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import itertools
from keras.datasets import cifar10
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from keras import Sequential
from keras.applications import VGG19 #For Transfer Learning
from keras.preprocessing.image import ImageDataGenerator
from keras.optimizers import SGD,Adam
from keras.callbacks import ReduceLROnPlateau
from keras.layers import Flatten,Dense,BatchNormalization,Activation,Dropout
from keras.utils import to_categorical

# Download the CIFAR dataset
(x_train,y_train),(x_test,y_test) = cifar10.load_data()

#defining training and test sets
x_train,x_val,y_train,y_val=train_test_split(x_train,y_train,test_size=.3)

#Dimension of the dataset
print((x_train.shape,y_train.shape))
print((x_val.shape,y_val.shape))
print((x_test.shape,y_test.shape))

#One Hot Encoding
y_train=to_categorical(y_train)
y_val=to_categorical(y_val)
y_test=to_categorical(y_test)

#Verifying the dimension after one hot encoding
print((x_train.shape,y_train.shape))
print((x_val.shape,y_val.shape))
print((x_test.shape,y_test.shape))

#Image Data Augmentation
train_generator = ImageDataGenerator(rotation_range=2, horizontal_flip=True, zoom_range=.1)
val_generator = ImageDataGenerator(rotation_range=2, horizontal_flip=True, zoom_range=.1)
test_generator = ImageDataGenerator(rotation_range=2, horizontal_flip= True, zoom_range=.1)

#Fitting the augmentation defined above to the data
train_generator.fit(x_train)
val_generator.fit(x_val)
test_generator.fit(x_test)

#Learning Rate Annealer
lrr= ReduceLROnPlateau(monitor='val_acc', factor=.01, patience=3, min_lr=1e-5)

#Defining the VGG Convolutional Neural Net
base_model = VGG19(include_top = False, weights = 'imagenet', input_shape = (32,32,3), classes = y_train.shape[1])

#Adding the final layers to the above base models where the actual classification is done in the dense layers
model= Sequential()
model.add(base_model)
model.add(Flatten())

#Model summary
model.summary()

#Adding the Dense layers along with activation and batch normalization
model.add(Dense(1024,activation=('relu'),input_dim=512))
model.add(Dense(512,activation=('relu')))
model.add(Dense(256,activation=('relu')))
model.add(Dropout(.3))
model.add(Dense(128,activation=('relu')))

#model.add(Dropout(.2))
model.add(Dense(10,activation=('softmax')))

#Checking the final model summary
model.summary()

#Making prediction
predict_y = model.predict(x_test)
y_pred=np.argmax(predict_y,axis=1)
y_true=np.argmax(y_test,axis=1)

輸出

Downloading data from https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz
170498071/170498071 [==============================] - 2s 0us/step
((35000, 32, 32, 3), (35000, 1))
((15000, 32, 32, 3), (15000, 1))
((10000, 32, 32, 3), (10000, 1))
((35000, 32, 32, 3), (35000, 10))
((15000, 32, 32, 3), (15000, 10))
((10000, 32, 32, 3), (10000, 10))
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg19/vgg19_weights_tf_dim_ordering_tf_kernels_notop.h5
80134624/80134624 [==============================] - 1s 0us/step
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg19 (Functional) (None, 1, 1, 512) 20024384
flatten (Flatten) (None, 512) 0
=================================================================
Total params: 20,024,384
Trainable params: 20,024,384
Non-trainable params: 0
_________________________________________________________________
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg19 (Functional) (None, 1, 1, 512) 20024384
flatten (Flatten) (None, 512) 0
dense (Dense) (None, 1024) 525312
dense_1 (Dense) (None, 512) 524800
dense_2 (Dense) (None, 256) 131328
dropout (Dropout) (None, 256) 0
dense_3 (Dense) (None, 128) 32896
dense_4 (Dense) (None, 10) 1290
=================================================================
Total params: 21,240,010
Trainable params: 21,240,010
Non-trainable params: 0
_________________________________________________________________
313/313 [==============================] - 158s 503ms/step

結論

多類別影像分類已被證明對深度學習界非常有益。作為計算機視覺中最重要的一些基本任務之一，它被廣泛應用於 AI 行業作為基礎任務，即使對於複雜的計算機視覺應用（如影像分割、檢測和視覺識別任務）也是如此。

Mithilesh Pradhan

更新於： 2022 年 12 月 1 日

2K+ 次檢視

開啟你的職業生涯

透過完成課程獲得認證

開始學習