Pytorch Cnn 예제 | Pytorch Cnn 예제 (컨볼 루션 신경망) 모든 답변

당신은 주제를 찾고 있습니까 “pytorch cnn 예제 – Pytorch CNN 예제 (컨볼 루션 신경망)“? 다음 카테고리의 웹사이트 you.tfvp.org 에서 귀하의 모든 질문에 답변해 드립니다: you.tfvp.org/blog. 바로 아래에서 답을 찾을 수 있습니다. 작성자 Aladdin Persson 이(가) 작성한 기사에는 조회수 42,614회 및 좋아요 632개 개의 좋아요가 있습니다.

Table of Contents

pytorch cnn 예제 주제에 대한 동영상 보기

여기에서 이 주제에 대한 비디오를 시청하십시오. 주의 깊게 살펴보고 읽고 있는 내용에 대한 피드백을 제공하세요!

아래 동영상 보기

d여기에서 Pytorch CNN 예제 (컨볼 루션 신경망) – pytorch cnn 예제 주제에 대한 세부정보를 참조하세요

A walkthrough of how to code a convolutional neural network (CNN) in the Pytorch-framework using MNIST dataset. Explaining it step by step and building the basic architecture of the CNN.
People often ask what courses are great for getting into ML/DL and the two I started with is ML and DL specialization both by Andrew Ng. Below you’ll find both affiliate and non-affiliate links if you want to check it out. The pricing for you is the same but a small commission goes back to the channel if you buy it through the affiliate link.
ML Course (affiliate): https://bit.ly/3qq20Sx
DL Specialization (affiliate): https://bit.ly/30npNrw
ML Course (no affiliate): https://bit.ly/3t8JqA9
DL Specialization (no affiliate): https://bit.ly/3t8JqA9
GitHub Repository:
https://github.com/aladdinpersson/Machine-Learning-Collection
✅ Equipment I use and recommend:
https://www.amazon.com/shop/aladdinpersson
❤️ Become a Channel Member:
https://www.youtube.com/channel/UCkzW5JSFwvKRjXABI-UTAkQ/join
✅ One-Time Donations:
Paypal: https://bit.ly/3buoRYH
Ethereum: 0xc84008f43d2E0bC01d925CC35915CdE92c2e99dc
▶️ You Can Connect with me on:
Twitter – https://twitter.com/aladdinpersson
LinkedIn – https://www.linkedin.com/in/aladdin-persson-a95384153/
GitHub – https://github.com/aladdinpersson
PyTorch Playlist:
https://www.youtube.com/playlist?list=PLhhyoLH6IjfxeoooqP9rhU3HJIAVAJ3Vz

pytorch cnn 예제 주제에 대한 자세한 내용은 여기를 참조하세요.

Pytorch로 CNN 구현하기 – JustKode

또한, MNIST 데이터 또한 학습 해 보겠습니다. MNIST Example. Convolution Layers. Convolution 연산을 위한 레이어들은 다음과 같습니다. Conv1d (Text- …

+ 더 읽기

Source: justkode.kr

Date Published: 11/12/2021

Pytorch로 구현하는 CNN(Convolutional Neural Network) – 데하

CNN 을 파이토치로 구현을 하는 코드이다. CNN 은 이미지 처리에 강력하다. 멀티프로세싱에 유리한 GPU 연산으로 사용한다. 구글 코랩으로 사용한다.

+ 여기에 더 보기

Source: data-science-hi.tistory.com

Date Published: 12/23/2022

[pytorch 따라하기-5] 합성곱신경망(CNN) 구현

[pytorch 따라하기-1] 구글 Colab에 pytorch 세팅하기 https://limitsinx.tistory.com/136 [pytorch 따라하기-2] Tensor생성 및 Backward …

+ 여기에 더 보기

Source: limitsinx.tistory.com

Date Published: 5/12/2021

Training a Classifier — PyTorch Tutorials 1.12.1+cu102 …

If the prediction is correct, we add the sample to the list of correct predictions. Okay, first step. Let us display an image from the test set to get familiar.

+ 자세한 내용은 여기를 클릭하십시오

Source: pytorch.org

Date Published: 6/10/2022

PyTorch: Training your first Convolutional Neural Network (CNN)

We randomly sample a total of 10 images from this dataset on Lines 28 and 29 using the Subset (which creates a smaller “view” of the full …

+ 여기에 표시

Source: pyimagesearch.com

Date Published: 4/8/2021

PyTorch로 딥러닝하기 — CNN – Medium

이 중 유명한 dataset(MNIST)이 덕분에 손쉽게 batch iterator로 만들 수 있는 CNN부터 PyTorch로 어떻게 구성하는지 알아보려고 합니다.

+ 여기에 더 보기

Source: medium.com

Date Published: 10/10/2021

[Pytorch] CNN을 이용한 문장 분류 모델 구현하기

위의 코드를 실행시키면 다음과 같이 전처리 된 데이터의 sample을 볼 수 있습니다! #Step 3. 모델 구현하기. 이제 본격적으로 모델 클래스를 정의하고 …

+ 여기에 자세히 보기

Source: kaya-dev.tistory.com

Date Published: 4/19/2022

[Pytorch-기초강의] 4. 이미지 처리 능력이 탁월한 CNN(Simple …

본 예제에서는 앞에서 사용한 패션 아이템을 CNN 네트워크를 사용하여, 분류 성능을 높여본다. import torch import torch.nn as nn import …

+ 여기에 표시

Source: yjs-program.tistory.com

Date Published: 5/19/2022

주제와 관련된 이미지 pytorch cnn 예제

주제와 관련된 더 많은 사진을 참조하십시오 Pytorch CNN 예제 (컨볼 루션 신경망). 댓글에서 더 많은 관련 이미지를 보거나 필요한 경우 더 많은 관련 기사를 볼 수 있습니다.

주제에 대한 기사 평가 pytorch cnn 예제

Author: Aladdin Persson
Views: 조회수 42,614회
Likes: 좋아요 632개
Date Published: 2020. 4. 4.
Video Url link: https://www.youtube.com/watch?v=wnK3uWv_WkU

Pytorch로 CNN 구현하기

CNN In Pytorch

Pytorch 에는 CNN 을 개발 하기 위한 API 들이 있습니다. 다채널로 구현 되어 있는 CNN 신경망을 위한 Layers, Max pooling, Avg pooling 등, 이번 시간에는 여러 가지 CNN 을 위한 API 를 알아 보겠습니다. 또한, MNIST 데이터 또한 학습 해 보겠습니다.

MNIST Example

Convolution Layers

Convolution 연산을 위한 레이어들은 다음과 같습니다.

Conv1d (Text-CNN에서 많이 사용)

(Text-CNN에서 많이 사용) Conv2d (이미지 분류에서 많이 사용)

(이미지 분류에서 많이 사용) Conv3d

위 3가지 API 들은 내부 원리는 다 같습니다. 이번에는 자주 사용하는 Conv2d 를 중점으로 설명 하도록 하겠습니다.

Parameters

일단 Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode=’zeros’) 의 파라미터는 다음과 같습니다.

in_channels : 입력 채널 수 을 뜻합니다. 흑백 이미지일 경우 1 , RGB 값을 가진 이미지일 경우 3 을 가진 경우가 많습니다.

: 을 뜻합니다. , 을 가진 경우가 많습니다. out_channels : 출력 채널 수 을 뜻합니다.

: 을 뜻합니다. kernel_size : 커널 사이즈 를 뜻합니다. int 혹은 tuple 이 올 수 있습니다.

: 를 뜻합니다. 혹은 이 올 수 있습니다. stride : stride 사이즈 를 뜻합니다. int 혹은 tuple 이 올 수 있습니다. 기본 값은 1입니다.

: 를 뜻합니다. 혹은 이 올 수 있습니다. 기본 값은 1입니다. padding : padding 사이즈 를 뜻합니다. int 혹은 tuple 이 올 수 있습니다. 기본 값은 0입니다.

: 를 뜻합니다. 혹은 이 올 수 있습니다. 기본 값은 0입니다. padding_mode : padding mode 를 설정할 수 있습니다. 기본 값은 ‘zeros’ 입니다. 아직 zero padding 만 지원 합니다.

: 를 설정할 수 있습니다. 기본 값은 ‘zeros’ 입니다. 아직 만 지원 합니다. dilation : 커널 사이 간격 사이즈 를 조절 합니다. 해당 링크를 확인 하세요.

: 를 조절 합니다. 해당 링크를 확인 하세요. groups : 입력 층의 그룹 수 을 설정하여 입력의 채널 수를 그룹 수에 맞게 분류 합니다. 그 다음, 출력의 채널 수를 그룹 수에 맞게 분리 하여, 입력 그룹과 출력 그룹의 짝 을 지은 다음 해당 그룹 안에서만 연산 이 이루어지게 합니다.

: 을 설정하여 합니다. 그 다음, 하여, 을 지은 다음 이 이루어지게 합니다. bias : bias 값을 설정 할 지, 말지를 결정합니다. 기본 값은 True 입니다.

Shape

Input Tensor ( N , C i n , H i n , W i n ) (N, C_{in}, H_{in}, W_{in}) (N,Cin,Hin,Win)의 모양과 Output Tensor ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) (N,Cout,Hout,Wout)의 모양은 다음과 같습니다.

Input Tensor ( N , C i n , H i n , W i n ) (N, C_{in}, H_{in}, W_{in}) ( N , C i n , H i n , W i n )

N N N : batch의 크기

: C i n C_{in} C i n : in_channels 에 넣은 값과 일치하여야 함.

: 에 넣은 값과 일치하여야 함. H i n H_{in} H i n : 2D Input Tensor 의 높이

: 의 높이 W i n W_{in} W i n : 2D Input Tensor의 너비

Output Tensor ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) ( N , C o u t , H o u t , W o u t )

N N N : batch의 크기

: C o u t C_{out} C o u t : out_channels 에 넣은 값과 일치 함.

: 에 넣은 값과 일치 함. H o u t = ⌊ H i n + 2 × p a d d i n g [ 0 ] − d i l a t i o n [ 0 ] × ( k e r n e l _ s i z e [ 0 ] − 1 ) − 1 s t r i d e [ 0 ] + 1 ⌋ H_{out} = \lfloor{H_{in} + 2 \times padding[0] – dilation[0] \times (kernel\_size[0] – 1) – 1 \over stride[0]} + 1\rfloor H o u t = ⌊ s t r i d e [ 0 ] H i n + 2 × p a d d i n g [ 0 ] − d i l a t i o n [ 0 ] × ( k e r n e l _ s i z e [ 0 ] − 1 ) − 1 + 1 ⌋

W o u t = ⌊ W i n + 2 × p a d d i n g [ 1 ] − d i l a t i o n [ 1 ] × ( k e r n e l _ s i z e [ 1 ] − 1 ) − 1 s t r i d e [ 1 ] + 1 ⌋ W_{out} = \lfloor{W_{in} + 2 \times padding[1] – dilation[1] \times (kernel\_size[1] – 1) – 1 \over stride[1]} + 1 \rfloor W o u t = ⌊ s t r i d e [ 1 ] W i n + 2 × p a d d i n g [ 1 ] − d i l a t i o n [ 1 ] × ( k e r n e l _ s i z e [ 1 ] − 1 ) − 1 + 1 ⌋

Code Example

In

import torch import torch . nn as nn import torch . nn . functional as F class CNN ( nn . Module ) : def __init__ ( self ) : super ( CNN , self ) . __init__ ( ) self . conv1 = nn . Conv2d ( in_channels = 1 , out_channels = 3 , kernel_size = 5 , stride = 1 ) self . conv2 = nn . Conv2d ( in_channels = 3 , out_channels = 10 , kernel_size = 5 , stride = 1 ) self . fc1 = nn . Linear ( 10 * 12 * 12 , 50 ) self . fc2 = nn . Linear ( 50 , 10 ) def forward ( self , x ) : print ( “연산 전” , x . size ( ) ) x = F . relu ( self . conv1 ( x ) ) print ( “conv1 연산 후” , x . size ( ) ) x = F . relu ( self . conv2 ( x ) ) print ( “conv2 연산 후” , x . size ( ) ) x = x . view ( – 1 , 10 * 12 * 12 ) print ( “차원 감소 후” , x . size ( ) ) x = F . relu ( self . fc1 ( x ) ) print ( “fc1 연산 후” , x . size ( ) ) x = self . fc2 ( x ) print ( “fc2 연산 후” , x . size ( ) ) return x cnn = CNN ( ) output = cnn ( torch . randn ( 10 , 1 , 20 , 20 ) )

Out

연산 전 torch.Size([10, 1, 20, 20]) conv1 연산 후 torch.Size([10, 3, 16, 16]) conv2 연산 후 torch.Size([10, 10, 12, 12]) 차원 감소 후 torch.Size([10, 1440]) fc1 연산 후 torch.Size([10, 50]) fc2 연산 후 torch.Size([10, 10])

Pooling Layers

Pooling 연산을 위한 레이어 들은 다음과 같습니다.

MaxPool1d

MaxPool2d

MaxPool3d

AvgPool1d

AvgPool2d

AvgPool3d

위 6가지 API 들은 차원 수, Pooling 연산의 방법을 제외하곤 다 같습니다. 대표적인 MaxPool2d 를 설명 해 보겠습니다.

Parameters

일단 MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False) 의 파라미터는 다음과 같습니다.

kernel_size : 커널 사이즈 를 뜻합니다. int 혹은 tuple 이 올 수 있습니다.

: 를 뜻합니다. 혹은 이 올 수 있습니다. stride : stride 사이즈 를 뜻합니다. int 혹은 tuple 이 올 수 있습니다. 기본 값은 1입니다.

: 를 뜻합니다. 혹은 이 올 수 있습니다. 기본 값은 1입니다. padding : zero padding 을 실시 할 사이즈를 뜻합니다. int 혹은 tuple 이 올 수 있습니다. 기본 값은 0입니다.

: 을 실시 할 사이즈를 뜻합니다. 혹은 이 올 수 있습니다. 기본 값은 0입니다. dilation : 커널 사이 간격 사이즈 를 조절 합니다. 해당 링크를 확인 하세요.

: 를 조절 합니다. 해당 링크를 확인 하세요. return_indices : True 일 경우 최대 인덱스 를 반환합니다.

: 일 경우 를 반환합니다. ceil_mode : True 일 경우, Output Size에 대하여 바닥 함수대신, 천장 함수를 사용합니다.

Shape

Input Tensor ( N , C i n , H i n , W i n ) (N, C_{in}, H_{in}, W_{in}) (N,Cin,Hin,Win)의 모양과 Output Tensor ( N , C o u t , H o u t , W o u t ) (N, C_{out}, H_{out}, W_{out}) (N,Cout,Hout,Wout)의 모양은 다음과 같습니다.

Input Tensor ( N , C , H i n , W i n ) (N, C, H_{in}, W_{in}) ( N , C , H i n , W i n )

N N N : batch의 크기

: C C C : channel의 크기 .

: . H i n H_{in} H i n : 2D Input Tensor 의 높이

: 의 높이 W i n W_{in} W i n : 2D Input Tensor의 너비

Output Tensor ( N , C , H o u t , W o u t ) (N, C, H_{out}, W_{out}) ( N , C , H o u t , W o u t )

N N N : batch의 크기

: C C C : channel의 크기 .

: . H o u t = ⌊ H i n + 2 × p a d d i n g [ 0 ] − d i l a t i o n [ 0 ] × ( k e r n e l _ s i z e [ 0 ] − 1 ) − 1 s t r i d e [ 0 ] + 1 ⌋ H_{out} = \lfloor{H_{in} + 2 \times padding[0] – dilation[0] \times (kernel\_size[0] – 1) – 1 \over stride[0]} + 1\rfloor H o u t = ⌊ s t r i d e [ 0 ] H i n + 2 × p a d d i n g [ 0 ] − d i l a t i o n [ 0 ] × ( k e r n e l _ s i z e [ 0 ] − 1 ) − 1 + 1 ⌋

W o u t = ⌊ W i n + 2 × p a d d i n g [ 1 ] − d i l a t i o n [ 1 ] × ( k e r n e l _ s i z e [ 1 ] − 1 ) − 1 s t r i d e [ 1 ] + 1 ⌋ W_{out} = \lfloor{W_{in} + 2 \times padding[1] – dilation[1] \times (kernel\_size[1] – 1) – 1 \over stride[1]} + 1 \rfloor W o u t = ⌊ s t r i d e [ 1 ] W i n + 2 × p a d d i n g [ 1 ] − d i l a t i o n [ 1 ] × ( k e r n e l _ s i z e [ 1 ] − 1 ) − 1 + 1 ⌋

Code Example

In

import torch import torch . nn as nn import torch . nn . functional as F class CNN ( nn . Module ) : def __init__ ( self ) : super ( CNN , self ) . __init__ ( ) self . max_pool1 = nn . MaxPool2d ( kernel_size = 2 ) self . max_pool2 = nn . MaxPool2d ( kernel_size = 2 ) self . fc1 = nn . Linear ( 10 * 5 * 5 , 50 ) self . fc2 = nn . Linear ( 50 , 10 ) def forward ( self , x ) : print ( “연산 전” , x . size ( ) ) x = F . relu ( self . max_pool1 ( x ) ) print ( “max_pool1 연산 후” , x . size ( ) ) x = F . relu ( self . max_pool2 ( x ) ) print ( “max_pool2 연산 후” , x . size ( ) ) x = x . view ( – 1 , 10 * 5 * 5 ) print ( “차원 감소 후” , x . size ( ) ) x = F . relu ( self . fc1 ( x ) ) print ( “fc1 연산 후” , x . size ( ) ) x = self . fc2 ( x ) print ( “fc2 연산 후” , x . size ( ) ) return x cnn = CNN ( ) output = cnn ( torch . randn ( 10 , 1 , 20 , 20 ) )

Out

연산 전 torch.Size([10, 1, 20, 20]) max_pool1 연산 후 torch.Size([10, 1, 10, 10]) max_pool2 연산 후 torch.Size([10, 1, 5, 5]) 차원 감소 후 torch.Size([1, 250]) fc1 연산 후 torch.Size([1, 50]) fc2 연산 후 torch.Size([1, 10])

MNIST 모델 학습

일단 MNIST 모델을 불러오기 위해서는 torchvision 의 설치가 선행 되어야 합니다.

pip install torchvision

torchvision 을 설치한 후, 필요한 라이브러리를 import 합니다.

import torch import torch . nn as nn import torch . nn . functional as F import torch . optim as optim from torchvision import datasets , transforms

MNIST 데이터를 가져오기 위해, datasets 를 사용 하고, 이를 Tensor 객체로 가공 하기 위해, transforms 를 사용합니다. Compose 함수를 이용해, Tensor 로 가공 후, 정규화 또한 진행합니다. MNIST 데이터를 배치 학습 시키기 위해, DataLoader 를 사용 합니다.

train_data = datasets . MNIST ( ‘./data/’ , train = True , download = True , transform = transforms . Compose ( [ transforms . ToTensor ( ) , transforms . Normalize ( ( 0.1307 , ) , ( 0.3081 , ) ) ] ) ) train_loader = torch . utils . data . DataLoader ( dataset = train_data , batch_size = 50 , shuffle = True ) test_data = datasets . MNIST ( ‘./data/’ , train = False , transform = transforms . Compose ( [ transforms . ToTensor ( ) , transforms . Normalize ( ( 0.1307 , ) , ( 0.3081 , ) ) ] ) ) test_loader = torch . utils . data . DataLoader ( dataset = test_data , batch_size = 50 , shuffle = True )

CNN 클래스를 선언해 줍니다.

class CNN ( nn . Module ) : def __init__ ( self ) : super ( CNN , self ) . __init__ ( ) self . conv1 = nn . Conv2d ( in_channels = 1 , out_channels = 20 , kernel_size = 5 , stride = 1 ) self . conv2 = nn . Conv2d ( in_channels = 20 , out_channels = 50 , kernel_size = 5 , stride = 1 ) self . fc1 = nn . Linear ( 4 * 4 * 50 , 500 ) self . fc2 = nn . Linear ( 500 , 10 ) def forward ( self , x ) : x = F . relu ( self . conv1 ( x ) ) x = F . max_pool2d ( x , kernel_size = 2 , stride = 2 ) x = F . relu ( self . conv2 ( x ) ) x = F . max_pool2d ( x , kernel_size = 2 , stride = 2 ) x = x . view ( – 1 , 4 * 4 * 50 ) x = F . relu ( self . fc1 ( x ) ) x = self . fc2 ( x ) return x

CNN 객체와, optimizer , loss function 객체를 선언 해 줍니다.

cnn = CNN ( ) criterion = torch . nn . CrossEntropyLoss ( ) optimizer = optim . SGD ( cnn . parameters ( ) , lr = 0.01 )

학습 코드를 실행 해 줍니다. 배치로 변환된 data 의 사이즈는 (50, 1, 28, 28)이고, target 사이즈는 (50) 입니다.

cnn . train ( ) for epoch in range ( 10 ) : for index , ( data , target ) in enumerate ( train_loader ) : optimizer . zero_grad ( ) output = cnn ( data ) loss = criterion ( output , target ) loss . backward ( ) optimizer . step ( ) if index % 100 == 0 : print ( “loss of {} epoch, {} index : {}” . format ( epoch , index , loss . item ( ) ) )

결과를 확인 합니다.

cnn . eval ( ) test_loss = 0 correct = 0 with torch . no_grad ( ) : for data , target in test_loader : output = cnn ( data ) test_loss += criterion ( output , target ) . item ( ) pred = output . argmax ( dim = 1 , keepdim = True ) correct += pred . eq ( target . view_as ( pred ) ) . sum ( ) . item ( ) print ( ‘

Test set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)

‘ . format ( test_loss , correct , len ( test_loader . dataset ) , 100 . * correct / len ( test_loader . dataset ) ) )

결과 : Test set: Average loss: 297.5230, Accuracy: 9761/10000 (98%)

마치며

Pytorch로 구현하는 CNN(Convolutional Neural Network)

728×90

CNN 을 파이토치로 구현을 하는 코드이다. CNN 은 이미지 처리에 강력하다. 멀티프로세싱에 유리한 GPU 연산으로 사용한다. 구글 코랩으로 사용한다.

먼저 라이브러리들을 불러오도록 한다.

import torch import torchvision import torchvision.transforms as transforms from torch.utils.data import DataLoader import torch.nn as nn import torch.nn.functional as F import torch.optim as optim import matplotlib.pyplot as plt

미리 코랩에 드라이브 디렉토리를 마운트 시켜 준다.

from google.colab import drive drive.mount(‘/content/gdrive’)

경로 설정도 해준다.

cd/content/gdrive/My Drive/deeplearningbro/pytorch

여기서는 CIFAR10 의 데이터셋을 사용한다. 클래스 10개의 이미지를 가지는 데이터이다. 3d tensor로 구성되어 있다.

클래스에는 ‘plane’,’car’,’bird’,’cat’,’deer’,’dog’,’frog’,’horse’,’ship’,’truck’ 이 있다.

전처리를 위해서 Compose로 셋팅해준다. 여기선 tensor 데이터로 바꿔주는 거랑 normalize하는 전처리만 한다. 추가적으로 더 넣어도 된다.

Normalize 내부에는 평균과 표준편차로 구성된 것이다. 3개씩 인것은 데이터가 color데이터기 때문에 3차원(channel*width*height)이기 때문에 그렇다. 0.5는 임의로 설정한 것이다. 데이터의 최적의 평균, 표준편차를 구해 넣어주는게 더 좋을 수 있다.

CIFAR10 데이터셋은 파이토치에서 제공을 해주기 때문에 쉽게 다운받을 수 있다. 불러올 때는 transforms를 이용해서 전처리를 해준다.

DataLoader로 배치 형태로 만들어 준다.

transform = transforms.Compose( [transforms.ToTensor(), transforms.Normalize((0.5,0.5,0.5), (0.5,0.5,0.5))] ) trainset = torchvision.datasets.CIFAR10(root=’./data’, train=True, download=True, transform=transform) trainloader = torch.utils.data.DataLoader(trainset, batch_size=8, shuffle=True) testset = torchvision.datasets.CIFAR10(root=’./data’, train=False, download=True, transform=transform) testloader = torch.utils.data.DataLoader(testset, batch_size=8, shuffle=False)

device를 설정해서 GPU 연산을 가능하게 해주는 CUDA를 사용하자. cpu라고 뜨면 코랩이면 런타임에서 gpu로 설정을 바꿔준다.

device = torch.device(‘cuda:0’ if torch.cuda.is_available() else ‘cpu’) print(f'{device} is available’)

이제 CNN 모델을 구축해보자. Conv와 pooling을 사용하였다.

nn.Conv2d(3, 6, 5) 가 의미하는 것은 일단 들어오는 입력 채널의 수가 3개이다. 칼라 이미지이기 때문이다. 그리고 출력 채널 수는 사용자가 정해줄 수 있는 부분이다.

그래서 여기서는 6개의 채널로 출력을 시켰다. 그리고 window size는 5×5로 슬라이딩을 진행한다. window가 움직이는 크기를 나타내는 stride는 디폴트는 1칸이다. 여기선 하지 않았지만 padding도 설정이 가능하다.

convolutional 연산이 끝나면 다음으로 maxplooling에 들어간다. nn.MaxPool2d(2,2)는 2×2짜리 필터를 사용한다는 것이다.

conv와 pooling이 끝나면 nn.Linear(16 * 5 * 5, 120)을 통해 피쳐맵을 일렬로 편다. 일렬로 폈을 때 16 * 5 * 5 노드, 즉 이걸 입력벡터로 해서 120개의 히든노드를 가진 히든레이어를 하나 만든다.

그런다음 nn.Linear(120, 10) 히든노드 120개에서 출력층에는 10가지 클래스를 구분하는 문제니까 10개의 노드로 최종적으로 출력이 되게 한다.

forward내에서의 연산은 conv1 -> relu -> pooling -> conv2 -> relu -> polling -> linear -> output

.view 로 일렬인 노드로 만드는데, -1은 배치수 만큼 만들어야하니까 지정을 해준 것이다.

.to(device)를 통해 GPU 연산을 할 수 있게 하여 선언한다.

class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(3, 6, 5) # 합성곱 연산 (입력 채널 수: 3, 출력 채널 수: 6, 필터 크기: 5×5, stride=1(default)) self.pool1 = nn.MaxPool2d(2,2) # 합성곱 연산 (필터크기 2×2, stride=2) self.conv2 = nn.Conv2d(6, 16, 5) # 합성곱 연산 (입력 채널 수: 6, 출력 채널수: 16, 필터 크기: 5×5, stride=1(default)) self.pool2 = nn.MaxPool2d(2, 2) # 합성곱 연산 (필터크기 2×2, stride=2) self.fc1 = nn.Linear(16 * 5 * 5, 120) # 5×5 피쳐맵 16개를 일렬로 피면 16*5*5개의 노드가 생성됨. self.fc2 = nn.Linear(120, 10) def forward(self, x): x = self.pool1(F.relu(self.conv1(x))) # conv1 -> ReLU -> pool1 x = self.pool2(F.relu(self.conv2(x))) # conv2 -> ReLU -> pool2 x = x.view(-1, 16 * 5 * 5) # 5×5 피쳐맵 16개를 일렬로 만든다. x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) return x net = Net().to(device) # 모델 선언

print(net) # 피쳐맵은 다음과 같이 바뀌면서 진행된다. 32 -> 28 -> 14 -> 14 -> 5

분류문제이기 때문에 손실함수를 크로스 엔트로피를 사용한다.

위에서 class Net()으로 정의한 파라미터를 net.parameters()로 설정해준다. 그리고 최적화 방법으로는 모멘텀을 활용한다.

criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(net.parameters(), lr=1e-3, momentum=0.9)

학습하는 과정은 이전에 MLP에서 했던 것과 동일하게 진행한다.

gpu를 연산을 하려면 gpu용 모델이 있어야하고 gpu용 데이터가 있어야 한다. 그래서 data 뒤에 .to(device)를 붙여서 gpu연산이 가능한 tensor데이터 바꿔준다.

labels은 [0,1, … ,9]인 벡터형태이구, outputs에서 나오는 예측값은 10개 노드를 가진 벡터로 나오는데, 혹시 커스터마이징하다가 labels를 원핫인코딩하면 criteron을(outputs, labels) 에서 에러가 나게되니 주의하자.

loss_ = [] # loss 저장용 리스트 n = len(trainloader) # 배치개수 for epoch in range(10): # 10회 반복 running_loss = 0.0 for i, data in enumerate(trainloader, 0): inputs, labels = data[0].to(device), data[1].to(device) # 배치 데이터 optimizer.zero_grad() # 배치마다 optimizer 초기화 outputs = net(inputs) # 노드 10개짜리 예측값 산출 loss = criterion(outputs, labels) # 크로스 엔트로피 손실함수 계산 optimizer.zero_grad() # 배치마다 optimizer 초기화 loss.backward() # 손실함수 기준 역전파 optimizer.step() # 가중치 최적화 running_loss += loss.item() loss.backward() # 손실함수 기준 역전파 optimizer.step() # 가중치 최적화 running_loss += loss.item() loss_.append(running_loss / n) print(‘[%d] loss: %.3f’ %(epoch + 1, running_loss / len(trainloader)))

loss를 그래프로 그려보면 학습이 잘 진행된 것을 확인할 수 있다. 모델이 복잡하지 않고 단순하기 때문에 모델을 구성하는 것과 학습하는 과정에 의의를 두자.

plt.plot(loss_)

plt.title(loss)

plt.xlabel(‘epoch’)

plt.show()

모델을 저장하는 것을 살펴보자. 경로를 지정해주고, torch.save 를 저장하면된다. net.state_dict은 parameter정보가 들어가게 된다.

PATH = ‘./cifar_net.pth’ # 모델 저장 경로 torch.save(net.state_dict(), PATH) # 모델 저장장

저장한 모델을 다시 불러와보자. 모델 불러오기는 엄밀히 말하면 모델의 parameter를 불러오는 것이다. 모델의 뼈대를 먼저 선언하고나서 모델의 parameter를 불러와서 pretrained model를 만든다.

gpu용 모델을 만들었기 때문에 gpu용 모델로 뼈대를 만들고, parameter를 덮어씌워준다.

net = Net().to(device) # 모델 선언 net.load_state_dict(torch.load(PATH)) # 모델 parameter 불러오기

이제 모델의 정확도를 구해보자.

correct = 0 total = 0 with torch.no_grad(): # 파라미터 업데이트 같은거 안하기 때문에 no_grad를 사용. # net.eval() # batch normalization이나 dropout을 사용하지 않았기 때문에 사용하지 않음. 항상 주의해야함. for data in testloader: images, labels = data[0].to(device), data[1].to(device) outputs = net(images) _, predicted = torch.max(outputs.data, 1) # 10개의 class중 가장 값이 높은 것을 예측 label로 추출. total += labels.size(0) # test 개수 correct += (predicted == labels).sum().item() # 예측값과 실제값이 맞으면 1 아니면 0으로 합산. print(f’accuracy of 10000 test images: {100*correct/total}%’)

outputs.data # 한 epoch에서 각 batch에서 나온 여기서는 8개의 배치라서 8개의 각 배치에 대한 10개의 class에 대한 score 산출.

predicted # 어느 한 배치의 분류 예측값

predicted를 통해 epoch한번에 8개의 배치에 대해 각각 3, 5, 6, 5, 3, 5, 4, 7 이라는 예측값을 얻을 수 있다.

ref)

www.analyticsvidhya.com/blog/2019/10/building-image-classification-models-cnn-pytorch/

deeplearningbro

728×90

[pytorch 따라하기-5] 합성곱신경망(CNN) 구현

[pytorch 따라하기-1] 구글 Colab에 pytorch 세팅하기 https://limitsinx.tistory.com/136
[pytorch 따라하기-2] Tensor생성 및 Backward https://limitsinx.tistory.com/137
[pytorch 따라하기-3] 경사하강법을 통한 선형회귀 구현 https://limitsinx.tistory.com/138
[pytorch 따라하기-4] 인공신경망(ANN) 구현 https://limitsinx.tistory.com/139

※이 전글에서 정리한 코드/문법은 재설명하지 않으므로, 참고부탁드립니다

※해당 글은 PC에서 보기에 최적화 되어있습니다.

CNN은 이미지 딥러닝에 사용되는 아주 기본적인 기술입니다!

이미지를 학습시키려면, 이미지를 다루는 라이브러리와 데이터들이 있어야겠죠?

console창에 하기와 같이 입력해주시고, 완료되시면 해당 코드를 실행해주세요!

!pip3 install torchvision

CNN이란?

CNN

이번에는 드디어, 그 유명한 CNN에 관해 정리를 해보도록 하겠습니다.

CNN은 제가 딥러닝을 적용하는 방식은 아닙니다. 왜냐하면, 저는 실수형 데이터를 주로 다룰 뿐 아니라

제가 하는것은 Classification이 아닌, Following이기 때문이죠!

즉, 일반적으로 “딥러닝!” 이라고 하면 떠오르는 장르(classification)를 하고 있는것은 아니기에..

이런 연유로 저는 주로 RNN쪽을 많이 사용합니다.:)

하지만, CNN에 대해서 이해를 하고 넘어가는것과, 안쓴다고 그냥 넘어가는것은 큰 차이가있죠!

정리를 시작해보겠습니다.

CNN은 근본적으로는, 이미지를 학습시키는데 사용되는 방법이기에 전자공학과의 “영상처리”에서 배우는 내용들이 상당수 들어 있습니다.

일단 Convolution이 무엇인지부터 알아야하는데요,

사실 학부생에게는 컨벌루션이 뭔지만 2주정도 강의를 하는 내용입니다.

코드를 돌리는데는 아무지장 없으니, 문과출신이시거나 전자과,공학수학 수강자분이 아니시면 그냥 넘어가셔도 됩니다.

★ 기본적으로 이미지는 RGB3차원 값 혹은, 조/명/채도 3차원값으로, 숫자로 변환 하여 학습을 진행합니다.

“컨벌루션??”

컨벌루션, 출처 : 위키백과

컨벌루션 수식, 출처 : https://pinkwink.kr/156

컨벌루션이란, 이미지 2개를 합친다고 하면 하나를 Y축(세로)기준으로 대칭을 시킨후 차례차례 곱해나가는 것입니다.

즉, 첫번째 이미지에서 파란색과 빨간색 함수를 이미지라고 가정해보면

빨간색 함수를 Y축 기준 대칭시키고, 파란색 이미지를 향해 오른쪽으로 1씩 움직이면서 차츰차츰 곱한 결과값들을 누적하는것입니다.

이것을 수학적으로 정의하면, 두번째 이미지와 같은 수식이 얻어지게 됩니다.

이런 원리를 토대로 Neural Network의 이미지 학습에도 적용한것이 바로 “CNN”인데요

요것을 이미지로 옮기면 이렇게 해석 됩니다.

출처 : https://github.com/hunkim/DeepLearningZeroToAll/blob/master/lab-11-0-cnn_basics.ipynb

맨왼쪽 그림이 이미지라고 가정하고, [1 1, 1 1]이라는 필터가 있다고 생각해보면,

필터와 각 위치에 해당하는 값들을 모두 곱하고 더해서 맨오른쪽 이미지처럼 만들어 주게 됩니다.

이렇게 되면 기존의 3X3 이미지가 2X2로 줄어들게 되겠죠?

즉, 정보의 손실이 일어남과 동시에 특징들을 뭉뚱그려서 더 작은값으로 저장하게 되는것입니다.

이 “정보의 손실” 이라는것을 최소화 하고자 “Zero Padding”이라는 기술이 존재하는데요

Zero padding이란, 영상처리에서 아주 흔히 쓰이는 기술로

출처 : https://github.com/hunkim/DeepLearningZeroToAll/blob/master/lab-11-0-cnn_basics.ipynb

이렇게, 학습하는 이미지의 끝자락 부분을 0으로 채워(padding)줘서, 필터를 통과시킨 후에도 3X3의 사이즈를

유지하도록 하는것입니다.

그다음, 필터를 거쳐 나온 값들 중에 숫자가 가장 큰것이 있겠죠?

숫자가 가장 크다는것은, 특징이 가장 뚜렷하게 나타나는 것이라고 할 수 있습니다

필터를 거쳐서 나온 것들 중, 가장 큰 값 1개만 남기고 다 없애버리는 것이 바로 ‘Max pooling’ 이라는 기술입니다.

출처 : Andrej karpathy CNN 강의

이렇게 Max pooling을하게되면, 정보의 손실이 그야말로 엄청납니다!

위의 이미지만 봐도 16개의 포인트를 4개의 포인트만 남기고 12개를 죽여버리는것을 확인하실 수 있죠

하지만, 가장 강한 특징들만 남겼기에 이정도는 감수를 해야 앞으로 진행할 이미지 연산이 가능하게 됩니다.

위에서 정리한, Convolution을 통한 Filtering과 Max pooling을 몇차레 반복하다보면

진짜 진짜 진짜 중요한 특징 몇가지만 딱 남게되고, 나머지는 모두 사라지게 되겠죠!

이렇게 정제된 특징들을 기반으로 최종적으로, 이 이미지가 무엇이냐?를 분류하는것이 바로 CNN입니다.

이것을 크게 보면 다음과 같습니다.

출처 : https://www.researchgate.net/figure/The-overall-architecture-of-the-Convolutional-Neural-Network-CNN-includes-an-input_fig4_331540139 출처 : https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53

이미지 자체의 특징 하나하나를 보기에는 1024 * 860 *3 (이미지 사이즈 * RGB 3차원) = 2,580,480개의 데이터를 가집니다.

즉, 이미지 한개에 250만개의 특징을 가지는데, 이런 사진 수만장~수백만장까지 학습을 진행하기란 사실상 불가능하죠

설령 가능하다고해도 1개 사진에서 250만개의 특징을 모두 살리면서, 제대로 인식을 할지조차 미지수입니다.

따라서, CNN을 딱 한줄로 요약하자면

“Convolution을 통한 Filtering과 Max pooling을 반복하여, 정말 중요한 특징들을 정제한 후, Classification을 하는것”

이라고 할 수 있습니다.

개념설명은 잘 안하려고하는데요, CNN이 무엇인지는 알아야 코드를 정리하는게 의미가 있을것 같아 간략히 설명드렸고, 이제는 코드를 정리해보겠습니다.

코드

import torch

import torch.nn as nn

import torch.optim as optim

import torch.nn.init as init

import matplotlib.pyplot as plt

import torchvision.datasets as dset

import torchvision.transforms as transforms

from torch.utils.data import DataLoader

#MNIST라는 숫자 데이터셋을 다운받기위해 torchvision이라는 라이브러리를 미리 다운로드 해줍니다.

#가지고 있는 데이터의 순서를 섞거나, 원하는 비율로 나누거나…하는 데이터 전처리를 위해 DataLoader를 선언

batch_size = 256

learning_rate = 0.0002

num_epoch = 10

#CNN에서 batch_size 는 한번에 학습하는 이미지의 수 입니다. 즉, MNIST는 6만장의 데이터가 있는데요

#이걸 한장한장씩 학습하는게 아니라, 256개씩 묶어서 진행하겠다는 뜻입니다.(256이 아니어도 됩니다.)

#6만장의 사진을 학습하므로, learning rate는 조금 낮게 잡는게 발산할 수 있는 가능성을 낮추어줍니다.

#데이터 사이즈가 큰 관계로 epoch은 10번만 해줍니다.

mnist_train = dset.MNIST(“./”,train=True, transform = transforms.ToTensor(),target_transform=None, download = True)

mnist_test = dset.MNIST(“./”, train=False, transform = transforms.ToTensor(), target_transform=None, download = True)

#torchvision.datasets라이브러리에서 MNIST데이터를 받아오는 코드

train_loader = torch.utils.data.DataLoader(mnist_train,batch_size=batch_size,shuffle=True,num_workers=2,drop_last=True)

test_loader = torch.utils.data.DataLoader(mnist_test,batch_size=batch_size,shuffle=False,num_workers=2,drop_last=True)

#받아온 데이터를 학습하기 위해 나누어줍니다.

#batch_size선언, shuffle : 데이터를 무작위로 섞을때

#num_workers : 데이터를 묶을때 사용하는 프로세스 갯수

#drop_last : 묶고 남은 자투리 데이터들은 버릴지 말지

class CNN(nn.Module):

#C++에서 사용되는 Class 선언(파이썬 : 객체지향 언어)

def __init__(self) :

super(CNN,self).__init__() #Super class로 지금 작성하고있는 클래스 자체를 초기화하기 위함

self.layer = nn.Sequential(

nn.Conv2d(1,16,5),

nn.ReLU(),

nn.Conv2d(16,32,5),

nn.ReLU(),

nn.MaxPool2d(2,2),

nn.Conv2d(32,64,5),

nn.ReLU(),

nn.MaxPool2d(2,2)

)

#Conv2d : Convolution Filtering이라는 Signal Processing적인 방법으로 이미지를 처리 하는것으로,

#nn.Conv2d(1,16,5)는 1개필터짜리 입력(28×28 해상도의 이미지, default filter 갯수 = 1)을 받아 16개의 필터로 size 5의 Kernel(Filtering)을 하는것입니다.

#기본적으로 CNN은 신호/영상처리에 대한 기본적인 이해가 있어야합니다.

#Kernel size가 5인경우, Convoltuion을 하게 되면 4개의 pixel이 사라지게 되어(28×28)의 input 이미지가 (24×24)가 됩니다.

#이런식으로 이미지의 사이즈를 줄여가며 강한 특징만을 추려나가는게 CNN입니다.

#MaxPooling을 중간중간 섞어줌으로써, Convolution보다 더욱 강하게 Feature들을 뽑아내줍니다.

self.fc_layer = nn.Sequential(

nn.Linear(64*3*3,100),

nn.ReLU(),

nn.Linear(100,10)

)

#self.layer : CNN이 끝난 이후, 최종적으로 나오는 결과물은 [batch_size,64,3,3]입니다.

#즉, 256개의 이미지 묶음씩 64개의 필터, (3×3)의 이미지가 남게 되는것으로, pixel갯수로 따지면 64*3*3이 나오게 되는것입니다.

#따라서, 64*3*3의 결과값을 nn.Linear(100,10)을 통해 최종적으로 10개의 값이 나오게하는데

#이 10개의 값이 내가 넣은 이미지가 0~9(10개)중 어떤것일지에 대한 각각의 확률입니다.

def forward(self,x):

out = self.layer(x)

out = out.view(batch_size, -1)

out = self.fc_layer(out)

return out

#CNN함수의 전체적인 그림으로, Conv2d -> Linear Regression -> 추정 입니다.

device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)

#이부분은 굳이 안해주셔도 됩니다. GPU를 사용할 수 없는경우 CPU를 쓰겠다는 것으로, 이부분을 주석처리하고

# model = CNN()로만 해주셔도 됩니다.

model = CNN().to(device)

loss_func = nn.CrossEntropyLoss()

optimizer = torch.optim.Adam(model.parameters(),lr=learning_rate)

#Cross Entropy Loss function, Adam optimizer

loss_arr = []
for i in range(num_epoch):

for j,[image,label] in enumerate(train_loader):

x = image.to(device)

#mnist 학습용 data를 불러옵니다.(28×28)

y_ = label.to(device)

#각각의 data들이 0~9중 어떤숫자인지도 불러옵니다.

optimizer.zero_grad()

#optimizer 초기화

output = model.forward(x)

#학습용 데이터로 CNN 실시

loss = loss_func(output,y_)

#학습해서 추정해낸 값과, 실제 라벨된 값 비교

loss.backward()

#오차만큼 다시 Back Propagation 시행

optimizer.step()

#Back Propagation시 ADAM optimizer 매 Step마다 시행

if j % 1000 == 0 :

print(loss)

loss_arr.append(loss.cpu().detach().numpy())

correct = 0

total = 0

with torch.no_grad():

for image,label in test_loader :

x = image.to(device)

y_ = label.to(device)

output = model.forward(x)

_,output_index = torch.max(output,1)

total += label.size(0)

correct += (output_index == y_).sum().float()

print(“Accuracy of Test Data : {}”.format(100*correct/total))

결과값

Train Data로 학습시키고, Test Dataset으로 검증하면

약 98.66%의 정확도로 사진의 숫자를 추정하는것을 확인할 수 있습니다.

Training a Classifier — PyTorch Tutorials 1.12.1+cu102 documentation

We will do the following steps in order:

Test the network on the test data

Train the network on the training data

Load and normalize the CIFAR10 training and test datasets using torchvision

Let us show some of the training images, for fun.

If running on Windows and you get a BrokenPipeError, try setting the num_worker of torch.utils.data.DataLoader() to 0.

The output of torchvision datasets are PILImage images of range [0, 1]. We transform them to Tensors of normalized range [-1, 1].

Copy the neural network from the Neural Networks section before and modify it to take 3-channel images (instead of 1-channel images as it was defined).

Let’s use a Classification Cross-Entropy loss and SGD with momentum.

See here for more details on saving PyTorch models.

# get the inputs; data is a list of [inputs, labels]
This is when things start to get interesting. We simply have to loop over our data iterator, and feed the inputs to the network and optimize.

5. Test the network on the test data¶

We have trained the network for 2 passes over the training dataset. But we need to check if the network has learnt anything at all.

We will check this by predicting the class label that the neural network outputs, and checking it against the ground-truth. If the prediction is correct, we add the sample to the list of correct predictions.

Okay, first step. Let us display an image from the test set to get familiar.

dataiter = iter ( testloader ) images , labels = dataiter . next () # print images imshow ( torchvision . utils . make_grid ( images )) print ( ‘GroundTruth: ‘ , ‘ ‘ . join ( f ‘ { classes [ labels [ j ]] : 5s } ‘ for j in range ( 4 )))

Out:

GroundTruth: cat ship ship plane

Next, let’s load back in our saved model (note: saving and re-loading the model wasn’t necessary here, we only did it to illustrate how to do so):

net = Net () net . load_state_dict ( torch . load ( PATH ))

Okay, now let us see what the neural network thinks these examples above are:

outputs = net ( images )

The outputs are energies for the 10 classes. The higher the energy for a class, the more the network thinks that the image is of the particular class. So, let’s get the index of the highest energy:

_ , predicted = torch . max ( outputs , 1 ) print ( ‘Predicted: ‘ , ‘ ‘ . join ( f ‘ { classes [ predicted [ j ]] : 5s } ‘ for j in range ( 4 )))

Out:

Predicted: cat car truck ship

The results seem pretty good.

Let us look at how the network performs on the whole dataset.

correct = 0 total = 0 # since we’re not training, we don’t need to calculate the gradients for our outputs with torch . no_grad (): for data in testloader : images , labels = data # calculate outputs by running images through the network outputs = net ( images ) # the class with the highest energy is what we choose as prediction _ , predicted = torch . max ( outputs . data , 1 ) total += labels . size ( 0 ) correct += ( predicted == labels ) . sum () . item () print ( f ‘Accuracy of the network on the 10000 test images: { 100 * correct // total } %’ )

Out:

Accuracy of the network on the 10000 test images: 56 %

That looks way better than chance, which is 10% accuracy (randomly picking a class out of 10 classes). Seems like the network learnt something.

Hmmm, what are the classes that performed well, and the classes that did not perform well:

# prepare to count predictions for each class correct_pred = { classname : 0 for classname in classes } total_pred = { classname : 0 for classname in classes } # again no gradients needed with torch . no_grad (): for data in testloader : images , labels = data outputs = net ( images ) _ , predictions = torch . max ( outputs , 1 ) # collect the correct predictions for each class for label , prediction in zip ( labels , predictions ): if label == prediction : correct_pred [ classes [ label ]] += 1 total_pred [ classes [ label ]] += 1 # print accuracy for each class for classname , correct_count in correct_pred . items (): accuracy = 100 * float ( correct_count ) / total_pred [ classname ] print ( f ‘Accuracy for class: { classname : 5s } is { accuracy : .1f } %’ )

Out:

Accuracy for class: plane is 59.1 % Accuracy for class: car is 78.2 % Accuracy for class: bird is 36.7 % Accuracy for class: cat is 54.3 % Accuracy for class: deer is 42.8 % Accuracy for class: dog is 45.9 % Accuracy for class: frog is 67.1 % Accuracy for class: horse is 60.8 % Accuracy for class: ship is 70.3 % Accuracy for class: truck is 53.7 %

Okay, so what next?

How do we run these neural networks on the GPU?

PyTorch: Training your first Convolutional Neural Network (CNN)

In this tutorial, you will receive a gentle introduction to training your first Convolutional Neural Network (CNN) using the PyTorch deep learning library. This network will be able to recognize handwritten Hiragana characters.

Today’s tutorial is part three in our five part series on PyTorch fundamentals:

What is PyTorch? Intro to PyTorch: Training your first neural network using PyTorch PyTorch: Training your first Convolutional Neural Network (today’s tutorial) PyTorch image classification with pre-trained networks (next week’s tutorial) PyTorch object detection with pre-trained networks

Last week you learned how to train a very basic feedforward neural network using the PyTorch library. That tutorial focused on simple numerical data.

Today, we will take the next step and learn how to train a CNN to recognize handwritten Hiragana characters using the Kuzushiji-MNIST (KMNIST) dataset.

As you’ll see, training a CNN on an image dataset isn’t all that different from training a basic multi-layer perceptron (MLP) on numerical data. We still need to:

Define our model architecture Load our dataset from disk Loop over our epochs and batches Make predictions and compute our loss Properly zero our gradient, perform backpropagation, and update our model parameters

Furthermore, this post will also give you some experience with PyTorch’s DataLoader implementation which makes it super easy to work with datasets — becoming proficient with PyTorch’s DataLoader is a critical skill you’ll want to develop as a deep learning practitioner (and it’s a topic that I’ve dedicated an entire course to inside PyImageSearch University).

To learn how to train your first CNN with PyTorch, just keep reading.

Looking for the source code to this post? Jump Right To The Downloads Section

PyTorch: Training your first Convolutional Neural Network (CNN)

Throughout the remainder of this tutorial, you will learn how to train your first CNN using the PyTorch framework.

We’ll start by configuring our development environment to install both torch and torchvision , followed by reviewing our project directory structure.

I’ll then show you the KMNIST dataset (a drop-in replacement for the MNIST digits dataset) that contains Hiragana characters. Later in this tutorial, you’ll learn how to train a CNN to recognize each of the Hiragana characters in the KMNIST dataset.

We’ll then implement three Python scripts with PyTorch, including our CNN architecture, training script, and a final script used to make predictions on input images.

By the end of this tutorial, you’ll be comfortable with the steps required to train a CNN with PyTorch.

Let’s get started!

Configuring your development environment

To follow this guide, you need to have PyTorch, OpenCV, and scikit-learn installed on your system.

Luckily, all three are extremely easy to install using pip:

$ pip install torch torchvision $ pip install opencv-contrib-python $ pip install scikit-learn

If you need help configuring your development environment for PyTorch, I highly recommend that you read the PyTorch documentation — PyTorch’s documentation is comprehensive and will have you up and running quickly.

And if you need help installing OpenCV, be sure to refer to my pip install OpenCV tutorial.

Having problems configuring your development environment?

Figure 1: Having trouble configuring your dev environment? Want access to pre-configured Jupyter Notebooks running on Google Colab? Be sure to join PyImageSearch University — you’ll be up and running with this tutorial in a matter of minutes.

All that said, are you:

Short on time?

Learning on your employer’s administratively locked system?

Wanting to skip the hassle of fighting with the command line, package managers, and virtual environments?

Ready to run the code right now on your Windows, macOS, or Linux system?

Then join PyImageSearch University today!

Gain access to Jupyter Notebooks for this tutorial and other PyImageSearch guides that are pre-configured to run on Google Colab’s ecosystem right in your web browser! No installation required.

And best of all, these Jupyter Notebooks will run on Windows, macOS, and Linux!

The KMNIST dataset

Figure 2: The KMNIST dataset is a drop-in replacement for the standard MNIST dataset. The KMNIST dataset contains examples of handwritten Hiragana characters (image source).

The dataset we are using today is the Kuzushiji-MNIST dataset, or KMNIST, for short. This dataset is meant to be a drop-in replacement for the standard MNIST digits recognition dataset.

The KMNIST dataset consists of 70,000 images and their corresponding labels (60,000 for training and 10,000 for testing).

There are a total of 10 classes (meaning 10 Hiragana characters) in the KMNIST dataset, each equally distributed and represented. Our goal is to train a CNN that can accurately classify each of these 10 characters.

And lucky for us, the KMNIST dataset is built into PyTorch, making it super easy for us to work with!

Project structure

Before we start implementing any PyTorch code, let’s first review our project directory structure.

Start by accessing the “Downloads” section of this tutorial to retrieve the source code and pre-trained model.

You’ll then be presented with the following directory structure:

$ tree . –dirsfirst . ├── output │ ├── model.pth │ └── plot.png ├── pyimagesearch │ ├── __init__.py │ └── lenet.py ├── predict.py └── train.py 2 directories, 6 files

We have three Python scripts to review today:

lenet.py : Our PyTorch implementation of the famous LeNet architecture train.py : Trains LeNet on the KMNIST dataset using PyTorch, then serializes the trained model to disk (i.e., model.pth ) predict.py : Loads our trained model from disk, makes predictions on testing images, and displays the results on our screen

The output directory will be populated with plot.png (a plot of our training/validation loss and accuracy) and model.pth (our trained model file) once we run train.py .

With our project directory structure reviewed, we can move on to implementing our CNN with PyTorch.

Implementing a Convolutional Neural Network (CNN) with PyTorch

Figure 3: The LeNet architecture. We’ll be implementing LeNet with PyTorch (image source).

The Convolutional Neural Network (CNN) we are implementing here with PyTorch is the seminal LeNet architecture, first proposed by one of the grandfathers of deep learning, Yann LeCunn.

By today’s standards, LeNet is a very shallow neural network, consisting of the following layers:

(CONV => RELU => POOL) * 2 => FC => RELU => FC => SOFTMAX

As you’ll see, we’ll be able to implement LeNet with PyTorch in only 60 lines of code (including comments).

The best way to learn about CNNs with PyTorch is to implement one, so with that said, open the lenet.py file in the pyimagesearch module, and let’s get to work:

# import the necessary packages from torch.nn import Module from torch.nn import Conv2d from torch.nn import Linear from torch.nn import MaxPool2d from torch.nn import ReLU from torch.nn import LogSoftmax from torch import flatten

Lines 2-8 import our required packages. Let’s break each of them down:

Module : Rather than using the Sequential PyTorch class to implement LeNet, we’ll instead subclass the Module object so you can see how PyTorch implements neural networks using classes

: Rather than using the PyTorch class to implement LeNet, we’ll instead subclass the object so you can see how PyTorch implements neural networks using classes Conv2d : PyTorch’s implementation of convolutional layers

: PyTorch’s implementation of convolutional layers Linear : Fully connected layers

: Fully connected layers MaxPool2d : Applies 2D max-pooling to reduce the spatial dimensions of the input volume

: Applies 2D max-pooling to reduce the spatial dimensions of the input volume ReLU : Our ReLU activation function

: Our ReLU activation function LogSoftmax : Used when building our softmax classifier to return the predicted probabilities of each class

: Used when building our softmax classifier to return the predicted probabilities of each class flatten : Flattens the output of a multi-dimensional volume (e.g., a CONV or POOL layer) such that we can apply fully connected layers to it

With our imports taken care of, we can implement our LeNet class using PyTorch:

class LeNet(Module): def __init__(self, numChannels, classes): # call the parent constructor super(LeNet, self).__init__() # initialize first set of CONV => RELU => POOL layers self.conv1 = Conv2d(in_channels=numChannels, out_channels=20, kernel_size=(5, 5)) self.relu1 = ReLU() self.maxpool1 = MaxPool2d(kernel_size=(2, 2), stride=(2, 2)) # initialize second set of CONV => RELU => POOL layers self.conv2 = Conv2d(in_channels=20, out_channels=50, kernel_size=(5, 5)) self.relu2 = ReLU() self.maxpool2 = MaxPool2d(kernel_size=(2, 2), stride=(2, 2)) # initialize first (and only) set of FC => RELU layers self.fc1 = Linear(in_features=800, out_features=500) self.relu3 = ReLU() # initialize our softmax classifier self.fc2 = Linear(in_features=500, out_features=classes) self.logSoftmax = LogSoftmax(dim=1)

Line 10 defines the LeNet class. Notice how we are subclassing the Module object — by building our model as a class we can easily:

Reuse variables

Implement custom functions to generate subnetworks/components (used very often when implementing more complex networks, such as ResNet, Inception, etc.)

Define our own forward pass function

Best of all, when defined correctly, PyTorch can automatically apply its autograd module to perform automatic differentiation — backpropagation is taken care of for us by virtue of the PyTorch library!

The constructor to LeNet accepts two variables:

numChannels : The number of channels in the input images ( 1 for grayscale or 3 for RGB) classes : Total number of unique class labels in our dataset

Line 13 calls the parent constructor (i.e., Module ) which performs a number of PyTorch-specific operations.

From there, we start defining the actual LeNet architecture.

Lines 16-19 initialize our first set of CONV => RELU => POOL layers. Our first CONV layer learns a total of 20 filters, each of which are 5×5. A ReLU activation function is then applied, followed by a 2×2 max-pooling layer with a 2×2 stride to reduce the spatial dimensions of our input image.

We then have a second set of CONV => RELU => POOL layers on Lines 22-25. We increase the number of filters learned in the CONV layer to 50, but maintain the 5×5 kernel size. Again, a ReLU activation is applied, followed by max-pooling.

Next comes our first and only set of fully connected layers (Lines 28 and 29). We define the number of inputs to the layer ( 800 ) along with our desired number of output nodes ( 500 ). A ReLu activation follows the FC layer.

FInally, we apply our softmax classifier (Lines 32 and 33). The number of in_features is set to 500 , which is the output dimensionality from the previous layer. We then apply LogSoftmax such that we can obtain predicted probabilities during evaluation.

It’s important to understand that at this point all we have done is initialized variables. These variables are essentially placeholders. PyTorch has absolutely no idea what the network architecture is, just that some variables exist inside the LeNet class definition.

To build the network architecture itself (i.e., what layer is input to some other layer), we need to override the forward method of the Module class.

The forward function serves a number of purposes:

It connects layers/subnetworks together from variables defined in the constructor (i.e., __init__ ) of the class It defines the network architecture itself It allows the forward pass of the model to be performed, resulting in our output predictions And, thanks to PyTorch’s autograd module, it allows us to perform automatic differentiation and update our model weights

Let’s inspect the forward function now:

def forward(self, x): # pass the input through our first set of CONV => RELU => # POOL layers x = self.conv1(x) x = self.relu1(x) x = self.maxpool1(x) # pass the output from the previous layer through the second # set of CONV => RELU => POOL layers x = self.conv2(x) x = self.relu2(x) x = self.maxpool2(x) # flatten the output from the previous layer and pass it # through our only set of FC => RELU layers x = flatten(x, 1) x = self.fc1(x) x = self.relu3(x) # pass the output to our softmax classifier to get our output # predictions x = self.fc2(x) output = self.logSoftmax(x) # return the output predictions return output

The forward method accepts a single parameter, x , which is the batch of input data to the network.

We then connect our conv1 , relu1 , and maxpool1 layers together to form the first CONV => RELU => POOL layer of the network (Lines 38-40).

A similar operation is performed on Lines 44-46, this time building the second set of CONV => RELU => POOL layers.

At this point, the variable x is a multi-dimensional tensor; however, in order to create our fully connected layers, we need to “flatten” this tensor into what essentially amounts to a 1D list of values — the flatten function on Line 50 takes care of this operation for us.

From there, we connect the fc1 and relu3 layers to the network architecture (Lines 51 and 52), followed by attaching the final fc2 and logSoftmax (Lines 56 and 57).

The output of the network is then returned to the calling function.

Again, I want to reiterate the importance of initializing variables in the constructor versus building the network itself in the forward function:

The constructor to your Module only initializes your layer types. PyTorch keeps track of these variables, but it has no idea how the layers connect to each other.

only initializes your layer types. PyTorch keeps track of these variables, but it has no idea how the layers connect to each other. For PyTorch to understand the network architecture you’re building, you define the forward function.

function. Inside the forward function you take the variables initialized in your constructor and connect them.

function you take the variables initialized in your constructor and connect them. PyTorch can then make predictions using your network and perform automatic backpropagation, thanks to the autograd module

Congrats on implementing your first CNN with PyTorch!

Creating our CNN training script with PyTorch

With our CNN architecture implemented, we can move on to creating our training script with PyTorch.

Open the train.py file in your project directory structure, and let’s get to work:

# set the matplotlib backend so figures can be saved in the background import matplotlib matplotlib.use(“Agg”) # import the necessary packages from pyimagesearch.lenet import LeNet from sklearn.metrics import classification_report from torch.utils.data import random_split from torch.utils.data import DataLoader from torchvision.transforms import ToTensor from torchvision.datasets import KMNIST from torch.optim import Adam from torch import nn import matplotlib.pyplot as plt import numpy as np import argparse import torch import time

Lines 2 and 3 import matplotlib and set the appropriate background engine.

From there, we import a number of notable packages:

LeNet : Our PyTorch implementation of the LeNet CNN from the previous section

: Our PyTorch implementation of the LeNet CNN from the previous section classification_report : Used to display a detailed classification report on our testing set

: Used to display a detailed classification report on our testing set random_split : Constructs a random training/testing split from an input set of data

: Constructs a random training/testing split from an input set of data DataLoader : PyTorch’s awesome data loading utility that allows us to effortlessly build data pipelines to train our CNN

: PyTorch’s awesome data loading utility that allows us to effortlessly build data pipelines to train our CNN ToTensor : A preprocessing function that converts input data into a PyTorch tensor for us automatically

: A preprocessing function that converts input data into a PyTorch tensor for us automatically KMNIST : The Kuzushiji-MNIST dataset loader built into the PyTorch library

: The Kuzushiji-MNIST dataset loader built into the PyTorch library Adam : The optimizer we’ll use to train our neural network

: The optimizer we’ll use to train our neural network nn : PyTorch’s neural network implementations

Let’s now parse our command line arguments:

# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument(“-m”, “–model”, type=str, required=True, help=”path to output trained model”) ap.add_argument(“-p”, “–plot”, type=str, required=True, help=”path to output loss/accuracy plot”) args = vars(ap.parse_args())

We have two command line arguments that need parsing:

–model : The path to our output serialized model after training (we save this model to disk so we can use it to make predictions in our predict.py script) –plot : The path to our output training history plot

Moving on, we now have some important initializations to take care of:

# define training hyperparameters INIT_LR = 1e-3 BATCH_SIZE = 64 EPOCHS = 10 # define the train and val splits TRAIN_SPLIT = 0.75 VAL_SPLIT = 1 – TRAIN_SPLIT # set the device we will be using to train the model device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)

Lines 29-31 set our initial learning rate, batch size, and number of epochs to train for, while Lines 34 and 35 define our training and validation split size (75% of training, 25% for validation).

Line 38 then determines our device (i.e., whether we’ll be using our CPU or GPU).

Let’s start preparing our dataset:

# load the KMNIST dataset print(“[INFO] loading the KMNIST dataset…”) trainData = KMNIST(root=”data”, train=True, download=True, transform=ToTensor()) testData = KMNIST(root=”data”, train=False, download=True, transform=ToTensor()) # calculate the train/validation split print(“[INFO] generating the train/validation split…”) numTrainSamples = int(len(trainData) * TRAIN_SPLIT) numValSamples = int(len(trainData) * VAL_SPLIT) (trainData, valData) = random_split(trainData, [numTrainSamples, numValSamples], generator=torch.Generator().manual_seed(42))

Lines 42-45 load the KMNIST dataset using PyTorch’s build in KMNIST class.

For our trainData , we set train=True while our testData is loaded with train=False . These Booleans come in handy when working with datasets built into the PyTorch library.

The download=True flag indicates that PyTorch will automatically download and cache the KMNIST dataset to disk for us if we had not previously downloaded it.

Also take note of the transform parameter — here we can apply a number of data transformations (outside the scope of this tutorial but will be covered soon). The only transform we need is to convert the NumPy array loaded by PyTorch into a tensor data type.

With our training and testing set loaded, we drive our training and validation set on Lines 49-53. Using PyTorch’s random_split function, we can easily split our data.

We now have three sets of data:

Training Validation Testing

The next step is to create a DataLoader for each one:

# initialize the train, validation, and test data loaders trainDataLoader = DataLoader(trainData, shuffle=True, batch_size=BATCH_SIZE) valDataLoader = DataLoader(valData, batch_size=BATCH_SIZE) testDataLoader = DataLoader(testData, batch_size=BATCH_SIZE) # calculate steps per epoch for training and validation set trainSteps = len(trainDataLoader.dataset) // BATCH_SIZE valSteps = len(valDataLoader.dataset) // BATCH_SIZE

Building the DataLoader objects is accomplished on Lines 56-59. We set shuffle=True only for our trainDataLoader since our validation and testing sets do not require shuffling.

We also derive the number of training steps and validation steps per epoch (Lines 62 and 63).

At this point our data is ready for training; however, we don’t have a model to train yet!

Let’s initialize LeNet now:

# initialize the LeNet model print(“[INFO] initializing the LeNet model…”) model = LeNet( numChannels=1, classes=len(trainData.dataset.classes)).to(device) # initialize our optimizer and loss function opt = Adam(model.parameters(), lr=INIT_LR) lossFn = nn.NLLLoss() # initialize a dictionary to store training history H = { “train_loss”: [], “train_acc”: [], “val_loss”: [], “val_acc”: [] } # measure how long training is going to take print(“[INFO] training the network…”) startTime = time.time()

Lines 67-69 initialize our model . Since the KMNIST dataset is grayscale, we set numChannels=1 . We can easily set the number of classes by calling dataset.classes of our trainData .

We also call to(device) to move the model to either our CPU or GPU.

Lines 72 and 73 initialize our optimizer and loss function. We’ll use the Adam optimizer for training and the negative log-likelihood for our loss function.

When we combine the nn.NLLoss class with LogSoftmax in our model definition, we arrive at categorical cross-entropy loss (which is the equivalent to training a model with an output Linear layer and an nn.CrossEntropyLoss loss). Basically, PyTorch allows you to implement categorical cross-entropy in two separate ways.

Get used to seeing both methods as some deep learning practitioners (almost arbitrarily) prefer one over the other.

We then initialize H , our training history dictionary (Lines 76-81). After every epoch we’ll update this dictionary with our training loss, training accuracy, testing loss, and testing accuracy for the given epoch.

Finally, we start a timer to measure how long training takes (Line 85).

At this point, all of our initializations are complete, so it’s time to train our model.

Note: Be sure you’ve read the previous tutorial in this series, Intro to PyTorch: Training your first neural network using PyTorch, as we’ll be building on concepts learned in that guide.

Below follows our training loop:

# loop over our epochs for e in range(0, EPOCHS): # set the model in training mode model.train() # initialize the total training and validation loss totalTrainLoss = 0 totalValLoss = 0 # initialize the number of correct predictions in the training # and validation step trainCorrect = 0 valCorrect = 0 # loop over the training set for (x, y) in trainDataLoader: # send the input to the device (x, y) = (x.to(device), y.to(device)) # perform a forward pass and calculate the training loss pred = model(x) loss = lossFn(pred, y) # zero out the gradients, perform the backpropagation step, # and update the weights opt.zero_grad() loss.backward() opt.step() # add the loss to the total training loss so far and # calculate the number of correct predictions totalTrainLoss += loss trainCorrect += (pred.argmax(1) == y).type( torch.float).sum().item()

On Line 88, we loop over our desired number of epochs.

We then proceed to:

Put the model in train() mode Initialize our training loss and validation loss for the current epoch Initialize our number of correct training and validation predictions for the current epoch

Line 102 shows the benefit of using PyTorch’s DataLoader class — all we have to do is start a for loop over the DataLoader object. PyTorch automatically yields a batch of training data. Under the hood, the DataLoader is also shuffling our training data (and if we were doing any additional preprocessing or data augmentation, it would happen here as well).

For each batch of data (Line 104) we perform a forward pass, obtain our predictions, and compute the loss (Lines 107 and 108).

Next comes the all important step of:

Zeroing our gradient Performing backpropagation Updating the weights of our model

Seriously, don’t forget this step! Failure to do those three steps in that exact order will lead to erroneous training results. Whenever you write a training loop with PyTorch, I highly recommend you insert those three lines of code before you do anything else so that you are reminded to ensure they are in the proper place.

We wrap up the code block by updating our totalTrainLoss and trainCorrect bookkeeping variables.

At this point, we’ve looped over all batches of data in our training set for the current epoch — now we can evaluate our model on the validation set:

# switch off autograd for evaluation with torch.no_grad(): # set the model in evaluation mode model.eval() # loop over the validation set for (x, y) in valDataLoader: # send the input to the device (x, y) = (x.to(device), y.to(device)) # make the predictions and calculate the validation loss pred = model(x) totalValLoss += lossFn(pred, y) # calculate the number of correct predictions valCorrect += (pred.argmax(1) == y).type( torch.float).sum().item()

When evaluating a PyTorch model on a validation or testing set, you need to first:

Use the torch.no_grad() context to turn off gradient tracking and computation Put the model in eval() mode

From there, you loop over all validation DataLoader (Line 128), move the data to the correct device (Line 130), and use the data to make predictions (Line 133) and compute your loss (Line 134).

You can then derive your total number of correct predictions (Lines 137 and 138).

We round out our training loop by computing a number of statistics:

# calculate the average training and validation loss avgTrainLoss = totalTrainLoss / trainSteps avgValLoss = totalValLoss / valSteps # calculate the training and validation accuracy trainCorrect = trainCorrect / len(trainDataLoader.dataset) valCorrect = valCorrect / len(valDataLoader.dataset) # update our training history H[“train_loss”].append(avgTrainLoss.cpu().detach().numpy()) H[“train_acc”].append(trainCorrect) H[“val_loss”].append(avgValLoss.cpu().detach().numpy()) H[“val_acc”].append(valCorrect) # print the model training and validation information print(“[INFO] EPOCH: {}/{}”.format(e + 1, EPOCHS)) print(“Train loss: {:.6f}, Train accuracy: {:.4f}”.format( avgTrainLoss, trainCorrect)) print(“Val loss: {:.6f}, Val accuracy: {:.4f}

“.format( avgValLoss, valCorrect))

Lines 141 and 142 compute our average training and validation loss. Lines 146 and 146 do the same thing, but for our training and validation accuracy.

We then take these values and update our training history dictionary (Lines 149-152).

Finally, we display the training loss, training accuracy, validation loss, and validation accuracy on our terminal (Lines 149-152).

We’re almost there!

Now that training is complete, we need to evaluate our model on the testing set (previously we’ve only used the training and validation sets):

# finish measuring how long training took endTime = time.time() print(“[INFO] total time taken to train the model: {:.2f}s”.format( endTime – startTime)) # we can now evaluate the network on the test set print(“[INFO] evaluating network…”) # turn off autograd for testing evaluation with torch.no_grad(): # set the model in evaluation mode model.eval() # initialize a list to store our predictions preds = [] # loop over the test set for (x, y) in testDataLoader: # send the input to the device x = x.to(device) # make the predictions and add them to the list pred = model(x) preds.extend(pred.argmax(axis=1).cpu().numpy()) # generate a classification report print(classification_report(testData.targets.cpu().numpy(), np.array(preds), target_names=testData.classes))

Lines 162-164 stop our training timer and show how long training took.

We then set up another torch.no_grad() context and put our model in eval() mode (Lines 170 and 172).

Evaluation is performed by:

Initializing a list to store our predictions (Line 175) Looping over our testDataLoader (Line 178) Sending the current batch of data to the appropriate device (Line 180) Making predictions on the current batch of data (Line 183) Updating our preds list with the top predictions from the model (Line 184)

Finally, we display a detailed classification_report .

The last step we’ll do here is plot our training and validation history, followed by serializing our model weights to disk:

# plot the training loss and accuracy plt.style.use(“ggplot”) plt.figure() plt.plot(H[“train_loss”], label=”train_loss”) plt.plot(H[“val_loss”], label=”val_loss”) plt.plot(H[“train_acc”], label=”train_acc”) plt.plot(H[“val_acc”], label=”val_acc”) plt.title(“Training Loss and Accuracy on Dataset”) plt.xlabel(“Epoch #”) plt.ylabel(“Loss/Accuracy”) plt.legend(loc=”lower left”) plt.savefig(args[“plot”]) # serialize the model to disk torch.save(model, args[“model”])

Lines 191-201 generate a matplotlib figure for our training history.

We then call torch.save to save our PyTorch model weights to disk so that we can load them from disk and make predictions from a separate Python script.

As a whole, reviewing this script shows you how much more control PyTorch gives you over the training loop — this is both a good and a bad thing:

It’s good if you want full control over the training loop and need to implement custom procedures

It’s bad when your training loop is simple and a Keras/TensorFlow equivalent to model.fit would suffice

As I mentioned in part one of this series, What is PyTorch, neither PyTorch nor Keras/TensorFlow is better than the other, there are just different caveats and use cases for each library.

Training our CNN with PyTorch

We are now ready to train our CNN using PyTorch.

Be sure to access the “Downloads” section of this tutorial to retrieve the source code to this guide.

From there, you can train your PyTorch CNN by executing the following command:

$ python train.py –model output/model.pth –plot output/plot.png [INFO] loading the KMNIST dataset… [INFO] generating the train-val split… [INFO] initializing the LeNet model… [INFO] training the network… [INFO] EPOCH: 1/10 Train loss: 0.362849, Train accuracy: 0.8874 Val loss: 0.135508, Val accuracy: 0.9605 [INFO] EPOCH: 2/10 Train loss: 0.095483, Train accuracy: 0.9707 Val loss: 0.091975, Val accuracy: 0.9733 [INFO] EPOCH: 3/10 Train loss: 0.055557, Train accuracy: 0.9827 Val loss: 0.087181, Val accuracy: 0.9755 [INFO] EPOCH: 4/10 Train loss: 0.037384, Train accuracy: 0.9882 Val loss: 0.070911, Val accuracy: 0.9806 [INFO] EPOCH: 5/10 Train loss: 0.023890, Train accuracy: 0.9930 Val loss: 0.068049, Val accuracy: 0.9812 [INFO] EPOCH: 6/10 Train loss: 0.022484, Train accuracy: 0.9930 Val loss: 0.075622, Val accuracy: 0.9816 [INFO] EPOCH: 7/10 Train loss: 0.013171, Train accuracy: 0.9960 Val loss: 0.077187, Val accuracy: 0.9822 [INFO] EPOCH: 8/10 Train loss: 0.010805, Train accuracy: 0.9966 Val loss: 0.107378, Val accuracy: 0.9764 [INFO] EPOCH: 9/10 Train loss: 0.011510, Train accuracy: 0.9960 Val loss: 0.076585, Val accuracy: 0.9829 [INFO] EPOCH: 10/10 Train loss: 0.009648, Train accuracy: 0.9967 Val loss: 0.082116, Val accuracy: 0.9823 [INFO] total time taken to train the model: 159.99s [INFO] evaluating network… precision recall f1-score support o 0.93 0.98 0.95 1000 ki 0.96 0.95 0.96 1000 su 0.96 0.90 0.93 1000 tsu 0.95 0.97 0.96 1000 na 0.94 0.94 0.94 1000 ha 0.97 0.95 0.96 1000 ma 0.94 0.96 0.95 1000 ya 0.98 0.95 0.97 1000 re 0.95 0.97 0.96 1000 wo 0.97 0.96 0.97 1000 accuracy 0.95 10000 macro avg 0.95 0.95 0.95 10000 weighted avg 0.95 0.95 0.95 10000

Figure 4: Plotting our training history with PyTorch.

Training our CNN took ≈160 seconds on my CPU. Using my GPU training time drops to ≈82 seconds.

At the end of the final epoch we have obtained 99.67% training accuracy and 98.23% validation accuracy.

When we evaluate on our testing set we reach ≈95% accuracy, which is quite good given the complexity of the Hiragana characters and the simplicity of our shallow network architecture (using a deeper network such as a VGG-inspired model or ResNet-like would allow us to obtain even higher accuracy, but those models are more complex for an introduction to CNNs with PyTorch).

Furthermore, as Figure 4 shows, our training history plot is smooth, demonstrating there is little/no overfitting happening.

Before moving to the next section, take a look at your output directory:

$ ls output/ model.pth plot.png

Note the model.pth file — this is our trained PyTorch model saved to disk. We will load this model from disk and use it to make predictions in the following section.

Implementing our PyTorch prediction script

The final script we are reviewing here will show you how to make predictions with a PyTorch model that has been saved to disk.

Open the predict.py file in your project directory structure, and we’ll get started:

# set the numpy seed for better reproducibility import numpy as np np.random.seed(42) # import the necessary packages from torch.utils.data import DataLoader from torch.utils.data import Subset from torchvision.transforms import ToTensor from torchvision.datasets import KMNIST import argparse import imutils import torch import cv2

Lines 2-13 import our required Python packages. We set the NumPy random seed at the top of the script for better reproducibility across machines.

We then import:

DataLoader : Used to load our KMNIST testing data

: Used to load our KMNIST testing data Subset : Builds a subset of the testing data

: Builds a subset of the testing data ToTensor : Converts our input data to a PyTorch tensor data type

: Converts our input data to a PyTorch tensor data type KMNIST : The Kuzushiji-MNIST dataset loader built into the PyTorch library

: The Kuzushiji-MNIST dataset loader built into the PyTorch library cv2 : Our OpenCV bindings which we’ll use for basic drawing and displaying output images on our screen

Next comes our command line arguments:

# construct the argument parser and parse the arguments ap = argparse.ArgumentParser() ap.add_argument(“-m”, “–model”, type=str, required=True, help=”path to the trained PyTorch model”) args = vars(ap.parse_args())

We only need a single argument here, –model , the path to our trained PyTorch model saved to disk. Presumably, this switch will point to output/model.pth .

Moving on, let’s set our device :

# set the device we will be using to test the model device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”) # load the KMNIST dataset and randomly grab 10 data points print(“[INFO] loading the KMNIST test dataset…”) testData = KMNIST(root=”data”, train=False, download=True, transform=ToTensor()) idxs = np.random.choice(range(0, len(testData)), size=(10,)) testData = Subset(testData, idxs) # initialize the test data loader testDataLoader = DataLoader(testData, batch_size=1) # load the model and set it to evaluation mode model = torch.load(args[“model”]).to(device) model.eval()

Line 22 determines if we will be performing inference on our CPU or GPU.

We then load the testing data from the KMNIST dataset on Lines 26 and 27. We randomly sample a total of 10 images from this dataset on Lines 28 and 29 using the Subset class (which creates a smaller “view” of the full testing data).

A DataLoader is created to pass our subset of testing data through the model on Line 32.

We then load our serialized PyTorch model from disk on Line 35, passing it to the appropriate device .

Finally, the model is placed into evaluation mode (Line 36).

Let’s now make predictions on a sample of our testing set:

# switch off autograd with torch.no_grad(): # loop over the test set for (image, label) in testDataLoader: # grab the original image and ground truth label origImage = image.numpy().squeeze(axis=(0, 1)) gtLabel = testData.dataset.classes[label.numpy()[0]] # send the input to the device and make predictions on it image = image.to(device) pred = model(image) # find the class label index with the largest corresponding # probability idx = pred.argmax(axis=1).cpu().numpy()[0] predLabel = testData.dataset.classes[idx]
Line 39 turns off gradient tracking, while Line 41 loops over all images in our subset of the test set.

For each image, we:

Grab the current image and turn it into a NumPy array (so we can draw on it later with OpenCV) Extracts the ground-truth class label Sends the image to the appropriate device Uses our trained LeNet model to make predictions on the current image Extracts the class label with the top predicted probability

All that’s left is a bit of visualization:

# convert the image from grayscale to RGB (so we can draw on # it) and resize it (so we can more easily see it on our # screen) origImage = np.dstack([origImage] * 3) origImage = imutils.resize(origImage, width=128) # draw the predicted class label on it color = (0, 255, 0) if gtLabel == predLabel else (0, 0, 255) cv2.putText(origImage, gtLabel, (2, 25), cv2.FONT_HERSHEY_SIMPLEX, 0.95, color, 2) # display the result in terminal and show the input image print(“[INFO] ground truth label: {}, predicted label: {}”.format( gtLabel, predLabel)) cv2.imshow(“image”, origImage) cv2.waitKey(0)

Each image in the KMNIST dataset is a single channel grayscale image; however, we want to use OpenCV’s cv2.putText function to draw the predicted class label and ground-truth label on the image .

To draw RGB colors on a grayscale image, we first need to create an RGB representation of the grayscale image by stacking the grayscale image depth-wise a total of three times (Line 58).

Additionally, we resize the origImage so that we can more easily see it on our screen (by default, KMNIST images are only 28×28 pixels, which can be hard to see, especially on a high resolution monitor).

From there, we determine the text color and draw the label on the output image.

We wrap up the script by displaying the output origImage on our screen.

Making predictions with our trained PyTorch model

We are now ready to make predictions using our trained PyTorch model!

Be sure to access the “Downloads” section of this tutorial to retrieve the source code and pre-trained PyTorch model.

From there, you can execute the predict.py script:

$ python predict.py –model output/model.pth [INFO] loading the KMNIST test dataset… [INFO] Ground truth label: ki, Predicted label: ki [INFO] Ground truth label: ki, Predicted label: ki [INFO] Ground truth label: ki, Predicted label: ki [INFO] Ground truth label: ha, Predicted label: ha [INFO] Ground truth label: tsu, Predicted label: tsu [INFO] Ground truth label: ya, Predicted label: ya [INFO] Ground truth label: tsu, Predicted label: tsu [INFO] Ground truth label: na, Predicted label: na [INFO] Ground truth label: ki, Predicted label: ki [INFO] Ground truth label: tsu, Predicted label: tsu

Figure 5: Making predictions on handwritten characters using PyTorch and our trained CNN.

As our output demonstrates, we have been able to successfully recognize each of the Hiragana characters using our PyTorch model.

What’s next? I recommend PyImageSearch University. Course information:

53+ total classes • 57+ hours of on-demand code walkthrough videos • Last updated: Aug 2022

★★★★★ 4.84 (128 Ratings) • 15,800+ Students Enrolled I strongly believe that if you had the right teacher you could master computer vision and deep learning. Do you think learning computer vision and deep learning has to be time-consuming, overwhelming, and complicated? Or has to involve complex mathematics and equations? Or requires a degree in computer science? That’s not the case. All you need to master computer vision and deep learning is for someone to explain things to you in simple, intuitive terms. And that’s exactly what I do. My mission is to change education and how complex Artificial Intelligence topics are taught. If you’re serious about learning computer vision, your next stop should be PyImageSearch University, the most comprehensive computer vision, deep learning, and OpenCV course online today. Here you’ll learn how to successfully and confidently apply computer vision to your work, research, and projects. Join me in computer vision mastery. Inside PyImageSearch University you’ll find: ✓ 53+ courses on essential computer vision, deep learning, and OpenCV topics

on essential computer vision, deep learning, and OpenCV topics ✓ 53+ Certificates of Completion

of Completion ✓ 57+ hours of on-demand video

of on-demand video ✓ Brand new courses released regularly , ensuring you can keep up with state-of-the-art techniques

, ensuring you can keep up with state-of-the-art techniques ✓ Pre-configured Jupyter Notebooks in Google Colab

✓ Run all code examples in your web browser — works on Windows, macOS, and Linux (no dev environment configuration required!)

✓ Access to centralized code repos for all 450+ tutorials on PyImageSearch

on PyImageSearch ✓ Easy one-click downloads for code, datasets, pre-trained models, etc.

for code, datasets, pre-trained models, etc. ✓ Access on mobile, laptop, desktop, etc. Click here to join PyImageSearch University

Summary

In this tutorial, you learned how to train your first Convolutional Neural Network (CNN) using the PyTorch deep learning library.

You also learned how to:

Save our trained PyTorch model to disk Load it from disk in a separate Python script Use the PyTorch model to make predictions on images

This sequence of saving a model after training, and then loading it and using the model to make predictions, is a process you should become comfortable with — you’ll be doing it often as a PyTorch deep learning practitioner.

Speaking of loading saved PyTorch models from disk, next week you will learn how to use pre-trained PyTorch to recognize 1,000 image classes that you often encounter in everyday life. These models can save you a bunch of time and hassle — they are highly accurate and don’t require you to manually train them.

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), simply enter your email address in the form below!

[Pytorch] CNN을 이용한 문장 분류 모델 구현하기

이전 포스트에 설명한 CNN을 기반으로, 이번엔 직접 데이터 전처리부터 시작해서

CNN을 이용한 문장 분류까지의 이야기를 해보려고 합니다!

데이터는 한글 데이터 중 널리 알려진 “네이버 영화 리뷰”를 사용합니다!

<참고 사항!!>

우선 제가 구현한 컴퓨터의 라이브러리 버전들은 다음과 같습니다.

OS : Linux Ubuntu 20.04 LTS

python = 3.8.5

pytorch = 1.8.0

torchtext = 0.9.0

#Step 1. 데이터 다운받기

‘네이버 영화 리뷰’ 데이터는 https://github.com/e9t/nsmc/ 에서 다운받을 수 있습니다!

네이버 영화 리뷰 데이터 Github

다운 받으시면 모델 훈련에 사용할 ‘ratings_train.txt’ 파일과 모델 테스트에 사용할 ‘ratings_test.txt’ 를 보실 수 있습니다.

이제 다운받은 파일을 이용하여 전처리를 수행해봅시다!

#Step 2-1. 데이터 전처리 (Data preprocessing)

아래 구현되어있는 preprocessing.py 에서는 간단한 전처리를 수행합니다.

중복 데이터 제거 nan 데이터 제거 한글 필터링

#################### ##preprocessing.py## #################### import pandas as pd import re from tqdm import tqdm from sklearn.model_selection import train_test_split #Read table ”’ 훈련 데이터와 테스트 데이터를 각각 나눠줍니다. ”’ train_data = pd.read_table(“ratings_train.txt”) test_data = pd.read_table(“ratings_test.txt”) #각 데이터의 갯수 before_train = len(train_data) before_test = len(test_data) #Define text cleaning method def cleaning(text): #text에서 한글을 제외한 모든 문자열을 공백으로 처리해줍니다. text = re.sub(“[^가-힣ㄱ-ㅎㅏ-ㅣ]”,” “,text) #단어와 단어 사이에 공백이 너무 많아지므로(“가나…다!” -> “가나 다 “) #split후 단어와 단어 사이에 공백이 하나만 들어갈 수 있도록 join을 사용해줍니다. text = ” “.join(text.split()) return text print(“[train]preprocessing…”) #중복 text를 제거해줍니다. train_data.drop_duplicates(subset = [“document”],inplace = True) #nan값을 제거해줍니다. (nan값이 있으면 해당 행 자체를 삭제합니다.) train_data = train_data.dropna(axis = 0) #cleaning함수 적용 ”’ 중복과 nan값을 제거한 뒤 남은 text를 이용하여 위에서 정의한 cleaning함수를 사용하여 한글만 남을 수 있도록 처리해줍니다. ”’ print(“[train]cleaning…”) train_data[‘document’] = [cleaning(t) for t in tqdm(train_data[‘document’])] #중복과 nan값을 한번 더 제거해줍니다. ”’ 이모티콘 및 숫자로만 이루어진 데이터의 경우, cleaning함수를 적용하게 되면 빈 텍스트만 남게 되며 cleaning을 한 이후에 중복이 있을 수 있으므로 중복을 제거해준 뒤, 비어있는 값들에 대해서는 nan 처리를 다시 해줍니다. ”’ train_data.drop_duplicates(subset = [‘document’],inplace = True) train_data = train_data.dropna(axis = 0) print(“[train]done!”) #테스트 데이터에 대해서도 훈련데이터에서 했던것과 마찬가지로 같은 작업을 수행해줍니다. test_data.drop_duplicates(subset=[‘document’],inplace = True) test_data = test_data.dropna(axis = 0) test_data[‘document’] = [cleaning(t) for t in test_data[‘document’]] test_data.drop_duplicates(subset = [‘document’],inplace = True) test_data = test_data.dropna(axis = 0) #전처리 후, 데이터의 갯수 구하기 after_train = len(train_data) after_test = len(test_data) #최종적으로 우리가 사용할 수 있는 데이터의 양을 나타내 줍니다.(전/후 비교) print(“=== 전처리 전 ===”) print(‘훈련 데이터의 갯수 : %d | 테스트 데이터의 갯수 : %d’%(before_train,before_test)) print(“=== 전처리 후 ===”) print(“훈련 데이터의 갯수 : %d | 테스트 데이터의 갯수 : %d”%(after_train,after_test)) #Save : 전처리한 결과를 파일로 저장해줍니다.(나중에 불러다 쓰면 편함!) print(“Save…”) train,valid = train_test_split(train_data) train.to_csv(“new_ratings_train.txt”) #train data valid.to_csv(“new_ratings_valid.txt”) #validation data test_data.to_csv(“new_ratings_test.txt”) #test data print(“Done!”)

위의 코드를 실행시키면 다음과 같이 출력이 됩니다!

#Step 2-2. 데이터 전처리 (Data preprocessing)

이번 step에서는 이전에 저장한 ‘new_ratings_train.txt’ 와 ‘new_ratings_test.txt’를 불러와서 사용합니다!

이제 본격적으로 형태소 분석기를 사용하여 추가 전처리를 진행하며, 모델에 input데이터로 사용할 수 있게끔 만들어 주려고 합니다.

pytorch로 데이터를 만드는 방법은 정말 여러 방법이 있습니다만

이번 포스트에서는 필드를 정의하여 사용해보려고 합니다!

참고사항!!!

만약 아래 코드를 실행했더니 data.Field 관련 에러가 발생한다면 버전차이로 인해 에러가 발생한것입니다!

그런 경우엔 다음과 같이 바꿔주세요!

(기존)

from torchtext import data

(변경 후)

from torchtext.legacy import data

################ ##load_data.py## ################ import pandas as pd from konlpy.tag import Mecab from torchtext import data import torch import torchtext ”’ torchtext의 Field를 이용하여 훈련 및 테스트에 사용할 데이터를 만들어보도록 하겠습니다. 데이터를 원하는 batchsize에 나누기 전, 전처리한 데이터를 이용하여 형태소 분석을 진행합니다. ”’ #Part 1. Tokenize #Tokenizer로 사용할 Mecab 객체를 정의합니다.(Okt등 다른 형태소 분석기를 사용해도 됩니다.) tokenizer = Mecab() #stopword(불용어)를 정의합니다. 사용자에 따라서 추가해서 사용할 수 있습니다. stopwords = [‘의’,’가’,’이’,’은’,’들’,’는’,’좀’,’잘’,’걍’,’과’,’도’,’를’,’으로’,’자’,’에’,’와’,’한’,’하다’] #형태소 분석 후에 사용할 처리들을 모아둔 preprocess라는 이름의 함수를 정의합니다. def preprocess(text): #stopword를 제거합니다. word = [t for t in text if t not in stopwords] return word #Part 2. Define Field print(“+”*50) print(“load data…”) print(“+”*50) #사용안할 예정 IDX = data.Field(sequential = False, use_vocab = False) ID = data.Field(sequential= False, use_vocab = False) #사용할 예정 TEXT = data.Field(fix_length = 20, sequential = True, batch_first = True, is_target = False, use_vocab = True, tokenize = tokenizer.morphs, preprocessing = preprocess) #형태소 분석 + 형태소 분석 이후 추가 처리 진행! LABEL = data.Field(sequential = False,batch_first = True,is_target = True, use_vocab = False,dtype = torch.float32) #필드 정의 field = [(“idx”,IDX),(‘id’,ID),(‘document’,TEXT),(‘label’,LABEL)] #이전에 처리한 문서를 불러와서 훈련에 사용할 데이터로 만들어줍니다. train_data,valid_data,test_data = data.TabularDataset.splits( path = ‘.’, #반드시 있어야함! train = ‘new_ratings_train.txt’,validation = “new_ratings_valid.txt”, test = ‘new_ratings_test.txt’,#이전에 저장했던 문서! format = ‘csv’, fields = field, skip_header = True ) print(“Done!”) print(“+”*50) print(“Samples…”) print(“+”*50) for i in range(5): print(vars(train_data[i])) print(“+”*50) #Part 3. Make data ”’ 위의 과정을 거친 data들을 batch 단위로 만들어주며,모델에 입력 할 수 있도록 Embedding 작업을 진행합니다. Embedding은 사전 훈련된 Fasttext를 사용하며, 모델은 아래 주소에서 다운받을 수 있습니다. https://fasttext.cc/docs/en/crawl-vectors.html -bin파일 : fasttext 모델도 같이 들어있는 파일. + Embedding file -text파일 : Embedding file 물론 다른 모델을 사용해도 되며, Tf-Idf를 적용해보는것도 좋습니다. ”’ vector = torchtext.vocab.Vectors(name = ‘cc.ko.300.vec’) TEXT.build_vocab(train_data,vectors = vector) device = ‘cuda’ if torch.cuda.is_available() else ‘cpu’ #Batch size에 맞게 데이터를 만들어줍니다. train_batch = data.BucketIterator( dataset = train_data, sort = False, batch_size = 64, #batch 크기는 64로 설정! repeat = False, device = device) valid_batch = data.BucketIterator( dataset = valid_data, sort = False, batch_size = 64, #batch 크기는 64로 설정! train = False, device = device) test_batch = data.BucketIterator( dataset = test_data, sort = False, batch_size = 64, #batch 크기는 64로 설정! train = False, device = device) print(“load data… Done!!”) print(“+”*50)

위의 코드를 실행시키면 다음과 같이 전처리 된 데이터의 sample을 볼 수 있습니다!

#Step 3. 모델 구현하기

이제 본격적으로 모델 클래스를 정의하고 훈련을 진행해보도록 하겠습니다.

################ ####model.py#### ################ from load_data import * import math import torch import torch.nn as nn import torch.nn.functional as F from torch import optim from tqdm import tqdm from sklearn.metrics import accuracy_score from torchtext.data import Iterator class CNN_network(nn.Module): def __init__(self,embedding_size,seq_length): super(CNN_network,self).__init__() #embedding layer을 정의. ”’ load_data.py에서 정의한 TEXT field를 이용하여 Embedding(Fasttext) layer를 정의해줍니다. ”’ self.seq_length = seq_length #sequence_length(이전 TEXT Field에서 정의한 fix_length값) self.embedding_size = embedding_size self.kernel = [2,3,4] self.output_size = 128 #Embedding layer self.embedding = nn.Embedding.from_pretrained(TEXT.vocab.vectors) #Convolution layer self.conv1 = nn.Conv1d(in_channels = self.embedding_size,out_channels =self.output_size,kernel_size = self.kernel[0],stride=1) #seq_length, out_seq,kernel_size self.conv2 = nn.Conv1d(in_channels = self.embedding_size,out_channels =self.output_size,kernel_size = self.kernel[1],stride=1) self.conv3 = nn.Conv1d(in_channels = self.embedding_size,out_channels =self.output_size,kernel_size = self.kernel[2],stride=1) #pooling layer self.pool1 = nn.MaxPool1d(self.kernel[0],stride = 1) self.pool2 = nn.MaxPool1d(self.kernel[1],stride = 1) self.pool3 = nn.MaxPool1d(self.kernel[2],stride = 1) #Dropout & FC layer self.dropout = nn.Dropout(0.25) self.linear1 = nn.Linear(self._calculate_features(),1024) self.linear2 = nn.Linear(1024,128) self.linear3 = nn.Linear(128,1) def _calculate_features(self): ”’ FC layer의 input size를 구하기 위한 함수입니다. convolved features = ((embedding_size + (2 * padding) – dilation * (kernel – 1) -1 )/ stride ) + 1 Pooled features = ((embedding_size + (2*padding) – dilation * (kernel – 1) – 1)/stride) + 1 Source : https://pytorch.org/docs/stable/generated/torch.nn.Conv1d.html ”’ out_conv1 = (self.seq_length – 1 * (self.kernel[0] – 1)-1) + 1 out_conv1 = math.floor(out_conv1) out_pool1 = (out_conv1 – 1 * (self.kernel[0]-1)-1 ) + 1 out_pool1 = math.floor(out_pool1) #print(out_pool1) out_conv2 = (self.seq_length – 1 * (self.kernel[1] – 1)-1) + 1 out_conv2 = math.floor(out_conv2) out_pool2 = (out_conv2 – 1 * (self.kernel[1]-1)-1 ) + 1 out_pool2 = math.floor(out_pool2) #print(out_pool2) out_conv3 = (self.seq_length – 1 * (self.kernel[2] – 1)-1) + 1 out_conv3 = math.floor(out_conv3) out_pool3 = (out_conv3 – 1 * (self.kernel[2]-1)-1 ) + 1 out_pool3 = math.floor(out_pool3) #print(out_pool3) out = (out_pool1 + out_pool2 + out_pool3) * 128 #torch.cat이후 최종 size return out def forward(self,input_sentence,size): device = ‘cuda’ if torch.cuda.is_available() else ‘cpu’ x = self.embedding(input_sentence) x = x.transpose(1,2) x1 = self.conv1(x) x1 = F.sigmoid(x1) x1 = self.pool1(x1) x2 = self.conv2(x) x2 = F.sigmoid(x2) x2 = self.pool2(x2) x3 = self.conv3(x) x3 = F.sigmoid(x3) x3 = self.pool3(x3) x_concat = torch.cat((x1,x2,x3),2) #2번째 차원 기준으로 묶음.(32,30,17) + (32,30,13) => (32,30,30) x_concat = torch.flatten(x_concat,1) #batch를 제외한 나머지를 묶어버린다 -> FC layer를 사용하기 위함. (32,30,30) -> (32,900) out = self.linear1(x_concat) out = self.dropout(out) out = self.linear2(out) out = self.dropout(out) out = self.linear3(out) out = F.sigmoid(out) return out.squeeze()

위 코드는 크기가 다른 filter 3개를 이용하여 각각 Convolution 연산을 수행한 뒤, 그 결과를 합쳐서 fully-connected layer에 통과시키는 모델을 구현하였습니다.

일반적으로 1D-CNN을 이용하여 text classification을 진행할때 다음과 같이 진행합니다.

Text Classification Using 1D-CNN

위 그림은 크기가 동일한 2개의 filter를 3개 사용하고 있지만, 저의 코드에서는 단순히 크키가 다른 filter를 1개만 사용합니다. 위 모델대로 짜려면 동일한 filter를 추가로 정의하셔서 구현해보시기 바랍니다!

#Step 4. 모델 훈련하기

이제 위에서 구현한 모델을 이용하여 훈련 및 테스트를 진행하고자 합니다.

우선 Train 함수입니다.

train 함수는 모델,train_data,valid_data를 input으로 받습니다.

여기서 train_data는 실제 모델의 훈련에 사용되며, valid_data는 한 epoch당 accuracy를 출력하는 용도로 사용하였습니다.

def train(model,train_data,valid_data): device = ‘cuda’ if torch.cuda.is_available() else ‘cpu’ #훈련 모드 ON model.train() #optimizer 정의 optimizer = optim.AdamW(model.parameters(),lr = 0.0001) #loss loss_fn = torch.nn.BCELoss() epochs = 10 for epoch in range(epochs): t_loss = 0 for batch_idx,batch in tqdm(enumerate(train_data)): text = batch.document label = batch.label label = torch.tensor(label,dtype=torch.float,device = device) size = len(text) out = model(text,size) loss = loss_fn(out,label) #update parameters optimizer.zero_grad() loss.backward() optimizer.step() t_loss += loss.detach().item() print(f”Epoch : {epoch + 1} / {epochs} \t Train Loss : {t_loss/len(train_data) : .3f}”) test(model,valid_data)

#Step 5. 모델 테스트하기

다음은 test함수입니다.

test함수는 입력된 데이터에 대해 모델이 예측을 진행하며, 그 결과를 Accuracy로 표현해줍니다.

함수 자체는 간단하게 만들어보았습니다..ㅎㅎ

def test(model,data): #평가 모드로 진입 model.eval() #prediction predictions = [] labels = [] with torch.no_grad(): for batch in data: text = batch.document label = batch.label size = len(text) y_pred = model(text,size) for i in y_pred: if i >= 0.5: predictions.append(1) else: predictions.append(0) for j in label: labels.append(j.cpu()) print(f”Accuracy : {accuracy_score(labels,predictions) : .3f}”) print(“sample pred : “,predictions[:10]) print(‘sample labels : ‘,labels[:10]) print(“=”*100)

#Step 6. 유저가 입력한 문장 판단하기

마지막으로 유저가 직접!! 문장을 입력하면 모델이 긍/부정을 예측해주는 predict 함수를 만들어보았습니다.

test 데이터에 있는 데이터의 긍/부정을 모델이 판단하기는 하지만 본인이 직접 작성한 리뷰를 모델이 어떻게 판단하는지 궁금하지 않으신가요?ㅎㅎ

def predict(model,sentence): model.eval() with torch.no_grad(): sent = tokenizer.morphs(sentence) sent = torch.tensor([TEXT.vocab.stoi[i] for i in sent]) sent = F.pad(sent,pad = (1,20-len(sent)-1),value = 1) sent = sent.unsqueeze(dim = 0) #for batch output = model(sent,len(sent)) return output.item()

#Step 7. 실행 및 결과 확인하기

자 이제 필요한 모든 함수 및 클래스는 구현이 끝이 났습니다.

이제 직접 실행시켜서 확인해봅시다!

if __name__ == “__main__”: device = ‘cuda’ if torch.cuda.is_available() else ‘cpu’ model = CNN_network(embedding_size=300,seq_length=20).to(device) #train train(model,train_batch,valid_batch) #test print(“===TEST===”) test(model,test_batch) #사용자가 입력한 문장에 대해 긍/부정 판단(0을 입력하면 종료) while True: user = input(“테스트 할 리뷰를 작성하세요 : “) if user == ‘0’: break model = model.to(‘cpu’) pred = predict(model,user) if pred >= 0.5 : print(f”>>>긍정 리뷰입니다. ({pred : .2f})”) else: print(f”>>>부정 리뷰입니다.({pred : .2f})”)

위의 코드를 실행시키면 다음과 같이 훈련이 잘 되는것을 보실 수 있습니다.

제가 학습을 진행해보았을땐 테스트 데이터의 최고 Accuracy는 0.82가 나왔으며, 최저는 0.78이였습니다.

제 코드를 이용하시는분들은 layer를 추가하거나 filter를 추가하는 방식 등 여러 방법을 이용하여 저보다 Accuracy가 높게 나오셨으면 좋겠습니다ㅎㅎ

마지막으로 제가 직접 입력한 문장에 대해 CNN모델이 긍/부정을 판단하는 모습도 잘 동작함을 볼 수 있었습니다.

사용자가 입력한 문장의 긍/부정 판단 1 사용자가 입력한 문장의 긍/부정 판단 2

일반적으로 CNN은 MNIST 같이 이미지 데이터를 분류할때 많이 사용하는 모델이지만, 이렇게 문장 분류도 어느정도 좋은 성능을 낸다는것을 확인 할 수 있었습니다.

긴 글 읽어주셔서 감사합니다.

오류 및 질문은 댓글 남겨주시면 답변 드리겠습니다!
[참고 사이트] [1] https://wikidocs.net/44249
[2] https://ichi.pro/ko/pytorcheseo-cnneul-sayonghan-tegseuteu-bunlyu-18777046640543

[Pytorch-기초강의] 4. 이미지 처리 능력이 탁월한 CNN(Simple CNN, Deep CNN, ResNet, VGG, Batch Normalization )

728×90

반응형

※ 본 게시물에 사용된 내용의 출처는 대다수 <펭귄브로의 3분 딥러닝-파이토치맛>에서 사용된 자료이며, 개인적인 의견과 해석이 추가된 부분도 존재합니다. 그림의 경우 교재를 따라 그리거나, 제 임의대로 추가 수정한 부분도 존재합니다. 저 역시도 공부하고자 포스팅한 게시물이니, 잘못된 부분은 댓글로 알려주시면 감사하겠습니다.

– Simple CNN 코드 리뷰

– Deep CNN

– ResNet

– Batch Normalization

– VGG

Simple CNN 코드 리뷰

일전에 언급했던 CNN의 기본적인 구조를 구현해놓은 간단한 코드를 분석해본다.

– 사용코드

본 예제에서는 앞에서 사용한 패션 아이템을 CNN 네트워크를 사용하여, 분류 성능을 높여본다.

import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F from torchvision import transforms, datasets from torchsummary import summary # 모델 구조를 한번에 확인하기 위해 추가한 라이브러리 #torchsummary : 파라미터 개수, 레이어 구조를 정리하여 확인할 수 있다.

준비 코드는 이전과 거의 비슷한 필수 라이브러리들을 Import한다. 이때 추가적으로 torchsummary 라이브러리를 추가하여 네트워크 내 파라미터의 개수와 구조를 간단하게 보고자 한다.

# GPU 사용여부 확인 코드 # DEVICE 안에 cuda(gpu)가 할당되어야 함 USE_CUDA = torch.cuda.is_available() DEVICE = torch.device(“cuda” if USE_CUDA else “cpu”) EPOCHS = 40 BATCH_SIZE = 64

이전과 동일하게 GPU동작을 위해 DEVICE할당과 Epoch, batch_size등 필수적인 파라미터를 정의해준다.

데이터를 로딩하는 과정은 이전과 동일하므로, 넘어가도록 한다.

# Dataloader : 데이터셋을 batch 단위로 쪼개서 학습할 때 모델의 입력으로 주는 클래스. train_loader = torch.utils.data.DataLoader( datasets.MNIST(‘./.data’, train=True,#학습 download=True,#없으면 다운로드 transform=transforms.Compose([#torchvision.transform = 입력 변환 라이브러리 transforms.ToTensor(),#이미지를 Tensor로 transforms.Normalize((0.1307,), (0.3081,))#이미지 정규화 ])), batch_size=BATCH_SIZE, shuffle=True) test_loader = torch.utils.data.DataLoader( datasets.MNIST(‘./.data’, train=False, #테스트 download=True,#없으면 다운로드 transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,)) ])), batch_size=BATCH_SIZE, shuffle=True)

본격적인 CNN모델을 구현해보도록 하자.

크게 CNN 클래스를 구현하는 Init함수와 실제 데이터가 지나가는 길인 forward함수로 나뉜다.

CNN모델은 일전에 말한대로 CNN-ReLU(Activation function)-Dropout(Overfitting방지) 와 같은 형태를 유지한다.

class Net(nn.Module): # CNN Network Class 정의# # kernel size 5×5, 2 Convolutional layer# def __init__(self): super(Net, self).__init__() # 채널 수 : 1(흑백 이미지) / output volume_size : 10(필터의 개수==10개의 특징맵을 생성) / kernel_size(필터의 사이즈) = 5×5 # kernel size를 하나만 입력하면 NxN으로 간주. / NxM을 원할경우, (N,M)으로 정의 self.conv1 = nn.Conv2d(1, 10, kernel_size=5) # 입력 채널 수 : 10 (conv1의 결과물) / output volume size : 20 (20개의 특징맵을 생성) / kernel_size(필터의 사이즈) = 5×5 self.conv2 = nn.Conv2d(10, 20, kernel_size=5) #Dropout 적용 self.conv2_drop = nn.Dropout2d() # 일반 신경망을 거치면서 이전 출력 크기인 320을 기준으로 50, 10순으로 작아지도록 설정 # 이때 50과 같은 중간 값은 임의로 설정한 값. 10은 분류해야 할 클래스의 개수(FASHION MNIST CLASS 개수) self.fc1 = nn.Linear(320, 50) self.fc2 = nn.Linear(50, 10) #CNN Network의 동작(forward)함수, 본격적으로 데이터가 지나갈 길을 닦아준다 생각하면 됨.# def forward(self, x): # 각 레이어는 conv – pooling – relu를 하나의 묶음으로 간주 x = F.relu(F.max_pool2d(self.conv1(x), 2))# max pooling with (2×2)kernel x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2)) # 데이터를 FC레이어에 넣기 위해 2차원에서 1차원 형태로 변경 x = x.view(-1, 320) #RELU activation function+Dropout x = F.relu(self.fc1(x)) x = F.dropout(x, training=self.training) # 출력 클래스 10개인 Output생성 x = self.fc2(x) return x

레이어는 크게 CNN 레이어 2개(Conv1, Conv2) FC 레이어 2개(fc1, fc2)로 구성되어 있으며, 중간중간 활성화 함수와 pooling(max)이 껴있다. 사실상 CNN 두개를 사용하여 이미지의 feature를 추출하고, fc1, fc2를 거치면서 실제 클래스 분류를 진행한다 보면 된다. 일전 예제와 비슷하게 fc2의 최종 결과물이 데이터의 클래스 개수와 동일해지면서 최종 예측값을 내보내는 레이어라 생각하면 된다.

각 라인별 주석을 추가해놓으니, 그대로 따라가면 되겠지만 중간에 있는 320이라는 feature의 개수는 CNN의 output 개수로, 이를 직접 수식으로 계산하는 방법이 존재하지만, 우리는 조금 영리하게 pdb 라이브러리를 사용하여 중간에 찍어보도록 하자. 나는 실제로 DNN를 구현할 때 pdb를 가장 많이 사용한다. pdb는 일전 포스팅에 적어놓았지만 아래와 같이 사용하면 된다. 자세한 pdb명령어는 다음 포스팅을 참고하자.

import pdb; pdb.set_trace()

모델 설계가 끝난 후, 실제 코드 수행은 일전과 동일하기 때문에 따로 추가하지 않도록 한다.

model = Net().to(DEVICE) summary(model, (1, 28, 28)) #모델 구조를 보기 위하여 삽입한 코드 optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) print(model)

summary 함수를 출력해보면 파라미터 개수 및 레이어를 확인할 수 있다.

최종적으로 실제 훈련 및 테스트를 진행하면 이전 FC에 비해 매우 높은 성능을 보이는 것을 알 수 있다.

Deep CNN

이전까지는 단순한 흑백, 사이즈가 작은 이미지의 데이터셋에 적용하기 때문에 비교적 간단한 모델로 학습 시킬 수 있었다. 하지만 이는 단순한 데이터 처리에만 적용되는 이야기고, 대규모 벤치마크 데이터셋, 즉 복잡한 데이터가 늘어나면서 깊게 쌓아 올린 딥러닝 모델(Deep Neural Network)이 훨씬 성능에 유리하다.

+) ImageNet : ImageNet 데이터셋은 구성하는 이미지가 1000만개가 넘는 대규모 데이터셋으로 약 1000개의 클래스로 구성된 데이터셋이다. 자세한 데이터셋은 링크를 통해 들여다보길 바란다.

아래 사진은 실제로 몇 년에 걸쳐 Deep Neural Network의 발전을 보여주는 그래프 자료이다.

좌측 이미지는 ImageNet 데이터셋을 이용한 ILSVRC(ImageNet Large Scale Visual Recognition Challenge)국제 대회에서의 네트워크 크기(size)와 성능(error)을 보여주는 자료로, 13년도까지는 8 layer로 약 11퍼센트까지 오차(error)를 낮추었으나, 14년도에 22 layer, 15년도 ResNet이 등장하면서 152 layer까지 증가하고 Error를 대폭 줄인 것을 확인할 수 있다.

하지만, 무턱대고 layer를 마구 늘린다고 성능이 좋아지는 것은 절대 아니다. 우측 자료를 보면 56 layer를 쌓은 DNN이 20 layer의 DNN보다 error율이 더 높은 것을 알 수 있다. 이는 layer의 연산 과정 중 gradient vanishing/exploding 문제로 학습이 올바르게 이루어지지 않기 때문이며, 높은 레이어를 쌓고 좋은 성능을 내기 위해선 특별한 방법이 요구됨을 알 수 있다.

이제, 상단의 모델 중 대표적인 Deep CNN으로 VGG와 ResNet을 간단히 살펴보도록 하자.

VGG Net은 2014년에 Oxford 대학에서 개발한 모델로 앞서 언급한 ILSVRC에서 준우승을 한 모델이다.(우승은 GoogleNet.) 이때, VGG는 크게 16개, 19개의 layer로 구성된 모델이 소개되었는데, 사실상 VGG Net이 등장한 14년도부터 CNN 네트워크가 크게 깊어졌다해도 과언이 아니다.

+) VGG는 현재까지도 많이 쓰이는 네트워크로, 중요한 주제이지만 본 교재(3분딥러닝_파이토치맛)에서는 언급조차 되어있지 않아 부득이하게 간단하게만 정리하고 넘어가도록 한다.

VGG의 핵심은 네트워크를 깊게 많드는것이 성능에 얼마만큼의 영향을 미치는가를 알아보는 내용이다. 그런 것에 비해 네트워크는 상당히 단순하게 생겼는데, Abstract만 읽어봐도 생각보다 좀 심플함을 알 수 있다.

여기서의 핵심은 CNN을 구성하는 Kernel Filter를 (3 x 3)로 설정하여, 깊이를 16 – 19까지 늘렸다는 것이 제일 핵심이다. 필터의 사이즈를 최대한 작게 진행하면서, 이미지를 줄여나갔기 때문에 19까지 늘릴 수 있었다고 본다.

아래 사진이 실제 VGG 16 의 아키텍쳐이다.

– Input : 224 x 224 x 3 RGB Image (ImageNet 데이터셋 기준)

– 13개의 Convolution layer + 3 FC layer

: 16개의 layer로 구성된 VGG(Pooling, dropout 계층은 훈련을 안하니, 제외)

– Conv layer = (3 x 3 filter, stride = 1, padding = True)

: 이미지 사이즈 유지를 위해 padding을 적용했으며, conv layer개수에 변화를 주어가며, 실험

– Pooling layer(2 x 2 filter, stride = 2)

: max pooling으로 진행, feature map 사이즈를 1/4로 줄임

– FC layer (4096 > 4096 > 1000) : 마지막 1000은 클래스 개수

– Padding : CNN 연산, Max pooling 계층에서 Padding으로 “same”을 부여하여, 오로지 Mas pooling과정에서만 이미지 사이즈가 절반으로 줄어들게끔 조절하였고, 이로 인하여 Convolution 연산 과정에서는 이미지 크기가 줄어들지 않았다.

+) VGG 19는 VGG 16모델에 CNN은 3개 더 얹은 형태

여기서의 가장 큰 핵심은 앞에서 말한 (3×3) Filter인데, 다른 모델(GoogleNet, AlexNet)은 (5×5), (7×7), (11×11)사이즈의 필터를 사용하여 feature map 사이즈를 줄여나갔는데 VGG에서는 작은 필터를 여러번 사용하여, 큰 필터 하나 사용했을 때의 효과와 동일한 효과를 가져왔으며, 오히려 학습해야할 파라미터 수는 줄일 수 있었다.

아래 그림을 보면 조금 더 이해하기 쉬운데, 3 x 3 필터를 써서 2번 컨볼루션을 하는 과정과 5 x 5 필터로 한번 하는게 똑같은 사이즈의 특징맵을 만드는 것을 확인할 수 있다. 그리고 실제 VGG를 구성하는 파라미터 개수를 보면 1) 3 x 3필터 3개 = 3x3x3 = 27 vs 2) 7×7필터 1개 = 7x7x1 = 49로 필터의 개수가 늘어나더라도, 학습해야 할 파라미터 개수가 오히려 적어진다는 것을 알 수 있다.

이는 학습 속도가 빠르다는 장점과 최적화하기 쉬워진다는 장점이 있으며, 작은 레이어를 여러번 쌓아 비선형 함수를 자주 거치게 되면서, 복잡한 특징들을 학습할 수 있는 능력이 증대된다는 장점이 존재한다.

이것만 보면 VGG가 굉장히 단순하고 좋아보이지만, 15년도에 ResNet이 등장하면서 Residual block으로 관심이 쏠리게 된다.

ResNet은 15년에 나온 모델로 ILSVRC 1위를 차지하며, 152 layer이라는 엄청난 크기의 모델을 제안하며 화제가 되었다. 또한, 단순히 신경망을 깊게 쌓으면 오히려 성능이 나빠진다는 문제에 대한 해결방안을 제시하였고, 이후 등장하는 모델들에게 영향을 준 연구라 할 수 있다.

ResNet의 핵심은 Residual block인데, 아래 그림이 Residual Block의 핵심 아이디어이다.

좌측이 기존의 신경망이라면, 오른쪽에 F(x)라는 신규 라인이 생긴게 Residual block의 핵심이다. 간단하게 설명하면 그냥 이전 입력값을 출력값과 함께 더하여 보내준다(skip connection)는 차이점이 존재한다.

여기서 ResNet논문에서 강조했던 부분이 Residual learning(잔차 학습), Identity mapping(function, 항등 맵핑(함수))이 등장하는데, 이 개념이 조금 헷갈리니 아래에서 자세히 다뤄보도록 한다.

기존의 CNN은 좌측처럼 단순히 입력값(x)가 실제 목적하는 값(y)로 mapping할 수 있도록 H(x)를 얻는 과정에 집중했다. 즉, H(x) = y여야 하고, H(x)-y = 0, H(x)-y가 최소화 하는 방향으로 CNN이 feature 추출법을 학습하는 과정이라 할 수 있다.

ResNet은 이거랑 조금 다른 이야기를 언급하는데, 입력값(x)를 넣고 y를 예측할 수 있는 최적의 함수 H(x)를 찾는 과정보다, 이전에 학습했던 정보(x)를 갖고오고 여기에 추가 학습(F(x))을 진행하는게 더 쉽지 않겠냐는 의미다. 이건 다른말로 보면 F(x) + x = H(x)라 두고, 이 F(x)가 점점 작아지도록 하여 결국엔 x = H(x)가 되도록, 즉 H(x) 입력값과 동등하게 보는 identify function을 주장하게 된다. 이렇게 되면 F(x) = H(x) – x 즉, H(x)라는 값과 x간의 차이(= residual 잔차를 학습)를 줄이는 방향으로 학습하는 과정인 것이다. 이렇게 되면 H(x) = x가 되면서 아무리 학습과정에서 미분을 진행하더라도 최소 gradient 로 1 이상의 값을 갖기 때문에 gradient vanishing 문제는 해결되게 된다.

여기까지 말했을 때 조금 복잡하게 보일 수 있다. 거두절미하고 진짜 간단하게 얘기하면 그냥 이전의 연산값을 추가해서 학습하자는 얘기다. 그리고 이렇게 했을 때 깊은 신경망의 단점인 gradient vanishing가 해결된다.

이로써, ResNet 연구팀은 18, 34, 50, 101, 152개의 레이어를 쌓아가면서 성능 개선을 이룰 수 있었고, 본 교재 (3분 딥러닝 파이토치맛)에서 Deep CNN으로 이 ResNet을 예제로 들었으므로, 아래 코드를 분석해보면서 Deep CNN의 성능을 확인해보자.

ResNet 코드 리뷰

– 사용 코드

앞선 코드들과 마찬가지로 하이퍼파라미터, 주요 라이브러리 임포트 과정은 동일하다.

import torch import torch.nn as nn import torch.optim as optim import torch.nn.functional as F from torchvision import transforms, datasets, models # GPU 사용여부 확인 코드 # DEVICE 안에 cuda(gpu)가 할당되어야 함 USE_CUDA = torch.cuda.is_available() DEVICE = torch.device(“cuda” if USE_CUDA else “cpu”) EPOCHS = 300 BATCH_SIZE = 128

여기에 CIFAR data loader를 불러오도록 한다.

# Dataloader : 데이터셋을 batch 단위로 쪼개서 학습할 때 모델의 입력으로 주는 클래스. #CIFAR10 : 10개의 클래스, (32 * 32 *3) RGB 이미지 # Train data loader train_loader = torch.utils.data.DataLoader( datasets.CIFAR10(‘./.data’, train=True, download=True, transform=transforms.Compose([ # 과적합 방지를 위해 노이즈 추가(RandomCrop, RandomHorizontalFlip) transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])), batch_size=BATCH_SIZE, shuffle=True) # Test data loader test_loader = torch.utils.data.DataLoader( datasets.CIFAR10(‘./.data’, train=False, transform=transforms.Compose([ transforms.ToTensor(), transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])), batch_size=BATCH_SIZE, shuffle=True)

바로 이어서 모델에 대한 본격적인 설계에 들어가도록 하자. 우선, 코드에 들어가기 전에 본 논문에서 구성한 모델 사진은 아래와 같다. 보이는 것처럼 크게 3개의 layer로 구성되어 있으며, layer 1 -> layer 2 , layer 2-> layer 3으로 넘어갈때마다 shortcut이 생성되는 것을 알 수 있다. 이 과정에 앞에서 말한 residual 과정이라 보면 된다.

각 layer는 BasicBlock으로 구성되어 있으며, 아래 코드가 본격적인 BasicBlock 클래스이다.

class BasicBlock(nn.Module): # ResNet을 구성하는 기본 블록 # CNN + Batch Normalization + ReLU def __init__(self, in_planes, planes, stride=1): super(BasicBlock, self).__init__() self.conv1 = nn.Conv2d(in_planes, planes, kernel_size=3, stride=stride, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(planes)# Batch Normalization(for. 학습 안정화) self.conv2 = nn.Conv2d(planes, planes, kernel_size=3, stride=1, padding=1, bias=False) self.bn2 = nn.BatchNorm2d(planes) #nn.Sequential : 여러 모듈을 하나로 묶는 역할 self.shortcut = nn.Sequential() #shortcut = Con + BN if stride != 1 or in_planes != planes: self.shortcut = nn.Sequential( nn.Conv2d(in_planes, planes, kernel_size=1, stride=stride, bias=False), nn.BatchNorm2d(planes) ) def forward(self, x): out = F.relu(self.bn1(self.conv1(x))) out = self.bn2(self.conv2(out)) out += self.shortcut(x) out = F.relu(out) return out

보이는 것처럼 CNN 두개, 그 사이에 학습 안정화를 위한 배치 정규화가 포함되어 정의되어 있으며, 마지막 shortcut은 이후 뒤에서 나올 stride가 1이 아닌 경우, 또한 이전의 input feature와 달라질 경우 추가되도록 정의되어 있다. forward함수는 앞에서 정의한 conv, batch norm을 같이 정의해 놓았으며, shortcut이 필요한 경우, output에 추가로 더하여 사용할 수 있도록 정의되어 있다. 이때, 앞에서 말한 조건(stride가 1이 아닌 경우, 또한 이전의 input feature와 달라질 경우)이 아니면, shortcut은 빈 nn.sequential 모듈이 되어 그냥 지나가듯 동작할 수 있다.

+) 배치 정규화(Youtube 강의 추천)란 학습률(learning rate)을 너무 높게 잡으면 기울기가 소실되거나 발산하는 증상을 예방하여 학습 과정을 안정화하는 방법. 즉, 학습 중 각 계층에 들어가는 입력을 평 균과 분산으로 정규화함으로써 학습을 효율적으로 만들어줌. 이 계층은 자체적으로 정규화를 수행해 드롭아웃과 같은 효과를 내는 장점이 있다.

이제 실제 ResNet 클래스를 보자.

class ResNet(nn.Module): def __init__(self, num_classes=10): super(ResNet, self).__init__() self.in_planes = 16 self.conv1 = nn.Conv2d(3, 16, kernel_size=3, stride=1, padding=1, bias=False) self.bn1 = nn.BatchNorm2d(16) # 배치 정규화 진행 self.layer1 = self._make_layer(16, 2, stride=1) self.layer2 = self._make_layer(32, 2, stride=2) self.layer3 = self._make_layer(64, 2, stride=2) ”’ layer1 : 16채널에서 16채널을 내보내는 BasicBlock 2개 layer2 : 16채널을 받아 32채널을 출력하는 BasicBlock 1개와 32채널에서 32채널을 내보내는 BasicBlock 1개 layer3 : 32채널을 받아 64채널을 출력하는 BasicBlock 1개와 64채널에서 64채널을 출력하는 BasicBlock 1개 16->32, 32->64로 증폭시키는 Basic Block은 shortcut모듈 보유 ”’ self.linear = nn.Linear(64, num_classes) def _make_layer(self, planes, num_blocks, stride): # nn.Sequential로 묶어서 여러 개의 Basic Block을 하나의 모듈로 묶어주는 역할 strides = [stride] + [1]*(num_blocks-1) layers = [] for stride in strides: layers.append(BasicBlock(self.in_planes, planes, stride)) self.in_planes = planes return nn.Sequential(*layers) def forward(self, x): out = F.relu(self.bn1(self.conv1(x))) out = self.layer1(out) out = self.layer2(out) out = self.layer3(out) out = F.avg_pool2d(out, 8) out = out.view(out.size(0), -1) out = self.linear(out) return out

간단히 코드를 보면, 각 3종류의 layer가 make_layer라는 함수에 의해 정의되어 있고, make layer는 또다시 앞에서 정의한 여러 개의 Basicblock을 묶어 모듈화를 진행시켜주는 함수라 보면 된다.

또한, init 함수 내 3종류의 layer 중 layer 1을 제외한 나머지 2, 3은 각각 16->32로 증폭하는 블록 1개, 32 ->32를 내보내는 block한개, 32->64로 증폭하는 블록 1개, 64->64로 진행하는 블록 한개로 정의되어 있다. 이때 앞에서 언급한 조건 (stride가 1이 아닌 경우, 또한 이전의 input feature와 달라질 경우)이 바로 16->32로 증폭하는 블록 , 32->64로 증폭하는 블록 두가지의 케이스이다. 두 개의 케이스에서 shortcut 내 Conv2d와 BatchNorm이 정의됨을 알 수 있고, 이는 각 layer가 바뀔때마다 진행된다.

마지막으로 forward함수를 보면, 실제 layer의 결과값을 받고 제일 마지막 출력을 내기 위해 feature 사이즈를 변경하는 코드 out = out.view(out.size(0), -1), 과 linear layer를 거치는 모습을 볼 수 있다.

이후, 실제 코드를 돌리고 이 과정에서 이전과 다르게 최적화 함수의 학습률 감소(learning rate decay)기법을 사용하여 조금 더 정교하게 최적화할 수 있도록 설정해준다. 이 과정에는 optim.lr_scheduler.StepLR 라이브러리로 적용한다.

model = ResNet().to(DEVICE) optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=0.0005) #optim.lr_scheduler.StepLR : Scheduler가 epoch마다 호출, step_size를 50으로 지정하여, # 50번마다 learning rate * 0.1을 수행하여, 점점 낮춤 # gamma = 0.1 로 시작, 점점 0.01, 0.001,… 형태로 낮아짐 scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.1)

실제 model을 print(model)함수로 찍어보면 아래와 같다.

ResNet( (conv1): Conv2d(3, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (layer1): Sequential( (0): BasicBlock( (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) (1): BasicBlock( (conv1): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(16, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(16, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) (layer2): Sequential( (0): BasicBlock( (conv1): Conv2d(16, 32, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(16, 32, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) (layer3): Sequential( (0): BasicBlock( (conv1): Conv2d(32, 64, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential( (0): Conv2d(32, 64, kernel_size=(1, 1), stride=(2, 2), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) ) ) (1): BasicBlock( (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (shortcut): Sequential() ) ) (linear): Linear(in_features=64, out_features=10, bias=True) )

이후 학습과 테스트 함수는 이전과 거의 동일하게 진행한다.

def train(model, train_loader, optimizer, epoch): model.train() for batch_idx, (data, target) in enumerate(train_loader): data, target = data.to(DEVICE), target.to(DEVICE) optimizer.zero_grad() output = model(data) loss = F.cross_entropy(output, target) loss.backward() optimizer.step() def evaluate(model, test_loader): model.eval() test_loss = 0 correct = 0 with torch.no_grad(): for data, target in test_loader: data, target = data.to(DEVICE), target.to(DEVICE) output = model(data) # 배치 오차를 합산 test_loss += F.cross_entropy(output, target, reduction=’sum’).item() # 가장 높은 값을 가진 인덱스가 바로 예측값 pred = output.max(1, keepdim=True)[1] correct += pred.eq(target.view_as(pred)).sum().item() test_loss /= len(test_loader.dataset) test_accuracy = 100. * correct / len(test_loader.dataset) return test_loss, test_accuracy for epoch in range(1, EPOCHS + 1): scheduler.step()# 앞서 언급한 학습률을 낮추는 단계 train(model, train_loader, optimizer, epoch) test_loss, test_accuracy = evaluate(model, test_loader) print(‘[{}] Test Loss: {:.4f}, Accuracy: {:.2f}%’.format( epoch, test_loss, test_accuracy))

보면 Epoch 300이 될때 약 91퍼센트의 성능을 보임을 알 수 있다.

지금은 Residual Block을 직접 설계하여 ResNet의 개념을 넣은 예제였고, 실제로 ResNet을 사용할때는 다른 벤치마크 데이터셋으로 선행학습된 모델을 많이 사용할 것이다.

그럴땐 ResNet class를 대신하여 아래와 같은 코드 torchvision 에서 ILSVRC에서 우승한 Pre-trained 모델을 가져온 후, 클래스에 맞도록 맨 마지막 layer만 바꿔주면 된다. (경우에 따라선 맨 처음 Input도 바꿔줘야 할 수 있다. )

# ================================================================== # # 6. Pretrained model # # ================================================================== # # ResNet-18은 ILSVRC에서 2015년 우승한 심층 신경망 모델. model = torchvision.models.resnet18(pretrained=True) # 미세 조정을 위해 상단 레이어를 교체 num_ftrs = model.fc.in_features model.fc= nn.Linear(num_ftrs, 10) # 100 = 예시 model = model.to(DEVICE)

하지만, 선행학습된 모델을 가져왔다고 다 좋은 성능을 낼거란 기대는 하지 말길.. 학습하는 데이터가 너무 단순한 데이터, 또는 형식이 다를 경우엔 오히려 이전보다 낮은 성능을 가져올 수 있다.

728×90

반응형

키워드에 대한 정보 pytorch cnn 예제

다음은 Bing에서 pytorch cnn 예제 주제에 대한 검색 결과입니다. 필요한 경우 더 읽을 수 있습니다.

이 기사는 인터넷의 다양한 출처에서 편집되었습니다. 이 기사가 유용했기를 바랍니다. 이 기사가 유용하다고 생각되면 공유하십시오. 매우 감사합니다!

사람들이 주제에 대해 자주 검색하는 키워드 Pytorch CNN 예제 (컨볼 루션 신경망)

Pytorch CNN
Pytorch CNN tutorial
Pytorch cnn
pytorch cnn tutorial

Pytorch #CNN #예제 #(컨볼 #루션 #신경망)

YouTube에서 pytorch cnn 예제 주제의 다른 동영상 보기

주제에 대한 기사를 시청해 주셔서 감사합니다 Pytorch CNN 예제 (컨볼 루션 신경망) | pytorch cnn 예제, 이 기사가 유용하다고 생각되면 공유하십시오, 매우 감사합니다.

pytorch cnn 예제 주제에 대한 동영상 보기

d여기에서 Pytorch CNN 예제 (컨볼 루션 신경망) – pytorch cnn 예제 주제에 대한 세부정보를 참조하세요

pytorch cnn 예제 주제에 대한 자세한 내용은 여기를 참조하세요.

Pytorch로 CNN 구현하기 – JustKode

Pytorch로 구현하는 CNN(Convolutional Neural Network) – 데하

[pytorch 따라하기-5] 합성곱신경망(CNN) 구현

Training a Classifier — PyTorch Tutorials 1.12.1+cu102 …

PyTorch: Training your first Convolutional Neural Network (CNN)

PyTorch로 딥러닝하기 — CNN – Medium

[Pytorch] CNN을 이용한 문장 분류 모델 구현하기

[Pytorch-기초강의] 4. 이미지 처리 능력이 탁월한 CNN(Simple …

주제와 관련된 이미지 pytorch cnn 예제

주제에 대한 기사 평가 pytorch cnn 예제

Pytorch로 CNN 구현하기

Pytorch로 구현하는 CNN(Convolutional Neural Network)

[pytorch 따라하기-5] 합성곱신경망(CNN) 구현

Training a Classifier — PyTorch Tutorials 1.12.1+cu102 documentation

PyTorch: Training your first Convolutional Neural Network (CNN)

[Pytorch] CNN을 이용한 문장 분류 모델 구현하기

[Pytorch-기초강의] 4. 이미지 처리 능력이 탁월한 CNN(Simple CNN, Deep CNN, ResNet, VGG, Batch Normalization )

키워드에 대한 정보 pytorch cnn 예제

사람들이 주제에 대해 자주 검색하는 키워드 Pytorch CNN 예제 (컨볼 루션 신경망)

Leave a Comment Cancel reply