코딩(Coding)/기계학습

[기계학습]합성곱 신경망(CNN : Convolutional Nerual Network) (Part 2/2)

J.S.Y 2022. 2. 18. 12:03

728x90

합성곱 신경망(CNN :Convolutional Nerual Network) (Part 2/3)

기존의 Fully-Connected 모델은 1차원의 데이터 말고 2차원 이상의 데이터를 사용하게 된다면, 해당 입력 데이터를 Flatten시켜 한 줄의 데이터로 만들어야 한다.
이 과정에서 데이터의 손상이 발생하게 된다.
이미지의 경우에는 상하좌우 이웃 픽셀의 정보가 손실된다.
위 문제를 해결하기 위해 고안한 해결책이 바로 CNN이다.

CNN 장점

단순 Fully-connected 보다 학습시킬 weight가 적다.

학습과 연산에 속도가 빠르며, 효율적이다.

이미지나 영상데이터를 처리할 때 사용한다.

CNN의 접근

이미지 표현 => Matrix

해당 실습에서 사용된 데이터와 코드(.ipynb)는 아래 링크에서 확인할 수 있습니다.

패키지 Import

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
from torchsummary import summary as summary_


from tqdm import tqdm, notebook

데이터 살펴보기

Fashion MNIST Dataset

# label_tags = ["티셔츠/탑", "트루저", "풀오버", "드레스", "코트", "샌들", "셔츠", "스니커", "가방", "앵클부츠"]
label_tags = ["T-shirt/top", "Trouser", "Pullover", "Dress", "Coat", "Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"]

train_dataset = pd.read_csv("Data/fashion-mnist_train.csv")
test_dataset = pd.read_csv("Data/fashion-mnist_test.csv")

# Split to Image & Label
train_images = (train_dataset.iloc[:, 1:].values).astype("float32")
train_labels = train_dataset["label"].values
test_images = (test_dataset.iloc[:, 1:].values).astype("float32")
test_labels = test_dataset["label"].values

Pandas 패키지를 통해서 Fashion_Mnist CSV 파일을 읽어온다.

Train 60000개, Test 10000개

그 다음 Feature와 Label로 구분지어준다.

# Split into Train, Valid Dataset
from sklearn.model_selection import train_test_split
train_images, valid_images, train_labels, valid_labels = train_test_split(train_images, 
                                                                          train_labels, 
                                                                          stratify = train_labels, 
                                                                          random_state = 42, 
                                                                          test_size = 0.2)

SciKit-Learn 패키지의 Train_Test_Split 함수를 통해서 Train데이터 60000개를 48000개의 학습셋과 12000개의 검증셋으로 나눈다.

# Reshape image's size to check for ours
# (size, 784) => (size, 28, 28)
train_images = train_images.reshape(train_images.shape[0], 28, 28)
valid_images = valid_images.reshape(valid_images.shape[0], 28, 28)
test_images = test_images.reshape(test_images.shape[0], 28, 28)

이미지를 2차원 데이터로 차원 변환해준다.(784 -> 28x28)

# Check Train, Valid, Test Image's Shape
print("The Shape of Train Images: ", train_images.shape)
print("The Shape of Valid Images: ", valid_images.shape)
print("The Shape of Test Images: ", test_images.shape)

# Check Train, Valid Label's Shape
print("The Shape of Train Labels: ", train_labels.shape)
print("The Shape of Valid Labels: ", valid_labels.shape)
print("The Shape of Valid Labels: ", test_labels.shape)

The Shape of Train Images:  (48000, 28, 28)
The Shape of Valid Images:  (12000, 28, 28)
The Shape of Test Images:  (10000, 28, 28)
The Shape of Train Labels:  (48000,)
The Shape of Valid Labels:  (12000,)
The Shape of Valid Labels:  (10000,)

데이터를 시각화 해보자

# 데이터 시각화
img = train_images[20]
label = train_labels[20]

print("Label :",label_tags[label])
plt.imshow(img, cmap='gray'); plt.show()

Label : Pullover

output_11_1

Dataset 정의

class MyDataset(Dataset):
    def __init__(self, feature_data, label_data, num_classes = 10):
        self.x_data = feature_data
        self.y_data = label_data
        self.num_classes = num_classes

    def __len__(self):
        return len(self.x_data)

    def __getitem__(self, idx):
        # image
        img = self.x_data[idx] / 255.       # 명암값 정규화
        img = torch.FloatTensor(img)        # Tensor로 변환
        img = img.view(1, 28, 28)           # (channel, width, height)

        # label
        label = torch.tensor(self.y_data[idx])
        label = F.one_hot(label, num_classes = self.num_classes)        # one-hot 인코딩
        label = label.float()

        return img, label

label_tags = ['T-Shirt', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt','Sneaker', 'Bag', 'Ankle Boot']

train_dataset = MyDataset(train_images, train_labels)
valid_dataset = MyDataset(valid_images, valid_labels)
test_dataset = MyDataset(test_images, test_labels)

train_loader = DataLoader(train_dataset, batch_size = 32, shuffle = True)
valid_loader = DataLoader(valid_dataset, batch_size = 32)
test_loader = DataLoader(test_dataset, batch_size = 32)

print(train_loader)
print(valid_loader)
print(test_loader)

TASK에 맞게 Dataset을 정의하고 DataLoader로 생성해준다.

Image는 기존에 28x28 Shape에서 1x28x28으로 변환해준다.(Channel, Width, Height)
Label은 0~9까지 10개의 Label을 One-Hot Encoding 해준다.
Batch_size = 32이다.필자 GPU가 좋지 못하다..

훈련 & 검증 함수 정의

loss_fn = nn.CrossEntropyLoss()

def calc_acc(X, Y):
    x_val, x_idx = torch.max(X, dim=1)
    y_val, y_idx = torch.max(Y, dim=1)
    return (x_idx == y_idx).sum().item()

def train(EPOCHS, model, train_loader, opt):
    train_loss_history = []
    valid_loss_history = []
    train_acc_history = []
    valid_acc_history = []
    for epoch in range(1, EPOCHS+1):
        model.train()
        train_acc = 0
        print("<<< EPOCH {} >>>".format(epoch))
        for batch_idx, (img,label) in enumerate(notebook.tqdm(train_loader)):
            img, label = img.to(DEVICE), label.to(DEVICE)

            output = model(img)                 # 순전파
            loss = loss_fn(output, label)       # 오차 계산

            opt.zero_grad()                     # opt내부 값 초기화
            loss.backward()                     # 오차 역전파
            opt.step()                          # 가중치 갱신

            train_acc += calc_acc(output, label)
            if batch_idx % 100 == 0 and batch_idx != 0:
                print("Training : [{}/{} ({:.0f}%)]\tLoss: {:.6f}\t Acc : {:.3f}".format(
                    batch_idx * len(img), 
                    len(train_loader.dataset), 
                    100. * batch_idx / len(train_loader), 
                    loss.item(),
                    train_acc / len(train_loader.dataset)))
        t_loss, t_acc = evaluate(model, valid_loader)
        print("[{}] valid Loss : {:.4f}\t accuracy: {:.2f}%\n\n".format(epoch, t_loss, t_acc*100.))

        train_loss_history.append(loss.item())
        train_acc_history.append(train_acc / len(train_loader.dataset))

        valid_loss_history.append(t_loss.item())
        valid_acc_history.append(t_acc)

    return train_loss_history, train_acc_history, valid_loss_history, valid_acc_history

def evaluate(model, valid_loader):
    model.eval()
    t_loss = 0
    correct = 0

    with torch.no_grad():
        for img, label in notebook.tqdm(valid_loader):
            img, label = img.to(DEVICE), label.to(DEVICE)

            output = model(img)
            t_loss += loss_fn(output, label)

            correct += calc_acc(output, label)

    t_loss /= len(valid_loader)
    t_acc = correct / len(valid_loader.dataset)
    return t_loss, t_acc

def predict(model, lower=0, upper=10):
    model.eval()
    for idx in range(lower, upper):
        img, _ = test_dataset.__getitem__(idx)

        output = model(img.view(1, 1, 28, 28))

        o_val, o_idx = torch.max(output, dim=1)

        print("Label :", label_tags[o_idx.item()])
        plt.imshow(img.view(28, 28), cmap='gray')
        plt.show()
        print()

학습을 위한 함수를 정의해 준다.

훈련을 위한 Train() 함수 => Train_Loader
검증을 위한 Evaluate()함수 => Valid_Loader or Test_Loader
추론을 위한 Predict()함수 => Test_Loader(본 포스팅에선 사용하지 않는다.)

모델 정의

모델은 2가지를 정의하였다.

단순히 선형변환을 사용하는 LinearNet
CNN을 이용한 분류기 CNN

Linear Net

3계층의 선형변환을 수행한다.

class LinearNet(nn.Module):
    def __init__(self):
        super(LinearNet, self).__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 64)
        self.fc3 = nn.Linear(64, 10)

        self.act_fn = nn.LeakyReLU()

    def forward(self, x):
        x = x.view(-1, 1*28*28)

        x = self.fc1(x)
        x = self.act_fn(x)

        x = self.fc2(x)
        x = self.act_fn(x)

        x = self.fc3(x)
        return x

CNN

Convolutional Dot + Pooling을 4계층으로 쌓고 2계층의 선형분류기를 쌓은 CNN 클래스이다.

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 8, kernel_size = 3, padding = 1)
        self.conv2 = nn.Conv2d(8 , 16, kernel_size = 3, padding = 1)
        self.conv3 = nn.Conv2d(16, 32, kernel_size = 3, padding = 1)
        self.conv4 = nn.Conv2d(32, 64, kernel_size = 3, padding = 1)

        self.pooling = nn.MaxPool2d(2, 2)
        self.flatten = nn.AdaptiveAvgPool2d(1)

        self.fc1 = nn.Linear(64, 24)
        self.fc2 = nn.Linear(24, 10)

        self.act_fn = nn.LeakyReLU()

    def forward(self, x):
        x = self.conv1(x)           # (batch, 1, 28, 28) -> (batch, 8, 28, 28)
        x = self.pooling(x)         # (batch, 8, 28, 28) -> (batch, 8, 14, 14)
        x = self.act_fn(x)

        x = self.conv2(x)           # (batch, 8, 14, 14) -> (batch, 16, 14, 14)
        x = self.pooling(x)         # (batch, 16, 14, 14) -> (batch, 16, 7, 7)
        x = self.act_fn(x)

        x = self.conv3(x)           # (batch, 16, 7, 7) -> (batch, 32, 7, 7)
        x = self.pooling(x)         # (batch, 32, 7, 7) -> (batch, 32, 3, 3)
        x = self.act_fn(x)

        x = self.conv4(x)           # (batch, 32, 3, 3) -> (batch, 64, 3, 3)
        x = self.pooling(x)         # (batch, 64, 3, 3) -> (batch, 64, 1, 1)
        x = self.act_fn(x)        

        x = self.flatten(x)         # # (batch, 64, 3, 3) -> (batch, 64, 1, 1)
        x = x.view(-1, 64*1*1)

        x = self.fc1(x)
        x = self.act_fn(x)

        x = self.fc2(x)
        return x

훈련 및 검증

이제 모든 준비는 끝났다.
LinearNet부터 훈련/검증을 시작해보자

Linear Net

USE_CUDA = torch.cuda.is_available()
DEVICE = "cuda" if USE_CUDA else "cpu"

model = LinearNet().to(DEVICE)
opt = optim.Adam(model.parameters())

print("Device :", DEVICE)
summary_(model,(1,28,28), device=DEVICE)

Device : cuda
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Linear-1                  [-1, 256]         200,960
         LeakyReLU-2                  [-1, 256]               0
            Linear-3                   [-1, 64]          16,448
         LeakyReLU-4                   [-1, 64]               0
            Linear-5                   [-1, 10]             650
================================================================
Total params: 218,058
Trainable params: 218,058
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.00
Params size (MB): 0.83
Estimated Total Size (MB): 0.84
----------------------------------------------------------------

LinearNet은 단순한 3계층이다.

사용하는 가중치 개수는 218,058이다.

# 학습 시작 #
t_loss_his, t_acc_his, v_loss_his, v_acc_his = train(EPOCHS = 10, model = model, train_loader = train_loader, opt = opt)

<<< EPOCH 1 >>>
Training : [3200/48000 (7%)]    Loss: 0.735302     Acc : 0.041
... 중간 생략 ...
Training : [44800/48000 (93%)]    Loss: 0.350932     Acc : 0.752
[1] valid Loss : 0.3948     accuracy: 84.87%

<<< EPOCH 2 >>>
Training : [3200/48000 (7%)]    Loss: 0.448148     Acc : 0.057
... 중간 생략 ...
Training : [44800/48000 (93%)]    Loss: 0.507022     Acc : 0.803
[2] valid Loss : 0.3696     accuracy: 86.38%

... 
중간 생략
...

<<< EPOCH 10 >>>
Training : [3200/48000 (7%)]    Loss: 0.036901     Acc : 0.061
... 중간 생략 ...
Training : [44800/48000 (93%)]    Loss: 0.253712     Acc : 0.851
[10] valid Loss : 0.3202     accuracy: 88.84%

10번의 학습 후 검증데이터에 대한 Loss와 정확도는 각각 0.3202와 88.84%이다.

plt.plot(t_loss_his, label="train")
plt.plot(v_loss_his, label="valid")
plt.legend()
plt.show()

output_26_0

plt.plot(t_acc_his, label="train")
plt.plot(v_acc_his, label="valid")
plt.legend()
plt.show()

output_27_0

Test셋에 대한 검증 결과는 아래와 같다.

v_loss, v_acc = evaluate(model, test_loader)
print("Test Loss : {:.4f}\t accuracy: {:.2f}%\n".format(v_loss, v_acc*100.))

Test Loss : 0.3141     accuracy: 88.88%

CNN

USE_CUDA = torch.cuda.is_available()
DEVICE = "cuda" if USE_CUDA else "cpu"

model = CNN().to(DEVICE)
opt = optim.Adam(model.parameters())

print("Device :", DEVICE)
summary_(model,(1,28,28), device=DEVICE)

Device : cuda
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1            [-1, 8, 28, 28]              80
         MaxPool2d-2            [-1, 8, 14, 14]               0
         LeakyReLU-3            [-1, 8, 14, 14]               0
            Conv2d-4           [-1, 16, 14, 14]           1,168
         MaxPool2d-5             [-1, 16, 7, 7]               0
         LeakyReLU-6             [-1, 16, 7, 7]               0
            Conv2d-7             [-1, 32, 7, 7]           4,640
         MaxPool2d-8             [-1, 32, 3, 3]               0
         LeakyReLU-9             [-1, 32, 3, 3]               0
           Conv2d-10             [-1, 64, 3, 3]          18,496
        MaxPool2d-11             [-1, 64, 1, 1]               0
        LeakyReLU-12             [-1, 64, 1, 1]               0
AdaptiveAvgPool2d-13             [-1, 64, 1, 1]               0
           Linear-14                   [-1, 24]           1,560
        LeakyReLU-15                   [-1, 24]               0
           Linear-16                   [-1, 10]             250
================================================================
Total params: 26,194
Trainable params: 26,194
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.00
Forward/backward pass size (MB): 0.13
Params size (MB): 0.10
Estimated Total Size (MB): 0.23
----------------------------------------------------------------

CNN은 앞선 LinearNet보다는 복잡한 구조를 가졌지만,
사용하는 가중치 개수는 26,194이다.
LinearNet보다 대략 9~10배 정도 더 적은 가중치를 사용한다.
과연, CNN은 적은 가중치로 어떤 성능을 보여줄까?

# 학습 시작 #
t_loss_his, t_acc_his, v_loss_his, v_acc_his = train(EPOCHS = 10, model = model, train_loader = train_loader, opt = opt)

<<< EPOCH 1 >>>
Training : [3200/48000 (7%)]    Loss: 1.139337     Acc : 0.024
... 중간 생략 ...
Training : [44800/48000 (93%)]    Loss: 0.460232     Acc : 0.673
[1] valid Loss : 0.5410     accuracy: 79.25%
<<< EPOCH 2 >>>
Training : [3200/48000 (7%)]    Loss: 0.342676     Acc : 0.055
... 중간 생략 ...
Training : [44800/48000 (93%)]    Loss: 0.312424     Acc : 0.781

...
중간 생략 
...

<<< EPOCH 10 >>>
Training : [3200/48000 (7%)]    Loss: 0.082751     Acc : 0.061
... 중간 생략 ...
Training : [44800/48000 (93%)]    Loss: 0.157439     Acc : 0.847
[10] valid Loss : 0.2796     accuracy: 89.85%

10번의 학습 후 검증데이터에 대한 Loss와 정확도는 각각 0.2796와 89.85%이다.

plt.plot(t_loss_his, label="train")
plt.plot(v_loss_his, label="valid")
plt.legend()
plt.show()

output_32_0

plt.plot(t_acc_his, label="train")
plt.plot(v_acc_his, label="valid")
plt.legend()
plt.show()

output_33_0

v_loss, v_acc = evaluate(model, test_loader)
print("Test Loss : {:.4f}\t accuracy: {:.2f}%\n".format(v_loss, v_acc*100.))

Test Loss : 0.2746     accuracy: 89.97%

Test 데이터에 대해서는 Linear Net에 비해 대략 1%정도 더 높은 성능을 보인다.
과연 이 1%의 성능이 더 좋은 것일까?

필자는 9~10배 정도의 적은 가중치를 사용하고도 LinearNet보다 더 좋은 성능을 보인 CNN의 손을 들어주고 싶다.
LinearNet과 CNN의 진정한 성능차이는 컬러이미지에서 들어난다.

다음 포스팅은 Cifar-10 데이터셋을 통해서 코드 실습을 진행해보고자 한다.

728x90