Build a VGG16 CNN in PyTorch From Scratch

Interested in deep learning and computer vision? This article guides you through building the VGG16 convolutional neural network (CNN) from scratch using PyTorch. Dive into the architecture of VGG16, understand its key components, and learn how to implement it for image classification.

What is VGG16?

Foundation: VGG, building on AlexNet, emphasizes the importance of network depth in CNNs.
Creator: Developed by Simonyan and Zisserman.
Deep Structure: VGG16 consists of 16 convolutional layers. A deeper version known as VGG19 goes up to 19 layers.
Key Feature: All convolutional layers use small 3x3 filters.

For in-depth details, refer to the original paper, "Very Deep Convolutional Networks for Large-Scale Image Recognition."

Loading the CIFAR-100 Dataset for Image Classification

Before building the model, you need to load and preprocess the data. Here, we'll use the CIFAR-100 dataset.

What is CIFAR-100? an image dataset with 100 classes, each containing 600 images (500 for training and 100 for testing).
Labels: Each image has both "fine" (specific class) and "coarse" (superclass) labels.
Classes: The 100 classes are grouped into 20 superclasses.

Import Necessary Libraries

Import the required libraries:

torch: Building and training the model
torchvision: Loading and preprocessing data.
numpy: Mathematical calculations.

Also, it's a good idea to define the device to use the GPU if it’s available.

import numpy as np
import torch
import torch. nn as nn
from torchvision import datasets
from torchvision import transforms
from torch. utils. data. sampler import SubsetRandomSampler

# Device configuration 
device = torch. device ( 'cuda' if torch. cuda. is_available () else 'cpu')

Efficient Data Loading with PyTorch

Use the following function with torchvision to load and process your dataset into your model.

def data_loader ( data_dir, 
batch_size, 
random_seed = 42, 
valid_size = 0.1, 
shuffle = True, 
test = False): 
normalize = transforms. Normalize ( 
mean = [ 0.4914, 0.4822, 0.4465], 
std = [ 0.2023, 0.1994, 0.2010], 
) 
 # define transforms 
transform = transforms. Compose ( [ 
transforms. Resize ( ( 227, 227)), 
transforms. ToTensor (), 
normalize, 
]) 
 if test: 
dataset = datasets. CIFAR100 ( 
root = data_dir, train = False, 
download = True, transform = transform, 
) 
data_loader = torch. utils. data. DataLoader ( 
dataset, batch_size = batch_size, shuffle = shuffle
) 
 return data_loader
 # load the dataset 
train_dataset = datasets. CIFAR100 ( 
root = data_dir, train = True, 
download = True, transform = transform, 
) 
valid_dataset = datasets. CIFAR10 ( 
root = data_dir, train = True, 
download = True, transform = transform, 
) 
num_train = len ( train_dataset) 
indices = list ( range ( num_train)) 
split = int ( np. floor ( valid_size * num_train)) 
 if shuffle: 
np. random. seed ( random_seed) 
np. random. shuffle ( indices) 
train_idx, valid_idx = indices [ split:], indices [: split] 
train_sampler = SubsetRandomSampler ( train_idx) 
valid_sampler = SubsetRandomSampler ( valid_idx) 
train_loader = torch. utils. data. DataLoader ( 
train_dataset, batch_size = batch_size, sampler = train_sampler) 
valid_loader = torch. utils. data. DataLoader ( 
valid_dataset, batch_size = batch_size, sampler = valid_sampler) 
 return ( train_loader, valid_loader) 
 # CIFAR100 dataset 
train_loader, valid_loader = data_loader ( data_dir = './data', 
batch_size = 64) 
test_loader = data_loader ( data_dir = './data', 
batch_size = 64, 
test = True)

Here's a breakdown:

Normalization: Normalize the data using mean and standard deviations with the transforms.Normalize function.
Transforms: Resize, convert to tensors, and normalize the data to prepare it for the model.
Data Splitting: Divide the dataset into training and validation sets.
Data Loaders: Use torch.utils.data.DataLoader to efficiently load data in batches, improving performance, especially with large datasets.

Implementing the VGG16 Architecture with PyTorch

To build a custom model in PyTorch, you need to inherit from nn.Module.

nn.Module: Provides the fundamental structure for building neural networks in PyTorch.
__init__: Defines the individual layers of your network.
forward: Specifies how the data flows through these layers.

Essential Layers for your VGG16 model:

nn.Conv2d: Performs convolutional operations.
nn.BatchNorm2d: Applies batch normalization to stabilize training.
nn.ReLU: Implements the ReLU activation function.
nn.MaxPool2d: Performs max pooling for downsampling.
nn.Dropout: Applies dropout to prevent overfitting.
nn.Linear: Implements fully connected layers.
nn.Sequential: Bundles multiple operations into a single layer.

Here's an implementation of VGG16 in PyTorch:

class VGG16 ( nn. Module): 
 def __init__ ( self, num_classes = 10): 
 super ( VGG16, self). __init__ () 
self. layer1 = nn. Sequential ( 
nn. Conv2d ( 3, 64, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 64), 
nn. ReLU ()) 
self. layer2 = nn. Sequential ( 
nn. Conv2d ( 64, 64, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 64), 
nn. ReLU (), 
nn. MaxPool2d ( kernel_size = 2, stride = 2)) 
self. layer3 = nn. Sequential ( 
nn. Conv2d ( 64, 128, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 128), 
nn. ReLU ()) 
self. layer4 = nn. Sequential ( 
nn. Conv2d ( 128, 128, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 128), 
nn. ReLU (), 
nn. MaxPool2d ( kernel_size = 2, stride = 2)) 
self. layer5 = nn. Sequential ( 
nn. Conv2d ( 128, 256, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 256), 
nn. ReLU ()) 
self. layer6 = nn. Sequential ( 
nn. Conv2d ( 256, 256, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 256), 
nn. ReLU ()) 
self. layer7 = nn. Sequential ( 
nn. Conv2d ( 256, 256, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 256), 
nn. ReLU (), 
nn. MaxPool2d ( kernel_size = 2, stride = 2)) 
self. layer8 = nn. Sequential ( 
nn. Conv2d ( 256, 512, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 512), 
nn. ReLU ()) 
self. layer9 = nn. Sequential ( 
nn. Conv2d ( 512, 512, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 512), 
nn. ReLU ()) 
self. layer10 = nn. Sequential ( 
nn. Conv2d ( 512, 512, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 512), 
nn. ReLU (), 
nn. MaxPool2d ( kernel_size = 2, stride = 2)) 
self. layer11 = nn. Sequential ( 
nn. Conv2d ( 512, 512, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 512), 
nn. ReLU ()) 
self. layer12 = nn. Sequential ( 
nn. Conv2d ( 512, 512, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 512), 
nn. ReLU ()) 
self. layer13 = nn. Sequential ( 
nn. Conv2d ( 512, 512, kernel_size = 3, stride = 1, padding = 1), 
nn. BatchNorm2d ( 512), 
nn. ReLU (), 
nn. MaxPool2d ( kernel_size = 2, stride = 2)) 
self. fc = nn. Sequential ( 
nn. Dropout ( 0.5), 
nn. Linear ( 7 * 7 * 512, 4096), 
nn. ReLU ()) 
self. fc1 = nn. Sequential ( 
nn. Dropout ( 0.5), 
nn. Linear ( 4096, 4096), 
nn. ReLU ()) 
self. fc2 = nn. Sequential ( 
nn. Linear ( 4096, num_classes)) 
 def forward ( self, x): 
out = self. layer1 ( x) 
out = self. layer2 ( out) 
out = self. layer3 ( out) 
out = self. layer4 ( out) 
out = self. layer5 ( out) 
out = self. layer6 ( out) 
out = self. layer7 ( out) 
out = self. layer8 ( out) 
out = self. layer9 ( out) 
out = self. layer10 ( out) 
out = self. layer11 ( out) 
out = self. layer12 ( out) 
out = self. layer13 ( out) 
out = out. reshape ( out. size ( 0), - 1) 
out = self. fc ( out) 
out = self. fc1 ( out) 
out = self. fc2 ( out) 
 return out

Setting Hyperparameters for Optimal Training

Hyperparameters significantly impact model performance. Define these before training:

num_classes: Number of classes in your dataset (e.g., 100 for CIFAR-100).
num_epochs: Number of times the training data is passed through the model.
batch_size: Number of samples processed in each iteration.
learning_rate: Controls the step size during optimization.
Loss Function: Measures the difference between predictions and actual values.
Optimizer: Updates the model's weights to minimize the loss function.

num_classes = 100 
num_epochs = 20 
batch_size = 16 
learning_rate = 0.005 

model = VGG16 ( num_classes). to ( device) 
 # Loss and optimizer 
criterion = nn. CrossEntropyLoss () 
optimizer = torch. optim. SGD ( model. parameters (), lr = learning_rate, weight_decay = 0.005, momentum = 0.9)

Training the VGG16 Model in PyTorch

This is the core of the process. Here's how training works in PyTorch:

Data Iteration: Loop through images and labels from the train_loader.
Device Transfer: Move data to the GPU (if available) for faster computation.
Forward Pass: Feed images to the model to generate predictions.
Loss Calculation: Calculate the loss between the model's predictions and the true labels.
Backpropagation: Compute gradients of the loss with respect to the model's parameters.
Weight Update: Adjust the model's weights using the optimizer to minimize the loss (remember to reset gradients before each update).
Validation: After each epoch, assess the model's accuracy on the validation set. Use torch.no_grad() during validation to disable gradient calculations and speed up the process.

total_step = len ( train_loader) 
for epoch in range ( num_epochs): 
	for i, ( images, labels) in enumerate ( train_loader): 
		# Move tensors to the configured device 
		images = images. to ( device) 
		labels = labels. to ( device) 
		# Forward pass 
		outputs = model ( images) 
		loss = criterion ( outputs, labels) 
		# Backward and optimize 
		optimizer. zero_grad () 
		loss. backward () 
		optimizer. step () 
		print ( 'Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
		. format ( epoch + 1, num_epochs, i + 1, total_step, loss. item ())) 
		# Validation 
		with torch. no_grad (): 
			correct = 0 
			total = 0 
			for images, labels in valid_loader: 
				images = images. to ( device) 
				labels = labels. to ( device) 
				outputs = model ( images) 
				_, predicted = torch. max ( outputs. data, 1) 
				total += labels. size ( 0) 
				correct += ( predicted == labels). sum (). item () 
				del images, labels, outputs
			print ( 'Accuracy of the network on the {} validation images: {} %'. format ( 5000, 100 * correct / total))

Evaluating the Model on the Test Set

After training, evaluate the model's generalization ability on unseen data:

with torch. no_grad (): 
	correct = 0 
	total = 0 
	for images, labels in test_loader: 
		images = images. to ( device) 
		labels = labels. to ( device) 
		outputs = model ( images) 
		_, predicted = torch. max ( outputs. data, 1) 
		total += labels. size ( 0) 
		correct += ( predicted == labels). sum (). item () 
		del images, labels, outputs
	print ( 'Accuracy of the network on the {} test images: {} %'. format ( 10000, 100 * correct / total))

By training for 20 epochs on the CIFAR-100 dataset, you can achieve a test accuracy of around 75%.

Taking Your VGG16 Model Further

This article provides a solid foundation, but here's how to expand your knowledge:

Experiment with Datasets: Try CIFAR-10 or a subset of the ImageNet dataset.
Tune Hyperparameters: Find the optimal combination of learning rate, batch size, etc.
Modify the Architecture: Add or remove layers to see the impact on performance. Try implementing VGG-19.

Additional Resources for Deep Learning with PyTorch

Original VGG Paper: Very Deep Convolutional Networks for Large-Scale Image Recognition
PyTorch Documentation: PyTorch nn.Module
Writing CNNs from Scratch in PyTorch