Build AlexNet from Scratch with PyTorch: A Step-by-Step Guide

Want to understand the inner workings of cutting-edge computer vision? Building neural networks from scratch is the best way to achieve that. This tutorial provides a practical, in-depth guide to building AlexNet, a groundbreaking convolutional neural network, from the ground up using PyTorch. You'll learn by doing, solidifying your understanding of CNNs and PyTorch along the way.

Updated on September 16, 2024

What You'll Learn

AlexNet architecture: Understand its key components and design principles.
Data pre-processing: Learn how to prepare your image data for optimal training.
PyTorch implementation: Build AlexNet layer by layer using PyTorch.
Training and evaluation: Train your model on the CIFAR-10 dataset and evaluate its performance.

Prerequisites

Before diving in, ensure you have a basic understanding of:

Neural Networks: Input, hidden, and output layers, activation functions, loss functions, and optimization algorithms such as different types of gradient descent.
Convolutional Neural Networks (CNNs): Convolutional and pooling layers, and their roles in feature extraction with parameters such as padding, stride size, and kernel sizes.
Python & PyTorch: Familiarity with Python syntax and the PyTorch library.

Understanding AlexNet Architecture

AlexNet, created by Alex Krizhevsky and colleagues in 2012, revolutionized image classification. Its key features include:

Input Size: Processes 3-channel (RGB) images of size 224x224.
Key building blocks: Max pooling for subsampling, Convolutional layers with kernel sizes of 11x11, 5x5, and 3x3, and ReLU activations.
Output: Classifies images into 1000 categories (in its original ImageNet application).
Speed: Utilizes multiple GPUs for faster training.

Loading and Preparing the CIFAR-10 Dataset

We'll use the CIFAR-10 dataset, consisting of 60,000 32x32 color images across 10 classes, with 6,000 images per class.

50,000 images for training
10,000 images for testing

Step 1: Import Essential Libraries

Import the necessary PyTorch libraries and NumPy:

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Step 2: Load and Pre-process CIFAR-10 Data

Use torchvision to load and pre-process the data:

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True):

    normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])

    valid_transform = transforms.Compose([
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        normalize,
    ])

    if augment:
        train_transform = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ])
    else:
        train_transform = transforms.Compose([
            transforms.Resize((227,227)),
            transforms.ToTensor(),
            normalize,
        ])

    train_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform)
    valid_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=valid_transform)

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler)
    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, sampler=valid_sampler)

    return (train_loader, valid_loader)


def get_test_loader(data_dir, batch_size, shuffle=True):

    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

    transform = transforms.Compose([
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        normalize,
    ])

    dataset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)

    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

    return data_loader


train_loader, valid_loader = get_train_valid_loader(data_dir='./data', batch_size=64, augment=False, random_seed=1)
test_loader = get_test_loader(data_dir='./data', batch_size=64)

Code Breakdown:

get_train_valid_loader and get_test_loader: Functions load the training/validation and test sets.
normalize: Normalizes the image pixel values using pre-calculated means and standard deviations for each color channel.
Data Augmentation: Training data includes options for random cropping and horizontal flipping to increase the sample diversity.
Data Loaders: Efficiently load data in batches during training.

Step 3: Define the AlexNet Model in PyTorch

Now, define the AlexNet architecture using nn.Module:

class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(9216, 4096),
            nn.ReLU())
        self.fc1 = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU())
        self.fc2= nn.Sequential(
            nn.Linear(4096, num_classes))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

Key Points:

__init__: Initializes the layers of the network (convolutional, pooling, and fully connected) and its components.
forward: Defines the data flow through the network, using the convolutional layers and fully connected layers defined earlier.

Step 4: Set Hyperparameters and Initialize the Model

Configure training parameters:

num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

# Train the model
total_step = len(train_loader)

Hyperparameters: Sets the number of training epochs, batch size, and learning rate for the optimizer.
Loss Function: CrossEntropyLoss is suitable for multi-class classification.
Optimizer: Uses Stochastic Gradient Descent (SGD) for updating model weights.
Device Configuration Move all tensors to configured device (GPU or CPU).

Step 5: Train the AlexNet Model

Implement the training loop:

total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

    # Validation
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in valid_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            del images, labels, outputs

        print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Inside the Training Loop:

Forward Pass: The input images are passed through the model, and the loss is computed based on the model output.
Backward Pass: Computes the gradients of the loss function with respect to the model parameters.
Optimization: Updates model parameters based on calculated gradients.
Validation: Evaluates the model’s performance on the validation set after each epoch.

Step 6: Evaluate the Model on the Test Set

Assess the trained model's performance on unseen data:

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

This code iterates through the test dataset and calculates the accuracy of the trained model as in the validation dataset.

Conclusion

You've successfully built and trained AlexNet from scratch using PyTorch! This hands-on experience will deepen your understanding of CNNs and PyTorch. Use this foundation to explore more complex architectures and tackle challenging computer vision problems.

Build AlexNet from Scratch with PyTorch: A Step-by-Step Guide

Updated on September 16, 2024

What You'll Learn

AlexNet architecture: Understand its key components and design principles.
Data pre-processing: Learn how to prepare your image data for optimal training.
PyTorch implementation: Build AlexNet layer by layer using PyTorch.
Training and evaluation: Train your model on the CIFAR-10 dataset and evaluate its performance.

Prerequisites

Before diving in, ensure you have a basic understanding of:

Neural Networks: Input, hidden, and output layers, activation functions, loss functions, and optimization algorithms such as different types of gradient descent.
Convolutional Neural Networks (CNNs): Convolutional and pooling layers, and their roles in feature extraction with parameters such as padding, stride size, and kernel sizes.
Python & PyTorch: Familiarity with Python syntax and the PyTorch library.

Understanding AlexNet Architecture

AlexNet, created by Alex Krizhevsky and colleagues in 2012, revolutionized image classification. Its key features include:

Input Size: Processes 3-channel (RGB) images of size 224x224.
Key building blocks: Max pooling for subsampling, Convolutional layers with kernel sizes of 11x11, 5x5, and 3x3, and ReLU activations.
Output: Classifies images into 1000 categories (in its original ImageNet application).
Speed: Utilizes multiple GPUs for faster training.

Loading and Preparing the CIFAR-10 Dataset

We'll use the CIFAR-10 dataset, consisting of 60,000 32x32 color images across 10 classes, with 6,000 images per class.

50,000 images for training
10,000 images for testing

Step 1: Import Essential Libraries

Import the necessary PyTorch libraries and NumPy:

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Step 2: Load and Pre-process CIFAR-10 Data

Use torchvision to load and pre-process the data:

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True):

    normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])

    valid_transform = transforms.Compose([
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        normalize,
    ])

    if augment:
        train_transform = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ])
    else:
        train_transform = transforms.Compose([
            transforms.Resize((227,227)),
            transforms.ToTensor(),
            normalize,
        ])

    train_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform)
    valid_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=valid_transform)

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler)
    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, sampler=valid_sampler)

    return (train_loader, valid_loader)


def get_test_loader(data_dir, batch_size, shuffle=True):

    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

    transform = transforms.Compose([
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        normalize,
    ])

    dataset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)

    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

    return data_loader


train_loader, valid_loader = get_train_valid_loader(data_dir='./data', batch_size=64, augment=False, random_seed=1)
test_loader = get_test_loader(data_dir='./data', batch_size=64)

Code Breakdown:

get_train_valid_loader and get_test_loader: Functions load the training/validation and test sets.
normalize: Normalizes the image pixel values using pre-calculated means and standard deviations for each color channel.
Data Augmentation: Training data includes options for random cropping and horizontal flipping to increase the sample diversity.
Data Loaders: Efficiently load data in batches during training.

Step 3: Define the AlexNet Model in PyTorch

Now, define the AlexNet architecture using nn.Module:

class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(9216, 4096),
            nn.ReLU())
        self.fc1 = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU())
        self.fc2= nn.Sequential(
            nn.Linear(4096, num_classes))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

Key Points:

__init__: Initializes the layers of the network (convolutional, pooling, and fully connected) and its components.
forward: Defines the data flow through the network, using the convolutional layers and fully connected layers defined earlier.

Step 4: Set Hyperparameters and Initialize the Model

Configure training parameters:

num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

# Train the model
total_step = len(train_loader)

Hyperparameters: Sets the number of training epochs, batch size, and learning rate for the optimizer.
Loss Function: CrossEntropyLoss is suitable for multi-class classification.
Optimizer: Uses Stochastic Gradient Descent (SGD) for updating model weights.
Device Configuration Move all tensors to configured device (GPU or CPU).

Step 5: Train the AlexNet Model

Implement the training loop:

total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

    # Validation
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in valid_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            del images, labels, outputs

        print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Inside the Training Loop:

Forward Pass: The input images are passed through the model, and the loss is computed based on the model output.
Backward Pass: Computes the gradients of the loss function with respect to the model parameters.
Optimization: Updates model parameters based on calculated gradients.
Validation: Evaluates the model’s performance on the validation set after each epoch.

Step 6: Evaluate the Model on the Test Set

Assess the trained model's performance on unseen data:

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

This code iterates through the test dataset and calculates the accuracy of the trained model as in the validation dataset.

Build AlexNet from Scratch with PyTorch: A Step-by-Step Guide

What You'll Learn

Prerequisites

Understanding AlexNet Architecture

Loading and Preparing the CIFAR-10 Dataset

Step 1: Import Essential Libraries

Step 2: Load and Pre-process CIFAR-10 Data

Step 3: Define the AlexNet Model in PyTorch

Step 4: Set Hyperparameters and Initialize the Model

Step 5: Train the AlexNet Model

Step 6: Evaluate the Model on the Test Set

Conclusion

Build AlexNet from Scratch with PyTorch: A Step-by-Step Guide

What You'll Learn

Prerequisites

Understanding AlexNet Architecture

Loading and Preparing the CIFAR-10 Dataset

Step 1: Import Essential Libraries

Step 2: Load and Pre-process CIFAR-10 Data

Step 3: Define the AlexNet Model in PyTorch

Step 4: Set Hyperparameters and Initialize the Model

Step 5: Train the AlexNet Model

Step 6: Evaluate the Model on the Test Set

Conclusion

Related Posts