Build AlexNet with PyTorch: A Step-by-Step Guide

Dive into the world of computer vision by building AlexNet from scratch using PyTorch. This tutorial will guide you through each step, from understanding the architecture to training and evaluating your model.

Updated on September 16, 2024

Why Build AlexNet?

AlexNet was a groundbreaking deep convolutional neural network that achieved state-of-the-art results in the ImageNet LSVRC-2010 competition. Building it from scratch offers valuable insights in CNN principles.

Prerequisites

Before diving in, ensure you have a basic understanding of:

Neural networks (layers, activation functions, optimization algorithms, loss functions).
Python syntax and the PyTorch library.
Convolutional Neural Networks (CNNs), including convolutional and pooling layers.

Understanding AlexNet Architecture

AlexNet's key features are:

Input image size: 224x224x3.
ReLU activations.
Max pooling for subsampling.
Convolutional kernel sizes: 11x11, 5x5, and 3x3.
Max pooling kernel size: 3x3.
Classification into 1000 classes.
Multi-GPU utilization.

Data Preparation: Loading and Preprocessing CIFAR-10

We'll use the CIFAR-10 dataset, which contains 60,000 32x32 color images in 10 classes.

50,000 training images.
10,000 test images.

CIFAR-10 Classes:

Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck

Step 1: Import Necessary Libraries

Import essential libraries for building and training your AlexNet model.

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

This code ensures that the notebook uses GPU if available, for faster training.

Step 2: Load and Preprocess the Dataset

Utilize torchvision and helper functions to load and pre-process the CIFAR-10 data.

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True):
    normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])

    valid_transform = transforms.Compose([
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        normalize,
    ])
    if augment:
        train_transform = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ])
    else:
        train_transform = transforms.Compose([
            transforms.Resize((227,227)),
            transforms.ToTensor(),
            normalize,
        ])

    train_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform)
    valid_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=valid_transform)

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler)
    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, sampler=valid_sampler)

    return (train_loader, valid_loader)

def get_test_loader(data_dir, batch_size, shuffle=True):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

    transform = transforms.Compose([
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        normalize,
    ])

    dataset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)

    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

    return data_loader

# CIFAR10 dataset
train_loader, valid_loader = get_train_valid_loader(data_dir='./data', batch_size=64, augment=False, random_seed=1)
test_loader = get_test_loader(data_dir='./data', batch_size=64)

Code Breakdown:

Two functions, get_train_valid_loader and get_test_loader, load the train/validation and test sets respectively.
normalize is defined using the mean and standard deviations of each channel (red, green, and blue) in the dataset.
Data augmentation is applied to the training subset only.
The training dataset is split into training and validation sets (90:10 ratio).
Data loaders iterate through the data in batches for optimal memory usage.

Step 3: Define the AlexNet Model from Scratch

Create the AlexNet architecture using PyTorch's nn.Module.

class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(9216, 4096),
            nn.ReLU())
        self.fc1 = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU())
        self.fc2= nn.Sequential(
            nn.Linear(4096, num_classes))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

Key Points:

Inherit from nn.Module to define the neural network.
Initialize layers in __init__.
Define the sequence of layers in the forward function.
Use nn.Conv2D for convolutional layers and nn.MaxPool2D for max pooling.
Combine layers, activation functions, and max pooling using nn.Sequential for better organization.
Define fully connected layers using nn.Linear and nn.Dropout.
The last layer outputs 10 neurons for the CIFAR-10's 10 classes.

Step 4: Set Hyperparameters

Define hyperparameters like the loss function, optimizer, batch size, learning rate, and number of epochs.

num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

# Train the model
total_step = len(train_loader)

Parameters Defined:

num_classes: Set to 10 for CIFAR-10.
num_epochs: Number of training cycles.
batch_size: Number of images in each batch.
learning_rate: Controls the step size during optimization.
Loss Function: Cross entropy loss.
Optimizer: SGD is used for this example.
total_step: Keeping track of training steps.

Step 5: Train the AlexNet Model on CIFAR-10

Train the model using the training data and validate it using the validation data.

total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

        # Validation
        with torch.no_grad():
            correct = 0
            total = 0
            for images, labels in valid_loader:
                images = images.to(device)
                labels = labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
                del images, labels, outputs

            print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Code Explanation:

Iterate through the number of epochs and batches in the training data.
Move images and labels to the appropriate device (GPU or CPU).
Perform the forward pass to get the model's predictions.
Calculate the loss using the predictions and actual labels.
Perform the backward pass to update the model's weights.
Zero the gradients before each update using optimizer.zero_grad().
Calculate new gradients using loss.backward().
Update weights with optimizer.step().
Validate the model's accuracy at the end of each epoch using the validation set.

Step 6: Evaluate the Model on the Test Dataset

Assess the model's performance on unseen data.

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

The code is the same as the validation loop.

Key Takeaways

Detailed walk-through of AlexNet architecture.
Hands-on experience in building a CNN in PyTorch.
Data loading and preprocessing techniques.
Training and validation methodologies.
Testing model accuracy on unseen data (CIFAR 10 test set).

Build AlexNet with PyTorch: A Step-by-Step Guide

Updated on September 16, 2024

Why Build AlexNet?

Prerequisites

Before diving in, ensure you have a basic understanding of:

Neural networks (layers, activation functions, optimization algorithms, loss functions).
Python syntax and the PyTorch library.
Convolutional Neural Networks (CNNs), including convolutional and pooling layers.

Understanding AlexNet Architecture

AlexNet's key features are:

Input image size: 224x224x3.
ReLU activations.
Max pooling for subsampling.
Convolutional kernel sizes: 11x11, 5x5, and 3x3.
Max pooling kernel size: 3x3.
Classification into 1000 classes.
Multi-GPU utilization.

Data Preparation: Loading and Preprocessing CIFAR-10

We'll use the CIFAR-10 dataset, which contains 60,000 32x32 color images in 10 classes.

50,000 training images.
10,000 test images.

CIFAR-10 Classes:

Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck

Step 1: Import Necessary Libraries

Import essential libraries for building and training your AlexNet model.

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

This code ensures that the notebook uses GPU if available, for faster training.

Step 2: Load and Preprocess the Dataset

Utilize torchvision and helper functions to load and pre-process the CIFAR-10 data.

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True):
    normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])

    valid_transform = transforms.Compose([
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        normalize,
    ])
    if augment:
        train_transform = transforms.Compose([
            transforms.RandomCrop(32, padding=4),
            transforms.RandomHorizontalFlip(),
            transforms.ToTensor(),
            normalize,
        ])
    else:
        train_transform = transforms.Compose([
            transforms.Resize((227,227)),
            transforms.ToTensor(),
            normalize,
        ])

    train_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform)
    valid_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=valid_transform)

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler)
    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, sampler=valid_sampler)

    return (train_loader, valid_loader)

def get_test_loader(data_dir, batch_size, shuffle=True):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

    transform = transforms.Compose([
        transforms.Resize((227,227)),
        transforms.ToTensor(),
        normalize,
    ])

    dataset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)

    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

    return data_loader

# CIFAR10 dataset
train_loader, valid_loader = get_train_valid_loader(data_dir='./data', batch_size=64, augment=False, random_seed=1)
test_loader = get_test_loader(data_dir='./data', batch_size=64)

Code Breakdown:

Two functions, get_train_valid_loader and get_test_loader, load the train/validation and test sets respectively.
normalize is defined using the mean and standard deviations of each channel (red, green, and blue) in the dataset.
Data augmentation is applied to the training subset only.
The training dataset is split into training and validation sets (90:10 ratio).
Data loaders iterate through the data in batches for optimal memory usage.

Step 3: Define the AlexNet Model from Scratch

Create the AlexNet architecture using PyTorch's nn.Module.

class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(9216, 4096),
            nn.ReLU())
        self.fc1 = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU())
        self.fc2= nn.Sequential(
            nn.Linear(4096, num_classes))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

Key Points:

Inherit from nn.Module to define the neural network.
Initialize layers in __init__.
Define the sequence of layers in the forward function.
Use nn.Conv2D for convolutional layers and nn.MaxPool2D for max pooling.
Combine layers, activation functions, and max pooling using nn.Sequential for better organization.
Define fully connected layers using nn.Linear and nn.Dropout.
The last layer outputs 10 neurons for the CIFAR-10's 10 classes.

Step 4: Set Hyperparameters

Define hyperparameters like the loss function, optimizer, batch size, learning rate, and number of epochs.

num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

# Train the model
total_step = len(train_loader)

Parameters Defined:

num_classes: Set to 10 for CIFAR-10.
num_epochs: Number of training cycles.
batch_size: Number of images in each batch.
learning_rate: Controls the step size during optimization.
Loss Function: Cross entropy loss.
Optimizer: SGD is used for this example.
total_step: Keeping track of training steps.

Step 5: Train the AlexNet Model on CIFAR-10

Train the model using the training data and validate it using the validation data.

total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

        # Validation
        with torch.no_grad():
            correct = 0
            total = 0
            for images, labels in valid_loader:
                images = images.to(device)
                labels = labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
                del images, labels, outputs

            print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Code Explanation:

Iterate through the number of epochs and batches in the training data.
Move images and labels to the appropriate device (GPU or CPU).
Perform the forward pass to get the model's predictions.
Calculate the loss using the predictions and actual labels.
Perform the backward pass to update the model's weights.
Zero the gradients before each update using optimizer.zero_grad().
Calculate new gradients using loss.backward().
Update weights with optimizer.step().
Validate the model's accuracy at the end of each epoch using the validation set.

Step 6: Evaluate the Model on the Test Dataset

Assess the model's performance on unseen data.

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

The code is the same as the validation loop.

Key Takeaways

Detailed walk-through of AlexNet architecture.
Hands-on experience in building a CNN in PyTorch.
Data loading and preprocessing techniques.
Training and validation methodologies.
Testing model accuracy on unseen data (CIFAR 10 test set).

Build AlexNet with PyTorch: A Step-by-Step Guide

Why Build AlexNet?

Prerequisites

Understanding AlexNet Architecture

Data Preparation: Loading and Preprocessing CIFAR-10

CIFAR-10 Classes:

Step 1: Import Necessary Libraries

Step 2: Load and Preprocess the Dataset

Code Breakdown:

Step 3: Define the AlexNet Model from Scratch

Key Points:

Step 4: Set Hyperparameters

Parameters Defined:

Step 5: Train the AlexNet Model on CIFAR-10

Code Explanation:

Step 6: Evaluate the Model on the Test Dataset

Key Takeaways

Build AlexNet with PyTorch: A Step-by-Step Guide

Why Build AlexNet?

Prerequisites

Understanding AlexNet Architecture

Data Preparation: Loading and Preprocessing CIFAR-10

CIFAR-10 Classes:

Step 1: Import Necessary Libraries

Step 2: Load and Preprocess the Dataset

Code Breakdown:

Step 3: Define the AlexNet Model from Scratch

Key Points:

Step 4: Set Hyperparameters

Parameters Defined:

Step 5: Train the AlexNet Model on CIFAR-10

Code Explanation:

Step 6: Evaluate the Model on the Test Dataset

Key Takeaways

Related Posts