Build Your Own AlexNet: A Practical PyTorch Tutorial for Image Recognition

Want to dive into computer vision? This tutorial provides a step-by-step guide to building AlexNet in PyTorch from scratch. We'll use the CIFAR-10 dataset to train and test our model, giving you hands-on experience with convolutional neural networks (CNNs). By following along, you'll gain a deeper understanding of AlexNet architecture and its implementation.

Why Build AlexNet from Scratch? Understand the Fundamentals

While pre-trained models are readily available, building AlexNet from scratch offers several key benefits:

Deepen Understanding: You'll truly grasp how CNNs work by implementing each layer.
Customization: You can easily modify and adapt the architecture for specific tasks.
Troubleshooting: You'll be better equipped to diagnose and fix issues in your models.

Prerequisites: Essential Knowledge for AlexNet Implementation

Before we start coding, ensure a basic understanding of these concepts:

Neural Networks: Familiarity with layers, activation functions (like ReLU), optimization algorithms (like gradient descent), and loss functions.
CNNs: Knowledge of convolutional layers, pooling layers, stride, padding, and kernel size.
Python & PyTorch: Familiarity with Python syntax and the PyTorch library is crucial for the code.

AlexNet Architecture: Key Takeaways Before Coding

AlexNet, a groundbreaking CNN, achieved state-of-the-art results in the 2012 ImageNet competition. Here's what defined it:

Input: 3-channel (RGB) images of size 224x224x3.
Layers: Alternating convolutional and max pooling layers, followed by fully connected layers.
Key Features: ReLU activations, max pooling for subsampling, and multiple GPUs for parallel processing.
Kernels: Convolutional kernels of sizes 11x11, 5x5, and 3x3; max-pooling kernels of size 3x3.

CIFAR-10 Dataset: Our Training Ground for AlexNet

We'll use CIFAR-10, a popular dataset for image classification. It consists of 60,000 32x32 color images in 10 classes:

Classes: Airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.
Data Split: 50,000 training images and 10,000 test images.

Step 1: Importing Libraries and Setting Up Device

First, import necessary libraries and configure the device (CPU or GPU):

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Step 2: Loading and Preprocessing the CIFAR-10 Data

Use torchvision to load and preprocess the CIFAR-10 dataset.

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True):
    normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])

    valid_transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize,])
    if augment:
        train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize,])
    else:
        train_transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize,])

    train_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform,)
    valid_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=valid_transform,)

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler)
    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, sampler=valid_sampler)

    return (train_loader, valid_loader)

def get_test_loader(data_dir, batch_size, shuffle=True):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225],)
    transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize,])

    dataset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform,)
    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

    return data_loader

# CIFAR10 dataset
train_loader, valid_loader = get_train_valid_loader(data_dir = './data', batch_size = 64, augment = False, random_seed = 1)
test_loader = get_test_loader(data_dir = './data', batch_size = 64)

Data Normalization: Normalize the data using the mean and standard deviation of each color channel.
Data Augmentation: Augment training data with random crops and horizontal flips (optional) to improve model robustness.
Train/Validation Split: Split the training data into training and validation sets.
Data Loaders: Use data loaders to efficiently load data in batches.

Step 3: Defining the AlexNet Model in PyTorch

Now, define the AlexNet architecture in PyTorch:

class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(9216, 4096),
            nn.ReLU())
        self.fc1 = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU())
        self.fc2= nn.Sequential(
            nn.Linear(4096, num_classes))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

__init__: Defines the layers of the network. Includes convolutional layers (nn.Conv2d), batch normalization (nn.BatchNorm2d), ReLU activations (nn.ReLU), max pooling (nn.MaxPool2d), dropout (nn.Dropout), and fully connected layers (nn.Linear).
forward: Defines the flow of data through the network.

Step 4: Setting Hyperparameters, Loss Function, and Optimizer

Configure the training process:

num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

Hyperparameters: Define num_epochs, batch_size, and learning_rate.
Loss Function: Use nn.CrossEntropyLoss for multi-class classification.
Optimizer: Use torch.optim.SGD (Stochastic Gradient Descent) to update model weights. Other optimizers like Adam could also be used.

Step 5: Training the AlexNet Model

Train the model using the training data:

total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

    # Validation
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in valid_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            del images, labels, outputs

        print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Forward Pass: Calculate the output of the model and the loss.
Backward Pass: Calculate the gradients of the loss with respect to the model parameters.
Optimization: Update the model parameters using the optimizer.

Step 6: Testing the Trained AlexNet Model

Evaluate the model's performance on the test set:

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

Key Takeaways and Next Steps

Congratulations! You've successfully implemented AlexNet in PyTorch. Here's what you've accomplished:

Built AlexNet architecture from scratch using PyTorch.
Loaded and preprocessed the CIFAR-10 dataset.
Trained the model and evaluated its performance.

Further experiment:

Hyperparameter Tuning: Experiment with different learning rates, batch sizes, and optimizers.
Data Augmentation: Try different augmentation techniques to improve model robustness.
Architectural Changes: Modify the AlexNet architecture by adding or removing layers.

This tutorial equips you with a solid foundation for exploring more advanced CNN architectures and computer vision tasks.

Build Your Own AlexNet: A Practical PyTorch Tutorial for Image Recognition

Why Build AlexNet from Scratch? Understand the Fundamentals

While pre-trained models are readily available, building AlexNet from scratch offers several key benefits:

Deepen Understanding: You'll truly grasp how CNNs work by implementing each layer.
Customization: You can easily modify and adapt the architecture for specific tasks.
Troubleshooting: You'll be better equipped to diagnose and fix issues in your models.

Prerequisites: Essential Knowledge for AlexNet Implementation

Before we start coding, ensure a basic understanding of these concepts:

Neural Networks: Familiarity with layers, activation functions (like ReLU), optimization algorithms (like gradient descent), and loss functions.
CNNs: Knowledge of convolutional layers, pooling layers, stride, padding, and kernel size.
Python & PyTorch: Familiarity with Python syntax and the PyTorch library is crucial for the code.

AlexNet Architecture: Key Takeaways Before Coding

AlexNet, a groundbreaking CNN, achieved state-of-the-art results in the 2012 ImageNet competition. Here's what defined it:

Input: 3-channel (RGB) images of size 224x224x3.
Layers: Alternating convolutional and max pooling layers, followed by fully connected layers.
Key Features: ReLU activations, max pooling for subsampling, and multiple GPUs for parallel processing.
Kernels: Convolutional kernels of sizes 11x11, 5x5, and 3x3; max-pooling kernels of size 3x3.

CIFAR-10 Dataset: Our Training Ground for AlexNet

We'll use CIFAR-10, a popular dataset for image classification. It consists of 60,000 32x32 color images in 10 classes:

Classes: Airplane, automobile, bird, cat, deer, dog, frog, horse, ship, and truck.
Data Split: 50,000 training images and 10,000 test images.

Step 1: Importing Libraries and Setting Up Device

First, import necessary libraries and configure the device (CPU or GPU):

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets
from torchvision import transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

Step 2: Loading and Preprocessing the CIFAR-10 Data

Use torchvision to load and preprocess the CIFAR-10 dataset.

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True):
    normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])

    valid_transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize,])
    if augment:
        train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize,])
    else:
        train_transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize,])

    train_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform,)
    valid_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=valid_transform,)

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler)
    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, sampler=valid_sampler)

    return (train_loader, valid_loader)

def get_test_loader(data_dir, batch_size, shuffle=True):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225],)
    transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize,])

    dataset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform,)
    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)

    return data_loader

# CIFAR10 dataset
train_loader, valid_loader = get_train_valid_loader(data_dir = './data', batch_size = 64, augment = False, random_seed = 1)
test_loader = get_test_loader(data_dir = './data', batch_size = 64)

Data Normalization: Normalize the data using the mean and standard deviation of each color channel.
Data Augmentation: Augment training data with random crops and horizontal flips (optional) to improve model robustness.
Train/Validation Split: Split the training data into training and validation sets.
Data Loaders: Use data loaders to efficiently load data in batches.

Step 3: Defining the AlexNet Model in PyTorch

Now, define the AlexNet architecture in PyTorch:

class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(9216, 4096),
            nn.ReLU())
        self.fc1 = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU())
        self.fc2= nn.Sequential(
            nn.Linear(4096, num_classes))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

__init__: Defines the layers of the network. Includes convolutional layers (nn.Conv2d), batch normalization (nn.BatchNorm2d), ReLU activations (nn.ReLU), max pooling (nn.MaxPool2d), dropout (nn.Dropout), and fully connected layers (nn.Linear).
forward: Defines the flow of data through the network.

Step 4: Setting Hyperparameters, Loss Function, and Optimizer

Configure the training process:

num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

Hyperparameters: Define num_epochs, batch_size, and learning_rate.
Loss Function: Use nn.CrossEntropyLoss for multi-class classification.
Optimizer: Use torch.optim.SGD (Stochastic Gradient Descent) to update model weights. Other optimizers like Adam could also be used.

Step 5: Training the AlexNet Model

Train the model using the training data:

total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Move tensors to the configured device
        images = images.to(device)
        labels = labels.to(device)

        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)

        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

    # Validation
    with torch.no_grad():
        correct = 0
        total = 0
        for images, labels in valid_loader:
            images = images.to(device)
            labels = labels.to(device)
            outputs = model(images)
            _, predicted = torch.max(outputs.data, 1)
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            del images, labels, outputs

        print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Forward Pass: Calculate the output of the model and the loss.
Backward Pass: Calculate the gradients of the loss with respect to the model parameters.
Optimization: Update the model parameters using the optimizer.

Step 6: Testing the Trained AlexNet Model

Evaluate the model's performance on the test set:

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

Key Takeaways and Next Steps

Congratulations! You've successfully implemented AlexNet in PyTorch. Here's what you've accomplished:

Built AlexNet architecture from scratch using PyTorch.
Loaded and preprocessed the CIFAR-10 dataset.
Trained the model and evaluated its performance.

Further experiment:

Hyperparameter Tuning: Experiment with different learning rates, batch sizes, and optimizers.
Data Augmentation: Try different augmentation techniques to improve model robustness.
Architectural Changes: Modify the AlexNet architecture by adding or removing layers.

This tutorial equips you with a solid foundation for exploring more advanced CNN architectures and computer vision tasks.

Build Your Own AlexNet: A Practical PyTorch Tutorial for Image Recognition

Why Build AlexNet from Scratch? Understand the Fundamentals

Prerequisites: Essential Knowledge for AlexNet Implementation

AlexNet Architecture: Key Takeaways Before Coding

CIFAR-10 Dataset: Our Training Ground for AlexNet

Step 1: Importing Libraries and Setting Up Device

Step 2: Loading and Preprocessing the CIFAR-10 Data

Step 3: Defining the AlexNet Model in PyTorch

Step 4: Setting Hyperparameters, Loss Function, and Optimizer

Step 5: Training the AlexNet Model

Step 6: Testing the Trained AlexNet Model

Key Takeaways and Next Steps

Build Your Own AlexNet: A Practical PyTorch Tutorial for Image Recognition

Why Build AlexNet from Scratch? Understand the Fundamentals

Prerequisites: Essential Knowledge for AlexNet Implementation

AlexNet Architecture: Key Takeaways Before Coding

CIFAR-10 Dataset: Our Training Ground for AlexNet

Step 1: Importing Libraries and Setting Up Device

Step 2: Loading and Preprocessing the CIFAR-10 Data

Step 3: Defining the AlexNet Model in PyTorch

Step 4: Setting Hyperparameters, Loss Function, and Optimizer

Step 5: Training the AlexNet Model

Step 6: Testing the Trained AlexNet Model

Key Takeaways and Next Steps

Related Posts