Build Your Own AlexNet: A Practical PyTorch Tutorial for Image Classification

Want to master convolutional neural networks (CNNs)? This step-by-step guide will walk you through writing AlexNet from scratch in PyTorch. Learn how to build and train this groundbreaking architecture for image classification, even with limited experience.

Updated for 2024, this tutorial provides practical code examples and clear explanations to elevate your AI skills. You’ll be classifying images in no time!

Why Build AlexNet from Scratch?

Understanding the inner workings of neural networks empowers you to:

Customize models: Adapt pre-existing architectures to your specific needs.
Troubleshoot effectively: Identify and resolve issues during the development process.
Deepen your knowledge: Gain a comprehensive understanding of CNN principles.

Prerequisites: Setting the Stage for Success

Before we dive into the code, ensure you have a basic understanding of:

Neural Networks: Layers, activation functions (like ReLU), optimization algorithms, and loss functions.
CNNs: Convolutional layers, pooling layers, stride, padding, and kernel size.
Python & PyTorch: Familiarity with Python syntax and the PyTorch library is crucial for writing AlexNet in PyTorch.

Demystifying AlexNet: Architecture at a Glance

AlexNet, a deep CNN, revolutionized image classification by demonstrating state-of-the-art results in the ImageNet competition. Key features include:

Input: Accepts 3-channel (RGB) images of size 224x224x3.
Key Components: Utilizes ReLU activations and max pooling for subsampling.
Convolutional Kernels: Employs 11x11, 5x5, or 3x3 kernels, while max pooling uses 3x3 kernels.
Output: Classifies images into 1000 categories.
Hardware: Originally designed to leverage multiple GPUs.

Dataset Preparation: Loading and Preprocessing CIFAR-10

We'll use the CIFAR-10 dataset, containing 60,000 32x32 color images across 10 classes (6,000 images per class). It's split into 50,000 training images and 10,000 test images. CIFAR-10 is perfect for writing AlexNet from scratch in PyTorch

CIFAR-10 Classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

Code Implementation: Building AlexNet in PyTorch Step-by-Step

Let's put theory into practice.

1. Importing Libraries

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

This code imports essential libraries and sets the device variable to utilize GPU if available.

2. Loading and Preprocessing the Dataset

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True):
    normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])

    valid_transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize])
    if augment:
        train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize])
    else:
        train_transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize])

    train_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform)
    valid_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=valid_transform)

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler)
    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, sampler=valid_sampler)
    return train_loader, valid_loader

def get_test_loader(data_dir, batch_size, shuffle=True):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize])
    dataset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)
    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)
    return data_loader

# CIFAR10 dataset
train_loader, valid_loader = get_train_valid_loader(data_dir='./data', batch_size=64, augment=False, random_seed=1)
test_loader = get_test_loader(data_dir='./data', batch_size=64)

Key steps:

Data Loaders: get_train_valid_loader and get_test_loader handle training/validation and testing sets respectively and are essential for writing AlexNet in PyTorch.
Normalization: Normalizes the dataset using pre-calculated mean and standard deviations.
Data Augmentation: Randomly crops and flips training images to improve model robustness.
Train/Validation Split: Divides the training data into training and validation subsets (90:10).
Batching and Shuffling: Loads data in batches and shuffles it for variance, using data loaders.

3. Defining the AlexNet Model

class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(9216, 4096),
            nn.ReLU())
        self.fc1 = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU())
        self.fc2= nn.Sequential(
            nn.Linear(4096, num_classes))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

Explanation:

nn.Module Inheritance: Defines a class inheriting from nn.Module.
__init__: Initializes the layers: convolutional, max pooling, batch normalization, and fully connected. nn.Sequential organizes layers.
forward: Defines the order in which the layers process the input image.
Convolutional Layers: nn.Conv2D with appropriate kernel sizes and input/output channels.
Max Pooling: nn.MaxPool2D for downsampling.
Fully Connected Layers: nn.Linear with dropout (nn.Dropout) and ReLU activation.

4. Setting Hyperparameters

num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

# Train the model
total_step = len(train_loader)

Hyperparameter Definitions:

Epochs, Batch Size, Learning Rate: Sets fundamental training parameters.
Model Initialization: Creates an instance of the AlexNet class and moves it to the device (CPU or GPU).
Loss Function: nn.CrossEntropyLoss suitable for multi-class classification.
Optimizer: torch.optim.SGD (Stochastic Gradient Descent) to update model weights.
Total Steps: Calculates the number of steps per epoch.

5. Training the Model

total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, i+1, total_step, loss.item()))

        # Validation
        with torch.no_grad():
            correct = 0
            total = 0
            for images, labels in valid_loader:
                images = images.to(device)
                labels = labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
                del images, labels, outputs

            print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Code Breakdown:

Epoch Iteration: Loops through the training data for the specified number of epochs.
Batch Iteration: Loads images and labels in batches.
Device Transfer: Moves data to the specified device (CPU or GPU).
Forward Pass: Passes images through the model to obtain predictions.
Loss Calculation: Calculates the loss between predictions and actual labels.
Backward Pass: Computes gradients of the loss with respect to model parameters.
Optimization: Updates model weights using the optimizer.
Gradient Reset: Resets gradients to zero before each update.
Validation: Calculates accuracy on the validation set after each epoch.

6. Testing the Model

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

This code evaluates the trained model on the unseen test dataset to assess its generalization performance. Using only 6 epochs, you can achieve the validation set resulting in about 78.8% accuracy.

Conclusion: Congratulations, You Built AlexNet!

You've successfully built AlexNet from scratch using PyTorch. This tutorial provided a detailed explanation of the architecture, data preprocessing, and training process. Experiment with different hyperparameters and datasets to further enhance your understanding. Keep practicing your writing AlexNet from scratch in PyTorch skills.

Build Your Own AlexNet: A Practical PyTorch Tutorial for Image Classification

Updated for 2024, this tutorial provides practical code examples and clear explanations to elevate your AI skills. You’ll be classifying images in no time!

Why Build AlexNet from Scratch?

Understanding the inner workings of neural networks empowers you to:

Customize models: Adapt pre-existing architectures to your specific needs.
Troubleshoot effectively: Identify and resolve issues during the development process.
Deepen your knowledge: Gain a comprehensive understanding of CNN principles.

Prerequisites: Setting the Stage for Success

Before we dive into the code, ensure you have a basic understanding of:

Neural Networks: Layers, activation functions (like ReLU), optimization algorithms, and loss functions.
CNNs: Convolutional layers, pooling layers, stride, padding, and kernel size.
Python & PyTorch: Familiarity with Python syntax and the PyTorch library is crucial for writing AlexNet in PyTorch.

Demystifying AlexNet: Architecture at a Glance

AlexNet, a deep CNN, revolutionized image classification by demonstrating state-of-the-art results in the ImageNet competition. Key features include:

Input: Accepts 3-channel (RGB) images of size 224x224x3.
Key Components: Utilizes ReLU activations and max pooling for subsampling.
Convolutional Kernels: Employs 11x11, 5x5, or 3x3 kernels, while max pooling uses 3x3 kernels.
Output: Classifies images into 1000 categories.
Hardware: Originally designed to leverage multiple GPUs.

Dataset Preparation: Loading and Preprocessing CIFAR-10

CIFAR-10 Classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

Code Implementation: Building AlexNet in PyTorch Step-by-Step

Let's put theory into practice.

1. Importing Libraries

import numpy as np
import torch
import torch.nn as nn
from torchvision import datasets, transforms
from torch.utils.data.sampler import SubsetRandomSampler

# Device configuration
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

This code imports essential libraries and sets the device variable to utilize GPU if available.

2. Loading and Preprocessing the Dataset

def get_train_valid_loader(data_dir, batch_size, augment, random_seed, valid_size=0.1, shuffle=True):
    normalize = transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], std=[0.2023, 0.1994, 0.2010])

    valid_transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize])
    if augment:
        train_transform = transforms.Compose([transforms.RandomCrop(32, padding=4), transforms.RandomHorizontalFlip(), transforms.ToTensor(), normalize])
    else:
        train_transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize])

    train_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=train_transform)
    valid_dataset = datasets.CIFAR10(root=data_dir, train=True, download=True, transform=valid_transform)

    num_train = len(train_dataset)
    indices = list(range(num_train))
    split = int(np.floor(valid_size * num_train))

    if shuffle:
        np.random.seed(random_seed)
        np.random.shuffle(indices)

    train_idx, valid_idx = indices[split:], indices[:split]
    train_sampler = SubsetRandomSampler(train_idx)
    valid_sampler = SubsetRandomSampler(valid_idx)

    train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, sampler=train_sampler)
    valid_loader = torch.utils.data.DataLoader(valid_dataset, batch_size=batch_size, sampler=valid_sampler)
    return train_loader, valid_loader

def get_test_loader(data_dir, batch_size, shuffle=True):
    normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    transform = transforms.Compose([transforms.Resize((227,227)), transforms.ToTensor(), normalize])
    dataset = datasets.CIFAR10(root=data_dir, train=False, download=True, transform=transform)
    data_loader = torch.utils.data.DataLoader(dataset, batch_size=batch_size, shuffle=shuffle)
    return data_loader

# CIFAR10 dataset
train_loader, valid_loader = get_train_valid_loader(data_dir='./data', batch_size=64, augment=False, random_seed=1)
test_loader = get_test_loader(data_dir='./data', batch_size=64)

Key steps:

Data Loaders: get_train_valid_loader and get_test_loader handle training/validation and testing sets respectively and are essential for writing AlexNet in PyTorch.
Normalization: Normalizes the dataset using pre-calculated mean and standard deviations.
Data Augmentation: Randomly crops and flips training images to improve model robustness.
Train/Validation Split: Divides the training data into training and validation subsets (90:10).
Batching and Shuffling: Loads data in batches and shuffles it for variance, using data loaders.

3. Defining the AlexNet Model

class AlexNet(nn.Module):
    def __init__(self, num_classes=10):
        super(AlexNet, self).__init__()
        self.layer1 = nn.Sequential(
            nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
            nn.BatchNorm2d(96),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.layer3 = nn.Sequential(
            nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer4 = nn.Sequential(
            nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(384),
            nn.ReLU())
        self.layer5 = nn.Sequential(
            nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size = 3, stride = 2))
        self.fc = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(9216, 4096),
            nn.ReLU())
        self.fc1 = nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(4096, 4096),
            nn.ReLU())
        self.fc2= nn.Sequential(
            nn.Linear(4096, num_classes))

    def forward(self, x):
        out = self.layer1(x)
        out = self.layer2(out)
        out = self.layer3(out)
        out = self.layer4(out)
        out = self.layer5(out)
        out = out.reshape(out.size(0), -1)
        out = self.fc(out)
        out = self.fc1(out)
        out = self.fc2(out)
        return out

Explanation:

nn.Module Inheritance: Defines a class inheriting from nn.Module.
__init__: Initializes the layers: convolutional, max pooling, batch normalization, and fully connected. nn.Sequential organizes layers.
forward: Defines the order in which the layers process the input image.
Convolutional Layers: nn.Conv2D with appropriate kernel sizes and input/output channels.
Max Pooling: nn.MaxPool2D for downsampling.
Fully Connected Layers: nn.Linear with dropout (nn.Dropout) and ReLU activation.

4. Setting Hyperparameters

num_classes = 10
num_epochs = 20
batch_size = 64
learning_rate = 0.005

model = AlexNet(num_classes).to(device)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

# Train the model
total_step = len(train_loader)

Hyperparameter Definitions:

Epochs, Batch Size, Learning Rate: Sets fundamental training parameters.
Model Initialization: Creates an instance of the AlexNet class and moves it to the device (CPU or GPU).
Loss Function: nn.CrossEntropyLoss suitable for multi-class classification.
Optimizer: torch.optim.SGD (Stochastic Gradient Descent) to update model weights.
Total Steps: Calculates the number of steps per epoch.

5. Training the Model

total_step = len(train_loader)

for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        images = images.to(device)
        labels = labels.to(device)

        outputs = model(images)
        loss = criterion(outputs, labels)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, i+1, total_step, loss.item()))

        # Validation
        with torch.no_grad():
            correct = 0
            total = 0
            for images, labels in valid_loader:
                images = images.to(device)
                labels = labels.to(device)
                outputs = model(images)
                _, predicted = torch.max(outputs.data, 1)
                total += labels.size(0)
                correct += (predicted == labels).sum().item()
                del images, labels, outputs

            print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Code Breakdown:

Epoch Iteration: Loops through the training data for the specified number of epochs.
Batch Iteration: Loads images and labels in batches.
Device Transfer: Moves data to the specified device (CPU or GPU).
Forward Pass: Passes images through the model to obtain predictions.
Loss Calculation: Calculates the loss between predictions and actual labels.
Backward Pass: Computes gradients of the loss with respect to model parameters.
Optimization: Updates model weights using the optimizer.
Gradient Reset: Resets gradients to zero before each update.
Validation: Calculates accuracy on the validation set after each epoch.

6. Testing the Model

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.to(device)
        labels = labels.to(device)
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        del images, labels, outputs

    print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

This code evaluates the trained model on the unseen test dataset to assess its generalization performance. Using only 6 epochs, you can achieve the validation set resulting in about 78.8% accuracy.

Build Your Own AlexNet: A Practical PyTorch Tutorial for Image Classification

Why Build AlexNet from Scratch?

Prerequisites: Setting the Stage for Success

Demystifying AlexNet: Architecture at a Glance

Dataset Preparation: Loading and Preprocessing CIFAR-10

Code Implementation: Building AlexNet in PyTorch Step-by-Step

1. Importing Libraries

2. Loading and Preprocessing the Dataset

3. Defining the AlexNet Model

4. Setting Hyperparameters

5. Training the Model

6. Testing the Model

Conclusion: Congratulations, You Built AlexNet!

Build Your Own AlexNet: A Practical PyTorch Tutorial for Image Classification

Why Build AlexNet from Scratch?

Prerequisites: Setting the Stage for Success

Demystifying AlexNet: Architecture at a Glance

Dataset Preparation: Loading and Preprocessing CIFAR-10

Code Implementation: Building AlexNet in PyTorch Step-by-Step

1. Importing Libraries

2. Loading and Preprocessing the Dataset

3. Defining the AlexNet Model

4. Setting Hyperparameters

5. Training the Model

6. Testing the Model

Conclusion: Congratulations, You Built AlexNet!

Related Posts