Master AlexNet with PyTorch: A Step-by-Step Guide to Building from Scratch

Updated on September 16, 2024

Ready to build your own image recognition system? This tutorial provides a practical, in-depth guide to crafting AlexNet, a groundbreaking convolutional neural network, from the ground up using PyTorch. We'll cover everything from data loading to model evaluation, ensuring you grasp the core concepts and implementation details.

Why Build AlexNet from Scratch?

Deeper Understanding: Gain a strong grasp of CNN architecture and implementation.
Customization: Tailor AlexNet to your specific image classification tasks.
Strong Foundation: Develop a solid base for exploring more advanced deep learning models.

Prerequisites: Your Launchpad into Deep Learning

Before diving in, ensure you have a foundational understanding of the following:

Neural Networks: Familiarity with layers, activation functions (like ReLU), optimization algorithms (like gradient descent), and loss functions is key.
CNN Fundamentals: Understand convolutional layers, pooling layers, stride, padding, and kernel size.
Python & PyTorch: A working knowledge of Python syntax and the PyTorch library is essential.

AlexNet: A Revolutionary Architecture Explained

AlexNet, introduced by Alex Krizhevsky et al. in 2012, achieved state-of-the-art results in the ImageNet LSVRC-2010 competition. Here's what made it special:

Deep Convolutional Network: Multiple layers for complex feature extraction.
ReLU Activation: Faster training and improved performance.
Max Pooling: Effective subsampling to reduce dimensionality.
Multiple GPUs: Parallel processing for accelerated training.

AlexNet used 3-channel images of size (224x224x3). It used max pooling along with ReLU activations when subsampling. The kernels used for convolutions were either 11x11, 5x5, or 3x3 while kernels used for max pooling were 3x3 in size.

Preparing Your Data: Loading and Preprocessing CIFAR-10

We'll use the CIFAR-10 dataset, consisting of 60,000 32x32 color images across 10 classes. Each class contains 6,000 images, split into 50,000 training and 10,000 testing images. The CIFAR-10 dataset is perfect for learning image classification with AlexNet and other convolutional neural networks.

Step-by-Step: Setting Up Your Environment and Importing Libraries

First, let's import necessary libraries and configure our device to use a GPU if available:

 import numpy as np
 import torch
 import torch.nn as nn
 from torchvision import datasets
 from torchvision import transforms
 from torch.utils.data.sampler import SubsetRandomSampler

 # Device configuration
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

This code snippet imports essential libraries like NumPy for numerical operations, PyTorch for neural network functionalities, and torchvision for easy dataset loading and preprocessing.

Loading and Preprocessing the CIFAR-10 Dataset Using PyTorch

 def get_train_valid_loader(data_dir,
 batch_size,
 augment,
 random_seed,
 valid_size=0.1,
 shuffle=True):
 normalize = transforms.Normalize(
 mean=[0.4914, 0.4822, 0.4465],
 std=[0.2023, 0.1994, 0.2010],
 )

 # define transforms
 valid_transform = transforms.Compose([
 transforms.Resize((227,227)),
 transforms.ToTensor(),
 normalize,
 ])
 if augment:
 train_transform = transforms.Compose([
 transforms.RandomCrop(32, padding=4),
 transforms.RandomHorizontalFlip(),
 transforms.ToTensor(),
 normalize,
 ])
 else:
 train_transform = transforms.Compose([
 transforms.Resize((227,227)),
 transforms.ToTensor(),
 normalize,
 ])

 # load the dataset
 train_dataset = datasets.CIFAR10(
 root=data_dir, train=True,
 download=True, transform=train_transform,
 )

 valid_dataset = datasets.CIFAR10(
 root=data_dir, train=True,
 download=True, transform=valid_transform,
 )

 num_train = len(train_dataset)
 indices = list(range(num_train))
 split = int(np.floor(valid_size * num_train))

 if shuffle:
 np.random.seed(random_seed)
 np.random.shuffle(indices)

 train_idx, valid_idx = indices[split:], indices[:split]
 train_sampler = SubsetRandomSampler(train_idx)
 valid_sampler = SubsetRandomSampler(valid_idx)

 train_loader = torch.utils.data.DataLoader(
 train_dataset, batch_size=batch_size, sampler=train_sampler)

 valid_loader = torch.utils.data.DataLoader(
 valid_dataset, batch_size=batch_size, sampler=valid_sampler)

 return (train_loader, valid_loader)

 def get_test_loader(data_dir,
 batch_size,
 shuffle=True):
 normalize = transforms.Normalize(
 mean=[0.485, 0.456, 0.406],
 std=[0.229, 0.224, 0.225],
 )

 # define transform
 transform = transforms.Compose([
 transforms.Resize((227,227)),
 transforms.ToTensor(),
 normalize,
 ])

 dataset = datasets.CIFAR10(
 root=data_dir, train=False,
 download=True, transform=transform,
 )

 data_loader = torch.utils.data.DataLoader(
 dataset, batch_size=batch_size, shuffle=shuffle
 )

 return data_loader

 # CIFAR10 dataset
 train_loader, valid_loader = get_train_valid_loader(data_dir = './data', batch_size = 64,
 augment = False, random_seed = 1)

 test_loader = get_test_loader(data_dir = './data',
 batch_size = 64)

Key takeaways from the code:

Two functions, get_train_valid_loader and get_test_loader, manage the loading of training/validation and test sets.
Normalization is applied using pre-calculated mean and standard deviations for each color channel (red, green, blue).
Augmentation techniques are available for the training dataset to improve model robustness.
The training dataset is divided into training and validation sets (90:10 ratio).
torch.utils.data.DataLoader is used to load datasets in mini-batches.

Building AlexNet: Defining the Architecture in PyTorch

Now, let's define the AlexNet architecture using PyTorch:

 class AlexNet(nn.Module):
 def __init__(self, num_classes=10):
 super(AlexNet, self).__init__()
 self.layer1 = nn.Sequential(
 nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
 nn.BatchNorm2d(96),
 nn.ReLU(),
 nn.MaxPool2d(kernel_size = 3, stride = 2))
 self.layer2 = nn.Sequential(
 nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
 nn.BatchNorm2d(256),
 nn.ReLU(),
 nn.MaxPool2d(kernel_size = 3, stride = 2))
 self.layer3 = nn.Sequential(
 nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
 nn.BatchNorm2d(384),
 nn.ReLU())
 self.layer4 = nn.Sequential(
 nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
 nn.BatchNorm2d(384),
 nn.ReLU())
 self.layer5 = nn.Sequential(
 nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
 nn.BatchNorm2d(256),
 nn.ReLU(),
 nn.MaxPool2d(kernel_size = 3, stride = 2))
 self.fc = nn.Sequential(
 nn.Dropout(0.5),
 nn.Linear(9216, 4096),
 nn.ReLU())
 self.fc1 = nn.Sequential(
 nn.Dropout(0.5),
 nn.Linear(4096, 4096),
 nn.ReLU())
 self.fc2= nn.Sequential(
 nn.Linear(4096, num_classes))

 def forward(self, x):
 out = self.layer1(x)
 out = self.layer2(out)
 out = self.layer3(out)
 out = self.layer4(out)
 out = self.layer5(out)
 out = out.reshape(out.size(0), -1)
 out = self.fc(out)
 out = self.fc1(out)
 out = self.fc2(out)
 return out

__init__: Initializes the layers of the CNN. Uses nn.Sequential for organized layer combinations. The nn.Conv2d function defines the convolutional layers with kernel_size and input/output channels; max pooling happens with nn.MaxPool2D function. Fully connected layers are defined with nn.Linear and nn.Dropout functions.
forward: Defines the sequence in which the layers process images.

Training the AlexNet Model: Hyperparameter Setup and Optimization

Before training, we need to set our hyperparameters (epochs, batch size, learning rate), define the loss function, and choose our optimization algorithm:

 num_classes = 10
 num_epochs = 20
 batch_size = 64
 learning_rate = 0.005

 model = AlexNet(num_classes).to(device)

 # Loss and optimizer
 criterion = nn.CrossEntropyLoss()
 optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

 # Train the model
 total_step = len(train_loader)

This configures the training process, selecting the CrossEntropyLoss function (suitable for classification) and the SGD optimizer. The AlexNet PyTorch implementation benefits from careful hyperparameter tuning for optimal performance.

The Training Loop: Optimizing Our Network

Here's the core training loop:

 total_step = len(train_loader)

 for epoch in range(num_epochs):
 for i, (images, labels) in enumerate(train_loader):
 # Move tensors to the configured device
 images = images.to(device)
 labels = labels.to(device)

 # Forward pass
 outputs = model(images)
 loss = criterion(outputs, labels)

 # Backward and optimize
 optimizer.zero_grad()
 loss.backward()
 optimizer.step()

 print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
 .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

 # Validation
 with torch.no_grad():
 correct = 0
 total = 0
 for images, labels in valid_loader:
 images = images.to(device)
 labels = labels.to(device)
 outputs = model(images)
 _, predicted = torch.max(outputs.data, 1)
 total += labels.size(0)
 correct += (predicted == labels).sum().item()
 del images, labels, outputs

 print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Iterate through epochs and batches: This loop feeds the training data to the model.
Forward Pass: Calculates the output of the model and the loss.
Backward Pass: Updates the model's weights using backpropagation.
Validation: Evaluates the model's performance on the validation set after each epoch.

Evaluating Performance: Testing Your AlexNet Model

After training, it's crucial to evaluate your model on unseen data:

 with torch.no_grad():
 correct = 0
 total = 0
 for images, labels in test_loader:
 images = images.to(device)
 labels = labels.to(device)
 outputs = model(images)
 _, predicted = torch.max(outputs.data, 1)
 total += labels.size(0)
 correct += (predicted == labels).sum().item()
 del images, labels, outputs

 print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

This code calculates the accuracy of the AlexNet model trained on CIFAR-10 by comparing the model's predictions to the actual labels in the test dataset.

Conclusion

You've successfully built AlexNet from scratch using PyTorch! This is a significant step towards mastering CNNs and deep learning. Experiment with different hyperparameters, datasets, and architectures to further enhance your skills. You’ve learned to train a custom AlexNet with PyTorch!

Master AlexNet with PyTorch: A Step-by-Step Guide to Building from Scratch

Updated on September 16, 2024

Why Build AlexNet from Scratch?

Deeper Understanding: Gain a strong grasp of CNN architecture and implementation.
Customization: Tailor AlexNet to your specific image classification tasks.
Strong Foundation: Develop a solid base for exploring more advanced deep learning models.

Prerequisites: Your Launchpad into Deep Learning

Before diving in, ensure you have a foundational understanding of the following:

Neural Networks: Familiarity with layers, activation functions (like ReLU), optimization algorithms (like gradient descent), and loss functions is key.
CNN Fundamentals: Understand convolutional layers, pooling layers, stride, padding, and kernel size.
Python & PyTorch: A working knowledge of Python syntax and the PyTorch library is essential.

AlexNet: A Revolutionary Architecture Explained

AlexNet, introduced by Alex Krizhevsky et al. in 2012, achieved state-of-the-art results in the ImageNet LSVRC-2010 competition. Here's what made it special:

Deep Convolutional Network: Multiple layers for complex feature extraction.
ReLU Activation: Faster training and improved performance.
Max Pooling: Effective subsampling to reduce dimensionality.
Multiple GPUs: Parallel processing for accelerated training.

Preparing Your Data: Loading and Preprocessing CIFAR-10

Step-by-Step: Setting Up Your Environment and Importing Libraries

First, let's import necessary libraries and configure our device to use a GPU if available:

 import numpy as np
 import torch
 import torch.nn as nn
 from torchvision import datasets
 from torchvision import transforms
 from torch.utils.data.sampler import SubsetRandomSampler

 # Device configuration
 device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

This code snippet imports essential libraries like NumPy for numerical operations, PyTorch for neural network functionalities, and torchvision for easy dataset loading and preprocessing.

Loading and Preprocessing the CIFAR-10 Dataset Using PyTorch

 def get_train_valid_loader(data_dir,
 batch_size,
 augment,
 random_seed,
 valid_size=0.1,
 shuffle=True):
 normalize = transforms.Normalize(
 mean=[0.4914, 0.4822, 0.4465],
 std=[0.2023, 0.1994, 0.2010],
 )

 # define transforms
 valid_transform = transforms.Compose([
 transforms.Resize((227,227)),
 transforms.ToTensor(),
 normalize,
 ])
 if augment:
 train_transform = transforms.Compose([
 transforms.RandomCrop(32, padding=4),
 transforms.RandomHorizontalFlip(),
 transforms.ToTensor(),
 normalize,
 ])
 else:
 train_transform = transforms.Compose([
 transforms.Resize((227,227)),
 transforms.ToTensor(),
 normalize,
 ])

 # load the dataset
 train_dataset = datasets.CIFAR10(
 root=data_dir, train=True,
 download=True, transform=train_transform,
 )

 valid_dataset = datasets.CIFAR10(
 root=data_dir, train=True,
 download=True, transform=valid_transform,
 )

 num_train = len(train_dataset)
 indices = list(range(num_train))
 split = int(np.floor(valid_size * num_train))

 if shuffle:
 np.random.seed(random_seed)
 np.random.shuffle(indices)

 train_idx, valid_idx = indices[split:], indices[:split]
 train_sampler = SubsetRandomSampler(train_idx)
 valid_sampler = SubsetRandomSampler(valid_idx)

 train_loader = torch.utils.data.DataLoader(
 train_dataset, batch_size=batch_size, sampler=train_sampler)

 valid_loader = torch.utils.data.DataLoader(
 valid_dataset, batch_size=batch_size, sampler=valid_sampler)

 return (train_loader, valid_loader)

 def get_test_loader(data_dir,
 batch_size,
 shuffle=True):
 normalize = transforms.Normalize(
 mean=[0.485, 0.456, 0.406],
 std=[0.229, 0.224, 0.225],
 )

 # define transform
 transform = transforms.Compose([
 transforms.Resize((227,227)),
 transforms.ToTensor(),
 normalize,
 ])

 dataset = datasets.CIFAR10(
 root=data_dir, train=False,
 download=True, transform=transform,
 )

 data_loader = torch.utils.data.DataLoader(
 dataset, batch_size=batch_size, shuffle=shuffle
 )

 return data_loader

 # CIFAR10 dataset
 train_loader, valid_loader = get_train_valid_loader(data_dir = './data', batch_size = 64,
 augment = False, random_seed = 1)

 test_loader = get_test_loader(data_dir = './data',
 batch_size = 64)

Key takeaways from the code:

Two functions, get_train_valid_loader and get_test_loader, manage the loading of training/validation and test sets.
Normalization is applied using pre-calculated mean and standard deviations for each color channel (red, green, blue).
Augmentation techniques are available for the training dataset to improve model robustness.
The training dataset is divided into training and validation sets (90:10 ratio).
torch.utils.data.DataLoader is used to load datasets in mini-batches.

Building AlexNet: Defining the Architecture in PyTorch

Now, let's define the AlexNet architecture using PyTorch:

 class AlexNet(nn.Module):
 def __init__(self, num_classes=10):
 super(AlexNet, self).__init__()
 self.layer1 = nn.Sequential(
 nn.Conv2d(3, 96, kernel_size=11, stride=4, padding=0),
 nn.BatchNorm2d(96),
 nn.ReLU(),
 nn.MaxPool2d(kernel_size = 3, stride = 2))
 self.layer2 = nn.Sequential(
 nn.Conv2d(96, 256, kernel_size=5, stride=1, padding=2),
 nn.BatchNorm2d(256),
 nn.ReLU(),
 nn.MaxPool2d(kernel_size = 3, stride = 2))
 self.layer3 = nn.Sequential(
 nn.Conv2d(256, 384, kernel_size=3, stride=1, padding=1),
 nn.BatchNorm2d(384),
 nn.ReLU())
 self.layer4 = nn.Sequential(
 nn.Conv2d(384, 384, kernel_size=3, stride=1, padding=1),
 nn.BatchNorm2d(384),
 nn.ReLU())
 self.layer5 = nn.Sequential(
 nn.Conv2d(384, 256, kernel_size=3, stride=1, padding=1),
 nn.BatchNorm2d(256),
 nn.ReLU(),
 nn.MaxPool2d(kernel_size = 3, stride = 2))
 self.fc = nn.Sequential(
 nn.Dropout(0.5),
 nn.Linear(9216, 4096),
 nn.ReLU())
 self.fc1 = nn.Sequential(
 nn.Dropout(0.5),
 nn.Linear(4096, 4096),
 nn.ReLU())
 self.fc2= nn.Sequential(
 nn.Linear(4096, num_classes))

 def forward(self, x):
 out = self.layer1(x)
 out = self.layer2(out)
 out = self.layer3(out)
 out = self.layer4(out)
 out = self.layer5(out)
 out = out.reshape(out.size(0), -1)
 out = self.fc(out)
 out = self.fc1(out)
 out = self.fc2(out)
 return out

__init__: Initializes the layers of the CNN. Uses nn.Sequential for organized layer combinations. The nn.Conv2d function defines the convolutional layers with kernel_size and input/output channels; max pooling happens with nn.MaxPool2D function. Fully connected layers are defined with nn.Linear and nn.Dropout functions.
forward: Defines the sequence in which the layers process images.

Training the AlexNet Model: Hyperparameter Setup and Optimization

Before training, we need to set our hyperparameters (epochs, batch size, learning rate), define the loss function, and choose our optimization algorithm:

 num_classes = 10
 num_epochs = 20
 batch_size = 64
 learning_rate = 0.005

 model = AlexNet(num_classes).to(device)

 # Loss and optimizer
 criterion = nn.CrossEntropyLoss()
 optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate, weight_decay = 0.005, momentum = 0.9)

 # Train the model
 total_step = len(train_loader)

The Training Loop: Optimizing Our Network

Here's the core training loop:

 total_step = len(train_loader)

 for epoch in range(num_epochs):
 for i, (images, labels) in enumerate(train_loader):
 # Move tensors to the configured device
 images = images.to(device)
 labels = labels.to(device)

 # Forward pass
 outputs = model(images)
 loss = criterion(outputs, labels)

 # Backward and optimize
 optimizer.zero_grad()
 loss.backward()
 optimizer.step()

 print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}'
 .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

 # Validation
 with torch.no_grad():
 correct = 0
 total = 0
 for images, labels in valid_loader:
 images = images.to(device)
 labels = labels.to(device)
 outputs = model(images)
 _, predicted = torch.max(outputs.data, 1)
 total += labels.size(0)
 correct += (predicted == labels).sum().item()
 del images, labels, outputs

 print('Accuracy of the network on the {} validation images: {} %'.format(5000, 100 * correct / total))

Iterate through epochs and batches: This loop feeds the training data to the model.
Forward Pass: Calculates the output of the model and the loss.
Backward Pass: Updates the model's weights using backpropagation.
Validation: Evaluates the model's performance on the validation set after each epoch.

Evaluating Performance: Testing Your AlexNet Model

After training, it's crucial to evaluate your model on unseen data:

 with torch.no_grad():
 correct = 0
 total = 0
 for images, labels in test_loader:
 images = images.to(device)
 labels = labels.to(device)
 outputs = model(images)
 _, predicted = torch.max(outputs.data, 1)
 total += labels.size(0)
 correct += (predicted == labels).sum().item()
 del images, labels, outputs

 print('Accuracy of the network on the {} test images: {} %'.format(10000, 100 * correct / total))

This code calculates the accuracy of the AlexNet model trained on CIFAR-10 by comparing the model's predictions to the actual labels in the test dataset.

Master AlexNet with PyTorch: A Step-by-Step Guide to Building from Scratch

Why Build AlexNet from Scratch?

Prerequisites: Your Launchpad into Deep Learning

AlexNet: A Revolutionary Architecture Explained

Preparing Your Data: Loading and Preprocessing CIFAR-10

Step-by-Step: Setting Up Your Environment and Importing Libraries

Loading and Preprocessing the CIFAR-10 Dataset Using PyTorch

Building AlexNet: Defining the Architecture in PyTorch

Training the AlexNet Model: Hyperparameter Setup and Optimization

The Training Loop: Optimizing Our Network

Evaluating Performance: Testing Your AlexNet Model

Conclusion

Master AlexNet with PyTorch: A Step-by-Step Guide to Building from Scratch

Why Build AlexNet from Scratch?

Prerequisites: Your Launchpad into Deep Learning

AlexNet: A Revolutionary Architecture Explained

Preparing Your Data: Loading and Preprocessing CIFAR-10

Step-by-Step: Setting Up Your Environment and Importing Libraries

Loading and Preprocessing the CIFAR-10 Dataset Using PyTorch

Building AlexNet: Defining the Architecture in PyTorch

Training the AlexNet Model: Hyperparameter Setup and Optimization

The Training Loop: Optimizing Our Network

Evaluating Performance: Testing Your AlexNet Model

Conclusion

Related Posts