Build AlexNet with PyTorch: A Step-by-Step Guide
Dive into the world of computer vision by building AlexNet from scratch using PyTorch. This tutorial will guide you through each step, from understanding the architecture to training and evaluating your model.
Updated on September 16, 2024
Why Build AlexNet?
AlexNet was a groundbreaking deep convolutional neural network that achieved state-of-the-art results in the ImageNet LSVRC-2010 competition. Building it from scratch offers valuable insights in CNN principles.
Prerequisites
Before diving in, ensure you have a basic understanding of:
- Neural networks (layers, activation functions, optimization algorithms, loss functions).
- Python syntax and the PyTorch library.
- Convolutional Neural Networks (CNNs), including convolutional and pooling layers.
Understanding AlexNet Architecture
AlexNet's key features are:
- Input image size: 224x224x3.
- ReLU activations.
- Max pooling for subsampling.
- Convolutional kernel sizes: 11x11, 5x5, and 3x3.
- Max pooling kernel size: 3x3.
- Classification into 1000 classes.
- Multi-GPU utilization.
Data Preparation: Loading and Preprocessing CIFAR-10
We'll use the CIFAR-10 dataset, which contains 60,000 32x32 color images in 10 classes.
- 50,000 training images.
- 10,000 test images.
CIFAR-10 Classes:
- Airplane
- Automobile
- Bird
- Cat
- Deer
- Dog
- Frog
- Horse
- Ship
- Truck
Step 1: Import Necessary Libraries
Import essential libraries for building and training your AlexNet model.
This code ensures that the notebook uses GPU if available, for faster training.
Step 2: Load and Preprocess the Dataset
Utilize torchvision
and helper functions to load and pre-process the CIFAR-10 data.
Code Breakdown:
- Two functions,
get_train_valid_loader
andget_test_loader
, load the train/validation and test sets respectively. normalize
is defined using the mean and standard deviations of each channel (red, green, and blue) in the dataset.- Data augmentation is applied to the training subset only.
- The training dataset is split into training and validation sets (90:10 ratio).
- Data loaders iterate through the data in batches for optimal memory usage.
Step 3: Define the AlexNet Model from Scratch
Create the AlexNet architecture using PyTorch's nn.Module
.
Key Points:
- Inherit from
nn.Module
to define the neural network. - Initialize layers in
__init__
. - Define the sequence of layers in the
forward
function. - Use
nn.Conv2D
for convolutional layers andnn.MaxPool2D
for max pooling. - Combine layers, activation functions, and max pooling using
nn.Sequential
for better organization. - Define fully connected layers using
nn.Linear
andnn.Dropout
. - The last layer outputs 10 neurons for the CIFAR-10's 10 classes.
Step 4: Set Hyperparameters
Define hyperparameters like the loss function, optimizer, batch size, learning rate, and number of epochs.
Parameters Defined:
num_classes
: Set to 10 for CIFAR-10.num_epochs
: Number of training cycles.batch_size
: Number of images in each batch.learning_rate
: Controls the step size during optimization.- Loss Function: Cross entropy loss.
- Optimizer: SGD is used for this example.
total_step
: Keeping track of training steps.
Step 5: Train the AlexNet Model on CIFAR-10
Train the model using the training data and validate it using the validation data.
Code Explanation:
- Iterate through the number of epochs and batches in the training data.
- Move images and labels to the appropriate device (GPU or CPU).
- Perform the forward pass to get the model's predictions.
- Calculate the loss using the predictions and actual labels.
- Perform the backward pass to update the model's weights.
- Zero the gradients before each update using
optimizer.zero_grad()
. - Calculate new gradients using
loss.backward()
. - Update weights with
optimizer.step()
. - Validate the model's accuracy at the end of each epoch using the validation set.
Step 6: Evaluate the Model on the Test Dataset
Assess the model's performance on unseen data.
The code is the same as the validation loop.
Key Takeaways
- Detailed walk-through of AlexNet architecture.
- Hands-on experience in building a CNN in PyTorch.
- Data loading and preprocessing techniques.
- Training and validation methodologies.
- Testing model accuracy on unseen data (CIFAR 10 test set).