Data Augmentation for Object Detection: Rotate and Shear Images with Bounding Boxes

Want to improve your object detection model? Learn how to use image rotation and shearing techniques to create more training data. By augmenting your dataset, you can significantly boost your model's accuracy and robustness. This article provides a step-by-step guide using OpenCV to rotate and shear images while ensuring your bounding boxes remain accurate.

Why Data Augmentation is Crucial for Object Detection

Data augmentation increases the size and diversity of your training dataset without collecting new data. This helps your model generalize better to unseen images and reduces overfitting, leading to improved performance. With data augmentation techniques like rotation and shearing, you artificially introduce variety, making your model more robust to different object orientations and perspectives. If you want to improve your bounding box accuracy, data augmentation is an easy choice.

Increase Dataset Size: Generate more training examples from existing data.
Improve Generalization: Reduce overfitting by exposing the model to diverse variations.
Enhance Robustness: Make the model invariant to changes in object pose and viewpoint.

Understanding Affine Transformations: The Basics

Affine transformations preserve parallel lines during image manipulation. Scaling, translation, rotation, and shearing are all examples of affine transformations. These transformations can be represented by a transformation matrix, which is used to compute the new coordinates of a pixel after the transformation by matrix multiplication. Using cv2.warpAffine you won't ever have to write the math yourself.

Affine Transformation: A transformation that preserves parallel lines.
Transformation Matrix: A matrix used to perform affine transformations efficiently.

Rotating Images and Bounding Boxes with OpenCV

Rotating images correctly is essential, but rotating the bounding boxes too is critical.

Image Rotation Explained

OpenCV's cv2.getRotationMatrix2D function computes the transformation matrix for rotating an image about its center. The cv2.warpAffine function then applies this transformation.

M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
image = cv2.warpAffine(image, M, (w, h))

Problems can occur when the rotation transformation cuts the image and loses information. To solve this problem, adjust the dimensions of the output image to accommodate the entire rotated image.

Calculating New Image Dimensions

To avoid cropping issues, we must calculate the new width and height of the rotated image using trigonometry:

$$ N_w = h * sin(\theta) + w * cos(\theta) \\ N_h = h * cos(\theta) + w * sin(\theta) $$

Where Nw and Nh are the new width and height respectively. By adjusting the transformation matrix to account for translation, the center of rotation remains consistent.

Rotating Bounding Boxes Accurately

Rotating bounding boxes involves more than rotating the image. It requires finding the tightest rectangle parallel to the image sides that encloses the rotated box. You will need to determine the four corners of the original bounding box. Rotate the bounding box using the transformation matrix, and get the new coordinates for the bounding box. After doing this you can create a new tight bounding box around the transformed coordinates. get_enclosing_box will give you the tightest box.

Code Implementation for Rotation and Bounding Boxes

def rotate_im(image, angle):
    (h, w) = image.shape[:2]
    (cX, cY) = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
    cos = np.abs(M[0, 0])
    sin = np.abs(M[0, 1])
    nW = int((h * sin) + (w * cos))
    nH = int((h * cos) + (w * sin))
    M[0, 2] += (nW / 2) - cX
    M[1, 2] += (nH / 2) - cY
    image = cv2.warpAffine(image, M, (nW, nH))
    return image

Shearing Images and Bounding Boxes for Enhanced Variety

Shearing transforms a rectangular image into a parallelogram. This technique can be particularly useful for images where the objects might appear skewed or tilted.

Horizontal Shearing Transformation

The transformation matrix for horizontal shearing is:

The pixel with coordinates (x, y) is moved to (x + alpha*y, y), where alpha is the shearing factor.

Implementing Shearing in OpenCV

To implement shearing, one can use OpenCV's cv2.warpAffine function with the appropriate transformation matrix. The x-coordinates of the bounding boxes are adjusted according to the shear factor.

M = np.array([[1, abs(shear_factor), 0],[0,1,0]])
nW = img.shape[1] + abs(shear_factor*img.shape[0])
bboxes[:,[0,2]] += ((bboxes[:,[1,3]]) * abs(shear_factor) ).astype(int)
img = cv2.warpAffine(img, M, (int(nW), img.shape[0]))

Maximize Object Detection Performance

Data augmentation through rotation and shearing is a great way to improve object detection models by increasing dataset variability and robustness. Utilizing OpenCV's affine transformation capabilities, it becomes straightforward to implement these enhancements by accurately rotating and shearing both images and their corresponding bounding boxes.