Bounding Box Data Augmentation: Rotate and Shear Images for Object Detection

Want to make your object detection models more robust? Learn how to implement data augmentation using rotation and shearing techniques to improve accuracy. This guide provides a step-by-step tutorial using OpenCV, complete with code examples that you can directly apply.

Why Data Augmentation Matters for Object Detection

Data augmentation is essential for training robust object detection models. By artificially increasing the size and diversity of your dataset, you can improve your model's ability to generalize to new, unseen images. Techniques like rotation and shearing introduce variations in object orientation and perspective, making the model less sensitive to these factors. This tutorial shows how to implement these transformations effectively.

Source Code

All the code discussed in this article can be found in this Github repository:

https://github.com/Paperspace/DataAugmentationForObjectDetection

Feel free to clone and experiment.

Unleash Powerful Image Rotation for Data Augmentation

Rotation involves rotating an image by a certain angle. It's one of the trickier augmentations to manage, particularly when dealing with bounding boxes.

Let's look at implementation details for doing a rotation.

Understanding Affine Transformations

Before diving into the code, let's clarify some concepts:

Affine Transformation: A transformation that preserves parallel lines. Scaling, translation, and rotation are examples.
Transformation Matrix: A matrix used to perform affine transformations. Multiplying this matrix with a point's coordinates yields the transformed coordinates.

OpenCV's cv2.warpAffine function handles these transformations efficiently. Let's define the __init__ function:

def __init__(self, angle = 10):
 self.angle = angle

 if type(self.angle) == tuple:
  assert len(self.angle) == 2, "Invalid range"
 else:
  self.angle = (-self.angle, self.angle)

Rotating Images with OpenCV

We use OpenCV's getRotationMatrix2D function to obtain the transformation matrix for rotation by an angle $\theta$ about the image center:

(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)

Apply the transformation using warpAffine:

image = cv2.warpAffine(image, M, (w, h))

Preventing Image Cropping During Rotation

A standard rotation can lead to image cropping. To avoid this, calculate the new dimensions of the rotated image to accommodate the entire content

cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])

# compute the new bounding dimensions of the image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))

# adjust the rotation matrix to take into account translation
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY

Encapsulate the image rotation logic in the function rotate_im.

def rotate_im(image, angle):
 """Rotate the image.

 Rotate the image such that the rotated image is enclosed inside the tightest
 rectangle. The area not occupied by the pixels of the original image is colored
 black.

 Parameters
 ----------

 image : numpy.ndarray
  numpy image

 angle : float
  angle by which the image is to be rotated

 Returns
 -------

 numpy.ndarray
  Rotated Image

 """
 # grab the dimensions of the image and then determine the
 # centre
 (h, w) = image.shape[:2]
 (cX, cY) = (w // 2, h // 2)

 # grab the rotation matrix (applying the negative of the
 # angle to rotate clockwise), then grab the sine and cosine
 # (i.e., the rotation components of the matrix)
 M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
 cos = np.abs(M[0, 0])
 sin = np.abs(M[0, 1])

 # compute the new bounding dimensions of the image
 nW = int((h * sin) + (w * cos))
 nH = int((h * cos) + (w * sin))

 # adjust the rotation matrix to take into account translation
 M[0, 2] += (nW / 2) - cX
 M[1, 2] += (nH / 2) - cY

 # perform the actual rotation and return the image
 image = cv2.warpAffine(image, M, (nW, nH))

# image = cv2.resize(image, (w,h))
 return image

Rotating Bounding Boxes

The biggest challenge lies in rotating the bounding boxes correctly. The goal is to find the tightest rectangle, parallel to the image sides, that contains the rotated bounding box.

Calculate the coordinates for all four corners of the box.

def get_corners(bboxes):
 """Get corners of bounding boxes

 Parameters
 ----------

 bboxes: numpy.ndarray
  Numpy array containing bounding boxes of shape `N X 4` where N is the
  number of bounding boxes and the bounding boxes are represented in the
  format `x1 y1 x2 y2`

 returns
 -------

 numpy.ndarray
  Numpy array of shape `N x 8` containing N bounding boxes each described by their
  corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`

 """
 width = (bboxes[:,2] - bboxes[:,0]).reshape(-1,1)
 height = (bboxes[:,3] - bboxes[:,1]).reshape(-1,1)

 x1 = bboxes[:,0].reshape(-1,1)
 y1 = bboxes[:,1].reshape(-1,1)

 x2 = x1 + width
 y2 = y1

 x3 = x1
 y3 = y1 + height

 x4 = bboxes[:,2].reshape(-1,1)
 y4 = bboxes[:,3].reshape(-1,1)

 corners = np.hstack((x1,y1,x2,y2,x3,y3,x4,y4))

 return corners

Define the rotate_box function to rotate the bounding boxes.

def rotate_box(corners,angle, cx, cy, h, w):

 """Rotate the bounding box.

 Parameters
 ----------

 corners : numpy.ndarray
  Numpy array of shape `N x 8` containing N bounding boxes each described by their
  corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`

 angle : float
  angle by which the image is to be rotated

 cx : int
  x coordinate of the center of image (about which the box will be rotated)

 cy : int
  y coordinate of the center of image (about which the box will be rotated)

 h : int
  height of the image

 w : int
  width of the image

 Returns
 -------

 numpy.ndarray
  Numpy array of shape `N x 8` containing N rotated bounding boxes each described by their
  corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`
 """

 corners = corners.reshape(-1,2)
 corners = np.hstack((corners, np.ones((corners.shape[0],1), dtype = type(corners[0][0]))))

 M = cv2.getRotationMatrix2D((cx, cy), angle, 1.0)

 cos = np.abs(M[0, 0])
 sin = np.abs(M[0, 1])

 nW = int((h * sin) + (w * cos))
 nH = int((h * cos) + (w * sin))
 # adjust the rotation matrix to take into account translation
 M[0, 2] += (nW / 2) - cx
 M[1, 2] += (nH / 2) - cy
 # Prepare the vector to be transformed
 calculated = np.dot(M,corners.T).T

 calculated = calculated.reshape(-1,8)

 return calculated

Finally, define the get_enclosing_box function to determine the coordinates of the rotated bounding box in the augmented image.

def get_enclosing_box(corners):
 """Get an enclosing box for ratated corners of a bounding box

 Parameters
 ----------

 corners : numpy.ndarray
  Numpy array of shape `N x 8` containing N bounding boxes each described by their
  corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`

 Returns
 -------

 numpy.ndarray
  Numpy array containing enclosing bounding boxes of shape `N X 4` where N is the
  number of bounding boxes and the bounding boxes are represented in the
  format `x1 y1 x2 y2`

 """
 x_ = corners[:,[0,2,4,6]]
 y_ = corners[:,[1,3,5,7]]

 xmin = np.min(x_,1).reshape(-1,1)
 ymin = np.min(y_,1).reshape(-1,1)
 xmax = np.max(x_,1).reshape(-1,1)
 ymax = np.max(y_,1).reshape(-1,1)

 final = np.hstack((xmin, ymin, xmax, ymax,corners[:,8:]))

 return final

Put it all together in the __call__ function that does the box clipping.

def __call__(self, img, bboxes):

 angle = random.uniform(*self.angle)

 w,h = img.shape[1], img.shape[0]
 cx, cy = w//2, h//2

 img = rotate_im(img, angle)

 corners = get_corners(bboxes)

 corners = np.hstack((corners, bboxes[:,4:]))

 corners[:,:8] = rotate_box(corners[:,:8], angle, cx, cy, h, w)

 new_bbox = get_enclosing_box(corners)

 scale_factor_x = img.shape[1] / w

 scale_factor_y = img.shape[0] / h

 img = cv2.resize(img, (w,h))

 new_bbox[:,:4] /= [scale_factor_x, scale_factor_y, scale_factor_x, scale_factor_y]

 bboxes = new_bbox

 bboxes = clip_box(bboxes, [0,0,w, h], 0.25)

 return img, bboxes

Add Variety with Bounding Box Shearing Data Augmentation

Shearing transforms a rectangular image into a parallelogram. The transformation matrix for horizontal shear is.

Implementing Horizontal Shear

Shearing changes the x-coordinates based on the equation x = x + alpha*y. The __init__ function defines the shear factor.

class RandomShear(object):
 """Randomly shears an image in horizontal direction


 Bounding boxes which have an area of less than 25% in the remaining in the
 transformed image is dropped. The resolution is maintained, and the remaining
 area if any is filled by black color.

 Parameters
 ----------
 shear_factor: float or tuple(float)
  if **float**, the image is sheared horizontally by a factor drawn
  randomly from a range (-`shear_factor`, `shear_factor`). If **tuple**,
  the `shear_factor` is drawn randomly from values specified by the
  tuple

 Returns
 -------

 numpy.ndaaray
  Sheared image in the numpy format of shape `HxWxC`

 numpy.ndarray
  Tranformed bounding box co-ordinates of the format `n x 4` where n is
  number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box

 """

 def __init__(self, shear_factor = 0.2):
  self.shear_factor = shear_factor

  if type(self.shear_factor) == tuple:
  assert len(self.shear_factor) == 2, "Invalid range for scaling factor"
  else:
  self.shear_factor = (-self.shear_factor, self.shear_factor)

  shear_factor = random.uniform(*self.shear_factor)

Shearing Augmentation Logic

The __call__ function applies the horizontal shear transformation.

def __call__(self, img, bboxes):

 shear_factor = random.uniform(*self.shear_factor)

 w,h = img.shape[1], img.shape[0]

 if shear_factor < 0:
  img, bboxes = HorizontalFlip()(img, bboxes)

 M = np.array([[1, abs(shear_factor), 0],[0,1,0]])

 nW = img.shape[1] + abs(shear_factor*img.shape[0])

 bboxes[:,[0,2]] += ((bboxes[:,[1,3]]) * abs(shear_factor) ).astype(int)

 img = cv2.warpAffine(img, M, (int(nW), img.shape[0]))

 if shear_factor < 0:
  img, bboxes = HorizontalFlip()(img, bboxes)

 img = img[:,:w]

 bboxes[:,[0,2]] = bboxes[:,[0,2]].clip(0,w)
 bboxes[:,[1,3]] = bboxes[:,[1,3]].clip(0,h)

 return img, bboxes

Elevate your Object Detection Models

By implementing rotation and shearing, you can significantly enhance the robustness and accuracy of your object detection models. Use this guide to implement these transformations in your data augmentation pipelines, and consider the GitHub repository as a practical source of code.

Bounding Box Data Augmentation: Rotate and Shear Images for Object Detection

Why Data Augmentation Matters for Object Detection

Source Code

All the code discussed in this article can be found in this Github repository:

https://github.com/Paperspace/DataAugmentationForObjectDetection

Feel free to clone and experiment.

Unleash Powerful Image Rotation for Data Augmentation

Rotation involves rotating an image by a certain angle. It's one of the trickier augmentations to manage, particularly when dealing with bounding boxes.

Let's look at implementation details for doing a rotation.

Understanding Affine Transformations

Before diving into the code, let's clarify some concepts:

Affine Transformation: A transformation that preserves parallel lines. Scaling, translation, and rotation are examples.
Transformation Matrix: A matrix used to perform affine transformations. Multiplying this matrix with a point's coordinates yields the transformed coordinates.

OpenCV's cv2.warpAffine function handles these transformations efficiently. Let's define the __init__ function:

def __init__(self, angle = 10):
 self.angle = angle

 if type(self.angle) == tuple:
  assert len(self.angle) == 2, "Invalid range"
 else:
  self.angle = (-self.angle, self.angle)

Rotating Images with OpenCV

We use OpenCV's getRotationMatrix2D function to obtain the transformation matrix for rotation by an angle $\theta$ about the image center:

(h, w) = image.shape[:2]
(cX, cY) = (w // 2, h // 2)
M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)

Apply the transformation using warpAffine:

image = cv2.warpAffine(image, M, (w, h))

Preventing Image Cropping During Rotation

A standard rotation can lead to image cropping. To avoid this, calculate the new dimensions of the rotated image to accommodate the entire content

cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])

# compute the new bounding dimensions of the image
nW = int((h * sin) + (w * cos))
nH = int((h * cos) + (w * sin))

# adjust the rotation matrix to take into account translation
M[0, 2] += (nW / 2) - cX
M[1, 2] += (nH / 2) - cY

Encapsulate the image rotation logic in the function rotate_im.

def rotate_im(image, angle):
 """Rotate the image.

 Rotate the image such that the rotated image is enclosed inside the tightest
 rectangle. The area not occupied by the pixels of the original image is colored
 black.

 Parameters
 ----------

 image : numpy.ndarray
  numpy image

 angle : float
  angle by which the image is to be rotated

 Returns
 -------

 numpy.ndarray
  Rotated Image

 """
 # grab the dimensions of the image and then determine the
 # centre
 (h, w) = image.shape[:2]
 (cX, cY) = (w // 2, h // 2)

 # grab the rotation matrix (applying the negative of the
 # angle to rotate clockwise), then grab the sine and cosine
 # (i.e., the rotation components of the matrix)
 M = cv2.getRotationMatrix2D((cX, cY), angle, 1.0)
 cos = np.abs(M[0, 0])
 sin = np.abs(M[0, 1])

 # compute the new bounding dimensions of the image
 nW = int((h * sin) + (w * cos))
 nH = int((h * cos) + (w * sin))

 # adjust the rotation matrix to take into account translation
 M[0, 2] += (nW / 2) - cX
 M[1, 2] += (nH / 2) - cY

 # perform the actual rotation and return the image
 image = cv2.warpAffine(image, M, (nW, nH))

# image = cv2.resize(image, (w,h))
 return image

Rotating Bounding Boxes

The biggest challenge lies in rotating the bounding boxes correctly. The goal is to find the tightest rectangle, parallel to the image sides, that contains the rotated bounding box.

Calculate the coordinates for all four corners of the box.

def get_corners(bboxes):
 """Get corners of bounding boxes

 Parameters
 ----------

 bboxes: numpy.ndarray
  Numpy array containing bounding boxes of shape `N X 4` where N is the
  number of bounding boxes and the bounding boxes are represented in the
  format `x1 y1 x2 y2`

 returns
 -------

 numpy.ndarray
  Numpy array of shape `N x 8` containing N bounding boxes each described by their
  corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`

 """
 width = (bboxes[:,2] - bboxes[:,0]).reshape(-1,1)
 height = (bboxes[:,3] - bboxes[:,1]).reshape(-1,1)

 x1 = bboxes[:,0].reshape(-1,1)
 y1 = bboxes[:,1].reshape(-1,1)

 x2 = x1 + width
 y2 = y1

 x3 = x1
 y3 = y1 + height

 x4 = bboxes[:,2].reshape(-1,1)
 y4 = bboxes[:,3].reshape(-1,1)

 corners = np.hstack((x1,y1,x2,y2,x3,y3,x4,y4))

 return corners

Define the rotate_box function to rotate the bounding boxes.

def rotate_box(corners,angle, cx, cy, h, w):

 """Rotate the bounding box.

 Parameters
 ----------

 corners : numpy.ndarray
  Numpy array of shape `N x 8` containing N bounding boxes each described by their
  corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`

 angle : float
  angle by which the image is to be rotated

 cx : int
  x coordinate of the center of image (about which the box will be rotated)

 cy : int
  y coordinate of the center of image (about which the box will be rotated)

 h : int
  height of the image

 w : int
  width of the image

 Returns
 -------

 numpy.ndarray
  Numpy array of shape `N x 8` containing N rotated bounding boxes each described by their
  corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`
 """

 corners = corners.reshape(-1,2)
 corners = np.hstack((corners, np.ones((corners.shape[0],1), dtype = type(corners[0][0]))))

 M = cv2.getRotationMatrix2D((cx, cy), angle, 1.0)

 cos = np.abs(M[0, 0])
 sin = np.abs(M[0, 1])

 nW = int((h * sin) + (w * cos))
 nH = int((h * cos) + (w * sin))
 # adjust the rotation matrix to take into account translation
 M[0, 2] += (nW / 2) - cx
 M[1, 2] += (nH / 2) - cy
 # Prepare the vector to be transformed
 calculated = np.dot(M,corners.T).T

 calculated = calculated.reshape(-1,8)

 return calculated

Finally, define the get_enclosing_box function to determine the coordinates of the rotated bounding box in the augmented image.

def get_enclosing_box(corners):
 """Get an enclosing box for ratated corners of a bounding box

 Parameters
 ----------

 corners : numpy.ndarray
  Numpy array of shape `N x 8` containing N bounding boxes each described by their
  corner co-ordinates `x1 y1 x2 y2 x3 y3 x4 y4`

 Returns
 -------

 numpy.ndarray
  Numpy array containing enclosing bounding boxes of shape `N X 4` where N is the
  number of bounding boxes and the bounding boxes are represented in the
  format `x1 y1 x2 y2`

 """
 x_ = corners[:,[0,2,4,6]]
 y_ = corners[:,[1,3,5,7]]

 xmin = np.min(x_,1).reshape(-1,1)
 ymin = np.min(y_,1).reshape(-1,1)
 xmax = np.max(x_,1).reshape(-1,1)
 ymax = np.max(y_,1).reshape(-1,1)

 final = np.hstack((xmin, ymin, xmax, ymax,corners[:,8:]))

 return final

Put it all together in the __call__ function that does the box clipping.

def __call__(self, img, bboxes):

 angle = random.uniform(*self.angle)

 w,h = img.shape[1], img.shape[0]
 cx, cy = w//2, h//2

 img = rotate_im(img, angle)

 corners = get_corners(bboxes)

 corners = np.hstack((corners, bboxes[:,4:]))

 corners[:,:8] = rotate_box(corners[:,:8], angle, cx, cy, h, w)

 new_bbox = get_enclosing_box(corners)

 scale_factor_x = img.shape[1] / w

 scale_factor_y = img.shape[0] / h

 img = cv2.resize(img, (w,h))

 new_bbox[:,:4] /= [scale_factor_x, scale_factor_y, scale_factor_x, scale_factor_y]

 bboxes = new_bbox

 bboxes = clip_box(bboxes, [0,0,w, h], 0.25)

 return img, bboxes

Add Variety with Bounding Box Shearing Data Augmentation

Shearing transforms a rectangular image into a parallelogram. The transformation matrix for horizontal shear is.

Implementing Horizontal Shear

Shearing changes the x-coordinates based on the equation x = x + alpha*y. The __init__ function defines the shear factor.

class RandomShear(object):
 """Randomly shears an image in horizontal direction


 Bounding boxes which have an area of less than 25% in the remaining in the
 transformed image is dropped. The resolution is maintained, and the remaining
 area if any is filled by black color.

 Parameters
 ----------
 shear_factor: float or tuple(float)
  if **float**, the image is sheared horizontally by a factor drawn
  randomly from a range (-`shear_factor`, `shear_factor`). If **tuple**,
  the `shear_factor` is drawn randomly from values specified by the
  tuple

 Returns
 -------

 numpy.ndaaray
  Sheared image in the numpy format of shape `HxWxC`

 numpy.ndarray
  Tranformed bounding box co-ordinates of the format `n x 4` where n is
  number of bounding boxes and 4 represents `x1,y1,x2,y2` of the box

 """

 def __init__(self, shear_factor = 0.2):
  self.shear_factor = shear_factor

  if type(self.shear_factor) == tuple:
  assert len(self.shear_factor) == 2, "Invalid range for scaling factor"
  else:
  self.shear_factor = (-self.shear_factor, self.shear_factor)

  shear_factor = random.uniform(*self.shear_factor)

Shearing Augmentation Logic

The __call__ function applies the horizontal shear transformation.

def __call__(self, img, bboxes):

 shear_factor = random.uniform(*self.shear_factor)

 w,h = img.shape[1], img.shape[0]

 if shear_factor < 0:
  img, bboxes = HorizontalFlip()(img, bboxes)

 M = np.array([[1, abs(shear_factor), 0],[0,1,0]])

 nW = img.shape[1] + abs(shear_factor*img.shape[0])

 bboxes[:,[0,2]] += ((bboxes[:,[1,3]]) * abs(shear_factor) ).astype(int)

 img = cv2.warpAffine(img, M, (int(nW), img.shape[0]))

 if shear_factor < 0:
  img, bboxes = HorizontalFlip()(img, bboxes)

 img = img[:,:w]

 bboxes[:,[0,2]] = bboxes[:,[0,2]].clip(0,w)
 bboxes[:,[1,3]] = bboxes[:,[1,3]].clip(0,h)

 return img, bboxes

Bounding Box Data Augmentation: Rotate and Shear Images for Object Detection

Why Data Augmentation Matters for Object Detection

Source Code

Unleash Powerful Image Rotation for Data Augmentation

Understanding Affine Transformations

Rotating Images with OpenCV

Preventing Image Cropping During Rotation

Rotating Bounding Boxes

Add Variety with Bounding Box Shearing Data Augmentation

Implementing Horizontal Shear

Shearing Augmentation Logic

Elevate your Object Detection Models

Bounding Box Data Augmentation: Rotate and Shear Images for Object Detection

Why Data Augmentation Matters for Object Detection

Source Code

Unleash Powerful Image Rotation for Data Augmentation

Understanding Affine Transformations

Rotating Images with OpenCV

Preventing Image Cropping During Rotation

Rotating Bounding Boxes

Add Variety with Bounding Box Shearing Data Augmentation

Implementing Horizontal Shear

Shearing Augmentation Logic

Elevate your Object Detection Models

Related Posts