Master OpenAI Gym: Build Custom Environments for AI Training

Want to push the boundaries of reinforcement learning? Learn how to create custom OpenAI Gym environments and tailor your training to specific challenges. This tutorial guides you through building a ChopperScape environment, inspired by the classic Dino Run game, where an agent learns to fly a helicopter and avoid obstacles.

Why Custom OpenAI Gym Environments?

OpenAI Gym provides many pre-built environments, but sometimes you need something specific. Creating your own environment lets you:

Control the complexity and dynamics of your training task.
Simulate real-world scenarios more accurately.
Create truly novel and challenging learning experiences.

Prerequisites

Before diving in, make sure you have:

Basic Python knowledge.
OpenAI Gym installed (pip install gym).

Initial Setup: Dependencies and Imports

Let's begin by installing the necessary packages for building our environment:

!pip install opencv-python
!pip install pillow

Now, import the required libraries:

import numpy as np
import cv2
import matplotlib.pyplot as plt
import PIL.Image as Image
import gym
import random

from gym import Env, spaces
import time

font = cv2.FONT_HERSHEY_COMPLEX_SMALL

Defining the `ChopperScape` Environment: A Bird's-Eye View

Imagine a game where a helicopter (the "Chopper") must navigate a landscape, avoiding birds and collecting fuel tanks.

Key elements of the game:

Objective: Fly the Chopper as far as possible to maximize reward.
Hazards: Birds that must be avoided. Crashing ends the episode.
Resources: Fuel tanks replenish the Chopper's fuel supply. Running out of fuel also ends the episode.

Defining the observation space and action space are crucial:

Observation Space: What information does the agent have access to? (e.g., the game screen as pixel data)
Action Space: What actions can the agent take? (e.g., move left, right, up, down).

Structuring the `ChopperScape` Class

Let's define the ChopperScape class within which you'll define initial parameters of the game and create an environment.

class ChopperScape(Env):
    def __init__(self):
        super(ChopperScape, self).__init__()

        # 2-D observation space: Height x Width x Color channels
        self.observation_shape = (600, 800, 3)
        self.observation_space = spaces.Box(low=np.zeros(self.observation_shape),
                                            high=np.ones(self.observation_shape),
                                            dtype=np.float16)

        # Action space: 6 discrete actions
        self.action_space = spaces.Discrete(6,)

        # Canvas for rendering the environment
        self.canvas = np.ones(self.observation_shape) * 1

        # Active elements: chopper, fuel tanks, birds, etc.
        self.elements = []

        # Fuel parameters
        self.max_fuel = 1000

        # Restricting helicopter movement inside the screen
        self.y_min = int(self.observation_shape[0] * 0.1)
        self.x_min = 0
        self.y_max = int(self.observation_shape[0] * 0.9)
        self.x_max = self.observation_shape[1]

Key attributes of the __init__ function:

observation_shape: Defines the dimensions of the game screen (height, width, color channels).
observation_space: Specifies that the agent receives a visual input (using spaces.Box).
action_space: Defines the valid actions for the agent (using spaces.Discrete).
canvas: The image array where the game is rendered.
elements: A list to store all dynamic elements within the game (Chopper, birds, fuel).
max_fuel: Defines the maximum permissible fuel value for the chopper class object.
x_min, y_min, x_max, y_max: Permissible area of helicper (Chopper) to be

Building Blocks: Environment Elements

We need classes to represent the objects in our game: Chopper, Bird, and Fuel. These classes inherit from a base class called Point.

class Point(object):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        self.x = 0
        self.y = 0
        self.x_min = x_min
        self.x_max = x_max
        self.y_min = y_min
        self.y_max = y_max
        self.name = name

    def set_position(self, x, y):
        self.x = self.clamp(x, self.x_min, self.x_max - self.icon_w)
        self.y = self.clamp(y, self.y_min, self.y_max - self.icon_h)

    def get_position(self):
        return (self.x, self.y)

    def move(self, del_x, del_y):
        self.x += del_x
        self.y += del_y

        self.x = self.clamp(self.x, self.x_min, self.x_max - self.icon_w)
        self.y = self.clamp(self.y, self.y_min, self.y_max - self.icon_h)

    def clamp(self, n, minn, maxn):
        return max(min(maxn, n), minn)

class Chopper(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Chopper, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("chopper.png") / 255.0
        self.icon_w = 64
        self.icon_h = 64
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))


class Bird(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Bird, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("bird.png") / 255.0
        self.icon_w = 32
        self.icon_h = 32
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))


class Fuel(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Fuel, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("fuel.png") / 255.0
        self.icon_w = 32
        self.icon_h = 32
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

Resetting the Environment: The `reset` Function

The reset() function initializes the environment:

Resets fuel levels, score, and element positions.
Places the Chopper in a random starting location.

def draw_elements_on_canvas(self):
        # Init the canvas
        self.canvas = np.ones(self.observation_shape) * 1

        # Draw the heliopter on canvas
        for elem in self.elements:
            elem_shape = elem.icon.shape
            x, y = elem.x, elem.y
            self.canvas[y: y + elem_shape[1], x:x + elem_shape[0]] = elem.icon

        text = 'Fuel Left: {} | Rewards: {}'.format(self.fuel_left, self.ep_return)

        # Put the info on canvas
        self.canvas = cv2.putText(self.canvas, text, (10, 20), font,
                                   0.8, (0, 0, 0), 1, cv2.LINE_AA)


def reset(self):
        # Reset the fuel consumed
        self.fuel_left = self.max_fuel

        # Reset the reward
        self.ep_return = 0

        # Number of birds
        self.bird_count = 0
        self.fuel_count = 0

        # Determine a place to intialise the chopper in
        x = random.randrange(int(self.observation_shape[0] * 0.05), int(self.observation_shape[0] * 0.10))
        y = random.randrange(int(self.observation_shape[1] * 0.15), int(self.observation_shape[1] * 0.20))

        # Intialise the chopper
        self.chopper = Chopper("chopper", self.x_max, self.x_min, self.y_max, self.y_min)
        self.chopper.set_position(x, y)

        # Intialise the elements
        self.elements = [self.chopper]

        # Reset the Canvas
        self.canvas = np.ones(self.observation_shape) * 1

        # Draw elements on the canvas
        self.draw_elements_on_canvas()

        # return the observation
        return self.canvas

Showing the environment

Let's instantiate the environment and render it.

env = ChopperScape()
obs = env.reset()
plt.imshow(obs)

Rendering the Environment: The `render` Function

The render() function displays the game state:

human mode: Opens a pop-up window to visualize the game.
rgb_array mode: Returns the frame as a pixel array, useful for recording gameplay.

def render(self, mode="human"):
    assert mode in ["human", "rgb_array"], "Invalid mode, must be either \"human\" or \"rgb_array\""
    if mode == "human":
        cv2.imshow("Game", self.canvas)
        cv2.waitKey(10)

    elif mode == "rgb_array":
        return self.canvas


def close(self):
    cv2.destroyAllWindows()


env = ChopperScape()
obs = env.reset()
screen = env.render(mode="rgb_array")
plt.imshow(screen)

Executing Actions: The `step` Function, Part 1

The step() function is the core of the environment, simulating a single time step:

Apply actions to the agent (Chopper): Move the Chopper based on the selected action.
Update the environment: Spawn birds, fuel tanks, check for collisions, and update the score. It returns four values:
- Observation: The updated game screen.
- Reward: A numerical value indicating the agent's performance.
- Done: A boolean indicating whether the episode has ended.
- Info: Additional information (e.g., debugging data).

def get_action_meanings(self):
    return {0: "Right", 1: "Left", 2: "Down", 3: "Up", 4: "Do Nothing"}


# Assert that it is a valid action
assert self.action_space.contains(action), "Invalid Action"

# apply the action to the chopper
if action == 0:
    self.chopper.move(0, 5)
elif action == 1:
    self.chopper.move(0, -5)
elif action == 2:
    self.chopper.move(5, 0)
elif action == 3:
    self.chopper.move(-5, 0)
elif action == 4:
    self.chopper.move(0, 0)

Next Steps

The tutorial provides the foundation for building a more complex custom environment. The next steps would involve completing the step() function, adding the spawning and movement logic for birds and fuel tanks, implementing collision detection, and designing a suitable reward function.