Master OpenAI Gym: Build Custom AI Environments for Breakthrough Results

Ready to take your AI skills to the next level? Learn how to create totally custom OpenAI Gym environments and train agents to solve unique challenges! This tutorial dives into building your own game-like environment, using the popular OpenAI Gym framework.

Why Build Custom OpenAI Gym Environments?

While OpenAI Gym offers fantastic pre-built environments, sometimes you need something tailored. Building custom environments allows you to:

Tackle Industry-Specific Problems: Simulate real-world scenarios in robotics, finance, or logistics.
Research Novel Algorithms: Design testbeds to evaluate new reinforcement learning approaches.
Gain Deeper Understanding: Master the inner workings of reinforcement learning by building from scratch.

Prerequisites: Get Ready to Code

Before diving in, make sure you have this in place:

Python: A working Python installation. Some basic Python knowledge is good to have.
OpenAI Gym: Install the OpenAI Gym package using pip install gym.

Essential Dependencies: Importing the Right Tools

Let's import the necessary libraries. These give us image manipulation, environment creation, and number crunching power:

import numpy as np
import cv2
import matplotlib.pyplot as plt
import PIL.Image as Image
import gym
import random

from gym import Env, spaces
import time

font = cv2.FONT_HERSHEY_COMPLEX_SMALL

Designing Your Environment: ChopperScape Game

We'll craft a "ChopperScape" environment, inspired by the classic Chrome Dino Run game:

The Goal: A chopper pilot must navigate obstacles (birds) and collect fuel tanks to maximize travel distance and score.
Game Over: The episode ends if the chopper hits a bird or runs out of fuel.
Fuel: Collecting floating fuel tanks refills the chopper to its maximum fuel capacity (1000L).

This example prioritizes learning. It's not about perfect graphics but about understanding how to structure a custom environment.

Defining Observation and Action Spaces: Key Decisions

Before coding, decide how your agent will perceive the world and what actions it can take:

Observation Space: Can be continuous (real-valued coordinates) or discrete (like cells in a grid).
Action Space: Can also be continuous (like stretching a slingshot in Angry Birds) or discrete (move left/right/jump in Mario).

In our game, the observation space will be the game screen (image), and the action space will be discrete (move up, down, left, right, idle).

Building Blocks: The `ChopperScape` Class

Let's create the core class for our environment, ChopperScape:

class ChopperScape(Env):
 def __init__(self):
 super(ChopperScape, self).__init__()

 # Define a 2-D observation space
 self.observation_shape = (600, 800, 3)
 self.observation_space = spaces.Box(low = np.zeros(self.observation_shape),
 high = np.ones(self.observation_shape),
 dtype = np.float16)

 # Define an action space ranging from 0 to 4
 self.action_space = spaces.Discrete(6,)

 # Create a canvas to render the environment images upon
 self.canvas = np.ones(self.observation_shape) * 1

 # Define elements present inside the environment
 self.elements = []

 # Maximum fuel chopper can take at once
 self.max_fuel = 1000

 # Permissible area of helicper to be
 self.y_min = int (self.observation_shape[0] * 0.1)
 self.x_min = 0
 self.y_max = int (self.observation_shape[0] * 0.9)
 self.x_max = self.observation_shape[1]

observation_space: Defines the image size (600x800 pixels) and color channels (RGB).
action_space: Allows six discrete actions.
canvas: Represents the game screen.
elements: List to store objects like the chopper, birds, and fuel tanks.
max_fuel: The chopper's initial fuel capacity.

Representing Game Objects: `Point`, `Chopper`, `Bird`, and `Fuel` Classes

We'll create classes for the objects in our game:

Point (Base Class): Represents a generic point on the screen with (x, y) coordinates and boundaries.
Chopper (Derived): The player-controlled aircraft, with an image icon.
Bird (Derived): Obstacles that the chopper must avoid.
Fuel (Derived): Collectible items to replenish the chopper's fuel.

Code for Point class:

class Point(object):
 def __init__(self, name, x_max, x_min, y_max, y_min):
 self.x = 0
 self.y = 0
 self.x_min = x_min
 self.x_max = x_max
 self.y_min = y_min
 self.y_max = y_max
 self.name = name

 def set_position(self, x, y):
 self.x = self.clamp(x, self.x_min, self.x_max - self.icon_w)
 self.y = self.clamp(y, self.y_min, self.y_max - self.icon_h)

 def get_position(self):
 return (self.x, self.y)

 def move(self, del_x, del_y):
 self.x += del_x
 self.y += del_y

 self.x = self.clamp(self.x, self.x_min, self.x_max - self.icon_w)
 self.y = self.clamp(self.y, self.y_min, self.y_max - self.icon_h)

 def clamp(self, n, minn, maxn):
 return max(min(maxn, n), minn)

Code for Chopper, Bird and Fuel class:

class Chopper(Point):
 def __init__(self, name, x_max, x_min, y_max, y_min):
 super(Chopper, self).__init__(name, x_max, x_min, y_max, y_min)
 self.icon = cv2.imread("chopper.png") / 255.0
 self.icon_w = 64
 self.icon_h = 64
 self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

class Bird(Point):
 def __init__(self, name, x_max, x_min, y_max, y_min):
 super(Bird, self).__init__(name, x_max, x_min, y_max, y_min)
 self.icon = cv2.imread("bird.png") / 255.0
 self.icon_w = 32
 self.icon_h = 32
 self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

class Fuel(Point):
 def __init__(self, name, x_max, x_min, y_max, y_min):
 super(Fuel, self).__init__(name, x_max, x_min, y_max, y_min)
 self.icon = cv2.imread("fuel.png") / 255.0
 self.icon_w = 32
 self.icon_h = 32
 self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

Important: Replace "chopper.png", "bird.png", and "fuel.png" with the actual paths to your image files.

`reset()` Function: Starting Fresh

The reset() function initializes the environment:

Resets fuel, score, and element lists.
Places the chopper in a random starting position.
Draws all elements onto the canvas.
Returns the initial observation (the game screen). You can create exciting OpenAI gym examples if you know how to reset the environment.

Code of reset function:

def draw_elements_on_canvas(self):
 # Init the canvas
 self.canvas = np.ones(self.observation_shape) * 1

 # Draw the heliopter on canvas
 for elem in self.elements:
 elem_shape = elem.icon.shape
 x,y = elem.x, elem.y
 self.canvas[y : y + elem_shape[1], x:x + elem_shape[0]] = elem.icon

 text = 'Fuel Left: {} | Rewards: {}'.format(self.fuel_left, self.ep_return)

 # Put the info on canvas
 self.canvas = cv2.putText(self.canvas, text, (10,20), font,
 0.8, (0,0,0), 1, cv2.LINE_AA)

def reset(self):
 # Reset the fuel consumed
 self.fuel_left = self.max_fuel

 # Reset the reward
 self.ep_return = 0

 # Number of birds
 self.bird_count = 0
 self.fuel_count = 0

 # Determine a place to intialise the chopper in
 x = random.randrange(int(self.observation_shape[0] * 0.05), int(self.observation_shape[0] * 0.10))
 y = random.randrange(int(self.observation_shape[1] * 0.15), int(self.observation_shape[1] * 0.20))

 # Intialise the chopper
 self.chopper = Chopper("chopper", self.x_max, self.x_min, self.y_max, self.y_min)
 self.chopper.set_position(x,y)

 # Intialise the elements
 self.elements = [self.chopper]

 # Reset the Canvas
 self.canvas = np.ones(self.observation_shape) * 1

 # Draw elements on the canvas
 self.draw_elements_on_canvas()

 # return the observation
 return self.canvas

Let's see how the game looks with reset function

env = ChopperScape()
obs = env.reset()
plt.imshow(obs)

`render()` Function: Visualizing the Game

The render() function displays the game screen:

human mode: Shows the game in a pop-up window.
rgb_array mode: Returns the game screen as a pixel array (useful for recording videos).

Code for the render function:

def render(self, mode = "human"):
 assert mode in ["human", "rgb_array"], "Invalid mode, must be either \"human\" or \"rgb_array\""
 if mode == "human":
 cv2.imshow("Game", self.canvas)
 cv2.waitKey(10)

 elif mode == "rgb_array":
 return self.canvas

def close(self):
 cv2.destroyAllWindows()

env = ChopperScape()
obs = env.reset()
screen = env.render(mode = "rgb_array")
plt.imshow(screen)

`step()` Function: The Heart of the Environment

The step() function is the most critical part. It defines how the environment changes after each action:

Apply the Action: Move the chopper based on the chosen action (up, down, left, right, or do nothing).
Update Environment:
- Spawn birds randomly from the right.
- Spawn fuel tanks randomly from the bottom.
- Move birds to the left.
- Move fuel tanks upwards.
Check for collisions (bird hits chopper, fuel tank collected).
End the episode if a collision occurs or fuel runs out.
Calculate the reward (based on distance traveled).
Return the new observation, reward, done (episode finished), and info (empty dictionary for now).

Key Takeaways for OpenAI Gym Custom Environments

Modular Design: Break down your environment into classes representing different objects and functionalities.
Clear State Representation: Define observation spaces that provide relevant information to the agent.
Reward Shaping: Carefully design the reward function to encourage desired behavior.
Test Thoroughly: Validate your environment to catch bugs and inconsistencies.

Master OpenAI Gym: Build Custom AI Environments for Breakthrough Results

Why Build Custom OpenAI Gym Environments?

While OpenAI Gym offers fantastic pre-built environments, sometimes you need something tailored. Building custom environments allows you to:

Tackle Industry-Specific Problems: Simulate real-world scenarios in robotics, finance, or logistics.
Research Novel Algorithms: Design testbeds to evaluate new reinforcement learning approaches.
Gain Deeper Understanding: Master the inner workings of reinforcement learning by building from scratch.

Prerequisites: Get Ready to Code

Before diving in, make sure you have this in place:

Python: A working Python installation. Some basic Python knowledge is good to have.
OpenAI Gym: Install the OpenAI Gym package using pip install gym.

Essential Dependencies: Importing the Right Tools

Let's import the necessary libraries. These give us image manipulation, environment creation, and number crunching power:

import numpy as np
import cv2
import matplotlib.pyplot as plt
import PIL.Image as Image
import gym
import random

from gym import Env, spaces
import time

font = cv2.FONT_HERSHEY_COMPLEX_SMALL

Designing Your Environment: ChopperScape Game

We'll craft a "ChopperScape" environment, inspired by the classic Chrome Dino Run game:

The Goal: A chopper pilot must navigate obstacles (birds) and collect fuel tanks to maximize travel distance and score.
Game Over: The episode ends if the chopper hits a bird or runs out of fuel.
Fuel: Collecting floating fuel tanks refills the chopper to its maximum fuel capacity (1000L).

This example prioritizes learning. It's not about perfect graphics but about understanding how to structure a custom environment.

Defining Observation and Action Spaces: Key Decisions

Before coding, decide how your agent will perceive the world and what actions it can take:

Observation Space: Can be continuous (real-valued coordinates) or discrete (like cells in a grid).
Action Space: Can also be continuous (like stretching a slingshot in Angry Birds) or discrete (move left/right/jump in Mario).

In our game, the observation space will be the game screen (image), and the action space will be discrete (move up, down, left, right, idle).

Building Blocks: The `ChopperScape` Class

Let's create the core class for our environment, ChopperScape:

class ChopperScape(Env):
 def __init__(self):
 super(ChopperScape, self).__init__()

 # Define a 2-D observation space
 self.observation_shape = (600, 800, 3)
 self.observation_space = spaces.Box(low = np.zeros(self.observation_shape),
 high = np.ones(self.observation_shape),
 dtype = np.float16)

 # Define an action space ranging from 0 to 4
 self.action_space = spaces.Discrete(6,)

 # Create a canvas to render the environment images upon
 self.canvas = np.ones(self.observation_shape) * 1

 # Define elements present inside the environment
 self.elements = []

 # Maximum fuel chopper can take at once
 self.max_fuel = 1000

 # Permissible area of helicper to be
 self.y_min = int (self.observation_shape[0] * 0.1)
 self.x_min = 0
 self.y_max = int (self.observation_shape[0] * 0.9)
 self.x_max = self.observation_shape[1]

observation_space: Defines the image size (600x800 pixels) and color channels (RGB).
action_space: Allows six discrete actions.
canvas: Represents the game screen.
elements: List to store objects like the chopper, birds, and fuel tanks.
max_fuel: The chopper's initial fuel capacity.

Representing Game Objects: `Point`, `Chopper`, `Bird`, and `Fuel` Classes

We'll create classes for the objects in our game:

Point (Base Class): Represents a generic point on the screen with (x, y) coordinates and boundaries.
Chopper (Derived): The player-controlled aircraft, with an image icon.
Bird (Derived): Obstacles that the chopper must avoid.
Fuel (Derived): Collectible items to replenish the chopper's fuel.

Code for Point class:

class Point(object):
 def __init__(self, name, x_max, x_min, y_max, y_min):
 self.x = 0
 self.y = 0
 self.x_min = x_min
 self.x_max = x_max
 self.y_min = y_min
 self.y_max = y_max
 self.name = name

 def set_position(self, x, y):
 self.x = self.clamp(x, self.x_min, self.x_max - self.icon_w)
 self.y = self.clamp(y, self.y_min, self.y_max - self.icon_h)

 def get_position(self):
 return (self.x, self.y)

 def move(self, del_x, del_y):
 self.x += del_x
 self.y += del_y

 self.x = self.clamp(self.x, self.x_min, self.x_max - self.icon_w)
 self.y = self.clamp(self.y, self.y_min, self.y_max - self.icon_h)

 def clamp(self, n, minn, maxn):
 return max(min(maxn, n), minn)

Code for Chopper, Bird and Fuel class:

class Chopper(Point):
 def __init__(self, name, x_max, x_min, y_max, y_min):
 super(Chopper, self).__init__(name, x_max, x_min, y_max, y_min)
 self.icon = cv2.imread("chopper.png") / 255.0
 self.icon_w = 64
 self.icon_h = 64
 self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

class Bird(Point):
 def __init__(self, name, x_max, x_min, y_max, y_min):
 super(Bird, self).__init__(name, x_max, x_min, y_max, y_min)
 self.icon = cv2.imread("bird.png") / 255.0
 self.icon_w = 32
 self.icon_h = 32
 self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

class Fuel(Point):
 def __init__(self, name, x_max, x_min, y_max, y_min):
 super(Fuel, self).__init__(name, x_max, x_min, y_max, y_min)
 self.icon = cv2.imread("fuel.png") / 255.0
 self.icon_w = 32
 self.icon_h = 32
 self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

Important: Replace "chopper.png", "bird.png", and "fuel.png" with the actual paths to your image files.

`reset()` Function: Starting Fresh

The reset() function initializes the environment:

Resets fuel, score, and element lists.
Places the chopper in a random starting position.
Draws all elements onto the canvas.
Returns the initial observation (the game screen). You can create exciting OpenAI gym examples if you know how to reset the environment.

Code of reset function:

def draw_elements_on_canvas(self):
 # Init the canvas
 self.canvas = np.ones(self.observation_shape) * 1

 # Draw the heliopter on canvas
 for elem in self.elements:
 elem_shape = elem.icon.shape
 x,y = elem.x, elem.y
 self.canvas[y : y + elem_shape[1], x:x + elem_shape[0]] = elem.icon

 text = 'Fuel Left: {} | Rewards: {}'.format(self.fuel_left, self.ep_return)

 # Put the info on canvas
 self.canvas = cv2.putText(self.canvas, text, (10,20), font,
 0.8, (0,0,0), 1, cv2.LINE_AA)

def reset(self):
 # Reset the fuel consumed
 self.fuel_left = self.max_fuel

 # Reset the reward
 self.ep_return = 0

 # Number of birds
 self.bird_count = 0
 self.fuel_count = 0

 # Determine a place to intialise the chopper in
 x = random.randrange(int(self.observation_shape[0] * 0.05), int(self.observation_shape[0] * 0.10))
 y = random.randrange(int(self.observation_shape[1] * 0.15), int(self.observation_shape[1] * 0.20))

 # Intialise the chopper
 self.chopper = Chopper("chopper", self.x_max, self.x_min, self.y_max, self.y_min)
 self.chopper.set_position(x,y)

 # Intialise the elements
 self.elements = [self.chopper]

 # Reset the Canvas
 self.canvas = np.ones(self.observation_shape) * 1

 # Draw elements on the canvas
 self.draw_elements_on_canvas()

 # return the observation
 return self.canvas

Let's see how the game looks with reset function

env = ChopperScape()
obs = env.reset()
plt.imshow(obs)

`render()` Function: Visualizing the Game

The render() function displays the game screen:

human mode: Shows the game in a pop-up window.
rgb_array mode: Returns the game screen as a pixel array (useful for recording videos).

Code for the render function:

def render(self, mode = "human"):
 assert mode in ["human", "rgb_array"], "Invalid mode, must be either \"human\" or \"rgb_array\""
 if mode == "human":
 cv2.imshow("Game", self.canvas)
 cv2.waitKey(10)

 elif mode == "rgb_array":
 return self.canvas

def close(self):
 cv2.destroyAllWindows()

env = ChopperScape()
obs = env.reset()
screen = env.render(mode = "rgb_array")
plt.imshow(screen)

`step()` Function: The Heart of the Environment

The step() function is the most critical part. It defines how the environment changes after each action:

Apply the Action: Move the chopper based on the chosen action (up, down, left, right, or do nothing).
Update Environment:
- Spawn birds randomly from the right.
- Spawn fuel tanks randomly from the bottom.
- Move birds to the left.
- Move fuel tanks upwards.
Check for collisions (bird hits chopper, fuel tank collected).
End the episode if a collision occurs or fuel runs out.
Calculate the reward (based on distance traveled).
Return the new observation, reward, done (episode finished), and info (empty dictionary for now).

Key Takeaways for OpenAI Gym Custom Environments

Modular Design: Break down your environment into classes representing different objects and functionalities.
Clear State Representation: Define observation spaces that provide relevant information to the agent.
Reward Shaping: Carefully design the reward function to encourage desired behavior.
Test Thoroughly: Validate your environment to catch bugs and inconsistencies.

Master OpenAI Gym: Build Custom AI Environments for Breakthrough Results

Why Build Custom OpenAI Gym Environments?

Prerequisites: Get Ready to Code

Essential Dependencies: Importing the Right Tools

Designing Your Environment: ChopperScape Game

Defining Observation and Action Spaces: Key Decisions

Building Blocks: The ChopperScape Class

Representing Game Objects: Point, Chopper, Bird, and Fuel Classes

reset() Function: Starting Fresh

render() Function: Visualizing the Game

step() Function: The Heart of the Environment

Key Takeaways for OpenAI Gym Custom Environments

Master OpenAI Gym: Build Custom AI Environments for Breakthrough Results

Why Build Custom OpenAI Gym Environments?

Prerequisites: Get Ready to Code

Essential Dependencies: Importing the Right Tools

Designing Your Environment: ChopperScape Game

Defining Observation and Action Spaces: Key Decisions

Building Blocks: The ChopperScape Class

Representing Game Objects: Point, Chopper, Bird, and Fuel Classes

reset() Function: Starting Fresh

render() Function: Visualizing the Game

step() Function: The Heart of the Environment

Key Takeaways for OpenAI Gym Custom Environments

Related Posts

Building Blocks: The `ChopperScape` Class

Representing Game Objects: `Point`, `Chopper`, `Bird`, and `Fuel` Classes

`reset()` Function: Starting Fresh

`render()` Function: Visualizing the Game

`step()` Function: The Heart of the Environment

Building Blocks: The `ChopperScape` Class

Representing Game Objects: `Point`, `Chopper`, `Bird`, and `Fuel` Classes

`reset()` Function: Starting Fresh

`render()` Function: Visualizing the Game

`step()` Function: The Heart of the Environment