Build a Custom OpenAI Gym Environment: The Ultimate Guide

Tired of the same old OpenAI Gym environments? Learn how to create your own custom environment and tailor it to your specific AI/ML project. This tutorial provides a step-by-step guide on building a custom OpenAI Gym environment from scratch, complete with code examples and clear explanations.

Why Build a Custom OpenAI Gym Environment?

Tailored Training: Design environments that perfectly match your specific problem, optimizing training for your unique agent.
Realistic Simulations: Create more complex and realistic simulations, pushing the boundaries of your AI/ML models.
Unique Challenges: Introduce novel challenges to test your agent's capabilities and drive innovation.

Prerequisites: Your Toolkit for Success

Before diving in, make sure you have the following:

Python: Basic Python coding knowledge is essential.
OpenAI Gym: Install the OpenAI Gym package using pip install gym.
Dependencies: Install opencv-python and pillow using pip install opencv-python pillow.

Setting Up the Foundation: Essential Imports

Let's start by importing the necessary libraries:

import numpy as np
import cv2
import matplotlib.pyplot as plt
import PIL.Image as Image
import gym
import random

from gym import Env, spaces
import time

font = cv2.FONT_HERSHEY_COMPLEX_SMALL

Concept: Dino Run-Inspired Environment

In this example, we'll create a "ChopperScape" environment, inspired by the classic Dino Run game. The agent (a chopper) must navigate the environment, avoiding birds and collecting fuel tanks to maximize its reward.

Defining the Action Space & Observation Space: Understanding the Game's Rules

The first step in building any OpenAI Gym environment is to define the action space and observation space.

Observation Space: This defines what the agent "sees." It can be continuous (real-valued coordinates) or discrete (e.g., a grid).
Action Space: This defines what the agent "can do." It can also be continuous (e.g., stretching a slingshot) or discrete (e.g., moving left, right, or jumping).

Building the `ChopperScape` Class: The Heart of Your Environment

Let's create the ChopperScape class, which will inherit from the gym.Env class:

class ChopperScape(Env):
    def __init__(self):
        super(ChopperScape, self).__init__()

        # Define a 2-D observation space
        self.observation_shape = (600, 800, 3)
        self.observation_space = spaces.Box(low = np.zeros(self.observation_shape),
                            high = np.ones(self.observation_shape),
                            dtype = np.float16)

        # Define an action space ranging from 0 to 4
        self.action_space = spaces.Discrete(6,)

        # Create a canvas to render the environment images upon
        self.canvas = np.ones(self.observation_shape) * 1

        # Define elements present inside the environment
        self.elements = []

        # Maximum fuel chopper can take at once
        self.max_fuel = 1000

        # Permissible area of helicper to be
        self.y_min = int (self.observation_shape[0] * 0.1)
        self.x_min = 0
        self.y_max = int (self.observation_shape[0] * 0.9)
        self.x_max = self.observation_shape[1]

This code defines the observation space as a 600x800 RGB image and the action space as a discrete space with six possible actions.

Populating the Environment: Introducing Elements

Our environment will consist of three key elements: the Chopper, Birds, and Fuel Tanks. To manage position and attributes effectively, all elements will inherit from a base class Point.

The `Point` Base Class

class Point(object):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        self.x = 0
        self.y = 0
        self.x_min = x_min
        self.x_max = x_max
        self.y_min = y_min
        self.y_max = y_max
        self.name = name

    def set_position(self, x, y):
        self.x = self.clamp(x, self.x_min, self.x_max - self.icon_w)
        self.y = self.clamp(y, self.y_min, self.y_max - self.icon_h)

    def get_position(self):
        return (self.x, self.y)

    def move(self, del_x, del_y):
        self.x += del_x
        self.y += del_y

        self.x = self.clamp(self.x, self.x_min, self.x_max - self.icon_w)
        self.y = self.clamp(self.y, self.y_min, self.y_max - self.icon_h)

    def clamp(self, n, minn, maxn):
        return max(min(maxn, n), minn)

Child Classes

The following classes inherit from the Point class, and introduce a set of new attributes:

icon: Icon of the point that will display on the observation image when the game is rendered.
(icon_w, icon_h): Dimensions of the icon.

class Chopper(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Chopper, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("chopper.png") / 255.0
        self.icon_w = 64
        self.icon_h = 64
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

class Bird(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Bird, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("bird.png") / 255.0
        self.icon_w = 32
        self.icon_h = 32
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

class Fuel(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Fuel, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("fuel.png") / 255.0
        self.icon_w = 32
        self.icon_h = 32
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

Essential Functions: `reset` and `step`

Every OpenAI Gym environment needs two core functions:

reset(): Resets the environment to its initial state.
step(action): Takes an action, updates the environment, and returns the new observation, reward, done flag, and additional information.

The `reset` Function: Starting Fresh

The reset function initializes the environment:

def draw_elements_on_canvas(self):
    # Init the canvas
    self.canvas = np.ones(self.observation_shape) * 1

    # Draw the heliopter on canvas
    for elem in self.elements:
        elem_shape = elem.icon.shape
        x,y = elem.x, elem.y
        self.canvas[y : y + elem_shape[1], x:x + elem_shape[0]] = elem.icon

    text = 'Fuel Left: {} | Rewards: {}'.format(self.fuel_left, self.ep_return)

    # Put the info on canvas
    self.canvas = cv2.putText(self.canvas, text, (10,20), font,
                            0.8, (0,0,0), 1, cv2.LINE_AA)

def reset(self):
    # Reset the fuel consumed
    self.fuel_left = self.max_fuel

    # Reset the reward
    self.ep_return = 0

    # Number of birds
    self.bird_count = 0
    self.fuel_count = 0

    # Determine a place to intialise the chopper in
    x = random.randrange(int(self.observation_shape[0] * 0.05), int(self.observation_shape[0] * 0.10))
    y = random.randrange(int(self.observation_shape[1] * 0.15), int(self.observation_shape[1] * 0.20))

    # Intialise the chopper
    self.chopper = Chopper("chopper", self.x_max, self.x_min, self.y_max, self.y_min)
    self.chopper.set_position(x,y)

    # Intialise the elements
    self.elements = [self.chopper]

    # Reset the Canvas
    self.canvas = np.ones(self.observation_shape) * 1

    # Draw elements on the canvas
    self.draw_elements_on_canvas()

    # return the observation
    return self.canvas

The `step` Function: Taking Action and Updating the World

The step function is the core of the environment's logic. It applies the agent's action, updates the environment's state, and calculates the reward. The step function is missing in the original content. Here is a general structure of what it should be.

Applying actions to our agent.
Everything else that happens in the environments, such as behaviour of the non-RL actors (e.g. birds and floating gas stations).

def get_action_meanings(self):
    return {0: "Right", 1: "Left", 2: "Down", 3: "Up", 4: "Do Nothing"}

Render & Close Functions

def render(self, mode = "human"):
    assert mode in ["human", "rgb_array"], "Invalid mode, must be either \"human\" or \"rgb_array\""
    if mode == "human":
        cv2.imshow("Game", self.canvas)
        cv2.waitKey(10)

    elif mode == "rgb_array":
        return self.canvas

def close(self):
    cv2.destroyAllWindows()

Mastering Custom Environments for OpenAI Gym

Creating custom OpenAI Gym environments opens a world of possibilities for tailored AI/ML training. By following this guide and adapting the code to your specific needs, you can build engaging and challenging environments that push the boundaries of your agent's capabilities.

Build a Custom OpenAI Gym Environment: The Ultimate Guide

Why Build a Custom OpenAI Gym Environment?

Tailored Training: Design environments that perfectly match your specific problem, optimizing training for your unique agent.
Realistic Simulations: Create more complex and realistic simulations, pushing the boundaries of your AI/ML models.
Unique Challenges: Introduce novel challenges to test your agent's capabilities and drive innovation.

Prerequisites: Your Toolkit for Success

Before diving in, make sure you have the following:

Python: Basic Python coding knowledge is essential.
OpenAI Gym: Install the OpenAI Gym package using pip install gym.
Dependencies: Install opencv-python and pillow using pip install opencv-python pillow.

Setting Up the Foundation: Essential Imports

Let's start by importing the necessary libraries:

import numpy as np
import cv2
import matplotlib.pyplot as plt
import PIL.Image as Image
import gym
import random

from gym import Env, spaces
import time

font = cv2.FONT_HERSHEY_COMPLEX_SMALL

Concept: Dino Run-Inspired Environment

Defining the Action Space & Observation Space: Understanding the Game's Rules

The first step in building any OpenAI Gym environment is to define the action space and observation space.

Observation Space: This defines what the agent "sees." It can be continuous (real-valued coordinates) or discrete (e.g., a grid).
Action Space: This defines what the agent "can do." It can also be continuous (e.g., stretching a slingshot) or discrete (e.g., moving left, right, or jumping).

Building the `ChopperScape` Class: The Heart of Your Environment

Let's create the ChopperScape class, which will inherit from the gym.Env class:

class ChopperScape(Env):
    def __init__(self):
        super(ChopperScape, self).__init__()

        # Define a 2-D observation space
        self.observation_shape = (600, 800, 3)
        self.observation_space = spaces.Box(low = np.zeros(self.observation_shape),
                            high = np.ones(self.observation_shape),
                            dtype = np.float16)

        # Define an action space ranging from 0 to 4
        self.action_space = spaces.Discrete(6,)

        # Create a canvas to render the environment images upon
        self.canvas = np.ones(self.observation_shape) * 1

        # Define elements present inside the environment
        self.elements = []

        # Maximum fuel chopper can take at once
        self.max_fuel = 1000

        # Permissible area of helicper to be
        self.y_min = int (self.observation_shape[0] * 0.1)
        self.x_min = 0
        self.y_max = int (self.observation_shape[0] * 0.9)
        self.x_max = self.observation_shape[1]

This code defines the observation space as a 600x800 RGB image and the action space as a discrete space with six possible actions.

Populating the Environment: Introducing Elements

Our environment will consist of three key elements: the Chopper, Birds, and Fuel Tanks. To manage position and attributes effectively, all elements will inherit from a base class Point.

The `Point` Base Class

class Point(object):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        self.x = 0
        self.y = 0
        self.x_min = x_min
        self.x_max = x_max
        self.y_min = y_min
        self.y_max = y_max
        self.name = name

    def set_position(self, x, y):
        self.x = self.clamp(x, self.x_min, self.x_max - self.icon_w)
        self.y = self.clamp(y, self.y_min, self.y_max - self.icon_h)

    def get_position(self):
        return (self.x, self.y)

    def move(self, del_x, del_y):
        self.x += del_x
        self.y += del_y

        self.x = self.clamp(self.x, self.x_min, self.x_max - self.icon_w)
        self.y = self.clamp(self.y, self.y_min, self.y_max - self.icon_h)

    def clamp(self, n, minn, maxn):
        return max(min(maxn, n), minn)

Child Classes

The following classes inherit from the Point class, and introduce a set of new attributes:

icon: Icon of the point that will display on the observation image when the game is rendered.
(icon_w, icon_h): Dimensions of the icon.

class Chopper(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Chopper, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("chopper.png") / 255.0
        self.icon_w = 64
        self.icon_h = 64
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

class Bird(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Bird, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("bird.png") / 255.0
        self.icon_w = 32
        self.icon_h = 32
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

class Fuel(Point):
    def __init__(self, name, x_max, x_min, y_max, y_min):
        super(Fuel, self).__init__(name, x_max, x_min, y_max, y_min)
        self.icon = cv2.imread("fuel.png") / 255.0
        self.icon_w = 32
        self.icon_h = 32
        self.icon = cv2.resize(self.icon, (self.icon_h, self.icon_w))

Essential Functions: `reset` and `step`

Every OpenAI Gym environment needs two core functions:

reset(): Resets the environment to its initial state.
step(action): Takes an action, updates the environment, and returns the new observation, reward, done flag, and additional information.

The `reset` Function: Starting Fresh

The reset function initializes the environment:

def draw_elements_on_canvas(self):
    # Init the canvas
    self.canvas = np.ones(self.observation_shape) * 1

    # Draw the heliopter on canvas
    for elem in self.elements:
        elem_shape = elem.icon.shape
        x,y = elem.x, elem.y
        self.canvas[y : y + elem_shape[1], x:x + elem_shape[0]] = elem.icon

    text = 'Fuel Left: {} | Rewards: {}'.format(self.fuel_left, self.ep_return)

    # Put the info on canvas
    self.canvas = cv2.putText(self.canvas, text, (10,20), font,
                            0.8, (0,0,0), 1, cv2.LINE_AA)

def reset(self):
    # Reset the fuel consumed
    self.fuel_left = self.max_fuel

    # Reset the reward
    self.ep_return = 0

    # Number of birds
    self.bird_count = 0
    self.fuel_count = 0

    # Determine a place to intialise the chopper in
    x = random.randrange(int(self.observation_shape[0] * 0.05), int(self.observation_shape[0] * 0.10))
    y = random.randrange(int(self.observation_shape[1] * 0.15), int(self.observation_shape[1] * 0.20))

    # Intialise the chopper
    self.chopper = Chopper("chopper", self.x_max, self.x_min, self.y_max, self.y_min)
    self.chopper.set_position(x,y)

    # Intialise the elements
    self.elements = [self.chopper]

    # Reset the Canvas
    self.canvas = np.ones(self.observation_shape) * 1

    # Draw elements on the canvas
    self.draw_elements_on_canvas()

    # return the observation
    return self.canvas

The `step` Function: Taking Action and Updating the World

Applying actions to our agent.
Everything else that happens in the environments, such as behaviour of the non-RL actors (e.g. birds and floating gas stations).

def get_action_meanings(self):
    return {0: "Right", 1: "Left", 2: "Down", 3: "Up", 4: "Do Nothing"}

Render & Close Functions

def render(self, mode = "human"):
    assert mode in ["human", "rgb_array"], "Invalid mode, must be either \"human\" or \"rgb_array\""
    if mode == "human":
        cv2.imshow("Game", self.canvas)
        cv2.waitKey(10)

    elif mode == "rgb_array":
        return self.canvas

def close(self):
    cv2.destroyAllWindows()

Build a Custom OpenAI Gym Environment: The Ultimate Guide

Why Build a Custom OpenAI Gym Environment?

Prerequisites: Your Toolkit for Success

Setting Up the Foundation: Essential Imports

Concept: Dino Run-Inspired Environment

Defining the Action Space & Observation Space: Understanding the Game's Rules

Building the ChopperScape Class: The Heart of Your Environment

Populating the Environment: Introducing Elements

The Point Base Class

Child Classes

Essential Functions: reset and step

The reset Function: Starting Fresh

The step Function: Taking Action and Updating the World

Render & Close Functions

Mastering Custom Environments for OpenAI Gym

Build a Custom OpenAI Gym Environment: The Ultimate Guide

Why Build a Custom OpenAI Gym Environment?

Prerequisites: Your Toolkit for Success

Setting Up the Foundation: Essential Imports

Concept: Dino Run-Inspired Environment

Defining the Action Space & Observation Space: Understanding the Game's Rules

Building the ChopperScape Class: The Heart of Your Environment

Populating the Environment: Introducing Elements

The Point Base Class

Child Classes

Essential Functions: reset and step

The reset Function: Starting Fresh

The step Function: Taking Action and Updating the World

Render & Close Functions

Mastering Custom Environments for OpenAI Gym

Related Posts

Building the `ChopperScape` Class: The Heart of Your Environment

The `Point` Base Class

Essential Functions: `reset` and `step`

The `reset` Function: Starting Fresh

The `step` Function: Taking Action and Updating the World

Building the `ChopperScape` Class: The Heart of Your Environment

The `Point` Base Class

Essential Functions: `reset` and `step`

The `reset` Function: Starting Fresh

The `step` Function: Taking Action and Updating the World