Benchmarks’s Core

Benchmarks is based on those core objects: Playground, Agent, TurnEnv.

They are all linked by the Playground, as showned by this:

image/svg+xmlPlayground Agents TurnEnv I R D O turn(O) reset() step(A) agents_order set_agents_order() Y N Done ? prev_experience O A R D I O A R D I O A R D I Agent act(O) learn() Memory remember (O, A, R, D, O’, I) Agent act(O) learn() Memory remember (O, A, R, D, O’, I) Agent act(O) learn() Memory remember (O, A, R, D, O’, I) O bservation A ction R eward D one I nfo

Playground

class Playground(environement, agents, agents_order=None)

A playground is used to run interactions between an environement and agent(s)

env

Environement in which the agent(s) will play.

Type

gym.Env

agents

List of agents to play.

Type

list of benchmarks.Agent

A playground is used to run agent(s) on an environement

Parameters
  • env – Environement in which the agent(s) will play.

  • agents (Union[Agent, List[Agent]]) – List of agents to play (can be only one agent).

run(episodes, render=True, render_mode='human', learn=True, steps_cycle_len=10, episodes_cycle_len=0.05, verbose=0, callbacks=None, logger=None, reward_handler=None, done_handler=None, **kwargs)

Let the agent(s) play on the environement for a number of episodes.

Additional arguments will be passed to the default logger.

Parameters
  • episodes (int) – Number of episodes to run.

  • render (bool) – If True, call |gym.render| every step.

  • render_mode (str) – Rendering mode. One of {‘human’, ‘rgb_array’, ‘ansi’} (see |gym.render|).

  • learn (bool) – If True, call Agent.learn() every step.

  • steps_cycle_len (int) – Number of steps that compose a cycle.

  • episode_cycle_len – Number of episodes that compose a cycle. If between 0 and 1, this in understood as a proportion.

  • verbose (int) – The verbosity level: 0 (silent), 1 (cycle), 2 (episode), 3 (step_cycle), 4 (step), 5 (detailed step).

  • callbacks (Optional[List[Callback]]) – List of Callback to use in runs.

  • reward_handler (Union[Callable, RewardHandler, None]) – A callable to redifine rewards of the environement.

  • done_handler (Union[Callable, DoneHandler, None]) – A callable to redifine the environement end.

  • logger (Optional[Callback]) – Logging callback to use. If None use the default Logger.

fit(episodes, **kwargs)

Train the agent(s) on the environement for a number of episodes.

test(episodes, **kwargs)

Test the agent(s) on the environement for a number of episodes.

set_agents_order(agents_order)

Change the agents_order.

This will update the agents order.

Parameters

agents_order (list) – New agents indices order. Default is range(n_agents).

Return type

list

Returns

The updated agents ordered indices list.

Agent

class Agent

A general structure for any learning agent.

abstract act(observation, greedy=False)

How the Agent act given an observation.

Parameters
  • observation – The observation given by the environment.

  • greedy (bool) – If True, act greedely (without exploration).

Return type

Union[int, float, ndarray]

learn()

How the Agent learns from his experiences.

Returns

The agent learning logs (Has to be numpy or python).

Return type

logs

remember(observation, action, reward, done, next_observation=None, info=None, **param)

How the Agent will remember experiences.

Often, the agent will use a perfect hash functions to store observations efficiently.

Example

>>>  self.memory.remember(self.observation_encoder(observation),
...                       self.action_encoder(action),
...                       reward, done,
...                       self.observation_encoder(next_observation),
...                       info, **param)

TurnEnv

class TurnEnv

Turn based multi-agents gym environment.

A layer over the gym environment class able to handle turn based environments with multiple agents.

Note

A TurnEnv must be in a Playground in order to work !

The only add in TurnEnv is the method “turn”, On top of the main API basic methodes (see environment): * step: take a step of the environment given the action of the active player * reset: reset the environment and returns the first observation * render * close * seed

action_space

The Space object corresponding to actions.

Type

space

observation_space

The Space object corresponding to observations.

Type

space

abstract step(action)

Perform a step of the environement.

Parameters

action – The action taken by the agent who’s turn was given by turn().

Returns

The observation to give to the Agent. reward (float): The reward given to the Agent for this step. done (bool): True if the environement is done after this step. info (dict): Additional informations given by the environment.

Return type

observation

abstract turn(state)

Give the turn to the next agent to play.

Assuming that agents are represented by a list like range(n_player) where n_player is the number of players in the game.

Parameters

state – The state of the environement. Should be enough to determine which is the next agent to play.

Returns

The next player id

Return type

agent_id (int)

abstract reset()

Reset the environement and returns the initial state.

Returns

The observation for the first Agent to play

Return type

observation

Handlers

RewardHandler

class RewardHandler

Helper to modify the rewards given by the environment.

You need to specify the method:
  • reward(self, observation, action, reward, done, info, next_observation) -> float

You can also define __init__ and reset() if you want to store anything.

abstract reward(observation, action, reward, done, info, next_observation, logs)

Replace the environment reward.

Often used to scale rewards or to do reward shaping.

Parameters
  • observation – Current observation.

  • action – Current action.

  • reward – Current reward.

  • done – done given by the environment.

  • info – Addition informations given by the environment.

  • next_observation – Next observation.

Return type

float

reset()

Reset the RewardHandler

Called automaticaly in Playground.run(). Useful only if variables are stored by the RewardHandler.

DoneHandler

class DoneHandler

Helper to modify the done given by the environment.

You need to specify the method:
  • done(self, observation, action, reward, done, info, next_observation) -> bool

You can also define __init__ and reset() if you want to store anything.

abstract done(observation, action, reward, done, info, next_observation, logs)

Replace the environment done.

Often used to make episodes shorter when the agent is stuck for example.

Parameters
  • observation – Current observation.

  • action – Current action.

  • reward – Current reward.

  • done – done given by the environment.

  • info – Addition informations given by the environment.

  • next_observation – Next observation.

Return type

bool

reset()

Reset the DoneHandler

Called automaticaly in Playground.run(). Used only if variables are stored by the DoneHandler.