Benchmarks’s Core¶

Benchmarks is based on those core objects: Playground, Agent, TurnEnv.

They are all linked by the Playground, as showned by this:

Playground¶

class Playground(environement, agents, agents_order=None)¶

A playground is used to run interactions between an environement and agent(s)

env¶

Environement in which the agent(s) will play.

Type: gym.Env

agents¶

List of agents to play.

Type: list of benchmarks.Agent

A playground is used to run agent(s) on an environement

Parameters

env – Environement in which the agent(s) will play.
agents (Union[Agent, List[Agent]]) – List of agents to play (can be only one agent).

run(episodes, render=True, render_mode='human', learn=True, steps_cycle_len=10, episodes_cycle_len=0.05, verbose=0, callbacks=None, logger=None, reward_handler=None, done_handler=None, **kwargs)¶

Let the agent(s) play on the environement for a number of episodes.

Additional arguments will be passed to the default logger.

Parameters

episodes (int) – Number of episodes to run.
render (bool) – If True, call |gym.render| every step.
render_mode (str) – Rendering mode. One of {‘human’, ‘rgb_array’, ‘ansi’} (see |gym.render|).
learn (bool) – If True, call Agent.learn() every step.
steps_cycle_len (int) – Number of steps that compose a cycle.
episode_cycle_len – Number of episodes that compose a cycle. If between 0 and 1, this in understood as a proportion.
verbose (int) – The verbosity level: 0 (silent), 1 (cycle), 2 (episode), 3 (step_cycle), 4 (step), 5 (detailed step).
callbacks (Optional[List[Callback]]) – List of Callback to use in runs.
reward_handler (Union[Callable, RewardHandler, None]) – A callable to redifine rewards of the environement.
done_handler (Union[Callable, DoneHandler, None]) – A callable to redifine the environement end.
logger (Optional[Callback]) – Logging callback to use. If None use the default Logger.

fit(episodes, **kwargs)¶: Train the agent(s) on the environement for a number of episodes.

test(episodes, **kwargs)¶: Test the agent(s) on the environement for a number of episodes.

set_agents_order(agents_order)¶

Change the agents_order.

This will update the agents order.

Parameters: agents_order (list) – New agents indices order. Default is range(n_agents).
Return type: list
Returns: The updated agents ordered indices list.

Agent¶

class Agent¶

A general structure for any learning agent.

abstract act(observation, greedy=False)¶

How the Agent act given an observation.

Parameters

observation – The observation given by the environment.
greedy (bool) – If True, act greedely (without exploration).

Return type

Union[int, float, ndarray]

learn()¶

How the Agent learns from his experiences.

Returns: The agent learning logs (Has to be numpy or python).
Return type: logs

remember(observation, action, reward, done, next_observation=None, info=None, **param)¶

How the Agent will remember experiences.

Often, the agent will use a perfect hash functions to store observations efficiently.

Example

>>>  self.memory.remember(self.observation_encoder(observation),
...                       self.action_encoder(action),
...                       reward, done,
...                       self.observation_encoder(next_observation),
...                       info, **param)

TurnEnv¶

class TurnEnv¶

Turn based multi-agents gym environment.

A layer over the gym environment class able to handle turn based environments with multiple agents.

Note

A TurnEnv must be in a Playground in order to work !

The only add in TurnEnv is the method “turn”, On top of the main API basic methodes (see environment): * step: take a step of the environment given the action of the active player * reset: reset the environment and returns the first observation * render * close * seed

action_space¶

The Space object corresponding to actions.

Type: space

observation_space¶

The Space object corresponding to observations.

Type: space

abstract step(action)¶

Perform a step of the environement.

Parameters: action – The action taken by the agent who’s turn was given by turn().
Returns: The observation to give to the Agent. reward (float): The reward given to the Agent for this step. done (bool): True if the environement is done after this step. info (dict): Additional informations given by the environment.
Return type: observation

abstract turn(state)¶

Give the turn to the next agent to play.

Assuming that agents are represented by a list like range(n_player) where n_player is the number of players in the game.

Parameters: state – The state of the environement. Should be enough to determine which is the next agent to play.
Returns: The next player id
Return type: agent_id (int)

abstract reset()¶

Reset the environement and returns the initial state.

Returns: The observation for the first Agent to play
Return type: observation

Handlers¶

RewardHandler¶

class RewardHandler¶

Helper to modify the rewards given by the environment.

You need to specify the method:

reward(self, observation, action, reward, done, info, next_observation) -> float

You can also define __init__ and reset() if you want to store anything.

abstract reward(observation, action, reward, done, info, next_observation, logs)¶

Replace the environment reward.

Often used to scale rewards or to do reward shaping.

Parameters

observation – Current observation.
action – Current action.
reward – Current reward.
done – done given by the environment.
info – Addition informations given by the environment.
next_observation – Next observation.

Return type

float

reset()¶

Reset the RewardHandler

Called automaticaly in Playground.run(). Useful only if variables are stored by the RewardHandler.

DoneHandler¶

class DoneHandler¶

Helper to modify the done given by the environment.

You need to specify the method:

done(self, observation, action, reward, done, info, next_observation) -> bool

You can also define __init__ and reset() if you want to store anything.

abstract done(observation, action, reward, done, info, next_observation, logs)¶

Replace the environment done.

Often used to make episodes shorter when the agent is stuck for example.

Parameters

observation – Current observation.
action – Current action.
reward – Current reward.
done – done given by the environment.
info – Addition informations given by the environment.
next_observation – Next observation.

Return type

bool

reset()¶

Reset the DoneHandler

Called automaticaly in Playground.run(). Used only if variables are stored by the DoneHandler.