Benchmarks

Pytest badge Pylint badge Unit coverage badge Integration coverage badge PyPI - License

Benchmarks is a tool to monitor and log reinforcement learning experiments. You build/find any compatible agent (only need an act method), you build/find a gym environment, and benchmarks will make them interact together ! Benchmarks also contains both tensorboard and weights&biases integrations for a beautiful and sharable experiment tracking ! Also, Benchmarks is cross platform compatible ! That’s why no agents are built-in benchmarks itself.

You can build and run your own Agent in a clear and sharable manner !

import benchmarks as rl
import gym

class MyAgent(rl.Agent):

   def act(self, observation, greedy=False):
      """ How the Agent act given an observation """
      ...
      return action

   def learn(self):
      """ How the Agent learns from his experiences """
      ...
      return logs

   def remember(self, observation, action, reward, done, next_observation=None, info={}, **param):
      """ How the Agent will remember experiences """
      ...

env = gym.make('FrozenLake-v0', is_slippery=True) # This could be any gym-like Environment !
agent = MyAgent(env.observation_space, env.action_space)

pg = rl.Playground(env, agent)
pg.fit(2000, verbose=1)

Note that ‘learn’ and ‘remember’ are optional, so this framework can also be used for baselines !

You can logs any custom metrics that your Agent/Env gives you and even chose how to aggregate them through different timescales. See the metric codes for more details.

metrics=[
     ('reward~env-rwd', {'steps': 'sum', 'episode': 'sum'}),
     ('handled_reward~reward', {'steps': 'sum', 'episode': 'sum'}),
     'value_loss~vloss',
     'actor_loss~aloss',
     'exploration~exp'
 ]

pg.fit(2000, verbose=1, metrics=metrics)

The Playground will allow you to have clean logs adapted to your will with the verbose parameter:

  • Verbose 1episodes cycles - If your environment makes a lot of quick episodes.
    _images/logs-verbose-1.png
  • Verbose 2episode - To log each individual episode.
    _images/logs-verbose-2.png
  • Verbose 3steps cycles - If your environment makes a lot of quick steps but has long episodes.
    _images/logs-verbose-3.png
  • Verbose 4step - To log each individual step.
    _images/logs-verbose-4.png
  • Verbose 5detailled step - To debug each individual step (with observations, actions, …).
    _images/logs-verbose-5.png

The Playground also allows you to add Callbacks with ease, for example the WandbCallback to have a nice experiment tracking dashboard using weights&biases!

Installation

Install Benchmarks by running:

pip install benchmarks

Documentation

See the latest complete documentation for more details.
See the development documentation to see what’s coming !

Contribute

Support

If you are having issues, please contact us on Discord.

License

The project is licensed under the GNU LGPLv3 license.
See LICENCE, COPYING and COPYING.LESSER for more details.

Table Of Content

Benchmarks’s Core

Benchmarks is based on those core objects: Playground, Agent, TurnEnv.

They are all linked by the Playground, as showned by this:

image/svg+xmlPlayground Agents TurnEnv I R D O turn(O) reset() step(A) agents_order set_agents_order() Y N Done ? prev_experience O A R D I O A R D I O A R D I Agent act(O) learn() Memory remember (O, A, R, D, O’, I) Agent act(O) learn() Memory remember (O, A, R, D, O’, I) Agent act(O) learn() Memory remember (O, A, R, D, O’, I) O bservation A ction R eward D one I nfo

Playground

class Playground(environement, agents, agents_order=None)

A playground is used to run interactions between an environement and agent(s)

env

Environement in which the agent(s) will play.

Type

gym.Env

agents

List of agents to play.

Type

list of benchmarks.Agent

A playground is used to run agent(s) on an environement

Parameters
  • env – Environement in which the agent(s) will play.

  • agents (Union[Agent, List[Agent]]) – List of agents to play (can be only one agent).

run(episodes, render=True, render_mode='human', learn=True, steps_cycle_len=10, episodes_cycle_len=0.05, verbose=0, callbacks=None, logger=None, reward_handler=None, done_handler=None, **kwargs)

Let the agent(s) play on the environement for a number of episodes.

Additional arguments will be passed to the default logger.

Parameters
  • episodes (int) – Number of episodes to run.

  • render (bool) – If True, call |gym.render| every step.

  • render_mode (str) – Rendering mode. One of {‘human’, ‘rgb_array’, ‘ansi’} (see |gym.render|).

  • learn (bool) – If True, call Agent.learn() every step.

  • steps_cycle_len (int) – Number of steps that compose a cycle.

  • episode_cycle_len – Number of episodes that compose a cycle. If between 0 and 1, this in understood as a proportion.

  • verbose (int) – The verbosity level: 0 (silent), 1 (cycle), 2 (episode), 3 (step_cycle), 4 (step), 5 (detailed step).

  • callbacks (Optional[List[Callback]]) – List of Callback to use in runs.

  • reward_handler (Union[Callable, RewardHandler, None]) – A callable to redifine rewards of the environement.

  • done_handler (Union[Callable, DoneHandler, None]) – A callable to redifine the environement end.

  • logger (Optional[Callback]) – Logging callback to use. If None use the default Logger.

fit(episodes, **kwargs)

Train the agent(s) on the environement for a number of episodes.

test(episodes, **kwargs)

Test the agent(s) on the environement for a number of episodes.

set_agents_order(agents_order)

Change the agents_order.

This will update the agents order.

Parameters

agents_order (list) – New agents indices order. Default is range(n_agents).

Return type

list

Returns

The updated agents ordered indices list.

Agent

class Agent

A general structure for any learning agent.

abstract act(observation, greedy=False)

How the Agent act given an observation.

Parameters
  • observation – The observation given by the environment.

  • greedy (bool) – If True, act greedely (without exploration).

Return type

Union[int, float, ndarray]

learn()

How the Agent learns from his experiences.

Returns

The agent learning logs (Has to be numpy or python).

Return type

logs

remember(observation, action, reward, done, next_observation=None, info=None, **param)

How the Agent will remember experiences.

Often, the agent will use a perfect hash functions to store observations efficiently.

Example

>>>  self.memory.remember(self.observation_encoder(observation),
...                       self.action_encoder(action),
...                       reward, done,
...                       self.observation_encoder(next_observation),
...                       info, **param)

TurnEnv

class TurnEnv

Turn based multi-agents gym environment.

A layer over the gym environment class able to handle turn based environments with multiple agents.

Note

A TurnEnv must be in a Playground in order to work !

The only add in TurnEnv is the method “turn”, On top of the main API basic methodes (see environment): * step: take a step of the environment given the action of the active player * reset: reset the environment and returns the first observation * render * close * seed

action_space

The Space object corresponding to actions.

Type

space

observation_space

The Space object corresponding to observations.

Type

space

abstract step(action)

Perform a step of the environement.

Parameters

action – The action taken by the agent who’s turn was given by turn().

Returns

The observation to give to the Agent. reward (float): The reward given to the Agent for this step. done (bool): True if the environement is done after this step. info (dict): Additional informations given by the environment.

Return type

observation

abstract turn(state)

Give the turn to the next agent to play.

Assuming that agents are represented by a list like range(n_player) where n_player is the number of players in the game.

Parameters

state – The state of the environement. Should be enough to determine which is the next agent to play.

Returns

The next player id

Return type

agent_id (int)

abstract reset()

Reset the environement and returns the initial state.

Returns

The observation for the first Agent to play

Return type

observation

Handlers

RewardHandler
class RewardHandler

Helper to modify the rewards given by the environment.

You need to specify the method:
  • reward(self, observation, action, reward, done, info, next_observation) -> float

You can also define __init__ and reset() if you want to store anything.

abstract reward(observation, action, reward, done, info, next_observation, logs)

Replace the environment reward.

Often used to scale rewards or to do reward shaping.

Parameters
  • observation – Current observation.

  • action – Current action.

  • reward – Current reward.

  • done – done given by the environment.

  • info – Addition informations given by the environment.

  • next_observation – Next observation.

Return type

float

reset()

Reset the RewardHandler

Called automaticaly in Playground.run(). Useful only if variables are stored by the RewardHandler.

DoneHandler
class DoneHandler

Helper to modify the done given by the environment.

You need to specify the method:
  • done(self, observation, action, reward, done, info, next_observation) -> bool

You can also define __init__ and reset() if you want to store anything.

abstract done(observation, action, reward, done, info, next_observation, logs)

Replace the environment done.

Often used to make episodes shorter when the agent is stuck for example.

Parameters
  • observation – Current observation.

  • action – Current action.

  • reward – Current reward.

  • done – done given by the environment.

  • info – Addition informations given by the environment.

  • next_observation – Next observation.

Return type

bool

reset()

Reset the DoneHandler

Called automaticaly in Playground.run(). Used only if variables are stored by the DoneHandler.

Callbacks

Callback API

class Callback

An object to call functions while the Playground is running. You can define the custom functions on_{position} where position can be :

>>> run_begin
...     episodes_cycle_begin
...         episode_begin
...             steps_cycle_begin
...                 step_begin
...                 # env.step()
...                 step_end
...             steps_cycle_end
...         # done==True
...         episode_end
...     episodes_cycle_end
... run_end
set_params(params)

Sets run parameters

set_playground(playground)

Sets reference to the used playground

on_step_begin(step, logs=None)

Triggers on each step beginning

Parameters
on_step_end(step, logs=None)

Triggers on each step end

Parameters
on_steps_cycle_begin(step, logs=None)

Triggers on each step cycle beginning

Parameters
on_steps_cycle_end(step, logs=None)

Triggers on each step cycle end

Parameters
on_episode_begin(episode, logs=None)

Triggers on each episode beginning

Parameters
  • episode (int) – current episode.

  • logs (Optional[dict]) – current logs.

on_episode_end(episode, logs=None)

Triggers on each episode end

Parameters
  • episode (int) – current episode.

  • logs (Optional[dict]) – current logs.

on_episodes_cycle_begin(episode, logs=None)

Triggers on each episode cycle beginning

Parameters
  • episode (int) – current episode.

  • logs (Optional[dict]) – current logs.

on_episodes_cycle_end(episode, logs=None)

Triggers on each episode cycle end

Parameters
  • episode (int) – current episode.

  • logs (Optional[dict]) – current logs.

on_run_begin(logs=None)

Triggers on each run beginning

Parameters

logs (Optional[dict]) – current logs.

on_run_end(logs=None)

Triggers on run end

Parameters

logs (Optional[dict]) – current logs.

Logger

class Logger(metrics=None, detailed_step_metrics=None, episode_only_metrics=None, titles_on_top=True)

Default logger in every Playground run.

This will print relevant informations in console.

You can regulate the flow of informations with the argument verbose in run() directly :

  • 0 is silent (nothing will be printed)

  • 1 is cycles of episodes (aggregated metrics over multiple episodes)

  • 2 is every episode (aggregated metrics over all steps)

  • 3 is cycles of steps (aggregated metrics over some steps)

  • 4 is every step

  • 5 is every step detailed (all metrics of all steps)

You can also replace it with you own Logger, with the argument logger in run().

To build you own logger, you have to chose what metrics will be displayed and how will metrics be aggregated over steps/episodes/cycles. To do that, see the Metric codes format.

Default logger in every Playground run.

Parameters
  • metrics (Optional[List[Union[str, tuple]]]) – Metrics to display and how to aggregate them.

  • detailed_step_metrics (Optional[List[str]]) – Metrics to display only on detailed steps.

  • episode_only_metrics (Optional[List[str]]) – Metrics to display only on episodes.

  • titles_on_top (bool) – If true, titles will be displayed on top and not at every line.

on_step_begin(step, logs=None)

Triggers on each step beginning

Parameters
  • step – current step.

  • logs – current logs.

on_step_end(step, logs=None)

Triggers on each step end

Parameters
  • step – current step.

  • logs – current logs.

on_steps_cycle_begin(step, logs=None)

Triggers on each step cycle beginning

Parameters
  • step – current step.

  • logs – current logs.

on_steps_cycle_end(step, logs=None)

Triggers on each step cycle end

Parameters
  • step – current step.

  • logs – current logs.

on_episode_begin(episode, logs=None)

Triggers on each episode beginning

Parameters
  • episode – current episode.

  • logs – current logs.

on_episode_end(episode, logs=None)

Triggers on each episode end

Parameters
  • episode – current episode.

  • logs – current logs.

on_episodes_cycle_begin(episode, logs=None)

Triggers on each episode cycle beginning

Parameters
  • episode – current episode.

  • logs – current logs.

on_episodes_cycle_end(episode, logs=None)

Triggers on each episode cycle end

Parameters
  • episode – current episode.

  • logs – current logs.

on_run_begin(logs=None)

Triggers on each run beginning

Parameters

logs – current logs.

Metric codes

To fully represent a metric and how to aggregate it, we use metric codes as such:

<logs_key>~<display_name>.<aggregator_function>

Where logs_key is the metric key in logs

Display_name is the optional name that will be displayed in console.
If not specified, the logs_key will be displayed.

Finaly aggregator_function is one of { avg, sum, last }:

  • avg computes the average of the metric while aggregating. (default)

  • sum computes the sum of the metric while aggregating.

  • last only shows the last value of the metric.

Examples
  • reward~rwd.sum will aggregate the sum of rewards and display Rwd

  • loss will show the average loss as Loss (no surname)

  • dt_step~ will show the average step_time with no name (surname is ‘’)

  • exploration~exp.last will show the last exploration value as Exp