Benchmarks¶
Benchmarks is a tool to monitor and log reinforcement learning experiments. You build/find any compatible agent (only need an act method), you build/find a gym environment, and benchmarks will make them interact together ! Benchmarks also contains both tensorboard and weights&biases integrations for a beautiful and sharable experiment tracking ! Also, Benchmarks is cross platform compatible ! That’s why no agents are built-in benchmarks itself.
You can build and run your own Agent in a clear and sharable manner !
import benchmarks as rl
import gym
class MyAgent(rl.Agent):
def act(self, observation, greedy=False):
""" How the Agent act given an observation """
...
return action
def learn(self):
""" How the Agent learns from his experiences """
...
return logs
def remember(self, observation, action, reward, done, next_observation=None, info={}, **param):
""" How the Agent will remember experiences """
...
env = gym.make('FrozenLake-v0', is_slippery=True) # This could be any gym-like Environment !
agent = MyAgent(env.observation_space, env.action_space)
pg = rl.Playground(env, agent)
pg.fit(2000, verbose=1)
Note that ‘learn’ and ‘remember’ are optional, so this framework can also be used for baselines !
You can logs any custom metrics that your Agent/Env gives you and even chose how to aggregate them through different timescales. See the metric codes for more details.
metrics=[
('reward~env-rwd', {'steps': 'sum', 'episode': 'sum'}),
('handled_reward~reward', {'steps': 'sum', 'episode': 'sum'}),
'value_loss~vloss',
'actor_loss~aloss',
'exploration~exp'
]
pg.fit(2000, verbose=1, metrics=metrics)
The Playground will allow you to have clean logs adapted to your will with the verbose parameter:
- Verbose 1episodes cycles - If your environment makes a lot of quick episodes.
- Verbose 2episode - To log each individual episode.
- Verbose 3steps cycles - If your environment makes a lot of quick steps but has long episodes.
- Verbose 4step - To log each individual step.
- Verbose 5detailled step - To debug each individual step (with observations, actions, …).
The Playground also allows you to add Callbacks with ease, for example the WandbCallback to have a nice experiment tracking dashboard using weights&biases!
Installation¶
Install Benchmarks by running:
pip install benchmarks
Documentation¶
Contribute¶
Support¶
If you are having issues, please contact us on Discord.
License¶
Table Of Content¶
Benchmarks’s Core¶
Benchmarks is based on those core objects: Playground, Agent, TurnEnv.
They are all linked by the Playground, as showned by this:
Playground¶
- class Playground(environement, agents, agents_order=None)¶
A playground is used to run interactions between an environement and agent(s)
- env¶
Environement in which the agent(s) will play.
- Type
gym.Env
A playground is used to run agent(s) on an environement
- Parameters
- run(episodes, render=True, render_mode='human', learn=True, steps_cycle_len=10, episodes_cycle_len=0.05, verbose=0, callbacks=None, logger=None, reward_handler=None, done_handler=None, **kwargs)¶
Let the agent(s) play on the environement for a number of episodes.
Additional arguments will be passed to the default logger.
- Parameters
episodes (
int
) – Number of episodes to run.render (
bool
) – If True, call |gym.render| every step.render_mode (
str
) – Rendering mode. One of {‘human’, ‘rgb_array’, ‘ansi’} (see |gym.render|).learn (
bool
) – If True, callAgent.learn()
every step.steps_cycle_len (
int
) – Number of steps that compose a cycle.episode_cycle_len – Number of episodes that compose a cycle. If between 0 and 1, this in understood as a proportion.
verbose (
int
) – The verbosity level: 0 (silent), 1 (cycle), 2 (episode), 3 (step_cycle), 4 (step), 5 (detailed step).callbacks (
Optional
[List
[Callback
]]) – List ofCallback
to use in runs.reward_handler (
Union
[Callable
,RewardHandler
,None
]) – A callable to redifine rewards of the environement.done_handler (
Union
[Callable
,DoneHandler
,None
]) – A callable to redifine the environement end.logger (
Optional
[Callback
]) – Logging callback to use. If None use the defaultLogger
.
- fit(episodes, **kwargs)¶
Train the agent(s) on the environement for a number of episodes.
- test(episodes, **kwargs)¶
Test the agent(s) on the environement for a number of episodes.
Agent¶
- class Agent¶
A general structure for any learning agent.
- learn()¶
How the Agent learns from his experiences.
- Returns
The agent learning logs (Has to be numpy or python).
- Return type
logs
- remember(observation, action, reward, done, next_observation=None, info=None, **param)¶
How the Agent will remember experiences.
Often, the agent will use a perfect hash functions to store observations efficiently.
Example
>>> self.memory.remember(self.observation_encoder(observation), ... self.action_encoder(action), ... reward, done, ... self.observation_encoder(next_observation), ... info, **param)
TurnEnv¶
- class TurnEnv¶
Turn based multi-agents gym environment.
A layer over the gym environment class able to handle turn based environments with multiple agents.
Note
A TurnEnv must be in a Playground in order to work !
The only add in TurnEnv is the method “turn”, On top of the main API basic methodes (see environment): * step: take a step of the environment given the action of the active player * reset: reset the environment and returns the first observation * render * close * seed
- abstract step(action)¶
Perform a step of the environement.
- Parameters
action – The action taken by the agent who’s turn was given by
turn()
.- Returns
The observation to give to the
Agent
. reward (float): The reward given to theAgent
for this step. done (bool): True if the environement is done after this step. info (dict): Additional informations given by the environment.- Return type
observation
- abstract turn(state)¶
Give the turn to the next agent to play.
Assuming that agents are represented by a list like range(n_player) where n_player is the number of players in the game.
- Parameters
state – The state of the environement. Should be enough to determine which is the next agent to play.
- Returns
The next player id
- Return type
agent_id (int)
- abstract reset()¶
Reset the environement and returns the initial state.
- Returns
The observation for the first
Agent
to play- Return type
observation
Handlers¶
RewardHandler¶
- class RewardHandler¶
Helper to modify the rewards given by the environment.
- You need to specify the method:
reward(self, observation, action, reward, done, info, next_observation) -> float
You can also define __init__ and reset() if you want to store anything.
- abstract reward(observation, action, reward, done, info, next_observation, logs)¶
Replace the environment reward.
Often used to scale rewards or to do reward shaping.
- Parameters
observation – Current observation.
action – Current action.
reward – Current reward.
done – done given by the environment.
info – Addition informations given by the environment.
next_observation – Next observation.
- Return type
- reset()¶
Reset the RewardHandler
Called automaticaly in
Playground.run()
. Useful only if variables are stored by the RewardHandler.
DoneHandler¶
- class DoneHandler¶
Helper to modify the done given by the environment.
- You need to specify the method:
done(self, observation, action, reward, done, info, next_observation) -> bool
You can also define __init__ and reset() if you want to store anything.
- abstract done(observation, action, reward, done, info, next_observation, logs)¶
Replace the environment done.
Often used to make episodes shorter when the agent is stuck for example.
- Parameters
observation – Current observation.
action – Current action.
reward – Current reward.
done – done given by the environment.
info – Addition informations given by the environment.
next_observation – Next observation.
- Return type
- reset()¶
Reset the DoneHandler
Called automaticaly in
Playground.run()
. Used only if variables are stored by the DoneHandler.
Callbacks¶
Callback API¶
- class Callback¶
An object to call functions while the
Playground
is running. You can define the custom functions on_{position} where position can be :>>> run_begin ... episodes_cycle_begin ... episode_begin ... steps_cycle_begin ... step_begin ... # env.step() ... step_end ... steps_cycle_end ... # done==True ... episode_end ... episodes_cycle_end ... run_end
- set_params(params)¶
Sets run parameters
- set_playground(playground)¶
Sets reference to the used playground
- on_step_begin(step, logs=None)¶
Triggers on each step beginning
- on_step_end(step, logs=None)¶
Triggers on each step end
- on_steps_cycle_begin(step, logs=None)¶
Triggers on each step cycle beginning
- on_steps_cycle_end(step, logs=None)¶
Triggers on each step cycle end
- on_episode_begin(episode, logs=None)¶
Triggers on each episode beginning
- on_episode_end(episode, logs=None)¶
Triggers on each episode end
- on_episodes_cycle_begin(episode, logs=None)¶
Triggers on each episode cycle beginning
- on_episodes_cycle_end(episode, logs=None)¶
Triggers on each episode cycle end
- on_run_begin(logs=None)¶
Triggers on each run beginning
Logger¶
- class Logger(metrics=None, detailed_step_metrics=None, episode_only_metrics=None, titles_on_top=True)¶
Default logger in every
Playground
run.This will print relevant informations in console.
You can regulate the flow of informations with the argument verbose in
run()
directly :0 is silent (nothing will be printed)
1 is cycles of episodes (aggregated metrics over multiple episodes)
2 is every episode (aggregated metrics over all steps)
3 is cycles of steps (aggregated metrics over some steps)
4 is every step
5 is every step detailed (all metrics of all steps)
You can also replace it with you own
Logger
, with the argument logger inrun()
.To build you own logger, you have to chose what metrics will be displayed and how will metrics be aggregated over steps/episodes/cycles. To do that, see the Metric codes format.
Default logger in every
Playground
run.- Parameters
metrics (
Optional
[List
[Union
[str
,tuple
]]]) – Metrics to display and how to aggregate them.detailed_step_metrics (
Optional
[List
[str
]]) – Metrics to display only on detailed steps.episode_only_metrics (
Optional
[List
[str
]]) – Metrics to display only on episodes.titles_on_top (
bool
) – If true, titles will be displayed on top and not at every line.
- on_step_begin(step, logs=None)¶
Triggers on each step beginning
- Parameters
step – current step.
logs – current logs.
- on_step_end(step, logs=None)¶
Triggers on each step end
- Parameters
step – current step.
logs – current logs.
- on_steps_cycle_begin(step, logs=None)¶
Triggers on each step cycle beginning
- Parameters
step – current step.
logs – current logs.
- on_steps_cycle_end(step, logs=None)¶
Triggers on each step cycle end
- Parameters
step – current step.
logs – current logs.
- on_episode_begin(episode, logs=None)¶
Triggers on each episode beginning
- Parameters
episode – current episode.
logs – current logs.
- on_episode_end(episode, logs=None)¶
Triggers on each episode end
- Parameters
episode – current episode.
logs – current logs.
- on_episodes_cycle_begin(episode, logs=None)¶
Triggers on each episode cycle beginning
- Parameters
episode – current episode.
logs – current logs.
- on_episodes_cycle_end(episode, logs=None)¶
Triggers on each episode cycle end
- Parameters
episode – current episode.
logs – current logs.
- on_run_begin(logs=None)¶
Triggers on each run beginning
- Parameters
logs – current logs.
Metric codes¶
To fully represent a metric and how to aggregate it, we use metric codes as such:
<logs_key>~<display_name>.<aggregator_function>
Where logs_key is the metric key in logs
Finaly aggregator_function is one of { avg
, sum
, last
}:
avg
computes the average of the metric while aggregating. (default)sum
computes the sum of the metric while aggregating.last
only shows the last value of the metric.
Examples¶
reward~rwd.sum
will aggregate the sum of rewards and display Rwdloss
will show the average loss as Loss (no surname)dt_step~
will show the average step_time with no name (surname is ‘’)exploration~exp.last
will show the last exploration value as Exp