Benchmarks’s Core¶
Benchmarks is based on those core objects: Playground, Agent, TurnEnv.
They are all linked by the Playground, as showned by this:
Playground¶
- class Playground(environement, agents, agents_order=None)¶
A playground is used to run interactions between an environement and agent(s)
- env¶
Environement in which the agent(s) will play.
- Type
gym.Env
A playground is used to run agent(s) on an environement
- Parameters
- run(episodes, render=True, render_mode='human', learn=True, steps_cycle_len=10, episodes_cycle_len=0.05, verbose=0, callbacks=None, logger=None, reward_handler=None, done_handler=None, **kwargs)¶
Let the agent(s) play on the environement for a number of episodes.
Additional arguments will be passed to the default logger.
- Parameters
episodes (
int
) – Number of episodes to run.render (
bool
) – If True, call |gym.render| every step.render_mode (
str
) – Rendering mode. One of {‘human’, ‘rgb_array’, ‘ansi’} (see |gym.render|).learn (
bool
) – If True, callAgent.learn()
every step.steps_cycle_len (
int
) – Number of steps that compose a cycle.episode_cycle_len – Number of episodes that compose a cycle. If between 0 and 1, this in understood as a proportion.
verbose (
int
) – The verbosity level: 0 (silent), 1 (cycle), 2 (episode), 3 (step_cycle), 4 (step), 5 (detailed step).callbacks (
Optional
[List
[Callback
]]) – List ofCallback
to use in runs.reward_handler (
Union
[Callable
,RewardHandler
,None
]) – A callable to redifine rewards of the environement.done_handler (
Union
[Callable
,DoneHandler
,None
]) – A callable to redifine the environement end.logger (
Optional
[Callback
]) – Logging callback to use. If None use the defaultLogger
.
- fit(episodes, **kwargs)¶
Train the agent(s) on the environement for a number of episodes.
- test(episodes, **kwargs)¶
Test the agent(s) on the environement for a number of episodes.
Agent¶
- class Agent¶
A general structure for any learning agent.
- learn()¶
How the Agent learns from his experiences.
- Returns
The agent learning logs (Has to be numpy or python).
- Return type
logs
- remember(observation, action, reward, done, next_observation=None, info=None, **param)¶
How the Agent will remember experiences.
Often, the agent will use a perfect hash functions to store observations efficiently.
Example
>>> self.memory.remember(self.observation_encoder(observation), ... self.action_encoder(action), ... reward, done, ... self.observation_encoder(next_observation), ... info, **param)
TurnEnv¶
- class TurnEnv¶
Turn based multi-agents gym environment.
A layer over the gym environment class able to handle turn based environments with multiple agents.
Note
A TurnEnv must be in a Playground in order to work !
The only add in TurnEnv is the method “turn”, On top of the main API basic methodes (see environment): * step: take a step of the environment given the action of the active player * reset: reset the environment and returns the first observation * render * close * seed
- abstract step(action)¶
Perform a step of the environement.
- Parameters
action – The action taken by the agent who’s turn was given by
turn()
.- Returns
The observation to give to the
Agent
. reward (float): The reward given to theAgent
for this step. done (bool): True if the environement is done after this step. info (dict): Additional informations given by the environment.- Return type
observation
- abstract turn(state)¶
Give the turn to the next agent to play.
Assuming that agents are represented by a list like range(n_player) where n_player is the number of players in the game.
- Parameters
state – The state of the environement. Should be enough to determine which is the next agent to play.
- Returns
The next player id
- Return type
agent_id (int)
- abstract reset()¶
Reset the environement and returns the initial state.
- Returns
The observation for the first
Agent
to play- Return type
observation
Handlers¶
RewardHandler¶
- class RewardHandler¶
Helper to modify the rewards given by the environment.
- You need to specify the method:
reward(self, observation, action, reward, done, info, next_observation) -> float
You can also define __init__ and reset() if you want to store anything.
- abstract reward(observation, action, reward, done, info, next_observation, logs)¶
Replace the environment reward.
Often used to scale rewards or to do reward shaping.
- Parameters
observation – Current observation.
action – Current action.
reward – Current reward.
done – done given by the environment.
info – Addition informations given by the environment.
next_observation – Next observation.
- Return type
- reset()¶
Reset the RewardHandler
Called automaticaly in
Playground.run()
. Useful only if variables are stored by the RewardHandler.
DoneHandler¶
- class DoneHandler¶
Helper to modify the done given by the environment.
- You need to specify the method:
done(self, observation, action, reward, done, info, next_observation) -> bool
You can also define __init__ and reset() if you want to store anything.
- abstract done(observation, action, reward, done, info, next_observation, logs)¶
Replace the environment done.
Often used to make episodes shorter when the agent is stuck for example.
- Parameters
observation – Current observation.
action – Current action.
reward – Current reward.
done – done given by the environment.
info – Addition informations given by the environment.
next_observation – Next observation.
- Return type
- reset()¶
Reset the DoneHandler
Called automaticaly in
Playground.run()
. Used only if variables are stored by the DoneHandler.