Skip to content

Conversation

@Manuscrit
Copy link
Collaborator

Remove use lock_replay during training (must not use it in LTFT).
Create submodule marltoolbox.utils.log.
Move methods to summarize a model into an helper class.
use before_init_loss instead of after_init (policy class factory arg).

Maxime Riché added 15 commits April 15, 2021 15:28
Remove use lock_replay during training (must not use it in LTFT).
Create submodule marltoolbox.utils.log.
Move methods to summarize a model into an helper class.
use before_init_loss instead of after_init (policy class factory arg).
Fix some tests.
Add augmented R2D2.
Add examples with R2D2.
Add some end to end tests for amTFT vs exploiter, meta game and R2D2.
Fix speed performance issue in entropy computation.
Some refactoring of configs and hyperparameters (DQN, R2D2, LOLA-PG).
Tune HP for R2D2.
Few corrections.
Few style changes.
Partial refactoring of the coin game envs tests.
Add logging & plot of exploration temperature.
…than 2.

Add rolling average for the LOLA-PG reward centering and normalization.
- punishment helped in CGs
- customizable matrix game
- coop coins log in vectorized MCPCG
Add the "punishment helped" option in vectorized_ssd_mm_coin_game.py.
Add new plots by defaults in cross an self play evaluation.
Add script to plot bar chart summary figure.
…ns (instead of 2 or 3 for LOLA-Exact and instead of 2 for SOS-Exact)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants