Feat: sebulba ff_ippo #1088

Louay-Ben-nessir · 2024-07-10T14:27:02Z

What?

Implemented the ff_ippo system for sebulba

Why?

Integrate Sebulba's architecture due to its effectiveness in scenarios involving non-jitted/non-jax environments.

How?

By implementing the ff_ippo system.

Extra

The other PPO systems will be added once this PR is merged.
This PR was built on top of #1080

…nv) ->(n_env, n_agents)

…pper generic

…nto seb-ff-ippo-only

RuanJohn

Thanks for all the work on this! It is looking very good. This is a first pass with some general comments and questions.
I will review the system run file later today.

mava/configs/arch/sebulba.yaml

mava/evaluator.py

RuanJohn · 2024-11-04T15:20:45Z

mava/evaluator.py

+        def _episode(key: PRNGKey) -> Tuple[PRNGKey, Metrics]:
+            """Simulates `num_envs` episodes."""
+
+            seeds = np_rng.integers(np.iinfo(np.int32).max, size=n_parallel_envs).tolist()


Perhaps a comment about this above? This is not very clear to me.

mava/configs/arch/sebulba.yaml

mava/wrappers/gym.py

RuanJohn · 2024-11-05T08:14:02Z

mava/wrappers/gym.py

+        metrics = {
+            "episode_return": self.running_count_episode_return,
+            "episode_length": self.running_count_episode_length,
+            "is_terminal_step": True,


Should this not be False?

Correct me if I'm wrong, but I think this works like the auto-reset wrapper. When is_terminal_step = True, it means a new episode started on that step, so it includes the first observation of the new episode along with the final metrics from the previous episode.

I think @sash-a is the best person for this question.

Good spot Ruan! We do it slightly differently in the anakin version, but it is essentially the same here.

Here the running counts are only reset after creating the metrics dict, so marking this as a terminal step with the previous steps metrics will work fine. In anakin we always set it to false in reset otherwise we set it to the done flag. @Louay-Ben-nessir can we change it to do that for consistence 🙏 See here for a reference

updated! I'd appreciate another pair of eyes on it 👀

mava/wrappers/gym.py

RuanJohn · 2024-11-05T08:16:54Z

mava/wrappers/gym.py

+                    observation = None
+                pipe.send(((observation, info), True))
+            elif command == "step":
+                # Modified the step function to align with 'AutoResetWrapper'.


Somewhere, it would be nice to document that we don't support termination or truncation then for the Sebulba systems but instead reset episodes on term or trunc.

Maybe we should support it from the start?

added a comment on top of the async worker code for visibility since that's the first place where the mixing happens

sash-a

Had a look at everything except the system file, super happy with this and will have a look at that file on Monday, only a few small changes needed 🙏

mava/wrappers/gym.py

mava/configs/arch/sebulba.yaml

mava/configs/env/smac_gym.yaml

mava/evaluator.py

mava/utils/logger.py

mava/utils/make_env.py

mava/utils/sebulba.py

mava/wrappers/gym.py

sash-a · 2024-11-08T15:19:23Z

mava/wrappers/gym.py

+        metrics = {
+            "episode_return": self.running_count_episode_return,
+            "episode_length": self.running_count_episode_length,
+            "is_terminal_step": np.logical_or(terminated, truncated).all().item(),


I prefer the | feel free to keep logical_or though

Suggested change

"is_terminal_step": np.logical_or(terminated, truncated).all().item(),

"is_terminal_step": (terminated | truncated).all().item(),

I say we keep the logical_or since it works for regular Python lists unlike the | operator

mava/wrappers/gym.py

Louay-Ben-nessir added 30 commits June 10, 2024 11:28

feat: gym wrapper

adc2114

chore : pre-commit hooks

ce86d09

fix: merged the observations and action mask

d5edf45

fix: Create the gym wrappers directly

f891be5

chore: pre-commit

15f4867

fix: fixed the async env creation

82ea827

fix: gymV26 compatability wrapper

4e94df5

fix: various minor fixes

8a86be9

fix: handling rware reset function

1da5c15

feat: async env wrapper , changed the gym wrapper to rware wrapper

4466044

fix: fixed the async env wrapper

24d8aae

fix: info only contains the action_mask and reformated (n_agents, n_e…

a6deae2

…nv) ->(n_env, n_agents)

chore: removed async gym wrapper

1475bd0

feat: gym metric tracker wrapper

9fce9c6

feat: init sebulba ippo

055a326

feat: initial learner / training loop

a435a0a

fix: changes the env creation

7e80d7b

fix: fixed function calls

b961336

fix: fixed the training and added training logger

502730d

fix: changed the anakin ppo type import

1985729

feat: fulll sebulba functional

89ed246

fix: logging and added LBF

7f43a33

fix: batch size calc for multiple devices

8a87258

fix: num_updates and code refactoring

7f0acd9

chore : code cleanup + comments + added checkpoint save

3e352cf

feat: mappo + removed sebulba specifique types and made the rware wra…

bcdaa38

…pper generic

fix: removed the sebulba spesifique types

7044fbe

feat: ff_mappo and rec_ippo in sebulba

9433f2e

fix: removed the lbf import/wrapper

627215d

chore: clean up & updated the code to match the sebulba-ff-ippo branch

c3b405d

sash-a and others added 22 commits October 14, 2024 16:02

chore: remove some more device transfers

133ea1a

chore: better graceful exit

9260e9b

fix: create envs in main thread to avoid deadlocks

d61dcfb

chore: use orginal rware and lbf

105d796

fix: possible off by one fix

f292bf3

fix: change to using gym.make to create envs and fix StepType

d42d732

feat: learner env accumulation

d4359c1

feat: jit evaluation on cpu

7c78478

Merge branch 'seb-ff-ippo-only' of github.com:Louay-Ben-nessir/Mava i…

aa49c6f

…nto seb-ff-ippo-only

fix: timestep calculation with accumulation

c252ffe

feat: shardmap almost working

fd7a025

feat: shard_map working

4013a22

fix: key use in actor loss

0e559d9

fix: align gym config with other configs

0a6bd49

feat: better env creation and safer sharding

641a548

chore: minor env typing fixes

c0c88bc

Merge branch 'develop' into seb-ff-ippo-only

354159a

fix: start actors simultaneously to avoid deadlocks

6b2d01c

feat: support for smac

a13ab65

chore: pre-commits

bc55375

fix: random segfault

c6d460f

fix: give each learner a unique random key

659a837

RuanJohn requested changes Nov 5, 2024

View reviewed changes

Louay-Ben-nessir added 5 commits November 5, 2024 14:53

chore: bunch of minor changes and fixes

7deb75b

chore: removed learner accumulation

c024b71

fix: Metric tracking more aligned with Jumanji

db378b9

fix: removed axis swaping & wrapper rename

3d3cec8

chore: pre-commits

a7665f9

sash-a requested changes Nov 8, 2024

View reviewed changes

chore: bunch of minor changes

0c4e83b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: sebulba ff_ippo #1088

Feat: sebulba ff_ippo #1088

Louay-Ben-nessir commented Jul 10, 2024

RuanJohn left a comment

RuanJohn Nov 4, 2024

RuanJohn Nov 5, 2024

Louay-Ben-nessir Nov 5, 2024

RuanJohn Nov 6, 2024

sash-a Nov 6, 2024

Louay-Ben-nessir Nov 7, 2024 •

edited

Loading

RuanJohn Nov 5, 2024

sash-a Nov 5, 2024

Louay-Ben-nessir Nov 5, 2024

sash-a left a comment

sash-a Nov 8, 2024

Louay-Ben-nessir Nov 8, 2024

	"is_terminal_step": np.logical_or(terminated, truncated).all().item(),
	"is_terminal_step": (terminated \| truncated).all().item(),

Feat: sebulba ff_ippo #1088

Are you sure you want to change the base?

Feat: sebulba ff_ippo #1088

Conversation

Louay-Ben-nessir commented Jul 10, 2024

What?

Why?

How?

Extra

RuanJohn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Louay-Ben-nessir Nov 7, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sash-a left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Louay-Ben-nessir Nov 7, 2024 •

edited

Loading