Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: sebulba ff_ippo #1088

Open
wants to merge 141 commits into
base: develop
Choose a base branch
from

Conversation

Louay-Ben-nessir
Copy link
Contributor

What?

Implemented the ff_ippo system for sebulba

Why?

Integrate Sebulba's architecture due to its effectiveness in scenarios involving non-jitted/non-jax environments.

How?

By implementing the ff_ippo system.

Extra

The other PPO systems will be added once this PR is merged.
This PR was built on top of #1080

Copy link
Collaborator

@RuanJohn RuanJohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the work on this! It is looking very good. This is a first pass with some general comments and questions.
I will review the system run file later today.

mava/configs/arch/sebulba.yaml Outdated Show resolved Hide resolved
mava/evaluator.py Outdated Show resolved Hide resolved
mava/evaluator.py Outdated Show resolved Hide resolved
def _episode(key: PRNGKey) -> Tuple[PRNGKey, Metrics]:
"""Simulates `num_envs` episodes."""

seeds = np_rng.integers(np.iinfo(np.int32).max, size=n_parallel_envs).tolist()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a comment about this above? This is not very clear to me.

mava/configs/arch/sebulba.yaml Show resolved Hide resolved
mava/wrappers/gym.py Outdated Show resolved Hide resolved
mava/wrappers/gym.py Outdated Show resolved Hide resolved
metrics = {
"episode_return": self.running_count_episode_return,
"episode_length": self.running_count_episode_length,
"is_terminal_step": True,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this not be False?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct me if I'm wrong, but I think this works like the auto-reset wrapper. When is_terminal_step = True, it means a new episode started on that step, so it includes the first observation of the new episode along with the final metrics from the previous episode.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @sash-a is the best person for this question.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good spot Ruan! We do it slightly differently in the anakin version, but it is essentially the same here.

Here the running counts are only reset after creating the metrics dict, so marking this as a terminal step with the previous steps metrics will work fine. In anakin we always set it to false in reset otherwise we set it to the done flag. @Louay-Ben-nessir can we change it to do that for consistence 🙏 See here for a reference

Copy link
Contributor Author

@Louay-Ben-nessir Louay-Ben-nessir Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated! I'd appreciate another pair of eyes on it 👀

mava/wrappers/gym.py Outdated Show resolved Hide resolved
observation = None
pipe.send(((observation, info), True))
elif command == "step":
# Modified the step function to align with 'AutoResetWrapper'.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Somewhere, it would be nice to document that we don't support termination or truncation then for the Sebulba systems but instead reset episodes on term or trunc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should support it from the start?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a comment on top of the async worker code for visibility since that's the first place where the mixing happens

Copy link
Contributor

@sash-a sash-a left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look at everything except the system file, super happy with this and will have a look at that file on Monday, only a few small changes needed 🙏

mava/wrappers/gym.py Outdated Show resolved Hide resolved
mava/configs/arch/sebulba.yaml Outdated Show resolved Hide resolved
mava/configs/env/smac_gym.yaml Outdated Show resolved Hide resolved
mava/evaluator.py Outdated Show resolved Hide resolved
mava/utils/logger.py Show resolved Hide resolved
mava/utils/make_env.py Outdated Show resolved Hide resolved
mava/utils/sebulba.py Show resolved Hide resolved
mava/wrappers/gym.py Outdated Show resolved Hide resolved
metrics = {
"episode_return": self.running_count_episode_return,
"episode_length": self.running_count_episode_length,
"is_terminal_step": np.logical_or(terminated, truncated).all().item(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the | feel free to keep logical_or though

Suggested change
"is_terminal_step": np.logical_or(terminated, truncated).all().item(),
"is_terminal_step": (terminated | truncated).all().item(),

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I say we keep the logical_or since it works for regular Python lists unlike the | operator

mava/wrappers/gym.py Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants