-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: sebulba ff_ippo #1088
base: develop
Are you sure you want to change the base?
Feat: sebulba ff_ippo #1088
Conversation
…nv) ->(n_env, n_agents)
…nto seb-ff-ippo-only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this! It is looking very good. This is a first pass with some general comments and questions.
I will review the system run file later today.
def _episode(key: PRNGKey) -> Tuple[PRNGKey, Metrics]: | ||
"""Simulates `num_envs` episodes.""" | ||
|
||
seeds = np_rng.integers(np.iinfo(np.int32).max, size=n_parallel_envs).tolist() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a comment about this above? This is not very clear to me.
mava/wrappers/gym.py
Outdated
metrics = { | ||
"episode_return": self.running_count_episode_return, | ||
"episode_length": self.running_count_episode_length, | ||
"is_terminal_step": True, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not be False?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct me if I'm wrong, but I think this works like the auto-reset wrapper. When is_terminal_step = True
, it means a new episode started on that step, so it includes the first observation of the new episode along with the final metrics from the previous episode.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think @sash-a is the best person for this question.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good spot Ruan! We do it slightly differently in the anakin version, but it is essentially the same here.
Here the running counts are only reset after creating the metrics dict, so marking this as a terminal step with the previous steps metrics will work fine. In anakin we always set it to false in reset otherwise we set it to the done
flag. @Louay-Ben-nessir can we change it to do that for consistence 🙏 See here for a reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated! I'd appreciate another pair of eyes on it 👀
observation = None | ||
pipe.send(((observation, info), True)) | ||
elif command == "step": | ||
# Modified the step function to align with 'AutoResetWrapper'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Somewhere, it would be nice to document that we don't support termination or truncation then for the Sebulba systems but instead reset episodes on term or trunc
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should support it from the start?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added a comment on top of the async worker code for visibility since that's the first place where the mixing happens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a look at everything except the system file, super happy with this and will have a look at that file on Monday, only a few small changes needed 🙏
metrics = { | ||
"episode_return": self.running_count_episode_return, | ||
"episode_length": self.running_count_episode_length, | ||
"is_terminal_step": np.logical_or(terminated, truncated).all().item(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the |
feel free to keep logical_or
though
"is_terminal_step": np.logical_or(terminated, truncated).all().item(), | |
"is_terminal_step": (terminated | truncated).all().item(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I say we keep the logical_or since it works for regular Python lists unlike the | operator
What?
Implemented the ff_ippo system for sebulba
Why?
Integrate Sebulba's architecture due to its effectiveness in scenarios involving non-jitted/non-jax environments.
How?
By implementing the ff_ippo system.
Extra
The other PPO systems will be added once this PR is merged.
This PR was built on top of #1080