Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent order of execution and StepCount #6190

Open
kezzyhko opened this issue Feb 12, 2025 · 1 comment
Open

Inconsistent order of execution and StepCount #6190

kezzyhko opened this issue Feb 12, 2025 · 1 comment
Labels
bug Issue describes a potential bug in ml-agents.

Comments

@kezzyhko
Copy link

Describe the bug

Order of execution between FixedUpdate and OnEpisodeBegin is different, depending on how episode ended/started.
In the first episode after running the game, FixedUpdate with StepCount == 0 is called before OnEpisodeBegin, causing incorrect reward and possible errors due to incomplete initialization.
This would be less of an issue if this was consistent with other episodes, but in an episode after MaxStep was reached, it is different and FixedUpdate called after OnEpisodeBegin.
I have not tested what happens with EndEpisode, but this might also be different.

To Reproduce

  1. Open CrawlerAgent script
  2. Add changes to the script (described below)
  3. Set MaxStep to small number, for example 5
  4. Disable all copies of Agent except one
  5. Enable "Pause" and then click "Play"
  6. Click "Step" button a couple of times, until second episode starts
  7. See in logs: the order is not consistent, and OnEpisodeBegin already has reward from FixedUpdate

Changes to the Crawler environment

    void FixedUpdate()
    {
        Debug.Log($"FixedUpdate: step={StepCount}");
        AddReward(1);
    public override void OnEpisodeBegin()
    {
        Debug.Log($"OnEpisodeBegin: step={StepCount}, reward={GetCumulativeReward()}");

Console logs / stack traces / screenshots

I waited a couple of seconds between each "step" click, so that you can see which operations were in one frame.
In first case OnEpisodeBegin called after StepCount=0 (not before!)
In the second case immediately after StepCount=4 (not before StepCount=0)
Image

Environment (please complete the following information):

  • Unity Version: Unity 6000.0.26f1
  • OS + version: Windows 11
  • ML-Agents version: release_22 / 3.0.0
  • Torch version: 2.2.2+cu121
  • Environment: Crawler
@kezzyhko kezzyhko added the bug Issue describes a potential bug in ml-agents. label Feb 12, 2025
@kezzyhko
Copy link
Author

This might seem like a small inconsequential bug, but it makes a big difference in some cases.

Just for understanding, here's my usecase:

  • OnEpisodeBegin randomizes initial position of agent's body
  • FixedUpdate (StepCount=0) remembers initial position
  • FixedUpdate (StepCount=1) gets agent's new position, calculates difference, and rewards agent based on how much it moved towards goal

Now, due to bug OnEpisodeBegin moves agent between steps 0 and 1, and the following happens:

  • FixedUpdate (StepCount=0) remembers agent's position
  • OnEpisodeBegin changes position of agent's body
  • FixedUpdate (StepCount=1) gets agent's new position, calculates difference, and rewards agent based on random change inside OnEpisodeBegin

That causes first reward to be random, and sometimes it's so big that it overwhelms other rewards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue describes a potential bug in ml-agents.
Projects
None yet
Development

No branches or pull requests

1 participant