Description
Describe the bug
Order of execution between FixedUpdate and OnEpisodeBegin is different, depending on how episode ended/started.
In the first episode after running the game, FixedUpdate with StepCount == 0 is called before OnEpisodeBegin, causing incorrect reward and possible errors due to incomplete initialization.
This would be less of an issue if this was consistent with other episodes, but in an episode after MaxStep was reached, it is different and FixedUpdate called after OnEpisodeBegin.
I have not tested what happens with EndEpisode, but this might also be different.
To Reproduce
- Open CrawlerAgent script
- Add changes to the script (described below)
- Set MaxStep to small number, for example
5
- Disable all copies of Agent except one
- Enable "Pause" and then click "Play"
- Click "Step" button a couple of times, until second episode starts
- See in logs: the order is not consistent, and OnEpisodeBegin already has reward from FixedUpdate
Changes to the Crawler environment
void FixedUpdate()
{
Debug.Log($"FixedUpdate: step={StepCount}");
AddReward(1);
public override void OnEpisodeBegin()
{
Debug.Log($"OnEpisodeBegin: step={StepCount}, reward={GetCumulativeReward()}");
Console logs / stack traces / screenshots
I waited a couple of seconds between each "step" click, so that you can see which operations were in one frame.
In first case OnEpisodeBegin called after StepCount=0 (not before!)
In the second case immediately after StepCount=4 (not before StepCount=0)
Environment (please complete the following information):
- Unity Version: Unity 6000.0.26f1
- OS + version: Windows 11
- ML-Agents version: release_22 / 3.0.0
- Torch version: 2.2.2+cu121
- Environment: Crawler