As described in #22, there are two planned phases of learning: online and offline. It could be interesting to let the agent decide to enter offline training on its own, much like many animals choose to rest or go to sleep.
In this mode, the environment simulation should continue to run in the background but the forced action would be a no-op. Maybe after a certain consecutive number of no-ops chosen by the agent this could occur. I'm not sure what a "natural" threshold for this could be though.