-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Model-free (habitual) and model-based (goal-directed) control algorithms will most likely have a different interface for the agent (internally). If both algorithms propose an action to take, how do we know which one to trust?
Confidence is the key: habitual control algorithms can be trusted in high-confidence subsections of the environment. For example, if the agent has mined 1,000 blocks of iron ore, it probably does not need planning or a model of the environment to mine the 1,001st block. However, if the agent is exploring a never before visited biome or underground mine, then goal-directed control algorithms would be more useful. The lack of experience in this new portion of the environment requires the agent to depend on the dynamics of the environment it has learned thus far.
The Intrinsic Curiosity Module (ICM) already introduces models that predict the dynamics of the environment. How can we extend the idea of this model-based approach to introduce planning to the agent?