- what has the model learned? unclear
- graph the training error and validation error? looks bad
- test what the model says to do? seems pretty pessimistic
- problems?
- is the network objective set properly? seems so
- is the training set reasonable? action matches pre_state
- is the validation happening correctly? code review ok
- what happens if we validate on model B? same thing
- are the weights changing? i guess so
- are the q-learning models being swapped? code review ok
- is the q-learning update calculated properly? walked through and checked
- is the learning rate is too high? probably
This returns an OrderedDict; compare magnitude of updates to magnitude of weight matrix values, look for ratio to be ~1:1000s.
- update function arguments (in case we change or slow down learning rate at some point)
- whether the training was resumed on a particular snapshot (to help with interpreting behaviour resulting from start/stop)
This gives an estimate of how reliable the A validation stat is.