Skip to content

Latest commit

 

History

History
36 lines (35 loc) · 1.94 KB

notes.org

File metadata and controls

36 lines (35 loc) · 1.94 KB

Dev Notes Cribbage

Discard Models

Early dqlearner models
  • what has the model learned? unclear
  • graph the training error and validation error? looks bad
  • test what the model says to do? seems pretty pessimistic
  • problems?
    • is the network objective set properly? seems so
    • is the training set reasonable? action matches pre_state
    • is the validation happening correctly? code review ok
    • what happens if we validate on model B? same thing
    • are the weights changing? i guess so
    • are the q-learning models being swapped? code review ok
    • is the q-learning update calculated properly? walked through and checked
    • is the learning rate is too high? probably
profile one loop of q-learning: where are we spending time?
try normalising inputs and outputs
rmsprop (learning_rate=0.001 or 0.01) or NAG, no dropout, switch back to relus
can sample updates() function (return from get_theano_functions)

This returns an OrderedDict; compare magnitude of updates to magnitude of weight matrix values, look for ratio to be ~1:1000s.

test case that post_state 5 cards correctly follows prestate 6 cards
fix time monitoring code in netbuilder.build
move RL state into DQLearner class, to allow RL to be run progressively
allow arbitrary logging to snapshots (e.g., epsilon)
  • update function arguments (in case we change or slow down learning rate at some point)
  • whether the training was resumed on a particular snapshot (to help with interpreting behaviour resulting from start/stop)
try NAG
prioritised Q updates
validation: average Q values for a fixed set of states
validation: check both A and B models

This gives an estimate of how reliable the A validation stat is.