Conversation
jcoreyes
commented
Apr 29, 2016
- Keep replay memory (screens, pre and post states) in gpu memory
- Use transpose kernel to switch to chwn format
- Training steps per second on breakout rom at 455 up from 260
- Keep replay memory (screens, pre and post states) in gpu memory - Use transpose kernel to switch to chwn format - Training steps per second on breakout rom at 455 up from 260
|
Thanks for a nice pull request, together those changes result in almost 2x improvement! But I would like to keep the code runnable on lesser GPUs as well, therefore I would like to have two ReplayMemory implementations that you can choose using command line switch. Also I would like to keep main code independent of Neon, therefore we need to figure out how to share backend between ReplayMemory and DeepQNetwork without instantiating it in main. Or can we just use two separate backends? Also I understood, that current version is achieving 38% GPU utilization on Titan X. I wonder what could be done to achieve 100%? Some ideas:
|
|
Does this fork really keep replay memory in GPU? I tried the latest version, but my GPU usage: $ nvidia-smi +-----------------------------------------------------------------------------+ And main memory: PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6.766g is about the size that 1M replay memory in main memory. |