[Feature Request] Extend TDLambdaEstimator with QLambdaEstimator #2397

roger-creus · 2024-08-13T23:15:24Z

Motivation

Attempting to implement Parallel Q Networks (online DQN without replay buffer or target networks). Uses QLambda returns.

TDLambdaEstimator expects state_value keys but we would now need action_value keys

The text was updated successfully, but these errors were encountered:

roger-creus added the enhancement New feature or request label Aug 13, 2024

roger-creus assigned vmoens Aug 13, 2024