-
Notifications
You must be signed in to change notification settings - Fork 307
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Inadequate Default Priority Design in PrioritizedSampler #2210
Comments
@vmoens I found that #2215 does not actually work well :O #2215 may erases the previous max_priority when sampler.add() and sampler.extend(), but the max priority within the buffer can decrease with update_priority(). For example, let’s consider a buffer with three samples: index: [0,1,2] So, it seems that the only way is to compute max_priority on the fly, using something like the negative min tree approach I proposed in #2205. For time consumption concerns, the tree structure can perform queries and inserts in O(log N), like the min_tree already did. |
Hmm in this case I think we can just add a check in |
I think we cannot determine what the new max priority will be after erasing the old max priority (we need to find the second largest priority in the buffer). |
Oh yeah but not every time the buffer is written! |
wow this seems like a feasible solution, I agree |
This issue comes from the original issue #2205.
Current Implementation and Issues
The current implementation maintains
_max_priority
, which represents the maximum priority of all samples historically, not just the current buffer. Early in RL training, outliers can cause_max_priority
to remain high, making it unrepresentative. Additionally,_max_priority
is initialized to 1, while most RL algorithms use Bellman error as priority, which can often be much smaller (close to 0). Consequently,_max_priority
may never be updated. New samples are thus given a priority of 1, which essentially means their PER weight is close to 0. This means they are sampled immediately but contribute little to the weighted loss, reducing sample efficiency.Proposed Solution
Maintain a
_neg_min_tree = MinSegmentTree()
to track the maximum priority in the current buffer. With this, and addself._upper_priority = 1
, part ofPrioritizedSampler
methods can be updated as follows:This change implies that the
default_priority
function will need to takestorage
as an additional parameter, and eventually affecting several methods likeSampler(ABC).extend()
,Sampler(ABC).add()
, andSampler(ABC).mark_update()
, but I believe this is reasonable, akin to howSampler.sample()
already takes storage as a parameter.Additional Context
See discussion in the original issue #2205.
Checklist
The text was updated successfully, but these errors were encountered: