-
Notifications
You must be signed in to change notification settings - Fork 2.2k
perf: improve extend_sorted_vec & batch write hashed_state #19990
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e21cd01 to
4d2e4d7
Compare
4d2e4d7 to
3b3bd96
Compare
f6cfcb2 to
029724d
Compare
mattsse
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pedantic doc nit
c1ec07c to
40f7bef
Compare
40f7bef to
ec2786b
Compare
| target.sort_unstable_by(|a, b| a.0.cmp(&b.0)); | ||
| } | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous implementation was specifically designed to avoid having to do a big collect like this; the resulting memory allocation from this collect dwarfs any ostensible speedup you get from not having to sort. I just did a bench comparing your implementation to the previous and this new one is about 2x slower for synthetic datasets:
You can see the bench here if you're curious
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very useful bench @mediocregopher
when I first use old version with aggregated hashed state to bench, the result is not good. This is why I think there something wrong with extend_ref or extend_sorted_vec
when I use your bench compare with custom own merge version (not used merge_join_by), it only shine at other target size smaller than other size, but overall case, old version still win. That give me some hint to use this function better, is keep target size and other size similar or larger so might benefit old version
but overall (size similar or target size > other) old still better

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dump the raw data of HashedPostStateSorted when bench with native-transfer
here is bench result of extend_ref, new version of both is better than in this testcase
can double check the bench here - already attach hashed state raw data
Could be raw data might have properties that benchmark doesn’t fully cover 🤔 ?

As discussed #19739 (comment)
Batch write of
hashed_stateis safe, so I created this PR to cherry-pick old reverted commitChanges
extend_sorted_vecuse merge (avoid sort at the end) and batch write hashed_statebefore
erc20 transfers spam: ~100ms
native transfers spam: ~50ms
after
erc20 transfers spam: ~80ms
native transfers spam: ~30ms