-
Notifications
You must be signed in to change notification settings - Fork 500
New circular buffer using faa instead of cas to optimize performance #3644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
The committers listed above are authorized under a signed CLA. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #3644 +/- ##
==========================================
+ Coverage 90.03% 90.08% +0.06%
==========================================
Files 220 220
Lines 7065 7111 +46
==========================================
+ Hits 6360 6405 +45
- Misses 705 706 +1
🚀 New features to boost your workflow:
|
|
||
CircularBufferRange<AtomicUniquePtr<T>> PeekImpl() noexcept | ||
{ | ||
uint64_t current_count = count_.load(std::memory_order_relaxed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only sync operation is processed here. Is it possible return the same CircularBufferRange for more than one thread, and cause data race?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sorry I can't get the point here, could you elaborate a more specific scene.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When two threads call this function and finish the callings before any change, they may returns the same values. And callback(range);
in Consume
will be called with the same range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this circular buffer implementation is mpsc and assuming only one consumer thread calling this method. That's how it used in batch span processor.
solve #3645
Changes
I noticed that original circular buffer using by batch span processor use lock free MPSC queu based on CAS. When experiencing intense multithreading contention, Compare-And-Swap (CAS) exhibits poorer scalability compared to Fetch-And-Add (FAA). In reference of https://github.com/dbittman/waitfree-mpsc-queue, this pr attempt to implement a wait-free MPSC (Multiple Producer, Single Consumer) queue using FAA. Base on original benchmark, it indicate that this approach demonstrates better performance scalability.
I would like to refine this PR if it's acceptable.