Skip to content

Conversation

ZENOTME
Copy link

@ZENOTME ZENOTME commented Sep 17, 2025

solve #3645

Changes

I noticed that original circular buffer using by batch span processor use lock free MPSC queu based on CAS. When experiencing intense multithreading contention, Compare-And-Swap (CAS) exhibits poorer scalability compared to Fetch-And-Add (FAA). In reference of https://github.com/dbittman/waitfree-mpsc-queue, this pr attempt to implement a wait-free MPSC (Multiple Producer, Single Consumer) queue using FAA. Base on original benchmark, it indicate that this approach demonstrates better performance scalability.

Run on (48 X 2593.99 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x24)
  L1 Instruction 32 KiB (x24)
  L2 Unified 256 KiB (x24)
  L3 Unified 30720 KiB (x2)
Load Average: 7.85, 5.70, 4.48
----------------------------------------------------------------
Benchmark                      Time             CPU   Iterations
----------------------------------------------------------------
BM_BaselineBuffer/1     10178537 ns        51528 ns         1000
BM_BaselineBuffer/2      7408646 ns        69828 ns         1000
BM_BaselineBuffer/4      7684772 ns       127549 ns         1000
BM_BaselineBuffer/8      7222459 ns       278660 ns         1000
BM_BaselineBuffer/16     6716972 ns       603712 ns         1215
BM_LockFreeBuffer/1      3915343 ns        53125 ns         1000
BM_LockFreeBuffer/2      4798406 ns        70581 ns         1000
BM_LockFreeBuffer/4      4562709 ns       128493 ns         1000
BM_LockFreeBuffer/8      4935221 ns       289996 ns         1000
BM_LockFreeBuffer/16     5187913 ns       618856 ns         1081
BM_OptimizedBuffer/1     4256507 ns        49970 ns         1000
BM_OptimizedBuffer/2     3398719 ns        67712 ns         1000
BM_OptimizedBuffer/4     3204749 ns       127378 ns         1000
BM_OptimizedBuffer/8     3230722 ns       296507 ns         1000
BM_OptimizedBuffer/16    3859005 ns       769220 ns         1000

I would like to refine this PR if it's acceptable.

Copy link

linux-foundation-easycla bot commented Sep 17, 2025

CLA Signed

  • ✅login: ZENOTME / name: zenotme / (1836bbf)

The committers listed above are authorized under a signed CLA.

@marcalff marcalff added the pr:waiting-on-cla Waiting on CLA label Sep 17, 2025
Copy link

codecov bot commented Sep 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 90.08%. Comparing base (aeb5a01) to head (1836bbf).
⚠️ Report is 50 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3644      +/-   ##
==========================================
+ Coverage   90.03%   90.08%   +0.06%     
==========================================
  Files         220      220              
  Lines        7065     7111      +46     
==========================================
+ Hits         6360     6405      +45     
- Misses        705      706       +1     
Files with missing lines Coverage Δ
...include/opentelemetry/sdk/common/circular_buffer.h 98.15% <ø> (-1.85%) ⬇️
...e/opentelemetry/sdk/common/circular_buffer_range.h 100.00% <100.00%> (ø)

... and 21 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.


CircularBufferRange<AtomicUniquePtr<T>> PeekImpl() noexcept
{
uint64_t current_count = count_.load(std::memory_order_relaxed);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only sync operation is processed here. Is it possible return the same CircularBufferRange for more than one thread, and cause data race?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry I can't get the point here, could you elaborate a more specific scene.

Copy link
Member

@owent owent Sep 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When two threads call this function and finish the callings before any change, they may returns the same values. And callback(range); in Consume will be called with the same range.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this circular buffer implementation is mpsc and assuming only one consumer thread calling this method. That's how it used in batch span processor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants