-
Notifications
You must be signed in to change notification settings - Fork 318
Open
Labels
feature requestNew feature or request.New feature or request.
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
libcu++
Is your feature request related to a problem? Please describe.
In investigating an issue with CUDA's atomicAdd we noticed that the instruction emitted for an atomicAdd with an unused return type was suboptimal when the kernel or device function uses no fence.acquire anywhere in the kernel, even if it's unrelated to the atomicAdd. In some cases atomicAdd is needed with the ATOM instruction, but when the atomic is purely used for something like statistics keeping a RED instruction is fine.
We tried to work around this by using atomic_ref, but it does not emit RED still:
// atomicAdd(&stats[0], local_stats[0]);
cuda::atomic_ref<idx_type, cuda::thread_scope_thread> s0(stats[0]);
s0.fetch_add(local_stats[0], cuda::std::memory_order_relaxed);
We needed a way, short of writing PTX, to emit a RED, and @gonzalobg mentioned that the new atomic::store_add in C++26 should emit that instruction in this case.
Describe the solution you'd like
Support for atomic::store_* in CCCL.
Describe alternatives you've considered
Writing PTX
Additional context
No response
Metadata
Metadata
Assignees
Labels
feature requestNew feature or request.New feature or request.
Type
Projects
Status
Todo