Skip to content

[FEA]: Add support for C++26's atomic::store_add #7003

@cliffburdick

Description

@cliffburdick

Is this a duplicate?

Area

libcu++

Is your feature request related to a problem? Please describe.

In investigating an issue with CUDA's atomicAdd we noticed that the instruction emitted for an atomicAdd with an unused return type was suboptimal when the kernel or device function uses no fence.acquire anywhere in the kernel, even if it's unrelated to the atomicAdd. In some cases atomicAdd is needed with the ATOM instruction, but when the atomic is purely used for something like statistics keeping a RED instruction is fine.

We tried to work around this by using atomic_ref, but it does not emit RED still:

// atomicAdd(&stats[0], local_stats[0]);
cuda::atomic_ref<idx_type, cuda::thread_scope_thread> s0(stats[0]);
s0.fetch_add(local_stats[0], cuda::std::memory_order_relaxed);

We needed a way, short of writing PTX, to emit a RED, and @gonzalobg mentioned that the new atomic::store_add in C++26 should emit that instruction in this case.

Describe the solution you'd like

Support for atomic::store_* in CCCL.

Describe alternatives you've considered

Writing PTX

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions