Skip to content

Conversation

@Pennycook
Copy link
Contributor

This started as an attempt just to clarify work-item scope atomics, but I think the fix has ended up addressing some broader long-standing issues with the memory model.

I think figuring out the steps that a SYCL application needs to take to guarantee "sequential consistency" is the only remaining open in the memory model, and I'm afraid to touch it. 😆

Closes #665.

In OpenCL, these atomics are only required to support a very specific use-case
involving images, and are forbidden in all other contexts.

In SYCL, we would like a work-item to be viewed as a degenerate case of a group
containing a single work-item. Work-item scope atomics should thus be
permitted, and their effect should be equivalent to non-atomic operations.
An implementation of atomic_ref<T> that was not lock-free needs to know which
work-items may access the lock in order to decide where to allocate the lock.

This additionally serves as a clarification of the behavior of work-item scope;
using an atomic_ref with work-item scope at the same time as an atomic_ref
with broader scope is invalid.
Even when using two atomic_ref objects with the same DefaultScope, it's
possible to encounter a data race by overriding the scope parameter of
individual operations.

This is a general clean-up but was motivated by work-item scope atomics:
any potentially concurrent use of work-item scope atomics and atomics with a
different scope results in undefined behavior.
The ISO C++ synchronizes-with relationship does not account for scopes.
The scopes do not need to match exactly, but there are restrictions on which
pairs of scopes are valid.

This is the final part of the clarification for work-item scope atomics;
a work-item scope atomic cannot sychronize with the atomic operations
performed by other work-items, and so their effects are not guaranteed to be
visible to other work-items without some other synchronization taking place.
The previously proposed wording suggested that any difference in scopes would
lead to undefined behavior, which was inconsistent with the paragraph
immediately afterwards about which atomics synchronize-with each other.
@Pennycook Pennycook force-pushed the clarification/work-item-scope-atomics branch from 7f05756 to eb0b73a Compare June 26, 2025 08:26
@TApplencourt
Copy link
Contributor

TApplencourt commented Jun 26, 2025

Potentially concurrent conflicting actions with different memory scopes may lead
to a data race, resulting in undefined behavior.
An atomic operation _A_ with scope _S~1~_ operating on the same memory location
as atomic operation _B_ with scope _S~2~_ is a data race if:

* The work-items which executed _A_ and _B_ are not both in the same group of
 work-items associated with scope _S~1~_; or
* The work-items which executed _A_ and _B_ are not both in the same group of
 work-items associated with scope _S~2~_.

same group of work-items associated

We use the term "set" , should we replace group for set here?

The set of work-items and devices to which the memory ordering constraints of a given atomic operation apply is controlled)

I tried to rewrite this little section, but really not sure if it's better...

Let:

- An atomic operation _A1_ with scope _MS~1~_ operating on Memory _M1_; and MS~1_ is associated with a set of work-item _MS~1_
- An atomic operation _A2_ with scope _MS~2~_ operating on Memory _M2_; and _MS~2_ is associated with a set of work-item _MS~2_

A data-race exist if and only if _MS~1_ == _MS~1_ and _S~1_ != _S~2_

@Pennycook
Copy link
Contributor Author

We use the term "set" , should we replace group for set here?
...
I tried to rewrite this little section, but really not sure if it's better...

I think I'll need to read (and re-read!) things a few times before I can determine which wording I prefer.

But to shed some light on why I chose the wording I did... One thing that is pretty subtle here is that it's not enough to just compare the memory_scope values themselves. As a concrete example of what I mean, this is a data race:

float* ptr = 0x42;

// Work-item 0, in Work-group 0
atomic_ref<float, memory_order::seq_cst, memory_scope::work_group>(*ptr) += 1;

// Work-item 0, in Work-group 1
atomic_ref<float, memory_order::seq_cst, memory_scope::work_group>(*ptr) += 1;

Even though these atomics use the same memory_scope value, the work-items themselves are in different work-groups. This is what I was trying to convey when talking about the "group of work-items associated with the scope".

The OpenCL version of this wording defines the concept of "inclusive scope" to try and explain this (see here) but I personally think their wording is quite unclear.

Copy link
Member

@keryell keryell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@tomdeakin
Copy link
Contributor

WG approved merge as clarification to SYCL 2020

@tomdeakin
Copy link
Contributor

@gmlueck Please can you add to your cherry-pick list. Thanks.

* The work-items which executed _A_ and _B_ are not both in the same group of
work-items associated with scope _S~1~_; or
* The work-items which executed _A_ and _B_ are not both in the same group of
work-items associated with scope _S~2~_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

John and I were having a side discussion about this part of the PR before he left. The question is whether operations A and B need to have the same scope, or whether it is sufficient for the scopes to include both work-items. To illustrate, consider the following example:

// Work-item A
sycl::atomic_ref<int, memory_order::release, memory_scope::work_group> a(mem);
a.store(1);

// Work-item B (in the same work-group a A)
sycl::atomic_ref<int, memory_order::acquire, memory_scope::device> a(mem);
int x = a.load();

Note that the two operations have different scopes, but each scope includes both A and B.

The question is whether SYCL should guarantee that these operations are atomic even though the scopes are different. My first question to John was about Intel hardware. We think that Intel hardware is guaranteed to be atomic in this scenario, so we have no concerns from the standpoint of our own ability to implement the proposed SYCL wording.

However, then we realized that the OpenCL specification seems to not guarantee atomicity in this case. Instead, the OpenCL wording seems to require both operations to have the same scope in order to guarantee atomicity. There is some debate, though, about whether the OpenCL wording should be changed. There is an open internal issue against the OpenCL specification on this point:

https://gitlab.khronos.org/opencl/OpenCL-Docs/-/issues/367

The SYCL WG should consider whether we want to adopt the wording that John proposes in this PR even though it guarantees atomicity in a case that OpenCL does not guarantee. Or, whether we should adopt the same language about atomicity that is currently in the OpenCL spec,

@gmlueck
Copy link
Contributor

gmlueck commented Sep 3, 2025

I realize that the WG approved this already, but I was wondering if we could reconsider. There are two point I would like to raise:

  • The review comment here did not get resolved. In fact, John's last response ends with "... so maybe it's better to take this out". I think this means that he intended to remove that paragraph before merging this PR.

  • John and I were having an internal side discussion about this PR, which never really got resolved before he left. That conversation was initially about Intel hardware (which is why it was internal), but then it branched out into OpenCL semantics. In retrospect, we should have made the conversation public at that point. I tried to capture our discussion in this comment.

@gmlueck gmlueck added the Agenda To be discussed during a SYCL committee meeting label Sep 3, 2025
@tomdeakin
Copy link
Contributor

Further discussion required, and then a re-review.

We decided that this requirement doesn't actually help implementations.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Agenda To be discussed during a SYCL committee meeting clarification Something is unclear memory model

Projects

None yet

Development

Successfully merging this pull request may close these issues.

What does it mean for an atomic_ref to have work_item scope?

5 participants