-
Notifications
You must be signed in to change notification settings - Fork 73
Clarify work-item scope atomics (and memory model in general) #849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Clarify work-item scope atomics (and memory model in general) #849
Conversation
In OpenCL, these atomics are only required to support a very specific use-case involving images, and are forbidden in all other contexts. In SYCL, we would like a work-item to be viewed as a degenerate case of a group containing a single work-item. Work-item scope atomics should thus be permitted, and their effect should be equivalent to non-atomic operations.
An implementation of atomic_ref<T> that was not lock-free needs to know which work-items may access the lock in order to decide where to allocate the lock. This additionally serves as a clarification of the behavior of work-item scope; using an atomic_ref with work-item scope at the same time as an atomic_ref with broader scope is invalid.
Even when using two atomic_ref objects with the same DefaultScope, it's possible to encounter a data race by overriding the scope parameter of individual operations. This is a general clean-up but was motivated by work-item scope atomics: any potentially concurrent use of work-item scope atomics and atomics with a different scope results in undefined behavior.
The ISO C++ synchronizes-with relationship does not account for scopes. The scopes do not need to match exactly, but there are restrictions on which pairs of scopes are valid. This is the final part of the clarification for work-item scope atomics; a work-item scope atomic cannot sychronize with the atomic operations performed by other work-items, and so their effects are not guaranteed to be visible to other work-items without some other synchronization taking place.
ea207e8 to
7f05756
Compare
The previously proposed wording suggested that any difference in scopes would lead to undefined behavior, which was inconsistent with the paragraph immediately afterwards about which atomics synchronize-with each other.
7f05756 to
eb0b73a
Compare
We use the term "set" , should we replace
I tried to rewrite this little section, but really not sure if it's better... |
I think I'll need to read (and re-read!) things a few times before I can determine which wording I prefer. But to shed some light on why I chose the wording I did... One thing that is pretty subtle here is that it's not enough to just compare the float* ptr = 0x42;
// Work-item 0, in Work-group 0
atomic_ref<float, memory_order::seq_cst, memory_scope::work_group>(*ptr) += 1;
// Work-item 0, in Work-group 1
atomic_ref<float, memory_order::seq_cst, memory_scope::work_group>(*ptr) += 1;Even though these atomics use the same The OpenCL version of this wording defines the concept of "inclusive scope" to try and explain this (see here) but I personally think their wording is quite unclear. |
keryell
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
|
WG approved merge as clarification to SYCL 2020 |
|
@gmlueck Please can you add to your cherry-pick list. Thanks. |
| * The work-items which executed _A_ and _B_ are not both in the same group of | ||
| work-items associated with scope _S~1~_; or | ||
| * The work-items which executed _A_ and _B_ are not both in the same group of | ||
| work-items associated with scope _S~2~_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
John and I were having a side discussion about this part of the PR before he left. The question is whether operations A and B need to have the same scope, or whether it is sufficient for the scopes to include both work-items. To illustrate, consider the following example:
// Work-item A
sycl::atomic_ref<int, memory_order::release, memory_scope::work_group> a(mem);
a.store(1);
// Work-item B (in the same work-group a A)
sycl::atomic_ref<int, memory_order::acquire, memory_scope::device> a(mem);
int x = a.load();
Note that the two operations have different scopes, but each scope includes both A and B.
The question is whether SYCL should guarantee that these operations are atomic even though the scopes are different. My first question to John was about Intel hardware. We think that Intel hardware is guaranteed to be atomic in this scenario, so we have no concerns from the standpoint of our own ability to implement the proposed SYCL wording.
However, then we realized that the OpenCL specification seems to not guarantee atomicity in this case. Instead, the OpenCL wording seems to require both operations to have the same scope in order to guarantee atomicity. There is some debate, though, about whether the OpenCL wording should be changed. There is an open internal issue against the OpenCL specification on this point:
https://gitlab.khronos.org/opencl/OpenCL-Docs/-/issues/367
The SYCL WG should consider whether we want to adopt the wording that John proposes in this PR even though it guarantees atomicity in a case that OpenCL does not guarantee. Or, whether we should adopt the same language about atomicity that is currently in the OpenCL spec,
|
I realize that the WG approved this already, but I was wondering if we could reconsider. There are two point I would like to raise:
|
|
Further discussion required, and then a re-review. |
We decided that this requirement doesn't actually help implementations.
This started as an attempt just to clarify work-item scope atomics, but I think the fix has ended up addressing some broader long-standing issues with the memory model.
I think figuring out the steps that a SYCL application needs to take to guarantee "sequential consistency" is the only remaining open in the memory model, and I'm afraid to touch it. 😆
Closes #665.