[SYCL][Doc] Add spec to reuse an event #20309

gmlueck · 2025-10-07T21:43:52Z

Add a proposed extension specification which allows an application to reuse the same event object in multiple command submissions, rather than creating a new event for each submission.

gmlueck · 2025-10-07T21:48:27Z

This PR proposes a new extension that relies on changes in #20308. However, I think the two PRs can be approved and merged separately. Note that this PR (#20309) only proposes a new extension, so it's OK if this proposed extension is merged even before #20308 is implemented. It is only necessary that #20308 be implemented by the time the extension proposed in #20309 is implemented.

sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc

CaoZhongZ · 2025-10-20T03:12:59Z

Just catching up with the discussion, reusable event might need counter-based event. Or at least very straightforward with counter-based event. So, when the backend feature is available we should use it.

Revamp the proposed specification to provide convenience APIs that are similar to CUDA's `cudaEventRecord` and `cudaStreamWaitEvent` because this is the immediate request from our customer. I think we do still want to add a `record_event` property, but I think we could add that separately as part of the KHR being proposed in KhronosGroup/SYCL-Docs#922, or as a separate oneapi extension based on that KHR.

gmlueck · 2025-10-29T17:09:53Z

Just catching up with the discussion, reusable event might need counter-based event. Or at least very straightforward with counter-based event. So, when the backend feature is available we should use it.

Thanks, @CaoZhongZ. I agree that counter-based events will make this easier. I updated the proposed spec, and I included a description about how it can work in Level Zero in the "Implementation notes" section. DPC++ already uses counter-based events for in-order queues.

sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc

pbalcer · 2025-10-31T08:27:57Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc

+However, standard events to not directly map because a standard Level Zero event
+_E_ cannot be passed to `zeCommandListAppendSignalEvent` until all previous
+"signal" operations on _E_ have completed and all previous commands using _E_
+as a "wait event" (i.e. dependency) have completed.
+To compensate for this, the runtime can track the lifetime of the backend event.
+If the backend event associated with `enqueue_event_signal` is still in use,
+the runtime can disassociate that backend event from the SYCL event, and
+allocate a new backend event.


If the make_event function doesn't accept a queue, the runtime won't be able to chose what type of event to allocate, since that is tied directly to the type of the command list the event will be used with.

The trade-off here is that counter-based events have less overhead and better semantics, but potentially might not be supported by all types of queues. L0v1 doesn't support these types of events at all. On the other hand, in L0v2, we are exclusively using counter-based events. For now at least. If we go with the proposed specification as is, we'd be locked in into that choice. Which is fine, but we'd be giving up e.g., the ability to use native out of order command lists to implement SYCL out of order queues in the new adapter. Personally I think that's an OK trade-off to make.

So my suggestion is that we support this extension only on L0v2, using counter-based events. The runtime tracking and transparently allocating new standard L0 events in the background to support L0v1 seems very complex and potentially error-prone. Not sure if it's worth the effort.

My answer here addresses both your comment above and also the one asking what the "primary benefit" of this extension is.

I think there are two goals of this extension. One goal is to make the SYCL API easier for people migrating code from CUDA. I also thought that restructuring the API in this way would help with performance.

I also want to have a consistent style in SYCL for events. We are creating a new "KHR" extension that changes the way applications submit commands to kernels. This is also an opportunity for us to change the way events work. That new KHR can either return an event when a command is submitted (similar to what SYCL does now), or it can take a "signaling" event as input (similar to the way CUDA and Level Zero work). However, I want to choose one style and use it consistently for all types of queues and for all backends. I don't want to use one style for in-order queues and a different style for out-of-order queues, for example. We should view this PR as related to the KHR. If this PR takes a "signaling" event as input, I'd like to change the KHR in the same way.

@gujinghui is the motivating customer for this extension, but I think his situation could be common for other people migrating CUDA code to SYCL. @gujinghui is porting PyTorch to SYCL. Unsurprisingly, PyTorch code has an internal structure that is similar to the CUDA APIs. For example, I think they have internal APIs similar to cudaEventCreate, cudaEventRecord, and cudaStreamWaitEvent that need to be implemented using SYCL. I think this is possible even if SYCL returns an event when a command is submitted. Here is what I think the code would look like in this case:

// Hypothetical "event" implementation inside of PyTorch. struct Event_s { sycl::event e; }; using Event = Event_s*; void EventCreate(Event *evt) { // Creating a PyTorch event is mostly a nop in this scenario. We just create an empty wrapper around // a dummy "sycl::event" object. *evt = new Event_s; } void EventRecord(Event evt, sycl::queue q) { // Each time we record an event, we really create a new "sycl::event" and overwrite the previous one. evt->e = syclex::enqueue_event_signal(q); } void StreamWaitEvent(sycl::queue q, Event evt) { syclex::enqueue_event_wait(q, {evt->e}); }

My performance concern here is that each call to EventRecord will drop the shared_ptr reference to the event_impl and then create a new event_impl for the new event. This seems like a lot of recycling both in the SYCL runtime itself and also lower down in the stack because the native events also have to be "recycled".

By contrast, if the SYCL APIs take a signaling event as input, the PyTorch code would look like this:

struct Event_s { Event_s(const sycl::event &e) : e{e} {} sycl::event e; }; using Event = Event_s*; void EventCreate(Event *evt) { *evt = new Event_s{syclex::make_event()}; } void EventRecord(Event evt, sycl::queue q) { syclex::enqueue_event_signal(q, evt->e); } void StreamWaitEvent(sycl::queue q, Event evt) { syclex::enqueue_event_wait(q, {evt->e}); }

So my suggestion is that we support this extension only on L0v2, using counter-based events. The runtime tracking and transparently allocating new standard L0 events in the background to support L0v1 seems very complex and potentially error-prone. Not sure if it's worth the effort.

I think the L0v2 is used only for in-order queues, right? As I said, I want the SYCL API to be the same for both in-order and out-of-order queues. However, that doesn't necessarily mean that the L0v1 adaptor needs to change. We could decide instead to put this code in the SYCL runtime. That might make sense since (I think) we will need similar code to handle OpenCL. The OpenCL API is similar to the current SYCL API where the call to submit a command returns an event. Therefore, I expect there will be some code that overwrites the native event object each time the sycl::event is "recorded". Perhaps this code could be shared between the OpenCL and L0v1 adaptors. I suspect there will also be similar cases in the SYCL runtime when commands depend on a host task, so the same code might be shared there too.

As far as performance goes, I'm OK if out-of-order queues go through a slow path. Our main performance concern is making in-order queues fast on Level Zero, so we should optimize for that.

If the make_event function doesn't accept a queue, the runtime won't be able to chose what type of event to allocate, since that is tied directly to the type of the command list the event will be used with.

Oh, I forgot to address this part of the comment. This is a good observation. I can think of two possible solutions:

We could change make_event to take an additional property which tells whether the event will be used with in-order queues vs. out-of-order queues. It would then be an error to use the event as the "signaling" event for the wrong type of queue. Would that be enough to choose the right type of underlying L0 event? Or,

We could change make_event to take a queue parameter. In that case, it would be an error to use the event as the "signaling" event for any other queue. @gujinghui, would that be too restrictive for you?

Note that in both cases, we would have to support any type of event in enqueue_event_wait. For example, we need to support an in-order queue waiting on an out-of-order event (or even on an event from a different backend).

I think the L0v2 is used only for in-order queues, right?

No, L0v2 out of order queues are implemented as a collection of in-order command lists that are used in a round-robin fashion, specifically so that we can use counter-based events. We can easily support the proposed extension on any of platforms where L0v2 is the default. But there's a small performance gap between our current v2 out of order queue implementation, and what could be achieved if we used native L0 out of order command lists. So far, all of this was an implementation detail of the adapter, that we can easily change. With this extensions, we would be precluding the implementation from ever (efficiently) using native out of order command lists.
Like I said before, that is an OK trade-off in my view.

The added benefit of supporting this extension only in L0v2 (only with counter based event) is that we wouldn't have to worry about standard events. This makes the question with queues moot.

We could decide instead to put this code in the SYCL runtime.

That might be best long term, in the context of liboffload and upstreaming.

The added benefit of supporting this extension only in L0v2 (only with counter based event) ...

What prevents us from dropping the L0v1 adaptor entirely? Do we still need it for use with the LTS driver, for example? Remember that the KHR will follow the same pattern as this PR. I think it is not a good option to say that the KHR will not work with the LTS driver.

If the make_event function doesn't accept a queue, the runtime won't be able to chose what type of event to allocate, since that is tied directly to the type of the command list the event will be used with.

Oh, I forgot to address this part of the comment. This is a good observation. I can think of two possible solutions:

We could change make_event to take an additional property which tells whether the event will be used with in-order queues vs. out-of-order queues. It would then be an error to use the event as the "signaling" event for the wrong type of queue. Would that be enough to choose the right type of underlying L0 event? Or,

We could change make_event to take a queue parameter. In that case, it would be an error to use the event as the "signaling" event for any other queue. @gujinghui, would that be too restrictive for you?

Note that in both cases, we would have to support any type of event in enqueue_event_wait. For example, we need to support an in-order queue waiting on an out-of-order event (or even on an event from a different backend).

We should not associate the event with certain queue. It's too restrictive...
If needed, it's OK to add an additional property in this API.

I think we may not need to change the interface to make_event. @sergey-semenov what do you think about the implementation proposal in 3af5004?

sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc

pbalcer · 2025-10-31T11:38:01Z

If the primary benefit of this extension is to improve performance, I suggest first creating a simple prototype and a microbenchmark that would show what sort of benefit we can expect.

This is a non-trivial change to the L0 adapter, and adds additional restrictions on the implementation. Right now I'm skeptical the benefit will be meaningful.

slawekptak · 2025-11-05T14:36:45Z

sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc

+----
+namespace sycl::ext::oneapi::experimental {
+
+void enqueue_event_wait(queue q, const std::vector<event>& evts);


Is my understanding correct, that this is a counterpart of the existing event barrier, but not returning an event? Would it make sense to add an optional event& evt argument here, similar to enqueue_event_signal, to provide the same functionality as we have today for this type of barrier, but with a reusable event?

I've been sketching out how reusable events can be added to the proposed KHR sycl_khr_free_function_commands. Align the names here with the ones I plan to propose in the KHR.

gmlueck · 2025-11-17T15:04:50Z

I'd like to drive this proposal to some sort of conclusion. I think we need the following:

Someone from @intel/dpcpp-specification-reviewers needs to review and either approve or make comments on what needs to change, and
@sergey-semenov: Please comment here on whether you think the proposed implementation in 3af5004 is reasonable.

steffenlarsen

LGTM!

gmlueck · 2025-12-03T20:16:55Z

@intel/llvm-gatekeepers I think this is ready to merge.

If we identify any other issues during implementation, we will still have time to consider changes to the API. This PR only makes the extension "proposed".

[SYCL][Doc] Add spec to record an event

66c8ace

Add a proposed extension specification which allows an application to reuse the same event object in multiple command submissions, rather than creating a new event for each submission.

gmlueck requested a review from a team as a code owner October 7, 2025 21:43

gmlueck commented Oct 10, 2025

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc Outdated Show resolved Hide resolved

sergey-semenov reviewed Oct 17, 2025

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc Outdated Show resolved Hide resolved

sycl/doc/extensions/proposed/sycl_ext_oneapi_record_event.asciidoc Outdated Show resolved Hide resolved

gmlueck changed the title ~~[SYCL][Doc] Add spec to record an event~~ [SYCL][Doc] Add spec to reuse an event Oct 29, 2025

gujinghui reviewed Oct 31, 2025

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc Show resolved Hide resolved

gujinghui reviewed Oct 31, 2025

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc Show resolved Hide resolved

pbalcer reviewed Oct 31, 2025

View reviewed changes

MichalMrozek reviewed Oct 31, 2025

View reviewed changes

sycl/doc/extensions/proposed/sycl_ext_oneapi_reusable_events.asciidoc Show resolved Hide resolved

Clarify overview

b6df113

slawekptak reviewed Nov 5, 2025

View reviewed changes

gmlueck added 2 commits November 12, 2025 15:48

Change names to align with KHR proposal

a923a12

I've been sketching out how reusable events can be added to the proposed KHR sycl_khr_free_function_commands. Align the names here with the ones I plan to propose in the KHR.

Implementation note on counter vs. standard events

3af5004

steffenlarsen approved these changes Dec 2, 2025

View reviewed changes

againull merged commit abde64e into intel:sycl Dec 9, 2025
3 checks passed

[SYCL][Doc] Add spec to reuse an event #20309

[SYCL][Doc] Add spec to reuse an event #20309

Uh oh!

Conversation

gmlueck commented Oct 7, 2025

Uh oh!

gmlueck commented Oct 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CaoZhongZ commented Oct 20, 2025

Uh oh!

gmlueck commented Oct 29, 2025

Uh oh!

Uh oh!

Uh oh!

pbalcer Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gmlueck Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

gmlueck Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

pbalcer Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pbalcer Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

gmlueck Oct 31, 2025

Choose a reason for hiding this comment

Uh oh!

gujinghui Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

gmlueck Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pbalcer commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

slawekptak Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

gmlueck commented Nov 17, 2025

Uh oh!

steffenlarsen left a comment

Choose a reason for hiding this comment

Uh oh!

gmlueck commented Dec 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

pbalcer Oct 31, 2025 •

edited

Loading

pbalcer Oct 31, 2025 •

edited

Loading

pbalcer commented Oct 31, 2025 •

edited

Loading