Skip to content

Conversation

@slawekptak
Copy link

@slawekptak slawekptak commented Oct 10, 2025

This is a new, follow-up PR to #644, originally created by John Pennycook. All the future work related to that PR will be continued here. The reason for creating a new PR is that the PR ownership transfer is required.

This extension provides an alternative mechanism for submitting commands to a device via free-functions that require developers to opt-in to the creation of event objects.

It also proposes alternative names for several commands (e.g., launch) and simplifies some concepts (e.g., by removing the need for the nd_range class).

Pennycook and others added 30 commits October 9, 2025 07:59
This extension provides an alternative mechanism for submitting commands to a
device via free-functions that require developers to opt-in to the creation of
event objects.

It also proposes alternative names for several commands (e.g., launch) and
simplifies some concepts (e.g., by removing the need for the nd_range class).
Previous "0 or more" wording only made sense when reductions could be
optionally provided to functions like parallel_for; now that there are
dedicated *_reduce functions, at least one reduction is required.
"is" is more consistent with ISO C++ wording.
There is no need to constrain T here because T must be device-copyable in order
to construct the accessor passed as an argument.
Renaming sycl::nd_item is not a necessary part of the API redesign for
submitting work, so it should be moved to its own extension.

This will also give us more time to consider the design and naming of any
proposed replacement(s), including how they should interact with new
functionality proposed in other KHRs.
There are currently no backends that define interop for reductions,
so we can remove these functions for now. If we decide later that
these functions are necessary, we can release a revision of the KHR.
Co-authored-by: Andrey Alekseenko <[email protected]>
Pennycook and others added 19 commits October 10, 2025 09:01
Changes to names and cv-qualifiers resulted in inconsistent spacing.
This restriction potentially improves performance by giving implementations the
freedom to submit work immediately where possible.
khr_free_function_commands renames several of the old enqueue APIs.
The comments added in this commit are intended to help reviewers,
and will not be visible in the specification.
This reverts commit caf3b0a.
After discussion, the SYCL WG decided that this was too error-prone.
A more detailed investigation of performance overheads in SYCL implementations
has uncovered that the cost associated with using a handler is similar to the
cost associated with returning a sycl::event.

This commit removes all the handler overloads from the KHR, as a first step
towards introducing an alternative design that does not depend on handler.
An instance of the requirements class represents all of the scheduling
requirements that must be satisfied when submitting a command. It acts
as a replacement for handler that delivers two main improvements:

1) All requirements are captured at once (by the requirements constructor),
   allowing the presence or absence of specific requirements to be detected
   at compile-time.

2) All requirements are passed as an argument to the command function,
   allowing the command function to enqueue work immediately.

There are some existing APIs that were dependent on handler that do not yet
have a requirements-based equivalent, including:

- local memory
- specialization constants
- kernel bundles
Adding a kernel_bundle<executable> as a requirement should have the same effect
as calling handler::use_kernel_bundle.

handler::use_kernel_bundle cannot be used in conjunction with commands
accepting a kernel object, and the command is defined as ignoring the
kernel_bundle in that case.

Since the new API receives the command and all requirements simultaneously, and
the requirements are known at compile-time, we can instead use a Constraint to
ensure that such code doesn't compile.
This is consistent with info::event_command_status.

Co-authored-by: Gordon Brown <[email protected]>
Although we could limit errors to the {tracking(true), tracking(false)} case,
this would have to be deferred until runtime. Ensuring that each requirements
object contains only one tracking requirement is simpler and less error-prone.
Commands like copy, memcpy, fill, etc are not kernels and so passing a
kernel_bundle as a requirement is not meaningful.
Commands like copy, memcpy, fill, etc take their arguments explicitly rather
than being captured by a function, and so there is no need to inform the
runtime about which accessors are used. If a command uses an accessor, it must
have been passed as an argument.
Any accessor passed to a command that will run on the device must have
target::device.
@CLAassistant
Copy link

CLAassistant commented Oct 10, 2025

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants