Skip to content

Conversation

@Pennycook
Copy link
Contributor

@Pennycook Pennycook commented Oct 17, 2024

This PR has been replaced by #922. All future work related to this PR (including the open discussions) should be continued in the new PR.

This extension provides an alternative mechanism for submitting commands to a device via free-functions that require developers to opt-in to the creation of event objects.

It also proposes alternative names for several commands (e.g., launch) and simplifies some concepts (e.g., by removing the need for the nd_range class).

This extension provides an alternative mechanism for submitting commands to a
device via free-functions that require developers to opt-in to the creation of
event objects.

It also proposes alternative names for several commands (e.g., launch) and
simplifies some concepts (e.g., by removing the need for the nd_range class).
@gmlueck gmlueck self-requested a review June 17, 2025 16:53
Commands like copy, memcpy, fill, etc are not kernels and so passing a
kernel_bundle as a requirement is not meaningful.
Commands like copy, memcpy, fill, etc take their arguments explicitly rather
than being captured by a function, and so there is no need to inform the
runtime about which accessors are used. If a command uses an accessor, it must
have been passed as an argument.
Any accessor passed to a command that will run on the device must have
target::device.
@Pennycook Pennycook added the enhancement New feature or request label Jun 27, 2025
@tomdeakin
Copy link
Contributor

CTS will need further tests with sycl::requirements

@tomdeakin
Copy link
Contributor

Needs re-review since requirements was added.

.[apititle]#requirements::has-tracking#
[source,role=synopsis,id=api:has-tracking]
----
bool has-tracking() const;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
bool has-tracking() const;
bool has_tracking() const;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the dash is intended. John named all the exposition-only functions with dashes to help emphasize that they are not part of the specified API.

Copy link
Contributor

@tomdeakin tomdeakin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed since requirements were added, and have no comments LGTM!

namespace sycl::khr {

template <typename KernelType, typename... Requirements>
std::optional<event> launch(const queue& q, range<1> r, const requirements<Requirements...>& reqs, const KernelType& k); (1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::optional<event> launch(const queue& q, range<1> r, const requirements<Requirements...>& reqs, const KernelType& k); (1)
std::optional<event> launch(const queue& q, range<1> r, const requirements<Requirements...>& reqs, KernelType&& k); (1)

I think using perfect-forwarding here is strictly not worse than const & while giving implementations ability to better optimize their quirks. Our current implementation of the host_tasks/scheduler might need to make a copy even for the no-handler path. Using perfect forwarding would allow us to use move instead of copy for rvalues.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm probably too tired to answer this well, but...

The problem with passing it by universal reference is that it becomes unclear which signature of k is being called, should it have more than one function call operator. For instance:

struct MyKernel {
    template<typename... Ts>
    void operator()(Ts&&...) { /* ... */ }

    template<typename... Ts>
    void operator()(Ts&&...) const { /* ... */ }
};

And how many kernels would really benefit from being movable? Are there issues with move-only types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And how many kernels would really benefit from being movable?

In our implementation - everything using SYCL special classes (accessors/streams/etc.) would benefit, as those have std::shared_ptr inside managing SYCL RT internal details.

Are there issues with move-only types?

I believe there are. Move-only types can be trivially-copyable and those are device-copyable by definition. We won't be able to copy them at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The WG discussed this at the F2F. We will work on an example to test existing implementation behaviour to understand the case of member variables not used on the device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular, do we diagnose an error in this case:

  • The member variable is not used inside the kernel, and
  • The type of the member variable is not a legal kernel argument.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rewrite this using a named kernel object, so the operators are more clear. Also, there's no such thing as event::set_not_ready or event::set_ready, so let's use real code.

#include <sycl/sycl.hpp>

struct MyKernel {
  sycl::accessor acc;
  MyKernel(syc::accessor<int> a) : acc(a) {}
  void operator()() const { acc[0] = 42; }
};

int main() {
  sycl::queue q;
  sycl::buffer<int, 1> b{1};

  sycl::event e = q.submit([&](sycl::handler& cgh) {
    cgh.host_task([=](){ sleep(1); });
  });

  {
    sycl::accessor acc{b, sycl::read_write};
    sycl::khr::requirements reqs{acc, e};
    sycl::khr::launch_task(q, reqs, MyKernel{acc});
  }

  // Somewhere internally GPU task above gets enqueued here.
  // User kernel has to be alive at that time, meaning the kernel 
  // above had to be copied/moved internally.
  q.wait();
  assert(host_accessor{b}[0] == 42);
}

I think you are saying that the implementation of launch_task will (likely) need to copy / move the temporary MyKernel object in the code above, correct? This is because the SYCL runtime will (likely) need to implement the host task in software via a thread, and it can't enqueue the kernel to the device until after the host task completes. This implies that the SYCL runtime has either taken a copy or moved the kernel object because the host task may complete long after the call to launch_task has returned.

Do I have that right?

Completing the analysis, the MyKernel object contains an accessor member variable, which can be moved more efficiently than it can be copied. Therefore, there would be a performance advantage if the SYCL runtime could move the temporary MyKernel object rather than copying it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I have that right?

Yes!

the MyKernel object contains an accessor member variable, which can be moved more efficiently than it can be copied.

Yes too.

Therefore, there would be a performance advantage if the SYCL runtime could move the temporary MyKernel object rather than copying it.

Yes. In addition to that, user kernel maybe be non-copyable and only movable. In that case, not being able to move would mean we can't even compile the code.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move-only kernels would be a feature request which does not compile now. It also severely limits implementation techniques (possibly to the point of unimplementable), as you can only have one copy of a kernel across all devices.

For host_task specifically, maybe, as the member function already takes it by forwarding reference, although I'm not really a fan of having different rules for host vs. device.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on how one reads the spec. struct sycl::is_device_copyable<MyKernel> : std::true_type{}; means that we can "bit-copy" when submitting to the device, but that doesn't mean the same for host/rt copies, IMO. If that's not the case, then maybe the SYCL spec needs clarifications?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move-only kernel objects seem like a niche case, but I don't think we need to rely on that in order to justify the change that @aelovikov-intel proposes. Does the accessor example above provide enough justification?

@nliber you said this above:

The problem with passing it by universal reference is that it becomes unclear which signature of k is being called, should it have more than one function call operator. For instance:

struct MyKernel {
    template<typename... Ts>
    void operator()(Ts&&...) { /* ... */ }

    template<typename... Ts>
    void operator()(Ts&&...) const { /* ... */ }
};

I think the example above is clear because section 4.12.1. "Defining kernels as named function objects" says:

The operator() member function must be const-qualified ...

However, this bizarre case would still be unclear:

struct MyKernel {
    void operator()() const { /* ... */ }
    void operator()() const volatile { /* ... */ }
};

I guess we could clarify this by changing section 4.12.1 to say:

The operator() member function must be const-qualified and not volatile-qualified ...

Aside from this issue that @nliber raised, are there other disadvantages to @aelovikov-intel's proposal?

}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add an example with buffer/accessors somewhere?

Testing if I can push commits to this branch.
@slawekptak slawekptak self-requested a review October 8, 2025 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Submision optimization when event are not used