-
Notifications
You must be signed in to change notification settings - Fork 73
Add sycl_khr_free_function_commands extension #644
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add sycl_khr_free_function_commands extension #644
Conversation
This extension provides an alternative mechanism for submitting commands to a device via free-functions that require developers to opt-in to the creation of event objects. It also proposes alternative names for several commands (e.g., launch) and simplifies some concepts (e.g., by removing the need for the nd_range class).
Previous "0 or more" wording only made sense when reductions could be optionally provided to functions like parallel_for; now that there are dedicated *_reduce functions, at least one reduction is required.
"is" is more consistent with ISO C++ wording.
Commands like copy, memcpy, fill, etc are not kernels and so passing a kernel_bundle as a requirement is not meaningful.
Commands like copy, memcpy, fill, etc take their arguments explicitly rather than being captured by a function, and so there is no need to inform the runtime about which accessors are used. If a command uses an accessor, it must have been passed as an argument.
Any accessor passed to a command that will run on the device must have target::device.
|
CTS will need further tests with |
|
Needs re-review since requirements was added. |
| .[apititle]#requirements::has-tracking# | ||
| [source,role=synopsis,id=api:has-tracking] | ||
| ---- | ||
| bool has-tracking() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| bool has-tracking() const; | |
| bool has_tracking() const; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the dash is intended. John named all the exposition-only functions with dashes to help emphasize that they are not part of the specified API.
tomdeakin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-reviewed since requirements were added, and have no comments LGTM!
| namespace sycl::khr { | ||
|
|
||
| template <typename KernelType, typename... Requirements> | ||
| std::optional<event> launch(const queue& q, range<1> r, const requirements<Requirements...>& reqs, const KernelType& k); (1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| std::optional<event> launch(const queue& q, range<1> r, const requirements<Requirements...>& reqs, const KernelType& k); (1) | |
| std::optional<event> launch(const queue& q, range<1> r, const requirements<Requirements...>& reqs, KernelType&& k); (1) |
I think using perfect-forwarding here is strictly not worse than const & while giving implementations ability to better optimize their quirks. Our current implementation of the host_tasks/scheduler might need to make a copy even for the no-handler path. Using perfect forwarding would allow us to use move instead of copy for rvalues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm probably too tired to answer this well, but...
The problem with passing it by universal reference is that it becomes unclear which signature of k is being called, should it have more than one function call operator. For instance:
struct MyKernel {
template<typename... Ts>
void operator()(Ts&&...) { /* ... */ }
template<typename... Ts>
void operator()(Ts&&...) const { /* ... */ }
};And how many kernels would really benefit from being movable? Are there issues with move-only types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And how many kernels would really benefit from being movable?
In our implementation - everything using SYCL special classes (accessors/streams/etc.) would benefit, as those have std::shared_ptr inside managing SYCL RT internal details.
Are there issues with move-only types?
I believe there are. Move-only types can be trivially-copyable and those are device-copyable by definition. We won't be able to copy them at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WG discussed this at the F2F. We will work on an example to test existing implementation behaviour to understand the case of member variables not used on the device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular, do we diagnose an error in this case:
- The member variable is not used inside the kernel, and
- The type of the member variable is not a legal kernel argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's rewrite this using a named kernel object, so the operators are more clear. Also, there's no such thing as event::set_not_ready or event::set_ready, so let's use real code.
#include <sycl/sycl.hpp>
struct MyKernel {
sycl::accessor acc;
MyKernel(syc::accessor<int> a) : acc(a) {}
void operator()() const { acc[0] = 42; }
};
int main() {
sycl::queue q;
sycl::buffer<int, 1> b{1};
sycl::event e = q.submit([&](sycl::handler& cgh) {
cgh.host_task([=](){ sleep(1); });
});
{
sycl::accessor acc{b, sycl::read_write};
sycl::khr::requirements reqs{acc, e};
sycl::khr::launch_task(q, reqs, MyKernel{acc});
}
// Somewhere internally GPU task above gets enqueued here.
// User kernel has to be alive at that time, meaning the kernel
// above had to be copied/moved internally.
q.wait();
assert(host_accessor{b}[0] == 42);
}
I think you are saying that the implementation of launch_task will (likely) need to copy / move the temporary MyKernel object in the code above, correct? This is because the SYCL runtime will (likely) need to implement the host task in software via a thread, and it can't enqueue the kernel to the device until after the host task completes. This implies that the SYCL runtime has either taken a copy or moved the kernel object because the host task may complete long after the call to launch_task has returned.
Do I have that right?
Completing the analysis, the MyKernel object contains an accessor member variable, which can be moved more efficiently than it can be copied. Therefore, there would be a performance advantage if the SYCL runtime could move the temporary MyKernel object rather than copying it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I have that right?
Yes!
the MyKernel object contains an accessor member variable, which can be moved more efficiently than it can be copied.
Yes too.
Therefore, there would be a performance advantage if the SYCL runtime could move the temporary MyKernel object rather than copying it.
Yes. In addition to that, user kernel maybe be non-copyable and only movable. In that case, not being able to move would mean we can't even compile the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move-only kernels would be a feature request which does not compile now. It also severely limits implementation techniques (possibly to the point of unimplementable), as you can only have one copy of a kernel across all devices.
For host_task specifically, maybe, as the member function already takes it by forwarding reference, although I'm not really a fan of having different rules for host vs. device.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on how one reads the spec. struct sycl::is_device_copyable<MyKernel> : std::true_type{}; means that we can "bit-copy" when submitting to the device, but that doesn't mean the same for host/rt copies, IMO. If that's not the case, then maybe the SYCL spec needs clarifications?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move-only kernel objects seem like a niche case, but I don't think we need to rely on that in order to justify the change that @aelovikov-intel proposes. Does the accessor example above provide enough justification?
@nliber you said this above:
The problem with passing it by universal reference is that it becomes unclear which signature of
kis being called, should it have more than one function call operator. For instance:struct MyKernel { template<typename... Ts> void operator()(Ts&&...) { /* ... */ } template<typename... Ts> void operator()(Ts&&...) const { /* ... */ } };
I think the example above is clear because section 4.12.1. "Defining kernels as named function objects" says:
The
operator()member function must be const-qualified ...
However, this bizarre case would still be unclear:
struct MyKernel {
void operator()() const { /* ... */ }
void operator()() const volatile { /* ... */ }
};I guess we could clarify this by changing section 4.12.1 to say:
The
operator()member function must be const-qualified and not volatile-qualified ...
Aside from this issue that @nliber raised, are there other disadvantages to @aelovikov-intel's proposal?
| } | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an example with buffer/accessors somewhere?
This PR has been replaced by #922. All future work related to this PR (including the open discussions) should be continued in the new PR.
This extension provides an alternative mechanism for submitting commands to a device via free-functions that require developers to opt-in to the creation of event objects.
It also proposes alternative names for several commands (e.g.,
launch) and simplifies some concepts (e.g., by removing the need for thend_rangeclass).