-
Notifications
You must be signed in to change notification settings - Fork 797
[SYCL] Implement free function kernel enqueue functions #20698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: sycl
Are you sure you want to change the base?
Conversation
cperkinsintel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't realize we were changing this. Nice!e
| // Free function kernel enqueue functions | ||
| template <auto *Func, typename... ArgsT> | ||
| void single_task(queue Q, kernel_function_s<Func> KernelFunc, ArgsT &&...Args) { | ||
| (void)KernelFunc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| (void)KernelFunc; |
same below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the catch, the functions that take queue actually use the parameter, it's the handler ones that don't.
| submit(Q, [&](handler &CGH) { | ||
| single_task(CGH, KernelFunc, std::forward<ArgsT>(Args)...); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a submit_direct* version of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a
submit_direct*version of this? Please sync with @slawekptak to implement it properly from the start rather than create more future work for him.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, there is no submit_direct* version of this in the spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can have it in detail:: still. Also, queue::* itself can act as submit_direct.
| queue Q = CGH.getQueue(); | ||
| sycl::kernel_bundle Bndl = | ||
| get_kernel_bundle<Func, sycl::bundle_state::executable>(Q.get_context()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This creates and destroys two std::shared_ptrs for almost no reason. IMO, we should fix this getQueue() hack while we're in an ABI breaking window. Maybe by changing handler_impl to store a reference to the sycl::queue it was created with? handler_impl::MQueueOrGraph isn't used directly outside a few getters, so the change should be very simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate a bit please?
What shared pointers are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
llvm/sycl/include/sycl/queue.hpp
Lines 3744 to 3745 in bb941bd
| std::shared_ptr<detail::queue_impl> impl; | |
| queue(std::shared_ptr<detail::queue_impl> impl) : impl(impl) {} |
and similar for
sycl::context.
| template <auto *> struct kernel_function_s; | ||
| template <auto *Func, typename... Args> | ||
| void single_task(handler &, kernel_function_s<Func>, Args &&...); | ||
| template <auto *Func, int Dimensions, typename... Args> | ||
| void nd_launch(handler &, nd_range<Dimensions>, kernel_function_s<Func>, | ||
| Args &&...); | ||
| template <auto *Func, int Dimensions, typename Properties, typename... Args> | ||
| void nd_launch(handler &, launch_config<nd_range<Dimensions>, Properties>, | ||
| kernel_function_s<Func>, Args &&...); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is all of that just for handler::getQueue()? Can you extend
llvm/sycl/include/sycl/handler.hpp
Lines 3226 to 3280 in 259c433
| namespace detail { | |
| class HandlerAccess { | |
| public: | |
| static void internalProfilingTagImpl(handler &Handler) { | |
| Handler.internalProfilingTagImpl(); | |
| } | |
| template <typename RangeT, typename PropertiesT> | |
| static void parallelForImpl(handler &Handler, RangeT Range, PropertiesT Props, | |
| kernel Kernel) { | |
| Handler.parallel_for_impl(Range, Props, Kernel); | |
| } | |
| static void swap(handler &LHS, handler &RHS) { | |
| std::swap(LHS.implOwner, RHS.implOwner); | |
| std::swap(LHS.impl, RHS.impl); | |
| std::swap(LHS.MLocalAccStorage, RHS.MLocalAccStorage); | |
| std::swap(LHS.MStreamStorage, RHS.MStreamStorage); | |
| std::swap(LHS.MKernelName, RHS.MKernelName); | |
| std::swap(LHS.MKernel, RHS.MKernel); | |
| std::swap(LHS.MSrcPtr, RHS.MSrcPtr); | |
| std::swap(LHS.MDstPtr, RHS.MDstPtr); | |
| std::swap(LHS.MLength, RHS.MLength); | |
| std::swap(LHS.MPattern, RHS.MPattern); | |
| std::swap(LHS.MHostKernel, RHS.MHostKernel); | |
| std::swap(LHS.MCodeLoc, RHS.MCodeLoc); | |
| } | |
| // pre/postProcess are used only for reductions right now, but the | |
| // abstractions they provide aren't reduction-specific. The main problem they | |
| // solve is | |
| // | |
| // # User code | |
| // q.submit([&](handler &cgh) { | |
| // set_dependencies(cgh); | |
| // enqueue_whatever(cgh); | |
| // }); // single submission | |
| // | |
| // that needs to be implemented as multiple enqueues involving | |
| // pre-/post-processing internally. SYCL prohibits recursive submits from | |
| // inside control group function object (lambda above) so we need some | |
| // internal interface to implement that. | |
| __SYCL_EXPORT static void preProcess(handler &CGH, type_erased_cgfo_ty F); | |
| __SYCL_EXPORT static void postProcess(handler &CGH, type_erased_cgfo_ty F); | |
| template <class FunctorTy> | |
| static void preProcess(handler &CGH, FunctorTy &Func) { | |
| preProcess(CGH, type_erased_cgfo_ty{Func}); | |
| } | |
| template <class FunctorTy> | |
| static void postProcess(handler &CGH, FunctorTy &Func) { | |
| postProcess(CGH, type_erased_cgfo_ty{Func}); | |
| } | |
| }; | |
| } // namespace detail |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that has been added to access getQueue. Now that you've brought HandlerAccess to my attention, it seems like a better solution so I'll try to migrate it over there instead.
Implement the new enqueue functions for free function kernels that were added in #19995