-
Notifications
You must be signed in to change notification settings - Fork 1
[cublas] introduce onemkl_cublas_host_task #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
[cublas] move dpc++ internal headers into cublas_task.hpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some comments so far.. main issue: the interop_handler
in scopedcontexthandler
.. how do you plan to replace that in hipSYCL
} | ||
|
||
template <typename H, typename F> | ||
static inline void onemkl_cublas_host_task(H &cgh, cl::sycl::queue queue, F f) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so.. you will be ifdef
ing the host_task_internal
depending on the implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This what I had in mind, yes.
namespace cublas { | ||
|
||
template <typename H, typename F> | ||
static inline auto host_task_internal(H &cgh, cl::sycl::queue queue, F f) -> decltype(cgh.interop_task(f)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the decltype here looks fishy.. first of all, you don't return anything, did you mean to write return cgh.interop_task([f,queue]..)
?
secondly, if making decltype(cgh.interop_task(f))
shouldn't work at all, as interop_task
would not be able to instantiate f
, if it doesn't take an interop_handler
as parameter.. right?
so either replace the auto and decltype.. with void (as you are discarding the result in onemkl_cublas_host_task
anyways or you have to use something like decltype(cgh.interop_task([](cl::sycl::interop_handler ih))
(you could also change the parameter type to auto, I think..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your totally right, this part of the code is quite confusing. I followed the pattern that has been used for the CPU functions see: https://github.com/oneapi-src/oneMKL/blob/e8e3dabf9fbda0556b8075c76b657336f88440f0/src/blas/backends/mklcpu/mklcpu_common.hpp#L42-L56
Probably would be nicer to do something like:
template <typname H, typename F, deceltype(cgh.interop_task(f))
.... void host_task_internal...
I intend to add the functionality for hipSYCL like this:
sbalint98#2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it even return? I think none of the handler
member functions that are there to be called inside queue::submit
have a non-void return type in SYCL. I imagine this is also the case for interop_task
- so why can the return type not simply be void
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's basically sfinae on whether the function exists or not...
For CPU they already use something similar in oneMKL, but there the interface of f fits the requirements of the task function, which is not the case here(taking the scoped thing instead of a handle(r), but you can't wrap it in a lambda either, as it's in an unevaluated context. )
It probably works as log the checked functions don't use sfinae to check the signature of the parameter..
@@ -68,12 +68,13 @@ class CublasScopedContextHandler { | |||
CUcontext original_; | |||
cl::sycl::context placedContext_; | |||
bool needToRecover_; | |||
cl::sycl::interop_handler& ih; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how will this work, if there's no interop_handler
in hipsycl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was to ifdef the headers in cublas_task.hpp, such that this header is not even included in case we are compiling with hipSYCL.
see: sbalint98#2
The main idea is to define multiple EDIT: |
c96127d
to
12edf97
Compare
#include "cublas_scope_handle.hpp" | ||
#include "cublas_task.hpp" | ||
|
||
#include "cublas_task.hpp" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Double include?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, thanks!
@@ -42,12 +42,12 @@ inline void gemm_batch(Func func, cl::sycl::queue &queue, transpose transa, tran | |||
auto a_acc = a.template get_access<cl::sycl::access::mode::read>(cgh); | |||
auto b_acc = b.template get_access<cl::sycl::access::mode::read>(cgh); | |||
auto c_acc = c.template get_access<cl::sycl::access::mode::read_write>(cgh); | |||
cgh.interop_task([=](cl::sycl::interop_handler ih) { | |||
auto sc = CublasScopedContextHandler(queue); | |||
onemkl_cublas_host_task(cgh, queue,[=](CublasScopedContextHandler sc) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does sc
have to be passed in by copy or would non-const reference make more sense? In general I imagine that this object would contain state, so passing in by reference might be more convenient and/or more performant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, indeed sc
can be passed as a reference.
namespace cublas { | ||
|
||
template <typename H, typename F> | ||
static inline auto host_task_internal(H &cgh, cl::sycl::queue queue, F f) -> decltype(cgh.interop_task(f)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it even return? I think none of the handler
member functions that are there to be called inside queue::submit
have a non-void return type in SYCL. I imagine this is also the case for interop_task
- so why can the return type not simply be void
?
It was neceassry for SFINAE but we use ifdefs instead
This is one of five PRs intended to bring support for hipSYCL to oneMKL.
This PR abstracts the
interop_task
by defining aonemkl_cublas_host_task
. This function takes the command group handler, the queue, and the lambda function parameterised bycublasScopedContextHandler
. This abstraction allows for using our owncublasScopedContextHandler
class in case we compile for hipSYCL and for being able to use thehipSYCL_enqueue_custom_operation
extension. If this PR is approved I would like to upstream this to the oneMKL repository.