Skip to content

[cublas] introduce onemkl_cublas_host_task #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

sbalint98
Copy link

This is one of five PRs intended to bring support for hipSYCL to oneMKL.

This PR abstracts the interop_task by defining a onemkl_cublas_host_task. This function takes the command group handler, the queue, and the lambda function parameterised by cublasScopedContextHandler. This abstraction allows for using our own cublasScopedContextHandler class in case we compile for hipSYCL and for being able to use the hipSYCL_enqueue_custom_operationextension. If this PR is approved I would like to upstream this to the oneMKL repository.

[cublas] move dpc++ internal headers into cublas_task.hpp
Copy link

@fodinabor fodinabor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments so far.. main issue: the interop_handler in scopedcontexthandler .. how do you plan to replace that in hipSYCL

}

template <typename H, typename F>
static inline void onemkl_cublas_host_task(H &cgh, cl::sycl::queue queue, F f) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so.. you will be ifdefing the host_task_internal depending on the implementation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This what I had in mind, yes.

namespace cublas {

template <typename H, typename F>
static inline auto host_task_internal(H &cgh, cl::sycl::queue queue, F f) -> decltype(cgh.interop_task(f)) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the decltype here looks fishy.. first of all, you don't return anything, did you mean to write return cgh.interop_task([f,queue]..)?
secondly, if making decltype(cgh.interop_task(f)) shouldn't work at all, as interop_task would not be able to instantiate f, if it doesn't take an interop_handler as parameter.. right?

so either replace the auto and decltype.. with void (as you are discarding the result in onemkl_cublas_host_task anyways or you have to use something like decltype(cgh.interop_task([](cl::sycl::interop_handler ih)) (you could also change the parameter type to auto, I think..)

Copy link
Author

@sbalint98 sbalint98 May 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your totally right, this part of the code is quite confusing. I followed the pattern that has been used for the CPU functions see: https://github.com/oneapi-src/oneMKL/blob/e8e3dabf9fbda0556b8075c76b657336f88440f0/src/blas/backends/mklcpu/mklcpu_common.hpp#L42-L56

Probably would be nicer to do something like:

template <typname H, typename F, deceltype(cgh.interop_task(f))
.... void host_task_internal...

I intend to add the functionality for hipSYCL like this:
sbalint98#2

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it even return? I think none of the handler member functions that are there to be called inside queue::submit have a non-void return type in SYCL. I imagine this is also the case for interop_task - so why can the return type not simply be void?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's basically sfinae on whether the function exists or not...
For CPU they already use something similar in oneMKL, but there the interface of f fits the requirements of the task function, which is not the case here(taking the scoped thing instead of a handle(r), but you can't wrap it in a lambda either, as it's in an unevaluated context. )
It probably works as log the checked functions don't use sfinae to check the signature of the parameter..

@@ -68,12 +68,13 @@ class CublasScopedContextHandler {
CUcontext original_;
cl::sycl::context placedContext_;
bool needToRecover_;
cl::sycl::interop_handler& ih;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this work, if there's no interop_handler in hipsycl?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was to ifdef the headers in cublas_task.hpp, such that this header is not even included in case we are compiling with hipSYCL.

see: sbalint98#2

@sbalint98
Copy link
Author

sbalint98 commented May 21, 2021

The main idea is to define multiple cublasScopedContextHandler class in different headers, and source files, and then use ifdef to decide which one will get included.

EDIT:
As suggested by @fodinabor I opened a subPR to show how adding hipSYCL will look like: sbalint98#2

@sbalint98 sbalint98 force-pushed the prepare_hipSYCL_host_task branch from c96127d to 12edf97 Compare May 21, 2021 19:35
#include "cublas_scope_handle.hpp"
#include "cublas_task.hpp"

#include "cublas_task.hpp"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double include?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, thanks!

@@ -42,12 +42,12 @@ inline void gemm_batch(Func func, cl::sycl::queue &queue, transpose transa, tran
auto a_acc = a.template get_access<cl::sycl::access::mode::read>(cgh);
auto b_acc = b.template get_access<cl::sycl::access::mode::read>(cgh);
auto c_acc = c.template get_access<cl::sycl::access::mode::read_write>(cgh);
cgh.interop_task([=](cl::sycl::interop_handler ih) {
auto sc = CublasScopedContextHandler(queue);
onemkl_cublas_host_task(cgh, queue,[=](CublasScopedContextHandler sc) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does sc have to be passed in by copy or would non-const reference make more sense? In general I imagine that this object would contain state, so passing in by reference might be more convenient and/or more performant.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, indeed sc can be passed as a reference.

namespace cublas {

template <typename H, typename F>
static inline auto host_task_internal(H &cgh, cl::sycl::queue queue, F f) -> decltype(cgh.interop_task(f)) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it even return? I think none of the handler member functions that are there to be called inside queue::submit have a non-void return type in SYCL. I imagine this is also the case for interop_task - so why can the return type not simply be void?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants