-
Notifications
You must be signed in to change notification settings - Fork 73
Description
I sometimes find myself in a situation where I'd like to copy some strided data from the host to a SYCL buffer on the device, for subsequent use in a kernel. As it turns out however, the existing APIs for explicit memory operations only allow me to pass a contiguous host pointer as the source of a copy.
I had previously been playing with the idea of using a temporary SYCL buffer constructed with a pointer to my (strided) host memory, from which I could then create a host accessor to use as the src in an explicit copy operation. However, @AerialMantis pointed out to me that the copy is considered a kernel executed on the device (see section 4.8.6), and using host accessors inside kernel functions results in undefined behavior (as they are only allowed to be used on the host, see section 4.7.6.3).
Now, maybe one way of doing this could be to use a device accessor for my src as well, and hope that the SYCL runtime will recognize that instead of doing a H -> D -> D copy, this could be optimized to a strided H -> D copy. However, in doing so I'm pretty much at the mercy of the implementors, and have no guarantee about how much memory will actually be used for this operation (other than an upper bound). More likely than not, in any of the current implementations, the entire temporary host buffer would first be copied to the device (correct me if I'm wrong!).
The other option, which I'm using now, is to first do a host-side copy of the strided data into a contiguous staging buffer, and using that as the src for the copy operation. That is of course not ideal, and if host memory gets tight, might also not be feasible (especially if the implementation uses another pinned staging buffer internally...).
Ultimately I think, given that both OpenCL and CUDA provide APIs for doing strided H -> D copies, SYCL could also benefit from having something like this.
As I recently ran into this issue again, it got me thinking: Why not simply provide the ability to create a SYCL accessor for arbitrary user pointers? Like so:
float* my_ptr = malloc(128 * 128 * sizeof(float));
// ...
cl::sycl::accessor<
float, 2, cl::sycl::access::mode::read_write /* mode is probably not needed */,
cl::sycl::access::target::user_pointer>
my_accessor(my_ptr,
cl::sycl::range<2>(128, 128) /* range of data pointed to */,
cl::sycl::range<2>(64, 128) /* optional sub-range to access */,
cl::sycl::id<2>(32, 0) /* optional offset to access */);With this API, my_accessor could then be used as the src in an explicit copy, implying that the copy should be strided. As an added bonus, such an accessor would allow users to index into their self-managed data just like they can for SYCL buffers, without having to worry about the data's layout in memory.