Open
Description
Is there a significant performance difference between the math function "powf()" in CUDA and sycl::pow() in SYCL on an Nvidia GPU (e.g. V100)? The "fast math" option is enabled when building the CUDA and SYCL programs. Thanks for your investigation.
https://github.com/zjin-lcf/oneAPI-DirectProgramming/tree/master/minkowski-sycl
https://github.com/zjin-lcf/oneAPI-DirectProgramming/tree/master/minkowski-cuda