[SYCL][NVPTX][AMDGCN] Move devicelib cmath to header #18706

npmiller · 2025-05-28T15:45:10Z

This patch experiments with moving standard library math built-ins from libdevice into headers.

This is based on the way clang handles this for CUDA and HIP. In these languages you can define device functions as overloads. This allows re-defining standard library functions specifically for the device in a header, so that we can provide a device specific implementations of certain built-ins while still using the regular standard library headers.

By default SYCL doesn't do overloads for device functions, so this patch introduces a new sycl_device_only attribute, this attribute will make a function device only and allow it to overload with existing functions.

This patch experiments with moving standard library math built-ins from libdevice into headers. This is based on the way clang handles this for CUDA and HIP. In these languages you can define device functions as overloads. This allows re-defining standard library functions specifically for the device in a header, so that we can provide a device specific implementations of certain built-ins while still using the regular standard library headers. By default SYCL doesn't do overloads for device functions, so this patch introduces a new `sycl_device_only` attribute, this attribute will make a function device only and allow it to overload with existing functions.

npmiller · 2025-05-28T16:40:44Z

@bader this is a proof of concept for moving C++ library handling from libdevice code into headers. It allows us to remove the hack blocking LLVM intrinsic generation for standard math built-ins, since we intercept them earlier in the header for device side, which is in-line with what clang cuda does. Only for cmath and for Nvidia and AMD for now.

I've currently placed the header into the stl_wrappers directory, it might be better as a clang header, but at least on CUDA the clang header is always included whereas with the stl wrappers it will only be included when the matching standard library header is included.

This still needs a ton of work which is why it's a draft, but let me know if you have any feedback on the approach.

It would be good to know if this would be interesting for non-AOT targets as well, there's a lot of logic in the driver to conditionally link libdevice libraries, I suspect in theory most of that could be replaced with this header approach, but I haven't looked into this much so I'm not 100% sure if this is something we'd want.

bader

@npmiller, thanks for working on this.

It allows us to remove the hack blocking LLVM intrinsic generation for standard math built-ins, since we intercept them earlier in the header for device side, which is in-line with what clang cuda does.

I discussed this approach with Johannes Doerfert a few years ago. He told me that he doesn't like "what clang cuda does" and plans to change it. I think clang still uses the header solution, but it may be worth to double check with LLVM community is doing any work in that direction.

I've currently placed the header into the stl_wrappers directory, it might be better as a clang header, but at least on CUDA the clang header is always included whereas with the stl wrappers it will only be included when the matching standard library header is included.

Interesting... I thought that clang only adds path to the clang headers at the beginning of the search paths list to make sure that clang wrapper header is included before STL one. I didn't know that CUDA compiler always includes clang wrapper headers.

It would be good to know if this would be interesting for non-AOT targets as well, there's a lot of logic in the driver to conditionally link libdevice libraries, I suspect in theory most of that could be replaced with this header approach, but I haven't looked into this much so I'm not 100% sure if this is something we'd want.

@AlexeySachkov, could you take into SPIR-V part, please?

The change looks to be aligned with the community approach. The only concern I have is compile time, but potential increase should be negligible.

cc @Naghasan just to keep in the loop.

bader · 2025-05-28T17:11:16Z

clang/lib/AST/Decl.cpp

+  if (Context.getLangOpts().isSYCL() && hasAttr<SYCLDeviceOnlyAttr>() &&
+      !(BuiltinID == Builtin::BIprintf || BuiltinID == Builtin::BImalloc)) {


If I get it right, we allow printf and malloc for all SYCL targets. I'm open to discuss if requiring printf support is reasonable, but I'm not sure if malloc can be supported by all SYCL targets.

I think this is a copy/paste from the CUDA code just above, malloc shouldn't be there

sycl/include/sycl/stl_wrappers/cmath-fallback.h

Naghasan

don't forget to add tests and documentation for the attribute before undrafting :)

Naghasan · 2025-05-28T19:32:29Z

clang/lib/AST/Decl.cpp

+  if (Context.getLangOpts().isSYCL() && hasAttr<SYCLDeviceOnlyAttr>() &&
+      !(BuiltinID == Builtin::BIprintf || BuiltinID == Builtin::BImalloc)) {


I think this is a copy/paste from the CUDA code just above, malloc shouldn't be there

Naghasan · 2025-05-28T19:33:58Z

clang/lib/Sema/SemaOverload.cpp

@@ -1629,6 +1629,14 @@ static bool IsOverloadOrOverrideImpl(Sema &SemaRef, FunctionDecl *New,
    }
  }

+  // Allow overloads with SYCLDeviceOnlyAttr
+  if (SemaRef.getLangOpts().isSYCL()) {
+    if (hasExplicitAttr<SYCLDeviceOnlyAttr>(Old) !=


we shouldn't limit to explicit attr

Updated all the hasExplicitAttr to hasAttr

Naghasan · 2025-05-28T19:37:46Z

clang/include/clang/Basic/Attr.td

+def SYCLDeviceOnly : InheritableAttr {
+  let Spellings = [GNU<"sycl_device_only">];
+  let Subjects = SubjectList<[Function]>;
+  let LangOpts = [SYCLIsDevice];


Suggested change

let LangOpts = [SYCLIsDevice];

let LangOpts = [SYCLIsHost, SYCLIsDevice];

otherwise this would create a warning during host compilation and the filtering is dead code

Added SilentlyIgnoreSYCLIsHost instead, sycl_device had that and it sounds like it should suppress the warning you mention.

We don't support malloc in SYCL, silence warnings for host compilation with `sycl_device_only`. Fix failing clang test with new attribute.

npmiller · 2025-05-29T09:43:18Z

Interesting... I thought that clang only adds path to the clang headers at the beginning of the search paths list to make sure that clang wrapper header is included before STL one. I didn't know that CUDA compiler always includes clang wrapper headers.

Yeah in the driver here it does:

  CC1Args.push_back("-include");
  CC1Args.push_back("__clang_cuda_runtime_wrapper.h");

And __clang_cuda_cmath.h is included from that runtime wrapper header here, it also includes <cmath>.

Using our stl wrappers solution should allow us to be a little more conservative about when we include all of this stuff.

This test was relying on the hack preventing LLVM intrinsics from being emitted so it doesn't work at all with the new approach.

npmiller had a problem deploying to WindowsCILock May 28, 2025 15:45 — with GitHub Actions Failure

npmiller had a problem deploying to WindowsCILock May 28, 2025 16:35 — with GitHub Actions Failure

bader reviewed May 28, 2025

View reviewed changes

Naghasan reviewed May 28, 2025

View reviewed changes

[SYCL] Fixup attribute handling

9ccd5e8

We don't support malloc in SYCL, silence warnings for host compilation with `sycl_device_only`. Fix failing clang test with new attribute.

npmiller added 3 commits May 29, 2025 10:54

[SYCL] Use hasAttr instead of hasExplicitAttr

ebe3c35

[SYCL] Update fallback header

dca8911

[SYCL] Remove sycl-libdevice-cmath.cpp test

8e11b15

This test was relying on the hack preventing LLVM intrinsics from being emitted so it doesn't work at all with the new approach.

npmiller force-pushed the rip-libdevice branch from 009c38b to 8e11b15 Compare May 29, 2025 15:31

npmiller temporarily deployed to WindowsCILock May 29, 2025 15:31 — with GitHub Actions Inactive

npmiller had a problem deploying to WindowsCILock May 29, 2025 16:14 — with GitHub Actions Error

npmiller temporarily deployed to WindowsCILock May 29, 2025 16:14 — with GitHub Actions Inactive

[SYCL] Add missing abs

bf6aea1

npmiller temporarily deployed to WindowsCILock May 29, 2025 16:53 — with GitHub Actions Inactive

npmiller temporarily deployed to WindowsCILock May 29, 2025 17:42 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SYCL][NVPTX][AMDGCN] Move devicelib cmath to header #18706

[SYCL][NVPTX][AMDGCN] Move devicelib cmath to header #18706

Uh oh!

npmiller commented May 28, 2025

Uh oh!

npmiller commented May 28, 2025

Uh oh!

bader left a comment

Uh oh!

bader May 28, 2025

Uh oh!

Naghasan May 28, 2025

Uh oh!

Uh oh!

Naghasan left a comment

Uh oh!

Naghasan May 28, 2025

Uh oh!

Naghasan May 28, 2025

Uh oh!

npmiller May 29, 2025

Uh oh!

Naghasan May 28, 2025

Uh oh!

npmiller May 29, 2025

Uh oh!

npmiller commented May 29, 2025

Uh oh!

		if (Context.getLangOpts().isSYCL() && hasAttr<SYCLDeviceOnlyAttr>() &&
		!(BuiltinID == Builtin::BIprintf \|\| BuiltinID == Builtin::BImalloc)) {

	let LangOpts = [SYCLIsDevice];
	let LangOpts = [SYCLIsHost, SYCLIsDevice];

[SYCL][NVPTX][AMDGCN] Move devicelib cmath to header #18706

Are you sure you want to change the base?

[SYCL][NVPTX][AMDGCN] Move devicelib cmath to header #18706

Uh oh!

Conversation

npmiller commented May 28, 2025

Uh oh!

npmiller commented May 28, 2025

Uh oh!

bader left a comment

Choose a reason for hiding this comment

Uh oh!

bader May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Naghasan May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Naghasan left a comment

Choose a reason for hiding this comment

Uh oh!

Naghasan May 28, 2025

Choose a reason for hiding this comment

Uh oh!

Naghasan May 28, 2025

Choose a reason for hiding this comment

Uh oh!

npmiller May 29, 2025

Choose a reason for hiding this comment

Uh oh!

Naghasan May 28, 2025

Choose a reason for hiding this comment

Uh oh!

npmiller May 29, 2025

Choose a reason for hiding this comment

Uh oh!

npmiller commented May 29, 2025

Uh oh!