Skip to content

[LIBCLC][BINDLESS][CUDA] always inline redirection functs #18699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: sycl
Choose a base branch
from

Conversation

JackAKirk
Copy link
Contributor

@JackAKirk JackAKirk commented May 28, 2025

These functions at most do some casting, and have effectively zero register overhead at default opt level, therefore there should be no usage circumstance that brings a downside to always inlining.

This brings the nvptx libclc image backend in line with the amd one which requires no such changes. amd libclc backend already does the same thing via consistent usage of the _CLC_DECL macro for all functions. Whilst not immediately obvious to the libclc programmer, _CLC_DECL macro calls __attribute__((always_inline)).

There's a few cases that had low register usage that I've added the inline hint to also, being probably overly cautious.

These functions have effectively zero register overhead,
therefore no usage circumstance that brings downside to
always inlining.

Signed-off-by: JackAKirk <[email protected]>
@JackAKirk
Copy link
Contributor Author

Marking as ready for review: will fix format later.

@JackAKirk JackAKirk marked this pull request as ready for review May 29, 2025 11:06
@JackAKirk JackAKirk requested a review from a team as a code owner May 29, 2025 11:06
@JackAKirk JackAKirk requested a review from npmiller May 29, 2025 11:06
@@ -149,7 +149,7 @@ void __nvvm_sust_3d_v4i32_clamp(write_only image3d_t, int, int, int, int, int,

int __nvvm_suq_width(long) __asm("llvm.nvvm.suq.width");
int __nvvm_suq_height(long) __asm("llvm.nvvm.suq.height");
int __nvvm_suq_depth(long arg) {
__attribute__((always_inline)) int __nvvm_suq_depth(long arg) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be prudent to define some kind of IMAGE_DEF_ATTRS macro like CLC_DEF which is set to __attribute__((always_inline))? That way more attributes could be added less noisily in the future if needed?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants