Skip to content

Conversation

@ZuseZ4
Copy link
Member

@ZuseZ4 ZuseZ4 commented Jan 5, 2026

This intrinsic helps with supporting the various AMD & NVIDIA libraries like rocBLAS or cuBLAS.
They provide functions which must be called from the host, but require a mixture of host and device pointers.
This offload_args intrinsic maps our host allocations to device allocations and transfers memory as required.
It reuses the whole infrastructure which we already have for the main offload intrinsic.
Unlike the main offload intrinsic, this also already fully works with std. I also got it to work with a single cargo invocation:
RUSTFLAGS="-L native=/opt/rocm-6.4.0/lib -l dylib=rocblas -l dylib=amdhip64 -l dylib=omp -l dylib=omptarget -Zoffload=Args -Zunstable-options" cargo +offload run -r

To be cleaned up.

TODO: handle mangled fnc names. Done

I updated compiler/rustc_monomorphize/src/collector/autodiff.rs, it now works without no_mangle, otherwise the function won't be codegen'ed. It also works without lto=fat if we only have main.rs
If we put and use stuff in lib.rs and call it in main.rs, then it currently trips the verifier. I guess that should be easy to fix:

thread 'rustc' (494962) panicked at compiler/rustc_monomorphize/src/collector.rs:468:13:
assertion failed: tcx.should_codegen_locally(instance)
stack backtrace:

cc @kevinsala @Sa4dUs

@rustbot rustbot added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 5, 2026
@ZuseZ4 ZuseZ4 added the F-gpu_offload `#![feature(gpu_offload)]` label Jan 5, 2026
@rust-log-analyzer

This comment has been minimized.

…ts, but calls some host code with device ptrs
@ZuseZ4 ZuseZ4 force-pushed the offload-host-intrinsic branch from 020f669 to 555131e Compare January 5, 2026 15:21
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@ZuseZ4 ZuseZ4 mentioned this pull request Jan 6, 2026
5 tasks
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu-llvm-20-2 failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
fmt: checked 6627 files
tidy check
tidy [rustdoc_json (src)]: `rustdoc-json-types` modified, checking format version
tidy: Skipping binary file check, read-only filesystem
tidy [style (tests)]: /checkout/tests/codegen-llvm/gpu_offload/offload_args.rs:39: line longer than 100 chars
tidy [style (tests)]: /checkout/tests/codegen-llvm/gpu_offload/offload_args.rs:40: line longer than 100 chars
tidy [style (tests)]: /checkout/tests/codegen-llvm/gpu_offload/offload_args.rs:41: line longer than 100 chars
tidy [style (compiler)]: /checkout/compiler/rustc_codegen_llvm/src/builder/gpu_offload.rs:475: TODO is used for tasks that should be done before merging a PR; If you want to leave a message in the codebase use FIXME
tidy [style (compiler)]: FAIL
tidy [style (tests)]: FAIL
tidy: The following checks failed: style (compiler), style (tests)
Bootstrap failed while executing `--stage 2 test --skip tests --skip coverage-map --skip coverage-run --skip library --skip tidyselftest`
Command `/checkout/obj/build/aarch64-unknown-linux-gnu/stage1-tools-bin/rust-tidy /checkout /checkout/obj/build/aarch64-unknown-linux-gnu/stage0/bin/cargo /checkout/obj/build 4 yarn` failed with exit code 1
Created at: src/bootstrap/src/core/build_steps/tool.rs:1612:23
Executed at: src/bootstrap/src/core/build_steps/test.rs:1357:29

Command has failed. Rerun with -v to see more details.
Build completed unsuccessfully in 0:00:53
  local time: Wed Jan  7 19:44:59 UTC 2026
  network time: Wed, 07 Jan 2026 19:44:59 GMT
##[error]Process completed with exit code 1.
##[group]Run echo "disk usage:"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. F-gpu_offload `#![feature(gpu_offload)]` S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants