Skip to content

ci(hotswap): test runtime hookup of relocated hsa-hotswap tool#6094

Open
lamb-j wants to merge 7 commits into
mainfrom
users/lambj/hotswap-debug
Open

ci(hotswap): test runtime hookup of relocated hsa-hotswap tool#6094
lamb-j wants to merge 7 commits into
mainfrom
users/lambj/hotswap-debug

Conversation

@lamb-j

@lamb-j lamb-j commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

ISSUE ID: #6096

Motivation

Exploratory / debugging PR for the refactored hotswap layout. Failures are expected and acceptable — this PR is to debug the end-to-end runtime hookup of the relocated HSA tool, not to land as-is.

The hotswap HSA_TOOLS_LIB tool has moved out of comgr (libamd_comgr_hotswap_tool.so, removed by ROCm/llvm-project#3007) into rocm-systems projects/hotswap (libhsa-hotswap.so). comgr now provides only the amd_comgr_hotswap_rewrite API.

What this PR does

  • compiler/CMakeLists.txt: drop the removed HOTSWAP_BUILD_TOOL args; keep COMGR_ENABLE_HOTSWAP_TRANSPILE (comgr keeps the rewrite API).
  • core/CMakeLists.txt: declare the hsa-hotswap subproject (rocm-systems projects/hotswap), depending on amd-comgr + ROCR-Runtime, and add it to the core-runtime artifact so libhsa-hotswap.so is packaged.
  • test_component.yml: hookup test — set HSA_TOOLS_LIB=<artifacts>/lib/libhsa-hotswap.so on the Linux hip-tests stage so ROCr exercises the tool-load path. With the allowlist restricted to gfx1250->gfx1250 (rocm-systems #7715), the tool is inert on gfx942 and must not change results.

What this tests (and what it does not)

  • ✅ The tool builds and packages, ROCr can dlopen it (deps resolve, HSA tool ABI present), and it stays inert on gfx942 (no regression).
  • ❌ Actual B0->A0 transpilation — no gfx1250 runner exists in CI, so the real transpile path is out of scope.

Temporary SMP bumps (to be removed)

These pin to test branches and will be dropped once the real content lands via the normal SMP process. The tables below track exactly what each temp branch carries on top of its base pin — we'll keep appending here as we add debug commits.

compiler/amd-llvmROCm/llvm-project users/lambj/therock-hotswap-cherrypick-v4

Base = current TheRock/main pin 46fcb339 (already carries the hotswap cherry-pick stack through #2987, landed via #6007). Only delta on top:

PR Commit Description
#3007 aa451e1f Remove COMGR hotswap HSA tool

rocm-systemsROCm/rocm-systems users/lambj/hotswap-test-integration

Base SMP pin: a0952b2b · PRs / commits on top (integration branch tip f34d63914e):

PR / source Description
#7629 Build/install libhsa-hotswap.so, link hsa-runtime64, re-key HSA_TOOLS_LIB to libhsa-hotswap.so, ISA derivation + tests, OnUnload use-after-free fix, opt-in HSA_HOTSWAP_VERBOSE logging
#7715 Restrict HotSwap forwarding to gfx1250→gfx1250
local (c3bde1d6) Install the rocjitsu CLI (for the planned rocjitsu emulation test)

Known risks

  • The hsa-hotswap subproject wiring is a first attempt; the amd_comgr / hsa-runtime64 CONFIG package resolution in the superbuild is the most likely failure point.
  • therock_test_validate_shared_lib will hard-fail the build if libhsa-hotswap.so isn't produced — intentional, to surface packaging misses.

lamb-j added 2 commits June 24, 2026 08:26
Adapt TheRock to the refactored hotswap layout: the HSA_TOOLS_LIB tool moves
from comgr (libamd_comgr_hotswap_tool.so, removed by ROCm/llvm-project#3007)
to rocm-systems projects/hotswap (libhsa-hotswap.so).

- compiler/CMakeLists.txt: drop the removed HOTSWAP_BUILD_TOOL args; keep
  COMGR_ENABLE_HOTSWAP_TRANSPILE (comgr still provides the rewrite API).
- core/CMakeLists.txt: declare the hsa-hotswap subproject (rocm-systems
  projects/hotswap), depending on amd-comgr + ROCR-Runtime, and add it to the
  core-runtime artifact so libhsa-hotswap.so is packaged.

TEMPORARY (testing only, to be dropped once the real pins advance via the
normal SMP process):
- compiler/amd-llvm -> users/lambj/therock-hotswap-cherrypick-v4 (v3 + #3007)
- rocm-systems -> users/lambj/hotswap-test-integration (pin + #7629 + #7715)
Hookup test for the relocated HSA tool: set HSA_TOOLS_LIB to the packaged
libhsa-hotswap.so during the Linux hip-tests stage so ROCr exercises the
tool-load path. The allowlist is gfx1250->gfx1250 only (rocm-systems #7715),
so the tool stays inert on gfx942 and must not change results. Fails loudly if
the tool is missing, since THEROCK_ENABLE_HOTSWAP is on by default.
@therock-pr-bot

therock-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

✅ All Policy Checks Passed

Check Status Details
🌿 Branch Name ✅ Pass
📝 PR Title/Description ✅ Pass
Forbidden Files ✅ Pass
🧪 Unit Test ✅ Pass
🚫 Draft PR 🔜 To Be Enabled
🚩 Feature Flag 🔜 To Be Enabled
📊 Code Coverage 🔜 To Be Enabled

🎉 All policy checks passed!

📖 Need help? See the Policy FAQ for details on every check and how to fix failures.

@therock-pr-bot therock-pr-bot Bot added the Not ready to Review PR has unresolved policy failures — reviews blocked label Jun 24, 2026
@therock-pr-bot

therock-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

🎉 All checks passed! This PR is ready for review.

@lamb-j lamb-j changed the title [Testing Only] Hotswap runtime hookup: relocated hsa-hotswap tool + HSA_TOOLS_LIB hip-tests ci(hotswap): test runtime hookup of relocated hsa-hotswap tool Jun 24, 2026
Fixes configure error "Subproject hsa-hotswap requires AMDGPU targets but none
were selected." Matches amd-comgr and other host-only runtime components; the
target vars are ignored by this host-only tool.
@therock-pr-bot therock-pr-bot Bot removed the Not ready to Review PR has unresolved policy failures — reviews blocked label Jun 24, 2026
lamb-j added 4 commits June 24, 2026 13:50
The hsa-hotswap subproject was added to core-runtime's SUBPROJECT_DEPS but the
artifact descriptor (artifact-core-runtime.toml) is scoped per-subproject stage
dir, so libhsa-hotswap.so was never captured. The gfx94X hip-tests hookup step
then failed: "Hotswap tool not found at ./build/lib/libhsa-hotswap.so".

Add the core/hsa-hotswap/stage component entries (optional, since the subproject
only builds when THEROCK_ENABLE_HOTSWAP).
Pulls in the two new #7629 commits (OnUnload use-after-free fix; opt-in
HSA_HOTSWAP_VERBOSE logging), plus the rocjitsu CLI install. Temporary test pin
-> users/lambj/hotswap-test-integration.
…ests

Replace the hip-tests-wide HSA_TOOLS_LIB hookup with a minimal sanity check.
rocminfo triggers hsa_init, which is when ROCr dlopen's HSA_TOOLS_LIB tools, so
it is sufficient to confirm libhsa-hotswap.so loads cleanly without running the
full hip-tests suite.

The test keys off whether libamd_comgr.so exports amd_comgr_hotswap_rewrite (a
reliable signal that THEROCK_ENABLE_HOTSWAP was on): if so, libhsa-hotswap.so
must be packaged and must load cleanly under rocminfo (allowlist keeps it inert
on non-gfx1250, so rocminfo still succeeds); otherwise the test skips.

- tests/test_rocm_sanity.py: add test_hotswap_tool_loads.
- .github/workflows/test_component.yml: drop the hip-tests HSA_TOOLS_LIB step.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: TODO

Development

Successfully merging this pull request may close these issues.

1 participant