ci(hotswap): test runtime hookup of relocated hsa-hotswap tool#6094
Open
lamb-j wants to merge 7 commits into
Open
ci(hotswap): test runtime hookup of relocated hsa-hotswap tool#6094lamb-j wants to merge 7 commits into
lamb-j wants to merge 7 commits into
Conversation
Adapt TheRock to the refactored hotswap layout: the HSA_TOOLS_LIB tool moves from comgr (libamd_comgr_hotswap_tool.so, removed by ROCm/llvm-project#3007) to rocm-systems projects/hotswap (libhsa-hotswap.so). - compiler/CMakeLists.txt: drop the removed HOTSWAP_BUILD_TOOL args; keep COMGR_ENABLE_HOTSWAP_TRANSPILE (comgr still provides the rewrite API). - core/CMakeLists.txt: declare the hsa-hotswap subproject (rocm-systems projects/hotswap), depending on amd-comgr + ROCR-Runtime, and add it to the core-runtime artifact so libhsa-hotswap.so is packaged. TEMPORARY (testing only, to be dropped once the real pins advance via the normal SMP process): - compiler/amd-llvm -> users/lambj/therock-hotswap-cherrypick-v4 (v3 + #3007) - rocm-systems -> users/lambj/hotswap-test-integration (pin + #7629 + #7715)
Hookup test for the relocated HSA tool: set HSA_TOOLS_LIB to the packaged libhsa-hotswap.so during the Linux hip-tests stage so ROCr exercises the tool-load path. The allowlist is gfx1250->gfx1250 only (rocm-systems #7715), so the tool stays inert on gfx942 and must not change results. Fails loudly if the tool is missing, since THEROCK_ENABLE_HOTSWAP is on by default.
✅ All Policy Checks Passed
📖 Need help? See the Policy FAQ for details on every check and how to fix failures. |
|
🎉 All checks passed! This PR is ready for review. |
Fixes configure error "Subproject hsa-hotswap requires AMDGPU targets but none were selected." Matches amd-comgr and other host-only runtime components; the target vars are ignored by this host-only tool.
The hsa-hotswap subproject was added to core-runtime's SUBPROJECT_DEPS but the artifact descriptor (artifact-core-runtime.toml) is scoped per-subproject stage dir, so libhsa-hotswap.so was never captured. The gfx94X hip-tests hookup step then failed: "Hotswap tool not found at ./build/lib/libhsa-hotswap.so". Add the core/hsa-hotswap/stage component entries (optional, since the subproject only builds when THEROCK_ENABLE_HOTSWAP).
Pulls in the two new #7629 commits (OnUnload use-after-free fix; opt-in HSA_HOTSWAP_VERBOSE logging), plus the rocjitsu CLI install. Temporary test pin -> users/lambj/hotswap-test-integration.
…ests Replace the hip-tests-wide HSA_TOOLS_LIB hookup with a minimal sanity check. rocminfo triggers hsa_init, which is when ROCr dlopen's HSA_TOOLS_LIB tools, so it is sufficient to confirm libhsa-hotswap.so loads cleanly without running the full hip-tests suite. The test keys off whether libamd_comgr.so exports amd_comgr_hotswap_rewrite (a reliable signal that THEROCK_ENABLE_HOTSWAP was on): if so, libhsa-hotswap.so must be packaged and must load cleanly under rocminfo (allowlist keeps it inert on non-gfx1250, so rocminfo still succeeds); otherwise the test skips. - tests/test_rocm_sanity.py: add test_hotswap_tool_loads. - .github/workflows/test_component.yml: drop the hip-tests HSA_TOOLS_LIB step.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ISSUE ID: #6096
Motivation
Exploratory / debugging PR for the refactored hotswap layout. Failures are expected and acceptable — this PR is to debug the end-to-end runtime hookup of the relocated HSA tool, not to land as-is.
The hotswap HSA_TOOLS_LIB tool has moved out of comgr (
libamd_comgr_hotswap_tool.so, removed by ROCm/llvm-project#3007) into rocm-systemsprojects/hotswap(libhsa-hotswap.so). comgr now provides only theamd_comgr_hotswap_rewriteAPI.What this PR does
HOTSWAP_BUILD_TOOLargs; keepCOMGR_ENABLE_HOTSWAP_TRANSPILE(comgr keeps the rewrite API).hsa-hotswapsubproject (rocm-systemsprojects/hotswap), depending onamd-comgr+ROCR-Runtime, and add it to thecore-runtimeartifact solibhsa-hotswap.sois packaged.HSA_TOOLS_LIB=<artifacts>/lib/libhsa-hotswap.soon the Linux hip-tests stage so ROCr exercises the tool-load path. With the allowlist restricted togfx1250->gfx1250(rocm-systems #7715), the tool is inert on gfx942 and must not change results.What this tests (and what it does not)
dlopenit (deps resolve, HSA tool ABI present), and it stays inert on gfx942 (no regression).Temporary SMP bumps (to be removed)
These pin to test branches and will be dropped once the real content lands via the normal SMP process. The tables below track exactly what each temp branch carries on top of its base pin — we'll keep appending here as we add debug commits.
compiler/amd-llvm→ROCm/llvm-projectusers/lambj/therock-hotswap-cherrypick-v4Base = current TheRock/main pin
46fcb339(already carries the hotswap cherry-pick stack through #2987, landed via #6007). Only delta on top:aa451e1frocm-systems→ROCm/rocm-systemsusers/lambj/hotswap-test-integrationBase SMP pin:
a0952b2b· PRs / commits on top (integration branch tipf34d63914e):HSA_HOTSWAP_VERBOSEloggingc3bde1d6)Known risks
hsa-hotswapsubproject wiring is a first attempt; theamd_comgr/hsa-runtime64CONFIG package resolution in the superbuild is the most likely failure point.therock_test_validate_shared_libwill hard-fail the build iflibhsa-hotswap.soisn't produced — intentional, to surface packaging misses.