Skip to content

[hotswap] Fixes to use projects/hotswap libhsa_hotswap.so and resolve header paths#7629

Merged
harsh-amd merged 10 commits into
ROCm:developfrom
nirmie:users/nirmie/hotswap-hsa-fixes
Jun 26, 2026
Merged

[hotswap] Fixes to use projects/hotswap libhsa_hotswap.so and resolve header paths#7629
harsh-amd merged 10 commits into
ROCm:developfrom
nirmie:users/nirmie/hotswap-hsa-fixes

Conversation

@nirmie

@nirmie nirmie commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

Motivation

This PR contains two components. One component is to update to use the tooling bundled in projects/hotswap instead of the tool that exists at comgr (deprecated). The other component is to resolve amd_comgr header issues during the build. Additionally, there are some changes to the CMakeLists.txt to have the hotswap HSA tool against hsa-runtime64::hsa-runtime64.

Technical Details

Builds on top of Harsh's PRs here: #7577 (deprecated, #7577 code is directly in this)
Relevant PRs: amd-llvm: ROCm/llvm-project#3007

JIRA ID

N/A

Test Plan

Contained build for the tool (verifies if linking resolves)

git sparse-checkout add projects/rocr-runtime/runtime/hsa-runtime/inc 2>/dev/null || true

P=~/hotswap/npi-build/_gapfix/comgr-prefix
DIST=~/hotswap/npi-build/therock/build/dist/rocm

cmake -S projects/hotswap -B /tmp/build-hotswap -GNinja -DCMAKE_BUILD_TYPE=Release \
  -DCMAKE_PREFIX_PATH="$DIST" \
  -Damd_comgr_DIR="$P/lib/cmake/amd_comgr" \
  -Dhsa-runtime64_DIR="$DIST/lib/cmake/hsa-runtime64"

cmake --build /tmp/build-hotswap --target hsa-hotswap

Tool name testing must be done with full build

Test Result

Ran:
ls -l /tmp/build-hotswap/libhsa-hotswap.so
nm -D /tmp/build-hotswap/libhsa-hotswap.so | grep hotswap_rewrite

nm -D /tmp/build-hotswap/libhsa-hotswap.so | grep hotswap_rewrite
-rwxrwxr-x 1 nisenthi nisenthi 50480 Jun 22 18:45 /tmp/build-hotswap/libhsa-hotswap.so
                 U amd_comgr_hotswap_rewrite@amd_comgr_3.2

Submission Checklist

@nirmie nirmie marked this pull request as ready for review June 22, 2026 19:05
@nirmie nirmie changed the title [hotswap] Fixes to build on top of #5977 to update to new tooling name/resolve header paths [hotswap] Fixes to build on top of #7577 to update to new tooling name/resolve header paths Jun 22, 2026
nirmie added 3 commits June 22, 2026 23:22
The ROCr v1 tool-load carve-out (runtime.cpp) and the CLR hotswap
enable-gate (clr/rocclr/device/hotswap.hpp) matched the old comgr-hosted
tool name "libamd_comgr_hotswap_tool.so". After the HSA tool moved to
rocm-systems' libhsa-hotswap.so, the name no longer matches: on a
rocprofiler-register (v3) box allow_v1_registration stays false, so ROCr
never invokes the tool's OnLoad and no rewrite is applied (silent). Re-key
both constants to "libhsa-hotswap.so".
Installed comgr ships its public header at include/amd_comgr/amd_comgr.h
and its CMake package exposes the include root (-Iinclude), so the flat
#include <amd_comgr.h> does not resolve when building against an installed
comgr (the path TheRock builds the tool against). Use
"amd_comgr/amd_comgr.h" to match every other comgr consumer in
rocm-systems (clr comgrctx.hpp/devkernel.hpp/devprogram.hpp, rocdbgapi,
rocprofiler-sdk).
Fold in the install rule and HSA runtime linkage previously
provided by ROCm#7577 so this branch no longer depends on it:
add GNUInstallDirs, find_package(hsa-runtime64), link
hsa-runtime64::hsa-runtime64, and install(TARGETS hsa-hotswap).
@nirmie nirmie force-pushed the users/nirmie/hotswap-hsa-fixes branch from b27da72 to 8fa66e4 Compare June 22, 2026 23:24
@nirmie nirmie changed the title [hotswap] Fixes to build on top of #7577 to update to new tooling name/resolve header paths [hotswap] Fixes to use projects/hotswap libhsa_hotswap.so and resolve header paths Jun 22, 2026
@nirmie nirmie force-pushed the users/nirmie/hotswap-hsa-fixes branch from b011b8f to f4571ca Compare June 23, 2026 06:26
Comment thread projects/hotswap/hotswap.hpp Outdated
Comment thread projects/hotswap/hotswap.hpp
@nirmie nirmie force-pushed the users/nirmie/hotswap-hsa-fixes branch 2 times, most recently from 1406d54 to e6b00f2 Compare June 23, 2026 16:13
The hand-rolled metadata-note parser (find_isa_in_metadata) over-read the
msgpack-encoded ISA string, spilling past the gfx target into the next note
key, so COMGR rejected the malformed source ISA (rc=2) and every rewrite fell
back to the original code object.

Read the source ISA from the code object via COMGR's
amd_comgr_get_data_isa_name instead. It uses LLVM's canonical parser, so it is
correct and tracks triple normalization automatically, and it removes the
fragile hand-rolled ELF/note parsing. The target ISA stays the running GPU's
ISA (from the HSA agent), so amd_comgr_hotswap_rewrite still receives source
(code object) and target (GPU) as designed.

Also drops the now-dead note-parsing helpers and updates the unit test.
Comment thread projects/hotswap/tests/hotswap_test.cpp
Comment thread projects/hotswap/CMakeLists.txt
Comment thread projects/hotswap/hotswap.cpp
Move code-object ISA discovery into GetCodeObjectIsaName (still via
amd_comgr_get_data_isa_name) and have the HSA tool derive source_isa from the
code object and target_isa from the agent, passing both into
RetargetCodeObject. This keeps RetargetCodeObject a pure rewrite primitive
whose source/target can be overridden (e.g. cross-gen).
Comment thread projects/hotswap/tests/fixtures/gfx1250_min.hsaco

@harsh-amd harsh-amd left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

Add a minimal gfx1250 fixture (tests/fixtures/gfx1250_min.hsaco), embedded
into the test binary at build time, plus tests exercising GetCodeObjectIsaName
and RetargetCodeObject end-to-end. GPU-free and file-free, so it runs on any
CI runner.
@nirmie nirmie force-pushed the users/nirmie/hotswap-hsa-fixes branch from f524b17 to a13f50e Compare June 23, 2026 19:38
nirmie added 3 commits June 25, 2026 01:53
OnUnload is driven by libamdhip64's atexit handler (hsa_shut_down ->
UnloadTools), which runs AFTER this library's C++ static destructors because
the tool is dlopen'd by hsa_init (its __cxa_atexit dtors are LIFO-earlier).
With ordinary file-scope statics, the reader-map / rewritten-ELF containers and
their mutexes were destroyed first, then OnUnload's clear()/lock touched freed
memory -> SIGSEGV during process exit (confirmed on gfx1250 A0: ~unique_ptr
calling a tcache-corrupted deleter).

Make these four objects never-destroyed heap singletons so their lifetime spans
past static destruction. OnUnload still clear()s the containers and frees every
malloc'd ELF buffer; only the small container shells persist to exit. Not a
payload leak.
Add a runtime-gated HOTSWAP_LOG(...) macro (driven by HSA_HOTSWAP_VERBOSE,
off by default) that traces each intercepted reader-create, load_agent_code_object
(with has_bytes), the gate decision, and each rewrite (src/tgt ISA, rc, in/out
size, changed). No rebuild needed to toggle; when off the cost is one cached bool
load per site. Used to confirm on real gfx1250 A0 that torch/hipBLASLt kernels
are intercepted and rewritten.
…etons)

The OnUnload use-after-free is better avoided by NOT touching heap-owning state
at teardown than by making that state immortal. OnUnload now only restores the
HSA API-table function pointers (POD writes); it no longer locks the mutexes or
clear()s g_reader_map / g_rewritten_elfs. Those revert to ordinary file-scope
statics, freed once by their normal static destructors at process exit.

Mirrors the original comgr hotswap tool (llvm-project ROCm#2936), whose OnUnload
likewise only restored the table and never crashed at teardown. Safe because
ROCr dereferences the retained ELF bytes during the executable's life
(debugger/profiler queries), not during teardown.

Supersedes the immortal-singleton approach from 56ed59c.
@harsh-amd harsh-amd merged commit b70e3cf into ROCm:develop Jun 26, 2026
60 of 64 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants