Skip to content

[AMDGPU][COMGR] Add hotswap entry trampolines#3008

Draft
harsh-amd wants to merge 14 commits into
ROCm:amd-stagingfrom
harsh-amd:comgr-hotswap-entry-trampolines
Draft

[AMDGPU][COMGR] Add hotswap entry trampolines#3008
harsh-amd wants to merge 14 commits into
ROCm:amd-stagingfrom
harsh-amd:comgr-hotswap-entry-trampolines

Conversation

@harsh-amd

@harsh-amd harsh-amd commented Jun 22, 2026

Copy link
Copy Markdown

Summary

  • Add opt-in gfx125x kernel-entry trampoline rewriting through AMD_COMGR_HOTSWAP_ENTRY_TRAMPOLINES.
  • Keep the trampoline pass separate from B0-to-A0 instruction patch dispatch.
  • Update COMGR ELF handling and unit/lit coverage for appended entry stubs, descriptor fixups, and prefetch-size clearing.
  • Update COMGR MC API usage for the standalone hotswap build against newer LLVM MC APIs.

Stack

  1. [AMDGPU][COMGR] Remove COMGR hotswap HSA tool #3007: COMGR hotswap tool removal.
  2. This PR: trampoline implementation.
  3. [AMDGPU][COMGR] Add hotswap text displacement infrastructure #3000: displacement infrastructure.

GitHub cannot use a fork-only branch as the ROCm PR base. The layer-only diff is harsh-amd:comgr-hotswap-tool-removal...harsh-amd:comgr-hotswap-entry-trampolines.

Testing

  • git diff --check comgr-hotswap-tool-removal..comgr-hotswap-entry-trampolines
  • make -j$(nproc) HotswapElfTests HotswapMCTests hotswap-rewrite in build-comgr-displacement-shared
  • build-comgr-displacement-shared/test-unit/HotswapElfTests
  • build-comgr-displacement-shared/test-unit/HotswapMCTests
  • make -j$(nproc) test-lit in build-comgr-displacement-shared (70 passed / 11 unsupported)

@harsh-amd harsh-amd force-pushed the comgr-hotswap-entry-trampolines branch 7 times, most recently from 3a769bf to b070a45 Compare June 25, 2026 18:26
Port the non-blit loader behavior from ROCm/rocm-systems#7581 into COMGR's opt-in hotswap entry-trampoline path.

Keep COMPUTE_PGM_RSRC3.INST_PREF_SIZE intact and append an s_code_end guard sized from max(INST_PREF_SIZE * 128 - 256, 0) so prefetch from appended stubs stays within readable .text bytes.

Enable entry trampolines for the gfx12.5/gfx125 family, including gfx12-5-generic, while keeping B0/A0 instruction patching limited to gfx1250.
@harsh-amd harsh-amd force-pushed the comgr-hotswap-entry-trampolines branch from b070a45 to 1819c8c Compare June 25, 2026 18:46
Use COMGR ISA parsing for hotswap target identifiers and centralize gfx12-5-generic metadata.

Resolve hotswap stub opcodes through MC assembly parsing instead of TableGen mnemonic names, and add fallback diagnostics for rejected byte replacements.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotswap Related to the Comgr Hotswap feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants