Skip to content

Use Triton portable intrinsics in device_utils#511

Merged
mawad-amd merged 3 commits into
mainfrom
muhaawad/portable-device-utils
Apr 16, 2026
Merged

Use Triton portable intrinsics in device_utils#511
mawad-amd merged 3 commits into
mainfrom
muhaawad/portable-device-utils

Conversation

@mawad-amd

Copy link
Copy Markdown
Collaborator

Summary

  • Replace hardcoded CDNA inline assembly in device_utils.py with Triton's architecture-aware APIs (tl.extra.hip.memrealtime, tl.extra.hip.smid, triton.language.target_info constexpr checks) so iris tracing works on all supported GPU families
  • Pass TRACING constexpr through to the gluon all-gather kernel so record_event_start/end events are emitted when tracing is enabled
  • Remove unused get_se_id()

Details

device_utils.py previously hardcoded CDNA-only assembly (s_memrealtime, HW_REG_XCC_ID, HW_REG_HW_ID), which meant iris tracing was broken on any non-CDNA target. Triton already provides portable equivalents:

Function Before After
read_realtime() s_memrealtime inline asm tl.extra.hip.memrealtime() — emits correct instruction per arch
get_cu_id() HW_REG_HW_ID bits [11:8] tl.extra.hip.smid() — reads CU_ID (CDNA) or WGP_ID (RDNA)
get_xcc_id() HW_REG_XCC_ID always Constexpr arch check: inline asm on multi-XCC parts, 0 elsewhere
get_se_id() HW_REG_HW_ID bits [15:13] Removed (unused)

The gluon all-gather kernel now accepts a TRACING: gl.constexpr = False parameter and passes it to IrisDeviceCtx.initialize(). When tracing is disabled (default), all tracing code is DCE'd at compile time — zero overhead.

Test plan

  • Run existing all-gather tests on CDNA hardware (MI300X/MI325X) to confirm no regression
  • Enable tracing (shmem.tracing.enable()) and verify events are recorded
  • Verify constexpr DCE: get_xcc_id() on single-die targets should compile to constant 0

🤖 Generated with Claude Code

Replace hardcoded CDNA inline assembly with Triton's architecture-aware
APIs so iris tracing works across all supported GPU families:

- read_realtime(): delegate to tl.extra.hip.memrealtime() which emits
  the correct timestamp instruction per architecture
- get_cu_id(): delegate to tl.extra.hip.smid() which reads the right
  hardware register (CU_ID on CDNA, WGP_ID on RDNA)
- get_xcc_id(): use constexpr arch detection to read HW_REG_XCC_ID on
  multi-XCC parts, return 0 elsewhere
- Remove unused get_se_id()
- Pass TRACING constexpr through to the gluon all-gather kernel so
  record_event_start/end are emitted when tracing is enabled

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mawad-amd mawad-amd requested review from BKP and neoblizz as code owners April 15, 2026 23:22
Copilot AI review requested due to automatic review settings April 15, 2026 23:22
@github-actions github-actions Bot added in-progress We are working on it iris Iris project issue labels Apr 15, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Note

Copilot was unable to run its full agentic suite in this review.

This PR replaces AMD CDNA-specific inline assembly in Iris device utilities with Triton’s HIP portable intrinsics and threads a TRACING constexpr through the Gluon all-gather kernel so tracing events are compiled in only when enabled.

Changes:

  • Replace s_memrealtime and HW register reads with tl.extra.hip.memrealtime() / tl.extra.hip.smid() and target-info constexpr checks.
  • Make get_xcc_id() return 0 on single-die targets while retaining multi-XCC support on CDNA3/4.
  • Add a TRACING: gl.constexpr parameter to the Gluon all-gather kernel and pass host-side tracing enablement through launch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
iris/device_utils.py Swaps hardcoded CDNA asm for Triton HIP intrinsics and adds target-aware get_xcc_id() behavior.
iris/ccl/all_gather.py Adds TRACING constexpr to kernel and wires host-side tracing enablement into the launch.

Comment thread iris/device_utils.py
Comment thread iris/device_utils.py
Comment thread iris/ccl/all_gather.py Outdated
mawad-amd and others added 2 commits April 15, 2026 16:48
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mawad-amd mawad-amd merged commit c6ebe9c into main Apr 16, 2026
33 checks passed
@mawad-amd mawad-amd deleted the muhaawad/portable-device-utils branch April 16, 2026 06:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

in-progress We are working on it iris Iris project issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants