Use Triton portable intrinsics in device_utils#511
Merged
Conversation
Replace hardcoded CDNA inline assembly with Triton's architecture-aware APIs so iris tracing works across all supported GPU families: - read_realtime(): delegate to tl.extra.hip.memrealtime() which emits the correct timestamp instruction per architecture - get_cu_id(): delegate to tl.extra.hip.smid() which reads the right hardware register (CU_ID on CDNA, WGP_ID on RDNA) - get_xcc_id(): use constexpr arch detection to read HW_REG_XCC_ID on multi-XCC parts, return 0 elsewhere - Remove unused get_se_id() - Pass TRACING constexpr through to the gluon all-gather kernel so record_event_start/end are emitted when tracing is enabled Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Note
Copilot was unable to run its full agentic suite in this review.
This PR replaces AMD CDNA-specific inline assembly in Iris device utilities with Triton’s HIP portable intrinsics and threads a TRACING constexpr through the Gluon all-gather kernel so tracing events are compiled in only when enabled.
Changes:
- Replace
s_memrealtimeand HW register reads withtl.extra.hip.memrealtime()/tl.extra.hip.smid()and target-info constexpr checks. - Make
get_xcc_id()return0on single-die targets while retaining multi-XCC support on CDNA3/4. - Add a
TRACING: gl.constexprparameter to the Gluon all-gather kernel and pass host-side tracing enablement through launch.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| iris/device_utils.py | Swaps hardcoded CDNA asm for Triton HIP intrinsics and adds target-aware get_xcc_id() behavior. |
| iris/ccl/all_gather.py | Adds TRACING constexpr to kernel and wires host-side tracing enablement into the launch. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
device_utils.pywith Triton's architecture-aware APIs (tl.extra.hip.memrealtime,tl.extra.hip.smid,triton.language.target_infoconstexpr checks) so iris tracing works on all supported GPU familiesTRACINGconstexpr through to the gluon all-gather kernel sorecord_event_start/endevents are emitted when tracing is enabledget_se_id()Details
device_utils.pypreviously hardcoded CDNA-only assembly (s_memrealtime,HW_REG_XCC_ID,HW_REG_HW_ID), which meant iris tracing was broken on any non-CDNA target. Triton already provides portable equivalents:read_realtime()s_memrealtimeinline asmtl.extra.hip.memrealtime()— emits correct instruction per archget_cu_id()HW_REG_HW_IDbits [11:8]tl.extra.hip.smid()— reads CU_ID (CDNA) or WGP_ID (RDNA)get_xcc_id()HW_REG_XCC_IDalwaysget_se_id()HW_REG_HW_IDbits [15:13]The gluon all-gather kernel now accepts a
TRACING: gl.constexpr = Falseparameter and passes it toIrisDeviceCtx.initialize(). When tracing is disabled (default), all tracing code is DCE'd at compile time — zero overhead.Test plan
shmem.tracing.enable()) and verify events are recordedget_xcc_id()on single-die targets should compile to constant 0🤖 Generated with Claude Code