Skip to content

Use configure_pytorch_release_matrix.py to drive the CI job matrix#6082

Merged
ScottTodd merged 9 commits into
mainfrom
users/scotttodd/pytorch-shared-matrix
Jun 30, 2026
Merged

Use configure_pytorch_release_matrix.py to drive the CI job matrix#6082
ScottTodd merged 9 commits into
mainfrom
users/scotttodd/pytorch-shared-matrix

Conversation

@ScottTodd

@ScottTodd ScottTodd commented Jun 24, 2026

Copy link
Copy Markdown
Member

Motivation

Fixes #6030. On #5744 we added gfx1250 support to ROCm but support is not yet there in PyTorch across all supported release branches, so we filtered it from the GPU targets list for release workflows. We did not have a way to filter for CI workflows, so PRs that used the ci:run-all-archs label like bump PRs (example: #6020) were failing to build pytorch.

Now we'll get the same filtering across CI and releases.

Technical Details

This is like #5942 which added build_tools/github_actions/configure_rocm_python_test_matrix.py. Dynamically generating a job matrix using a script will give us greater flexibility (finer-grained inclusions/exclusions, opt-in to expand the matrix, etc.), at the cost of some code complexity and indirection.

Note

CI workflows use workflow_call to run pytorch jobs while release workflows use workflow_dispatch. This generates a new matrix for CI but continues to use the same workflow_dispatch inputs for releases, to keep the workflows easy to dispatch directly (otherwise you'd need to type out a JSON matrix to trigger). There is potential there for the generated/used matrix to be inconsistent across that boundary.

Test Plan

Test Result

  • gfx125X-dcgpu / gfx125x is filtered from pytorch builds except for the release/2.11 branch
  • a "build-pytorch" section is added to the "Multi-Arch CI Configuration" summary, listing which matrix jobs are going to be added to the build graph

Submission Checklist

Move PyTorch version, Python version, and PyTorch-specific family filtering into configure_pytorch_release_matrix.py, then thread the resulting matrix through multi-arch CI and release workflows.

Tests: D:/projects/TheRock/.venv/Scripts/python.exe -m pytest github_actions/tests/configure_pytorch_release_matrix_test.py github_actions/tests/configure_multi_arch_ci_test.py

Tests: pre-commit run --files <changed files>

Assisted-by: Codex
Replace the per-ref dict configuration with explicit release/CI ref lists and an unsupported-family map. Remove the unused matrix wrapper so callers use a single entry point with defaulted optional overrides.

Testing:

- D:/projects/TheRock/.venv/Scripts/python.exe -m pytest github_actions/tests/configure_pytorch_release_matrix_test.py github_actions/tests/configure_multi_arch_ci_test.py

- pre-commit run --files build_tools/github_actions/configure_pytorch_release_matrix.py build_tools/github_actions/configure_multi_arch_ci.py build_tools/github_actions/tests/configure_pytorch_release_matrix_test.py

Assisted-by: Codex
@ScottTodd ScottTodd added the gfx125x Issue/PR relates to gfx125x family label Jun 24, 2026
@therock-pr-bot

therock-pr-bot Bot commented Jun 24, 2026

Copy link
Copy Markdown

❌ PR Check — Action Required

Check Status Details
🌿 Branch Name ✅ Pass
📝 PR Title/Description ❌ Fail Error: Title does not follow Conventional Commits style.
Expected: start with a valid type (feat, fix, docs, …).
Desired format: type(optional-scope): short description
───
Error: PR description must reference a JIRA ID or ISSUE ID.
Expected: include a JIRA ID or ISSUE ID line. The separator may be : or - (or omitted), and the value can be a JIRA key, a number (with or without #), or a link. Accepted examples:
JIRA ID : TESTAUTO-6039
JIRA ID - #330
JIRA ID #330
ISSUE ID : TESTUTO-3334
ISSUE ID #3334
ISSUE ID - TESTAUTO-3433
ISSUE ID : https://github.com/<org_name>/<repo_name>/issues/1234
Current: no valid JIRA/ISSUE reference found
Forbidden Files ✅ Pass
🧪 Unit Test ✅ Pass
🔎 pre-commit ⏳ Pending ⏳ Still running…
🚫 Draft PR 🔜 To Be Enabled
🚩 Feature Flag 🔜 To Be Enabled
📊 Code Coverage 🔜 To Be Enabled

⚠️ 1 policy check(s) failed. Please address the issues above before this PR can be Reviewed.

🚫 Please fix the failed policies

  • ❌ PR Title/Description

The Not ready to Review label was added to this PR. Once all policies pass, the label is removed automatically.

📖 Need help? See the Policy FAQ for details on every check and how to fix failures.

@therock-pr-bot therock-pr-bot Bot added the Not ready to Review PR has unresolved policy failures — reviews blocked label Jun 24, 2026
@therock-pr-bot

Copy link
Copy Markdown

🚫 Please fix the failed policies before requesting reviews.

The following policy checks failed:

  • ❌ PR Title/Description

The Not ready to Review label has been added to this PR.
Once all policies pass, the label will be removed automatically.

Pass the PyTorch build toggle and release Python version into the multi-arch setup workflow so setup-generated build configs match the release workflows that consume them.

Also let manually triggered multi_arch_ci.yml runs skip PyTorch builds, and keep ASAN setup callers opted out explicitly.

Assisted-by: Codex
Avoid depending on the current real GPU-family support matrix when testing that an empty generated PyTorch matrix disables PyTorch builds.

Testing:

- D:/projects/TheRock/.venv/Scripts/python.exe -m pytest github_actions/tests/configure_multi_arch_ci_test.py

Assisted-by: Codex
@ScottTodd ScottTodd marked this pull request as ready for review June 26, 2026 15:54
@ScottTodd ScottTodd requested review from geomin12 and rahulc-gh June 26, 2026 15:54
linux_test_labels: ${{ inputs.linux_test_labels || '' }}
prebuilt_stages: ${{ inputs.prebuilt_stages || '' }}
baseline_run_id: ${{ inputs.baseline_run_id || '' }}
build_pytorch: false

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note how build_pytorch is now an input to setup_multi_arch.yml and how configure_multi_arch_ci.py no longer has build_pytorch=(suffix != "asan"),

Comment on lines +49 to +52
build_pytorch:
type: boolean
default: true
description: "Build PyTorch wheels"

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that CI https://github.com/ROCm/TheRock/actions/workflows/multi_arch_ci.yml now has this input, just like the release workflow:

Image

I'll also add "build_jax" in #6117

@ScottTodd

Copy link
Copy Markdown
Member Author

I have a few follow-ups that are blocked on this, would like reviews soon.

@ScottTodd ScottTodd removed the Not ready to Review PR has unresolved policy failures — reviews blocked label Jun 29, 2026

@rahulc-gh rahulc-gh left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +206 to +231
def _append_build_pytorch(lines: list[str], outputs: CIOutputs) -> None:
lines.append("| Platform | Python | PyTorch ref | Families |")
lines.append("|----------|--------|-------------|----------|")

rows = 0
for platform, config in [
("Linux", outputs.builds.linux),
("Windows", outputs.builds.windows),
]:
if config is None:
continue
for row in config.pytorch_build_matrix:
families = ", ".join(
f"`{family}`" for family in row["amdgpu_families"].split(";")
)
lines.append(
f"| {platform} | `{row['python_version']}` | "
f"`{row['pytorch_git_ref']}` | {families} |"
)
rows += 1

if rows == 0:
lines.append("| — | — | — | — |")
lines.append("")


Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is nice, having it in summary

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for cleaning it up , looks good to me.

@ScottTodd ScottTodd merged commit 554b425 into main Jun 30, 2026
148 of 197 checks passed
@ScottTodd ScottTodd deleted the users/scotttodd/pytorch-shared-matrix branch June 30, 2026 16:42
ScottTodd added a commit that referenced this pull request Jun 30, 2026
## Motivation

Progress on #5634 and
#6218.

This builds on #6117 to provide the
JAX equivalent to #6082. Now the
dynamic matrix of JAX builds to run will be generated at the _start_ of
CI and release pipelines during "setup" / "configure CI" and a table
will be included in the summary:
<img width="750" height="683" alt="image"
src="https://github.com/user-attachments/assets/37b21bd3-1906-4d2f-9732-0602fab9776a"
/>

## Technical Details

The new `build_jax` option now has these settings:

Workflow | `build_jax` value | notes
-- | -- | --
`multi_arch_ci.yml` | `false` | Matching previous behavior<br>can later
be enabled (opt-in, automatic based on files edited, etc.)
`multi_arch_ci_asan.yml` | `false` | New behavior due to
#6218
`multi_arch_release.yml` | `inputs.build_jax`<br>(default `true`) |
Matching previous behavior
`multi_arch_release_asan.yml` | `false` | New behavior due to
#6218

## Test Plan

* Dev release triggered to observe CI configuration:
https://github.com/ROCm/TheRock/actions/runs/28470132517

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

gfx125x Issue/PR relates to gfx125x family

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Issue][PyTorch] CI (Linux fat build): CK_BUFFER_RESOURCE_3RD_DWORD undeclared for gfx1250 in build_pytorch_wheel_fat CI job

2 participants