Use configure_pytorch_release_matrix.py to drive the CI job matrix#6082
Conversation
Move PyTorch version, Python version, and PyTorch-specific family filtering into configure_pytorch_release_matrix.py, then thread the resulting matrix through multi-arch CI and release workflows. Tests: D:/projects/TheRock/.venv/Scripts/python.exe -m pytest github_actions/tests/configure_pytorch_release_matrix_test.py github_actions/tests/configure_multi_arch_ci_test.py Tests: pre-commit run --files <changed files> Assisted-by: Codex
Replace the per-ref dict configuration with explicit release/CI ref lists and an unsupported-family map. Remove the unused matrix wrapper so callers use a single entry point with defaulted optional overrides. Testing: - D:/projects/TheRock/.venv/Scripts/python.exe -m pytest github_actions/tests/configure_pytorch_release_matrix_test.py github_actions/tests/configure_multi_arch_ci_test.py - pre-commit run --files build_tools/github_actions/configure_pytorch_release_matrix.py build_tools/github_actions/configure_multi_arch_ci.py build_tools/github_actions/tests/configure_pytorch_release_matrix_test.py Assisted-by: Codex
…orch-shared-matrix
❌ PR Check — Action Required
📖 Need help? See the Policy FAQ for details on every check and how to fix failures. |
|
🚫 Please fix the failed policies before requesting reviews. The following policy checks failed:
The |
Pass the PyTorch build toggle and release Python version into the multi-arch setup workflow so setup-generated build configs match the release workflows that consume them. Also let manually triggered multi_arch_ci.yml runs skip PyTorch builds, and keep ASAN setup callers opted out explicitly. Assisted-by: Codex
…orch-shared-matrix
Avoid depending on the current real GPU-family support matrix when testing that an empty generated PyTorch matrix disables PyTorch builds. Testing: - D:/projects/TheRock/.venv/Scripts/python.exe -m pytest github_actions/tests/configure_multi_arch_ci_test.py Assisted-by: Codex
…orch-shared-matrix
| linux_test_labels: ${{ inputs.linux_test_labels || '' }} | ||
| prebuilt_stages: ${{ inputs.prebuilt_stages || '' }} | ||
| baseline_run_id: ${{ inputs.baseline_run_id || '' }} | ||
| build_pytorch: false |
There was a problem hiding this comment.
Note how build_pytorch is now an input to setup_multi_arch.yml and how configure_multi_arch_ci.py no longer has build_pytorch=(suffix != "asan"),
| build_pytorch: | ||
| type: boolean | ||
| default: true | ||
| description: "Build PyTorch wheels" |
There was a problem hiding this comment.
Note that CI https://github.com/ROCm/TheRock/actions/workflows/multi_arch_ci.yml now has this input, just like the release workflow:
I'll also add "build_jax" in #6117
|
I have a few follow-ups that are blocked on this, would like reviews soon. |
| def _append_build_pytorch(lines: list[str], outputs: CIOutputs) -> None: | ||
| lines.append("| Platform | Python | PyTorch ref | Families |") | ||
| lines.append("|----------|--------|-------------|----------|") | ||
|
|
||
| rows = 0 | ||
| for platform, config in [ | ||
| ("Linux", outputs.builds.linux), | ||
| ("Windows", outputs.builds.windows), | ||
| ]: | ||
| if config is None: | ||
| continue | ||
| for row in config.pytorch_build_matrix: | ||
| families = ", ".join( | ||
| f"`{family}`" for family in row["amdgpu_families"].split(";") | ||
| ) | ||
| lines.append( | ||
| f"| {platform} | `{row['python_version']}` | " | ||
| f"`{row['pytorch_git_ref']}` | {families} |" | ||
| ) | ||
| rows += 1 | ||
|
|
||
| if rows == 0: | ||
| lines.append("| — | — | — | — |") | ||
| lines.append("") | ||
|
|
||
|
|
There was a problem hiding this comment.
This is nice, having it in summary
There was a problem hiding this comment.
thank you for cleaning it up , looks good to me.
## Motivation Progress on #5634 and #6218. This builds on #6117 to provide the JAX equivalent to #6082. Now the dynamic matrix of JAX builds to run will be generated at the _start_ of CI and release pipelines during "setup" / "configure CI" and a table will be included in the summary: <img width="750" height="683" alt="image" src="https://github.com/user-attachments/assets/37b21bd3-1906-4d2f-9732-0602fab9776a" /> ## Technical Details The new `build_jax` option now has these settings: Workflow | `build_jax` value | notes -- | -- | -- `multi_arch_ci.yml` | `false` | Matching previous behavior<br>can later be enabled (opt-in, automatic based on files edited, etc.) `multi_arch_ci_asan.yml` | `false` | New behavior due to #6218 `multi_arch_release.yml` | `inputs.build_jax`<br>(default `true`) | Matching previous behavior `multi_arch_release_asan.yml` | `false` | New behavior due to #6218 ## Test Plan * Dev release triggered to observe CI configuration: https://github.com/ROCm/TheRock/actions/runs/28470132517 ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
Motivation
Fixes #6030. On #5744 we added gfx1250 support to ROCm but support is not yet there in PyTorch across all supported release branches, so we filtered it from the GPU targets list for release workflows. We did not have a way to filter for CI workflows, so PRs that used the
ci:run-all-archslabel like bump PRs (example: #6020) were failing to build pytorch.Now we'll get the same filtering across CI and releases.
Technical Details
This is like #5942 which added
build_tools/github_actions/configure_rocm_python_test_matrix.py. Dynamically generating a job matrix using a script will give us greater flexibility (finer-grained inclusions/exclusions, opt-in to expand the matrix, etc.), at the cost of some code complexity and indirection.Note
CI workflows use
workflow_callto run pytorch jobs while release workflows useworkflow_dispatch. This generates a new matrix for CI but continues to use the same workflow_dispatch inputs for releases, to keep the workflows easy to dispatch directly (otherwise you'd need to type out a JSON matrix to trigger). There is potential there for the generated/used matrix to be inconsistent across that boundary.Test Plan
multi_arch_ci.ymlrun using prebuilt artifacts: https://github.com/ROCm/TheRock/actions/runs/28060884840/job/83076620803multi_arch_ci_asan.ymlrun using prebuilt artifacts: https://github.com/ROCm/TheRock/actions/runs/28202142368multi_arch_release.ymlrun: https://github.com/ROCm/TheRock/actions/runs/28187486907 --> triggered https://github.com/ROCm/TheRock/actions/runs/28196069145multi_arch_ci.ymlon this PR with the opt-in gfx125x label: https://github.com/ROCm/TheRock/actions/runs/28200714865?pr=6082Test Result
gfx125X-dcgpu/gfx125xis filtered from pytorch builds except for the release/2.11 branchSubmission Checklist