Skip to content

[CI] Add Windows variant (Build + 3 test jobs)#196

Merged
lamb-j merged 14 commits into
amd-stagingfrom
users/lambj/spirv-ci-windows
May 13, 2026
Merged

[CI] Add Windows variant (Build + 3 test jobs)#196
lamb-j merged 14 commits into
amd-stagingfrom
users/lambj/spirv-ci-windows

Conversation

@lamb-j

@lamb-j lamb-j commented May 10, 2026

Copy link
Copy Markdown
Collaborator

Summary

Adds a Windows variant of the SPIRV CI alongside the existing Linux one. New PR rollup checks:

SPIRV Compiler CI / Windows::release / Build
SPIRV Compiler CI / Windows::release / Test SPIRV translator lit
SPIRV Compiler CI / Windows::release / Test LLVM SPIRV codegen
SPIRV Compiler CI / Windows::release / Test Comgr

Same shape as the Linux variant — workflow_call reusable workflow invoked from a Windows::release job in the dispatcher.

TheRock convention alignment

  • Runner: azure-windows-scale-rocm (static label)
  • No container — Windows runs natively (TheRock convention; Windows containers are too heavy)
  • MSVC env via ilammy/msvc-dev-cmd so cmake's -GNinja finds cl.exe / link.exe
  • Pinned ninja 1.12.1 via Chocolatey (TheRock pins this — 1.13.0 has a bug)
  • bash shell defaults for all steps (TheRock pattern)
  • git config --global core.longpaths true for the deep llvm-project tree
  • No strip step — PE/COFF debug info lives in separate .pdb files, not embedded in .exe/.dll

Risks / things to watch

  • Wall time: Windows LLVM builds typically 25-45 min vs ~14 min on Linux. Total PR wall-clock roughly 2x.
  • Test-side unknowns: Windows lit + Comgr fixture generation aren't proven on this runner. If a specific test job chronically fails, easy to break it into a follow-up PR (needs: build makes the structure orthogonal).
  • Translator-lit comment marker uses spirv-ci:translator-lit-windows (vs :translator-lit on Linux) so the two platforms don't fight over the same sticky comment.
  • Cascade-rebuild bug we're chasing on Linux ([CI debug - DO NOT MERGE] Diagnose ninja cascade-rebuild on test side #195) likely affects Windows too — same tar -xmf / artifact pattern. Fix lands in [CI debug - DO NOT MERGE] Diagnose ninja cascade-rebuild on test side #195's follow-up; will apply symmetrically here.

Not in scope

  • Windows variant of the ROCm/llvm-project copy of this workflow (PR #2451). Will port once Windows is proven on the translator side.
  • ccache for cross-PR build reuse on Windows (TheRock uses Chocolatey ccache; defer until we're past the cascade-rebuild fix).

Adds Windows::release dispatcher job + spirv-ci-windows.yml — full
parity with the Linux variant: Build + Test SPIRV translator lit (with
baseline-diff + gate) + Test LLVM SPIRV codegen + Test Comgr.

Mirrors TheRock's Windows CI conventions:
  - Runner: azure-windows-scale-rocm
  - No container (Windows runs natively)
  - MSVC env via ilammy/msvc-dev-cmd action
  - Pinned ninja 1.12.1 via Chocolatey (1.13.0 has a known bug)
  - bash shell defaults
  - git long paths enabled
  - No strip step (PE/COFF debug info lives in separate .pdb)

Translator-lit uses a separate sticky-comment marker
(spirv-ci:translator-lit-windows) so it doesn't fight with the Linux
variant's comment on the same PR.

After this lands the PR rollup gets 4 more checks:
  SPIRV Compiler CI / Windows::release / Build
  SPIRV Compiler CI / Windows::release / Test SPIRV translator lit
  SPIRV Compiler CI / Windows::release / Test LLVM SPIRV codegen
  SPIRV Compiler CI / Windows::release / Test Comgr

Wall-time risk: Windows build is typically 25-45 min vs ~14 min on
Linux. Total wall-clock for a PR roughly 2x.

Test-side risk: Windows lit + Comgr fixture generation are unproven
on this runner. If individual test jobs flake, can break them out
into follow-up PRs.

Companion: ROCm/llvm-project copy of this workflow doesn't get the
Windows variant in this PR — defer until Windows is proven on the
translator side.
@github-actions

github-actions Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

SPIRV translator lit suite: clean on both PR head and amd-staging baseline.

🔴 New failures (0) — likely caused by this PR

(none)

🟢 Fixed by this PR (0) — failing on baseline, passing here

(none)

⚠️ Pre-existing on `amd-staging` (0)

(none)

lamb-j added 3 commits May 10, 2026 17:40
Same fix as #197 for the Linux variant.
tar -m sets per-file mtimes from sequential extraction order; build.ninja
ends up older than CMakeCache.txt, triggering ninja's regen rule and
cascade rebuild. Touch build.ninja explicitly to make it the newest
file in the tree.
Comgr's find_package(AMDDeviceLibs) fails on Windows. Add a one-shot
debug step before the Comgr configure to print:
  - $PWD (validates path expansion in MSVC bash)
  - file path of AMDDeviceLibs*.cmake under build-device-libs
  - any cmake/ subdirs under build-device-libs

Once we see where the file actually lands we can pin AMDDeviceLibs_DIR
explicitly. Revert this debug after the targeted fix.
Comgr's find_package(AMDDeviceLibs) failed on Windows even though the
config file was at the expected build-device-libs/lib/cmake/AMDDeviceLibs/
location. Cause: $PWD in MSYS bash returns a Unix-style path
(/c/home/runner/...) that cmake's find_package can't traverse on Windows.

Switch device-libs and Comgr configures to compute PWD_WIN=$(pwd -W)
which returns Windows-style with forward slashes (C:/home/runner/...)
that cmake handles natively.

Also pin AMDDeviceLibs_DIR explicitly as belt-and-suspenders.

Drop the debug step.
@github-actions

github-actions Bot commented May 11, 2026

Copy link
Copy Markdown
Contributor

SPIRV translator lit suite (Windows): clean on both PR head and amd-staging baseline.

🔴 New failures (0) — likely caused by this PR

(none)

🟢 Fixed by this PR (0) — failing on baseline, passing here

(none)

⚠️ Pre-existing on `amd-staging` (0)

(none)

lamb-j added 7 commits May 10, 2026 20:27
Comgr's HotswapMCTests gtest binary fails to link on Windows because
the hotswap apply* path is fully MSVC-guarded out (see
ROCm/llvm-project#2479). Replace `ninja check-comgr` with
`ninja test test-lit` on Windows so we run the lit suite and ctest
without dragging in the gtest aggregator (test-unit).

Linux variant unchanged — still runs full check-comgr including the
gtest binaries.
Drop test-unit (gtest binaries) at configure time so check-comgr's lit
and ctest layers still work on Windows. HotswapMCTests fails to link
because of the MSVC-guarded hotswap apply* path
(ROCm/llvm-project#2479); rather than special-casing the test step,
sed out add_subdirectory(test-unit) from Comgr's CMakeLists.txt before
configuring.

This is a workflow-side hack — when the upstream Comgr issue is
resolved (or Comgr exposes a CMake option to gate test-unit), drop
the sed and revert to plain check-comgr.
Two of the Comgr lit failures on Windows (unbundle, cache) were hitting
"Compression not supported" from clang-offload-bundler — LLVM was built
without zstd or zlib. The manylinux container ships zstd dev libs on
Linux; on Windows we install via vcpkg (already present at $VCPKG_ROOT)
and pass CMAKE_TOOLCHAIN_FILE + LLVM_ENABLE_ZSTD=FORCE_ON so the
configure fails loudly if zstd isn't located.
The runner has two vcpkg installs: C:\vcpkg (first in PATH) and the
MSVC-bundled one at $VCPKG_ROOT. Plain `vcpkg install` resolved to
C:\vcpkg, but Configure LLVM's CMAKE_TOOLCHAIN_FILE points at the
MSVC-bundled vcpkg.cmake, so find_package(zstd) couldn't see the
package. Invoke $VCPKG_ROOT\vcpkg.exe directly so install and lookup
share the same root.
The MSVC-bundled vcpkg at $VCPKG_ROOT is manifest-mode only — `vcpkg
install <pkg>` errors out with "this distribution does not have a
classic mode instance". The other vcpkg on the runner (C:\vcpkg, what
plain `vcpkg` resolves to via PATH) supports classic mode. Point both
the install command and CMAKE_TOOLCHAIN_FILE at C:\vcpkg so the
package zstd installs and the LLVM cmake find_package locate it.
LLVM built with zstd exports LLVMSupport linking against
zstd::libzstd_shared. Downstream find_package(LLVM) callers in
device-libs and Comgr need CMAKE_TOOLCHAIN_FILE pointing at the same
vcpkg.cmake so the imported zstd target is defined when the export
file is processed; otherwise cmake errors with "target was not found".
build.ninja embeds C:/vcpkg/installed/.../zstd.lib as a build-edge
input that ninja validates at planning time. Test jobs running
ninja against the downloaded artifact need the same vcpkg state as
the build runner. Install zstd in each Windows test job to match.

This pattern doesn't scale — every new build-time system dep
requires updates to N test jobs. Followup planned to drop ninja
from test jobs in favor of llvm-lit-direct invocation, which
removes the test-time dependency on build-time system libs.
lamb-j added a commit that referenced this pull request May 12, 2026
Temporarily check out llvm-project at refs/pull/2491/head instead of
amd-staging in the Windows variant only. Goal: verify whether PR #2491
fixes the 12 hotswap-* lit failures on Windows. Revert before merging
PR #196 (or after PR #2491 lands and trickles to amd-staging).
lamb-j added a commit that referenced this pull request May 12, 2026
Drops the SED that strips add_subdirectory(test-unit) so
HotswapMCTests builds and runs. Pairs with the prior [TEMP] commit
checking out llvm-project at refs/pull/2491/head. Both [TEMP]
commits revert before merging PR #196.
ROCm/llvm-project#2491 (merged to amd-staging) fixes the
HotswapMCTests link failure on Windows that #2479 tracked. The SED
that stripped add_subdirectory(test-unit) is no longer needed —
test-unit builds and runs cleanly. check-comgr on Windows now
exercises lit + ctest + gtest, matching the Linux variant.
@lamb-j lamb-j force-pushed the users/lambj/spirv-ci-windows branch from eb4f2fb to 7da7df2 Compare May 13, 2026 07:20
@lamb-j lamb-j requested a review from kirthana14m May 13, 2026 18:22
@lamb-j lamb-j marked this pull request as ready for review May 13, 2026 18:22
lamb-j added 2 commits May 13, 2026 11:22
Build and test jobs both used \`ref: amd-staging\` for llvm-project
checkout. If amd-staging advances mid-run (e.g., upstream merge lands
between Build and Test), test jobs land on a newer commit than what
Build produced. ninja then sees source > object, decides to rebuild,
and uses the OLD compile command from build.ninja — which can lack
include paths added by the newer commit (recently hit when libc/shared
became a Support dependency for APFloat).

Build job now exports the resolved SHA as a job output. Test jobs
check out llvm-project at that exact SHA via needs.build.outputs.
Both Linux and Windows variants.
@lamb-j lamb-j merged commit 3b2c927 into amd-staging May 13, 2026
8 checks passed
@lamb-j lamb-j deleted the users/lambj/spirv-ci-windows branch May 13, 2026 23:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant