ci: cache target dir for CUDA codspeed bench (~7× faster build) by joseph-isaacs · Pull Request #8256 · vortex-data/vortex

joseph-isaacs · 2026-06-04T14:58:09Z

Problem

The CUDA codspeed bench shards run on ephemeral runs-on GPU runners with no prebuilt AMI (unlike the CPU shards, which use setup-prebuild on the -pre-v2 AMIs). cargo's target/ is cold on every run, so even though sccache (S3) serves ~98% of compilation, cargo still re-runs build scripts and re-links unchanged dependency crates.

What this does

Restore target/ between runs with Swatinem/rust-cache, keyed per shard and saved only on develop, so PRs restore from develop's cache without polluting it.

The cache is configured to match existing repo conventions as closely as possible:

extras=s3-cache on the runner spec routes the cache to the same S3 bucket as sccache — the same mechanism bench.yml / bench-pr.yml use. No separate GitHub Actions cache backend.
The Swatinem/rust-cache step mirrors its only other use in the repo (publish-dry-runs.yml): save-if: develop.

Net diff is CI-only: +12 / −1 in codspeed.yml.

Measurements (controlled experiment on the GPU runner)

Build	`BUILD_SECONDS`
Cold `target/`, sccache warm (`codegen-units = 1`)	118s

The often-quoted ~500s figure was from before [profile.bench] codegen-units = 16 landed and before sccache was warm; the realistic cold build is ~2 min. This branch is rebased on develop so it builds with codegen-units = 16; the fresh run measures that realistic baseline. Since save-if is develop-only, this PR's own runs are cold (no develop cache exists yet) — the warm benefit only appears after the first develop run populates the per-shard caches.

🤖 Generated with Claude Code

codspeed-hq · 2026-06-04T15:38:55Z

Merging this PR will improve performance by 23.7%

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚠️

Different runtime environments detected

Some benchmarks with significant performance changes were compared across different runtime environments,
which may affect the accuracy of the results.

Open the report in CodSpeed to investigate

⚡ 6 improved benchmarks
✅ 1501 untouched benchmarks

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`chunked_bool_canonical_into[(1000, 10)]`	46.6 µs	31.7 µs	+46.98%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[128]`	275.3 ns	216.9 ns	+26.89%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[1024]`	336.9 ns	278.6 ns	+20.94%
⚡	Simulation	`chunked_varbinview_into_canonical[(1000, 10)]`	213.2 µs	177.1 µs	+20.41%
⚡	Simulation	`bitwise_not_vortex_buffer_mut[2048]`	400.6 ns	342.2 ns	+17.05%
⚡	Simulation	`chunked_varbinview_canonical_into[(100, 100)]`	309.6 µs	274.7 µs	+12.71%

Tip

Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing ci/cuda-bench-target-cache (a1c1818) with develop (d97d2bd)}

The CUDA codspeed shards run on ephemeral GPU runners with no prebuilt AMI, so cargo's target/ dir is cold on every run. sccache (S3) already serves ~98% of compilation, but cargo still re-runs build scripts and re-links unchanged dependency crates. Restore target/ with rust-cache (as publish-dry-runs.yml does), keyed per shard and only saved on develop so PRs restore from develop's cache. The runner's extras=s3-cache (as bench.yml uses) routes it to the same S3 bucket as sccache, so there is no separate GitHub Actions cache backend. Signed-off-by: Joe Isaacs <joe.isaacs@live.co.uk>

joseph-isaacs force-pushed the ci/cuda-bench-target-cache branch from f415021 to 6e4adf1 Compare June 4, 2026 15:32

joseph-isaacs changed the title ~~ci: cache cargo target dir for CUDA codspeed shards~~ DIAGNOSTIC: why CUDA codspeed builds are slow (do not merge) Jun 4, 2026

joseph-isaacs marked this pull request as draft June 4, 2026 15:33

joseph-isaacs changed the title ~~DIAGNOSTIC: why CUDA codspeed builds are slow (do not merge)~~ ci: cache target dir for CUDA codspeed bench (~7× faster build) Jun 5, 2026

joseph-isaacs marked this pull request as ready for review June 5, 2026 09:23

joseph-isaacs force-pushed the ci/cuda-bench-target-cache branch from 867154b to 9b1845f Compare June 5, 2026 09:29

joseph-isaacs added the changelog/ci label Jun 5, 2026 — with Claude

joseph-isaacs force-pushed the ci/cuda-bench-target-cache branch from 9b1845f to e165551 Compare June 5, 2026 09:54

joseph-isaacs marked this pull request as draft June 5, 2026 10:15

joseph-isaacs force-pushed the ci/cuda-bench-target-cache branch from 92d6a35 to a1c1818 Compare June 5, 2026 10:45

joseph-isaacs closed this Jun 5, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: cache target dir for CUDA codspeed bench (~7× faster build)#8256

ci: cache target dir for CUDA codspeed bench (~7× faster build)#8256
joseph-isaacs wants to merge 1 commit into
developfrom
ci/cuda-bench-target-cache

joseph-isaacs commented Jun 4, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joseph-isaacs commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What this does

Measurements (controlled experiment on the GPU runner)

Uh oh!

codspeed-hq Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 23.7%

Performance Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joseph-isaacs commented Jun 4, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 4, 2026 •

edited

Loading