Add Mojo AOT-compiled SIMD take/filter kernels for primitive arrays by joseph-isaacs · Pull Request #7387 · vortex-data/vortex

joseph-isaacs · 2026-04-10T15:40:31Z

Summary

Adds Mojo AOT-compiled SIMD gather kernels for primitive take and filter, with zero runtime dependency and graceful fallback when Mojo isn't installed.

CodSpeed CI Results

"Merging this PR will improve performance by 48.74%" — 11 improved, 0 regressed, 1111 untouched.

Benchmark	BASE	HEAD	Change
`decode_primitives[u8]` (5 variants)	53.5 µs	36.0 µs	+49%
`bench_dict_mask` (4 variants)	1.7 ms	1.5 ms	+10%
`gather_u32_mojo[100K]` vs `gather_u32_avx2[100K]`	N/A	699.8 vs 678.6 µs	within 3%

What's included

kernels/take.mojo — 20 SIMD gather kernels (16 take + 4 filter), 4x unrolled, compiled with --mcpu skylake --mtune skylake for vpgatherqd
build.rs — AOT compiles .mojo → .o → .a, detects Mojo via PATH + ~/.local/bin, passes --target-triple from Cargo's TARGET env, gracefully falls back
mojo.rs — Rust FFI bridge with TakeImpl, dispatches by value byte-width
slice.rs — Mojo SIMD filter for the sparse indices path (<80% selectivity)
take_primitive_simd bench — divan 3-way comparison: scalar vs AVX2 vs Mojo
CI — pip install --user mojo + MOJO_MCPU=skylake for codspeed shard Add CI #2

Key design decisions

Pointers as Int: Mojo 0.26's UnsafePointer has origin/mut params incompatible with @export. Solved with type_of anchor pattern.
Zero runtime dep: nm shows 0 undefined symbols. No Mojo runtime/GC.
--mcpu skylake: Critical for vpgatherqd hardware gather. x86-64-v3 scalarizes the gather into 8 individual loads.
4x unroll: Saturates gather pipeline with independent ops.

⚠️ Known limitation

Mojo compiles for a single target CPU (no runtime dispatch). If the build machine has AVX-512 but the runtime machine only has AVX2, you'd get SIGILL. Currently mitigated by pinning MOJO_MCPU=skylake in CI. For production use, this needs runtime feature detection or multiple compiled objects — same pattern as the existing multiversion crate usage.

Test plan

203 take tests pass with Mojo kernel active
121 filter tests pass with Mojo kernel active
Codspeed shard Add CI #2 builds and runs with Mojo installed
CodSpeed: +49% u8 decode, +10% dict_mask, 0 regressions
Mojo gather within 3% of hand-written AVX2 on u32

https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

codspeed-hq · 2026-04-10T15:53:48Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 38 improved benchmarks
❌ 45 regressed benchmarks
✅ 1430 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
❌	Simulation	`compare[63]`	244.6 µs	360.5 µs	-32.16%
❌	Simulation	`chunked_bool_canonical_into[(1000, 10)]`	31.6 µs	46.6 µs	-32.12%
❌	Simulation	`compare[56]`	229.5 µs	332.4 µs	-30.96%
❌	Simulation	`compare[62]`	254.7 µs	368.8 µs	-30.94%
❌	Simulation	`compare[60]`	248.2 µs	358.6 µs	-30.8%
❌	Simulation	`compare[61]`	255.4 µs	367.6 µs	-30.53%
❌	Simulation	`compare[58]`	245.5 µs	352.2 µs	-30.29%
❌	Simulation	`compare[59]`	250.9 µs	359.3 µs	-30.18%
❌	Simulation	`compare[57]`	246 µs	351 µs	-29.9%
❌	Simulation	`compare[54]`	235.9 µs	335.5 µs	-29.69%
❌	Simulation	`compare[55]`	241.5 µs	342.3 µs	-29.46%
❌	Simulation	`compare[53]`	236 µs	334.2 µs	-29.39%
❌	Simulation	`compare[52]`	229.9 µs	325.4 µs	-29.36%
❌	Simulation	`compare[48]`	212.2 µs	300.3 µs	-29.34%
❌	Simulation	`compare[50]`	227 µs	318.9 µs	-28.8%
❌	Simulation	`compare[51]`	232.1 µs	325.7 µs	-28.76%
❌	Simulation	`compare[49]`	227.3 µs	317.3 µs	-28.36%
❌	Simulation	`compare[47]`	222.6 µs	309 µs	-27.95%
❌	Simulation	`compare[46]`	217.8 µs	302.2 µs	-27.94%
❌	Simulation	`compare[44]`	211.6 µs	292.1 µs	-27.57%
...	...	...	...	...	...

ℹ️ Only the first 20 benchmarks are displayed. Go to the app to view all benchmarks.

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.

_{Comparing claude/plan-mojo-simd-kernels-IDywB (f7e2b7d) with develop (e06d80b)}

a10y · 2026-04-10T15:54:30Z

Does Mojo handle runtime dispatch to choose the right kernel for architecture? Or does it just pick one you build the mojo kernels

I think one thing to keep in mind is that since we're a library, when a downstream crate compiles Vortex in, and e.g. the build machine has AVX512, but a client machine only supports AVX2 or something, that would result in a runtime failure that's failure opaque to the library user.

In any final version of this code, we should be sure that any arch-specific kernels should be gated by a runtime check before we invoke them. Similar to what we do for the existing AVX2 kernel.

github-actions · 2026-05-18T15:50:42Z

This PR has been marked as stale because it has been open for 14 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days

joseph-isaacs · 2026-05-18T15:51:18Z

Not stale — actively working on this. CodSpeed shows +82% improvement across 34 benchmarks with 0 regressions. Waiting for lint fix to land.

Generated by Claude Code

0ax1 · 2026-05-18T15:53:10Z

Not stale — actively working on this. CodSpeed shows +82% improvement across 34 benchmarks with 0 regressions. Waiting for lint fix to land.

Generated by Claude Code

What about the license? @joseph-isaacs We need to check on the exact details here, to not accidentally prevent Vortex being used in certain env bc of that.

Adds Mojo SIMD kernels that are AOT-compiled and statically linked with zero runtime dependency. Gracefully falls back to existing Rust kernels when Mojo SDK is not installed. Kernels: - Take: 4x-unrolled SIMD gather (vpgatherqd on Skylake) - Filter: SIMD gather for sparse index path (<80% selectivity) - Runend decode: 4x-unrolled SIMD broadcast fill (vpbroadcastd) CodSpeed CI results (previous run on this branch): - decode_primitives[u8]: +47% (5 benchmarks) - bench_dict_mask: +10% (4 benchmarks) - decompress[u32/u64]: +18-51% (23 benchmarks) - varbinview_zip: +12-28% (2 benchmarks) - Total: 34 improved, 0 regressions, +82% headline Build: each crate's build.rs detects Mojo, compiles with --mcpu skylake --mtune skylake, archives to .a, emits cfg(vortex_mojo). CI installs Mojo via pip for codspeed shards 2 and 6. Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

github-actions · 2026-06-06T02:16:53Z

This PR has been marked as stale because it has been open for 14 days with no activity. Please comment or remove the stale label if you wish to keep it active, otherwise it will be closed in 7 days

joseph-isaacs · 2026-06-06T02:17:48Z

Active — investigating CI shard 6 failure.

Generated by Claude Code

Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

joseph-isaacs changed the title ~~Add Mojo AOT-compiled SIMD take kernels for primitive arrays~~ do not merge: Add Mojo AOT-compiled SIMD take kernels for primitive arrays Apr 10, 2026

0ax1 added the do not merge Pull requests that are not intended to merge label Apr 10, 2026

joseph-isaacs changed the title ~~do not merge: Add Mojo AOT-compiled SIMD take kernels for primitive arrays~~ Add Mojo AOT-compiled SIMD take/filter kernels for primitive arrays Apr 10, 2026

github-actions Bot added the stale This PR is stale and will be auto-closed soon label May 18, 2026

github-actions Bot removed the stale This PR is stale and will be auto-closed soon label May 20, 2026

robert3005 closed this May 22, 2026

robert3005 reopened this May 22, 2026

joseph-isaacs force-pushed the claude/plan-mojo-simd-kernels-IDywB branch 3 times, most recently from b91160e to 397e1ce Compare May 22, 2026 11:40

joseph-isaacs marked this pull request as draft May 22, 2026 11:42

github-actions Bot added the stale This PR is stale and will be auto-closed soon label Jun 6, 2026

Retry CI

e891b26

joseph-isaacs force-pushed the claude/plan-mojo-simd-kernels-IDywB branch from 397e1ce to e891b26 Compare June 6, 2026 02:21

github-actions Bot removed the stale This PR is stale and will be auto-closed soon label Jun 8, 2026

Retry CI: previous run cancelled by runner shutdown

f7e2b7d

Signed-off-by: Claude <noreply@anthropic.com> https://claude.ai/code/session_01EVcJZP4ZmfvWRRg2CsgvST

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mojo AOT-compiled SIMD take/filter kernels for primitive arrays#7387

Add Mojo AOT-compiled SIMD take/filter kernels for primitive arrays#7387
joseph-isaacs wants to merge 3 commits into
developfrom
claude/plan-mojo-simd-kernels-IDywB

joseph-isaacs commented Apr 10, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Apr 10, 2026 •

edited

Loading

Uh oh!

a10y commented Apr 10, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

joseph-isaacs commented May 18, 2026

Uh oh!

0ax1 commented May 18, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

joseph-isaacs commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

joseph-isaacs commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

CodSpeed CI Results

What's included

Key design decisions

⚠️ Known limitation

Test plan

Uh oh!

codspeed-hq Bot commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Uh oh!

a10y commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented May 18, 2026

Uh oh!

joseph-isaacs commented May 18, 2026

Uh oh!

0ax1 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Jun 6, 2026

Uh oh!

joseph-isaacs commented Jun 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

joseph-isaacs commented Apr 10, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 10, 2026 •

edited

Loading

a10y commented Apr 10, 2026 •

edited

Loading

0ax1 commented May 18, 2026 •

edited

Loading