feat(vortex-array): add Interleave array encoding by joseph-isaacs · Pull Request #8277 · vortex-data/vortex

joseph-isaacs · 2026-06-05T22:54:53Z

Adds an Interleave array: a lazy, random-access gather of N value arrays into one array, taking output row i from values[array_indices[i]][row_indices[i]].

Adds an `Interleave` array: a lazy, random-access gather of `N` value arrays into one array, taking output row `i` from `values[array_indices[i]][row_indices[i]]`. It is the random-access analog of `Merge` — instead of consuming each branch under a cursor, `row_indices` names an explicit position, so rows may be reordered, skipped, or repeated. The layout mirrors `Merge`: an array encoding with `N` value children plus two non-nullable selector children (`array_indices`, `row_indices`), a single `check` source of truth for invariants, value-type-dispatched execution with an optimized boolean kernel, oracle-backed tests, and an `ArrayBuiltins::interleave` constructor. As with the merge skeleton, only the boolean (two-value) selector form is wired into execution; integer selectors construct but panic on execute. Signed-off-by: Claude <noreply@anthropic.com>

…gned Removes the boolean two-value `array_indices` special case from the `Interleave` encoding. `array_indices` is now always a non-nullable unsigned integer indexing into `values`, unifying selector validation in `check` (which remains the single source of truth used by both `try_new` and `validate`). With the boolean selector gone, the boolean-value execute kernel now implements the (previously panicking) integer-selector path directly: it gathers `N` boolean values routed by unsigned `array_indices` / `row_indices`, so multi-value interleaves execute end to end. Tests are updated to build unsigned selectors and now cover a three-value random-access gather. Signed-off-by: Claude <noreply@anthropic.com>

The Interleave module docs linked to `Merge`, which does not exist on `develop` (it lives in a separate, not-yet-merged PR). Under `-D rustdoc::broken-intra-doc-links` this failed the docs build. Demote the references to plain code spans so the docs build standalone. Signed-off-by: Claude <noreply@anthropic.com>

codspeed-hq · 2026-06-05T22:59:48Z

Merging this PR will not alter performance

⚠️

Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 1513 untouched benchmarks
🆕 4 new benchmarks

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
🆕	Simulation	`interleave_bool[2]`	N/A	1.4 ms	N/A
🆕	Simulation	`interleave_bool[4]`	N/A	1.4 ms	N/A
🆕	Simulation	`interleave_bool_nullable[2]`	N/A	2.3 ms	N/A
🆕	Simulation	`interleave_bool_nullable[4]`	N/A	2.3 ms	N/A

_{Comparing claude/interleave-method-6BEge (cc24b84) with develop (48c3f41)}

The boolean interleave kernel decoded both selectors into two `Vec<usize>` of the output length before the scatter loop, allocating `2 * len * size_of::<usize>()` bytes per call purely to widen indices. Read the selectors straight from their packed primitive buffers instead: the scatter loop is monomorphized on the `(array_index, row_index)` integer widths via a generic `gather` helper, indexing each selector slice in place and writing output bits (and validity) directly into bit buffers. No intermediate index materialization. Signed-off-by: Claude <noreply@anthropic.com>

The boolean interleave kernel built its output with a per-row `if bit { set(i) }` plus an `unset(i)` for validity. Each `set`/`unset` bounds-checks and the conditional branches on the (random) bit value, so the scatter mispredicted on essentially every row. Validate the per-row `(array_index, row_index)` bounds once up front (still returning an error rather than panicking), then build both the values and validity buffers with `BitBufferMut::collect_bool`, which packs 64 results per word and writes each bit branchlessly. `gather`, `collect_bool`, and the closures inline into the encoding's `execute` (confirmed via `cargo asm` — no out-of-line per-row calls). Adds a `divan` benchmark (`benches/interleave.rs`) over 2/4 boolean branches, nullable and non-nullable, gathering 65,536 random rows. Measured on this machine: interleave_bool 2 628 µs -> 189 µs (~3.3x) interleave_bool 4 628 µs -> 189 µs (~3.3x) interleave_bool_nullable 2 747 µs -> 337 µs (~2.2x) interleave_bool_nullable 4 782 µs -> 345 µs (~2.3x) Signed-off-by: Claude <noreply@anthropic.com>

robert3005 · 2026-06-08T13:24:00Z

annoyingly arrow calls your Merge -> interleave and we already have merge that zips two structs. I guess you could consider the existing merge the special case of no selector child where all children are kept

joseph-isaacs · 2026-06-08T13:29:03Z

Can you link what you mean, I based this off arrow-rs naming

robert3005 · 2026-06-08T13:33:41Z

I thought arrow interleave just takes a single value per row but they take tuple per row. This is the same then

joseph-isaacs · 2026-06-09T13:05:06Z

@robert3005 are we good to merge this?

The new interleave benchmarks gathered 65,536 rows, which CodSpeed's Simulation instrument reported at up to ~2.3ms per case. Drop the workload to 8,192 rows so every case runs comfortably under 1ms while still exercising the multi-word gather path. Signed-off-by: Claude <noreply@anthropic.com>

robert3005

Will approve once we have working ci. I think this is the right approach.

claude added 2 commits June 5, 2026 22:30

joseph-isaacs added the changelog/feature A new feature label Jun 5, 2026 — with Claude

claude added 2 commits June 8, 2026 12:28

joseph-isaacs marked this pull request as ready for review June 8, 2026 13:29

joseph-isaacs requested a review from a team June 8, 2026 13:29

robert3005 reviewed Jun 9, 2026

View reviewed changes

Merge branch 'develop' into claude/interleave-method-6BEge

f6f550b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(vortex-array): add Interleave array encoding#8277

feat(vortex-array): add Interleave array encoding#8277
joseph-isaacs wants to merge 7 commits into
developfrom
claude/interleave-method-6BEge

joseph-isaacs commented Jun 5, 2026 •

edited

Loading

Uh oh!

codspeed-hq Bot commented Jun 5, 2026 •

edited

Loading

Uh oh!

robert3005 commented Jun 8, 2026

Uh oh!

joseph-isaacs commented Jun 8, 2026

Uh oh!

robert3005 commented Jun 8, 2026

Uh oh!

joseph-isaacs commented Jun 9, 2026

Uh oh!

robert3005 left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

joseph-isaacs commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will not alter performance

Performance Changes

Uh oh!

robert3005 commented Jun 8, 2026

Uh oh!

joseph-isaacs commented Jun 8, 2026

Uh oh!

robert3005 commented Jun 8, 2026

Uh oh!

joseph-isaacs commented Jun 9, 2026

Uh oh!

robert3005 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

joseph-isaacs commented Jun 5, 2026 •

edited

Loading

codspeed-hq Bot commented Jun 5, 2026 •

edited

Loading