feat(vortex-array): add Interleave array encoding#8277
Conversation
Adds an `Interleave` array: a lazy, random-access gather of `N` value arrays into one array, taking output row `i` from `values[array_indices[i]][row_indices[i]]`. It is the random-access analog of `Merge` — instead of consuming each branch under a cursor, `row_indices` names an explicit position, so rows may be reordered, skipped, or repeated. The layout mirrors `Merge`: an array encoding with `N` value children plus two non-nullable selector children (`array_indices`, `row_indices`), a single `check` source of truth for invariants, value-type-dispatched execution with an optimized boolean kernel, oracle-backed tests, and an `ArrayBuiltins::interleave` constructor. As with the merge skeleton, only the boolean (two-value) selector form is wired into execution; integer selectors construct but panic on execute. Signed-off-by: Claude <noreply@anthropic.com>
…gned Removes the boolean two-value `array_indices` special case from the `Interleave` encoding. `array_indices` is now always a non-nullable unsigned integer indexing into `values`, unifying selector validation in `check` (which remains the single source of truth used by both `try_new` and `validate`). With the boolean selector gone, the boolean-value execute kernel now implements the (previously panicking) integer-selector path directly: it gathers `N` boolean values routed by unsigned `array_indices` / `row_indices`, so multi-value interleaves execute end to end. Tests are updated to build unsigned selectors and now cover a three-value random-access gather. Signed-off-by: Claude <noreply@anthropic.com>
The Interleave module docs linked to `Merge`, which does not exist on `develop` (it lives in a separate, not-yet-merged PR). Under `-D rustdoc::broken-intra-doc-links` this failed the docs build. Demote the references to plain code spans so the docs build standalone. Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| 🆕 | Simulation | interleave_bool[2] |
N/A | 1.4 ms | N/A |
| 🆕 | Simulation | interleave_bool[4] |
N/A | 1.4 ms | N/A |
| 🆕 | Simulation | interleave_bool_nullable[2] |
N/A | 2.3 ms | N/A |
| 🆕 | Simulation | interleave_bool_nullable[4] |
N/A | 2.3 ms | N/A |
Comparing claude/interleave-method-6BEge (cc24b84) with develop (48c3f41)
The boolean interleave kernel decoded both selectors into two `Vec<usize>` of the output length before the scatter loop, allocating `2 * len * size_of::<usize>()` bytes per call purely to widen indices. Read the selectors straight from their packed primitive buffers instead: the scatter loop is monomorphized on the `(array_index, row_index)` integer widths via a generic `gather` helper, indexing each selector slice in place and writing output bits (and validity) directly into bit buffers. No intermediate index materialization. Signed-off-by: Claude <noreply@anthropic.com>
The boolean interleave kernel built its output with a per-row
`if bit { set(i) }` plus an `unset(i)` for validity. Each `set`/`unset`
bounds-checks and the conditional branches on the (random) bit value, so
the scatter mispredicted on essentially every row.
Validate the per-row `(array_index, row_index)` bounds once up front
(still returning an error rather than panicking), then build both the
values and validity buffers with `BitBufferMut::collect_bool`, which packs
64 results per word and writes each bit branchlessly. `gather`,
`collect_bool`, and the closures inline into the encoding's `execute`
(confirmed via `cargo asm` — no out-of-line per-row calls).
Adds a `divan` benchmark (`benches/interleave.rs`) over 2/4 boolean
branches, nullable and non-nullable, gathering 65,536 random rows.
Measured on this machine:
interleave_bool 2 628 µs -> 189 µs (~3.3x)
interleave_bool 4 628 µs -> 189 µs (~3.3x)
interleave_bool_nullable 2 747 µs -> 337 µs (~2.2x)
interleave_bool_nullable 4 782 µs -> 345 µs (~2.3x)
Signed-off-by: Claude <noreply@anthropic.com>
|
annoyingly arrow calls your |
|
Can you link what you mean, I based this off arrow-rs naming |
|
I thought arrow interleave just takes a single value per row but they take tuple per row. This is the same then |
|
@robert3005 are we good to merge this? |
The new interleave benchmarks gathered 65,536 rows, which CodSpeed's Simulation instrument reported at up to ~2.3ms per case. Drop the workload to 8,192 rows so every case runs comfortably under 1ms while still exercising the multi-word gather path. Signed-off-by: Claude <noreply@anthropic.com>
robert3005
left a comment
There was a problem hiding this comment.
Will approve once we have working ci. I think this is the right approach.
Adds an
Interleavearray: a lazy, random-access gather ofNvalue arrays into one array, taking output rowifromvalues[array_indices[i]][row_indices[i]].