Skip to content

Optimize validity checks by resolving to masks upfront#8306

Closed
joseph-isaacs wants to merge 4 commits into
developfrom
claude/audit-per-row-accessors-ln5lho
Closed

Optimize validity checks by resolving to masks upfront#8306
joseph-isaacs wants to merge 4 commits into
developfrom
claude/audit-per-row-accessors-ln5lho

Conversation

@joseph-isaacs

Copy link
Copy Markdown
Contributor

Summary

This PR optimizes performance by resolving Validity objects to Mask objects once upfront, rather than calling Validity::is_valid() repeatedly in loops. This is particularly important for array-backed validity, where each is_valid() call executes a scalar operation and spins up an execution context.

The changes follow a consistent pattern across multiple files:

  1. Call validity.execute_mask() once to resolve to a concrete Mask
  2. Use mask.value(idx) for O(1) lookups instead of O(n) scalar executions per call
  3. Handle the three cases of Mask::bit_buffer(): AllOr::All, AllOr::None, and AllOr::Some

Key Changes

  • varbin filter: Changed from Validity to Mask parameter, with optimized handling of all/none/some cases
  • display module: Resolve validity once for table display instead of per-row checks
  • listview rebuild: Resolve validity once for both rebuild_with_take and rebuild_with_builder
  • vortex-tui layouts: Canonicalize struct fields once to avoid re-decoding per row
  • vortex-tensor l2_denorm: Resolve validity masks once for normalization loops
  • list/listview/fixed_size_list operations: Canonicalize element arrays once before per-element scalar execution
  • variant kernel: Canonicalize inputs once before per-row merging

All changes maintain the same logical behavior while improving performance by reducing redundant work in hot loops.

Testing

Existing tests pass. The changes are refactorings that preserve behavior while optimizing performance characteristics. The pattern of resolving validity to masks upfront is already used elsewhere in the codebase and is well-tested through existing array operation tests.

https://claude.ai/code/session_016i3hghJiXxmtWWcVu55Pjn

claude added 4 commits June 8, 2026 21:48
`rebuild_with_take` and `rebuild_list_by_list` probed validity per row via
`self.validity()?.is_valid(index)?`. For array-backed validity this executes a
scalar (and creates an execution context) on every row, turning an O(n) loop
into O(n) per-row work; even for non-nullable validity it reconstructed the
`Validity` enum on every iteration.

Resolve validity to a `Mask` once before the loop with `execute_mask`, then use
the O(1) `Mask::value(index)` lookup. Add nullable ListView rebuild benchmarks
(the existing ones only covered non-nullable validity).

Measured (divan, this machine):
- listview_rebuild i32_small_nullable: 231.7 us -> 33.55 us (~6.9x)
- listview_rebuild i32_large_nullable: 1.244 ms -> 1.014 ms (~1.23x)
- zstd listview_rebuild rebuild_naive (non-nullable, 1024 lists): 41.1 us -> 30.1 us (~1.37x)

Signed-off-by: Claude <noreply@anthropic.com>
Two more sites that probed validity per row via `Validity::is_valid` (which
executes a scalar per call for array-backed validity):

- VarBin filter index path (`filter_select_var_bin_by_index`): now resolves the
  validity mask once with `execute_mask` and matches on `AllOr` fast paths, the
  same way the slice path already does. Removes the lingering
  `TODO(ngates): pass LogicalValidity instead`.
- L2Denorm normalize and verify loops: resolve the validity mask once before the
  per-row loop instead of calling `is_valid(i)` each iteration.

Signed-off-by: Claude <noreply@anthropic.com>
…ild benches

Replace per-element/per-row `execute_scalar` decode loops with a single
canonicalization of the source before iterating:

- List/ListView/FixedSizeList `scalar_at`: canonicalize the element slice once so
  building the list scalar does not re-decode the element encoding per element.
- Variant `merge_typed_as_variant`: canonicalize the `typed`/`fallback` inputs
  once before the row-wise fallback loop.

Also remove the nullable ListView rebuild benchmarks that were added earlier.

Signed-off-by: Claude <noreply@anthropic.com>
- Table display: reuse a single execution context across the struct table loops
  and resolve validity to a mask once, instead of creating a context and probing
  validity per row.
- TUI browse stats table: canonicalize each field array once before the per-row
  `execute_scalar` loop so field encodings are not re-decoded per row.

Signed-off-by: Claude <noreply@anthropic.com>
@codspeed-hq

codspeed-hq Bot commented Jun 9, 2026

Copy link
Copy Markdown

Merging this PR will improve performance by 20.44%

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 6 improved benchmarks
❌ 3 regressed benchmarks
✅ 1504 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation varbinview_zip_block_mask 2.9 ms 3.7 ms -21.57%
Simulation encode_varbin[(1000, 2)] 143.1 µs 164.1 µs -12.77%
Simulation varbinview_zip_fragmented_mask 6.1 ms 6.9 ms -11.3%
Simulation extend_from_array_non_zctl_overlapping[(10000, 8)] 4.7 ms 2.3 ms ×2
Simulation extend_from_array_non_zctl_overlapping[(1000, 8)] 527.9 µs 287.6 µs +83.56%
Simulation rebuild_naive 142.1 µs 104.9 µs +35.46%
Simulation extend_from_array_non_zctl_overlapping[(1000, 32)] 1,023.8 µs 782.9 µs +30.76%
Simulation extend_from_array_zctl[(10000, 8)] 2.5 ms 2.1 ms +17.04%
Simulation extend_from_array_zctl[(1000, 8)] 291 µs 254.7 µs +14.25%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing claude/audit-per-row-accessors-ln5lho (2c2f634) with develop (1b19ac9)

Open in CodSpeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants