Optimize validity checks by resolving to masks upfront#8306
Closed
joseph-isaacs wants to merge 4 commits into
Closed
Optimize validity checks by resolving to masks upfront#8306joseph-isaacs wants to merge 4 commits into
joseph-isaacs wants to merge 4 commits into
Conversation
`rebuild_with_take` and `rebuild_list_by_list` probed validity per row via `self.validity()?.is_valid(index)?`. For array-backed validity this executes a scalar (and creates an execution context) on every row, turning an O(n) loop into O(n) per-row work; even for non-nullable validity it reconstructed the `Validity` enum on every iteration. Resolve validity to a `Mask` once before the loop with `execute_mask`, then use the O(1) `Mask::value(index)` lookup. Add nullable ListView rebuild benchmarks (the existing ones only covered non-nullable validity). Measured (divan, this machine): - listview_rebuild i32_small_nullable: 231.7 us -> 33.55 us (~6.9x) - listview_rebuild i32_large_nullable: 1.244 ms -> 1.014 ms (~1.23x) - zstd listview_rebuild rebuild_naive (non-nullable, 1024 lists): 41.1 us -> 30.1 us (~1.37x) Signed-off-by: Claude <noreply@anthropic.com>
Two more sites that probed validity per row via `Validity::is_valid` (which executes a scalar per call for array-backed validity): - VarBin filter index path (`filter_select_var_bin_by_index`): now resolves the validity mask once with `execute_mask` and matches on `AllOr` fast paths, the same way the slice path already does. Removes the lingering `TODO(ngates): pass LogicalValidity instead`. - L2Denorm normalize and verify loops: resolve the validity mask once before the per-row loop instead of calling `is_valid(i)` each iteration. Signed-off-by: Claude <noreply@anthropic.com>
…ild benches Replace per-element/per-row `execute_scalar` decode loops with a single canonicalization of the source before iterating: - List/ListView/FixedSizeList `scalar_at`: canonicalize the element slice once so building the list scalar does not re-decode the element encoding per element. - Variant `merge_typed_as_variant`: canonicalize the `typed`/`fallback` inputs once before the row-wise fallback loop. Also remove the nullable ListView rebuild benchmarks that were added earlier. Signed-off-by: Claude <noreply@anthropic.com>
- Table display: reuse a single execution context across the struct table loops and resolve validity to a mask once, instead of creating a context and probing validity per row. - TUI browse stats table: canonicalize each field array once before the per-row `execute_scalar` loop so field encodings are not re-decoded per row. Signed-off-by: Claude <noreply@anthropic.com>
Merging this PR will improve performance by 20.44%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | varbinview_zip_block_mask |
2.9 ms | 3.7 ms | -21.57% |
| ❌ | Simulation | encode_varbin[(1000, 2)] |
143.1 µs | 164.1 µs | -12.77% |
| ❌ | Simulation | varbinview_zip_fragmented_mask |
6.1 ms | 6.9 ms | -11.3% |
| ⚡ | Simulation | extend_from_array_non_zctl_overlapping[(10000, 8)] |
4.7 ms | 2.3 ms | ×2 |
| ⚡ | Simulation | extend_from_array_non_zctl_overlapping[(1000, 8)] |
527.9 µs | 287.6 µs | +83.56% |
| ⚡ | Simulation | rebuild_naive |
142.1 µs | 104.9 µs | +35.46% |
| ⚡ | Simulation | extend_from_array_non_zctl_overlapping[(1000, 32)] |
1,023.8 µs | 782.9 µs | +30.76% |
| ⚡ | Simulation | extend_from_array_zctl[(10000, 8)] |
2.5 ms | 2.1 ms | +17.04% |
| ⚡ | Simulation | extend_from_array_zctl[(1000, 8)] |
291 µs | 254.7 µs | +14.25% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing claude/audit-per-row-accessors-ln5lho (2c2f634) with develop (1b19ac9)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR optimizes performance by resolving
Validityobjects toMaskobjects once upfront, rather than callingValidity::is_valid()repeatedly in loops. This is particularly important for array-backed validity, where eachis_valid()call executes a scalar operation and spins up an execution context.The changes follow a consistent pattern across multiple files:
validity.execute_mask()once to resolve to a concreteMaskmask.value(idx)for O(1) lookups instead of O(n) scalar executions per callMask::bit_buffer():AllOr::All,AllOr::None, andAllOr::SomeKey Changes
ValiditytoMaskparameter, with optimized handling of all/none/some casesrebuild_with_takeandrebuild_with_builderAll changes maintain the same logical behavior while improving performance by reducing redundant work in hot loops.
Testing
Existing tests pass. The changes are refactorings that preserve behavior while optimizing performance characteristics. The pattern of resolving validity to masks upfront is already used elsewhere in the codebase and is well-tested through existing array operation tests.
https://claude.ai/code/session_016i3hghJiXxmtWWcVu55Pjn