Optimize buffer ops#8322
Conversation
Merging this PR will improve performance by 68.31%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | slice_empty_vortex |
2,599.4 ns | 368.3 ns | ×7.1 |
| ⚡ | Simulation | append_buffer_vortex_buffer[65536] |
95.4 µs | 27 µs | ×3.5 |
| ⚡ | Simulation | append_buffer_vortex_buffer[16384] |
32 µs | 12.9 µs | ×2.5 |
| ⚡ | Simulation | append_buffer_vortex_buffer[128] |
11.6 µs | 5.4 µs | ×2.2 |
| ⚡ | Simulation | append_buffer_vortex_buffer[1024] |
13.6 µs | 8.5 µs | +61.24% |
| ⚡ | Simulation | slice_vortex_buffer[1024] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | slice_vortex_buffer[16384] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | slice_vortex_buffer[2048] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | slice_vortex_buffer[128] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | slice_vortex_buffer[65536] |
1,276.7 ns | 813.1 ns | +57.02% |
| ⚡ | Simulation | append_buffer_vortex_buffer[2048] |
11.4 µs | 7.9 µs | +45.37% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
213.2 µs | 176.3 µs | +20.95% |
| ⚡ | Simulation | search_index_below_min_chunked |
1.5 ms | 1.3 ms | +15.71% |
| ⚡ | Simulation | search_index_mixed_out_of_range_chunked |
1.5 ms | 1.3 ms | +15.31% |
| ⚡ | Simulation | compare[6] |
79.4 µs | 69.7 µs | +13.88% |
| ⚡ | Simulation | search_index_full_range_random_chunked |
1.6 ms | 1.4 ms | +13.6% |
| ⚡ | Simulation | compare[5] |
75.6 µs | 67.8 µs | +11.47% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing adamg/buffer-slice-fast (b8ad2b4) with develop (729e17c)
## Summary Adds a basic benchmark for slicing, including an Arrow baseline. Hopefully building up to #8322, but I want a baseline first. Signed-off-by: Adam Gutglick <adam@spiraldb.com>
fd451bf to
341039a
Compare
Polar Signals Profiling ResultsLatest Run
Previous Runs (2)
Powered by Polar Signals Cloud |
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.090x ➖ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.090x ➖, 0↑ 4↓)
No file size changes detected. |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.112x ❌, 0↑ 6↓)
datafusion / vortex-compact (1.080x ➖, 0↑ 2↓)
datafusion / parquet (1.121x ❌, 0↑ 5↓)
duckdb / vortex-file-compressed (1.141x ❌, 0↑ 8↓)
duckdb / vortex-compact (1.083x ➖, 0↑ 4↓)
duckdb / parquet (1.102x ❌, 0↑ 3↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.062x ➖, 0↑ 2↓)
datafusion / vortex-compact (0.932x ➖, 10↑ 0↓)
datafusion / parquet (1.041x ➖, 1↑ 4↓)
datafusion / arrow (0.960x ➖, 4↑ 3↓)
duckdb / vortex-file-compressed (1.041x ➖, 0↑ 6↓)
duckdb / vortex-compact (1.000x ➖, 1↑ 0↓)
duckdb / parquet (0.990x ➖, 0↑ 0↓)
duckdb / duckdb (1.006x ➖, 0↑ 0↓)
File Size Changes (9 files changed, +0.2% overall, 9↑ 0↓)
Totals:
|
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.991x ➖, 6↑ 1↓)
datafusion / vortex-compact (1.003x ➖, 0↑ 2↓)
datafusion / parquet (0.996x ➖, 3↑ 0↓)
duckdb / vortex-file-compressed (0.994x ➖, 2↑ 1↓)
duckdb / vortex-compact (0.994x ➖, 0↑ 1↓)
duckdb / parquet (0.999x ➖, 1↑ 1↓)
duckdb / duckdb (0.997x ➖, 0↑ 3↓)
File Size Changes (7 files changed, +0.0% overall, 7↑ 0↓)
Totals:
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.022x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.020x ➖, 0↑ 0↓)
duckdb / parquet (1.025x ➖, 0↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
|
Benchmarks: FineWeb S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.971x ➖, 1↑ 1↓)
datafusion / vortex-compact (1.014x ➖, 2↑ 1↓)
datafusion / parquet (0.920x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (0.853x ➖, 2↑ 0↓)
duckdb / vortex-compact (0.968x ➖, 0↑ 1↓)
duckdb / parquet (0.972x ➖, 0↑ 0↓)
|
BENCHMARK FAILEDBenchmark |
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.993x ➖, 0↑ 0↓)
datafusion / vortex-compact (1.006x ➖, 0↑ 0↓)
datafusion / parquet (0.997x ➖, 0↑ 0↓)
datafusion / arrow (0.926x ➖, 7↑ 0↓)
duckdb / vortex-file-compressed (1.002x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.003x ➖, 0↑ 0↓)
duckdb / parquet (1.000x ➖, 0↑ 0↓)
duckdb / duckdb (0.998x ➖, 0↑ 0↓)
File Size Changes (26 files changed, -0.0% overall, 8↑ 18↓)
Totals:
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.969x ➖, 13↑ 2↓)
datafusion / parquet (0.913x ➖, 13↑ 0↓)
duckdb / vortex-file-compressed (1.011x ➖, 4↑ 0↓)
duckdb / parquet (1.002x ➖, 0↑ 0↓)
duckdb / duckdb (1.013x ➖, 0↑ 1↓)
File Size Changes (107 files changed, -0.0% overall, 56↑ 51↓)
Totals:
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.873x ➖, 3↑ 0↓)
datafusion / vortex-compact (0.867x ➖, 5↑ 0↓)
datafusion / parquet (1.267x ➖, 1↑ 13↓)
duckdb / vortex-file-compressed (0.875x ➖, 1↑ 0↓)
duckdb / vortex-compact (0.933x ➖, 0↑ 0↓)
duckdb / parquet (0.917x ➖, 0↑ 0↓)
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.008x ➖, 0↑ 0↓)
datafusion / parquet (1.007x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.031x ➖, 0↑ 0↓)
duckdb / parquet (1.006x ➖, 0↑ 0↓)
duckdb / duckdb (1.007x ➖, 0↑ 0↓)
File Size Changes (4 files changed, -0.0% overall, 1↑ 3↓)
Totals:
|
Benchmarks: CompressionVortex (geomean): 1.016x ➖ How to read Verdict and Engines
unknown / unknown (1.045x ➖, 2↑ 29↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (0.870x ➖, 4↑ 1↓)
datafusion / vortex-compact (0.867x ➖, 3↑ 0↓)
datafusion / parquet (1.044x ➖, 0↑ 4↓)
duckdb / vortex-file-compressed (0.929x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.065x ➖, 0↑ 2↓)
duckdb / parquet (0.921x ➖, 0↑ 0↓)
|
|
I made #8162 that optimizes some of these code paths |
06ac2f8 to
dde48e4
Compare
|
@robert3005 do you want to merge that first? |
|
that would be ideal, the pr I made is a revival of an older pr already |
|
I'll review it |
dde48e4 to
ae9bc12
Compare
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
d4f17f3 to
64fb797
Compare
| } | ||
| } else { | ||
| // Use bitvec for unaligned bit copying. | ||
| let self_slice = self |
There was a problem hiding this comment.
We have benchmarked this and bitvec is faster for anything that's bigger than 128 bits. I think we want to keep it
26984f8 to
bf88b71
Compare
Signed-off-by: Adam Gutglick <adam@spiraldb.com>
bf88b71 to
b8ad2b4
Compare
Summary
This PR includes a few optimization for buffer-level ops:
BitBufferMut::append_bufferuses arrow's word-sized append for unaligned bitbuffers instead of bitvec which is 1 bit a time.Alignment, instead of having less specific checks in different callsites.After this PR is merged, I'll follow up and remove
bitvecas a dependency, its currently used in a couple of pretty random places and I suspect there's nothing special about them compared to our ownBitBuffer.