deser array tree from a vortex array#8063
Conversation
Merging this PR will not alter performance
|
93e517d to
3257531
Compare
906945c to
34bf8aa
Compare
3257531 to
3c34669
Compare
3c34669 to
c33a177
Compare
Signed-off-by: Onur Satici <onur@spiraldb.com>
Signed-off-by: Onur Satici <onur@spiraldb.com>
Signed-off-by: Onur Satici <onur@spiraldb.com>
c33a177 to
e40b94f
Compare
| /// `fba::ArrayStats`. Binary columns hold `ScalarValue::to_proto_bytes` blobs (decoded | ||
| /// using the array's dtype at read time); the `*_exact` bools tag whether `min`/`max` | ||
| /// were exact. `sum` is exact-only by construction so there is no `sum_exact` column. | ||
| pub static STATS_COLUMNS_DTYPE: LazyLock<DType> = LazyLock::new(|| { |
There was a problem hiding this comment.
this is soon doing to be a extensible scalar (that is from the agg partial).
joseph-isaacs
left a comment
There was a problem hiding this comment.
My main question is about the hard coded stat serde
Polar Signals Profiling ResultsLatest Run
Powered by Polar Signals Cloud |
Benchmarks: FineWeb NVMeVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.018x ➖, 0↑ 1↓)
datafusion / vortex-compact (0.992x ➖, 0↑ 0↓)
datafusion / parquet (0.993x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.033x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.014x ➖, 0↑ 0↓)
duckdb / parquet (1.005x ➖, 0↑ 0↓)
File Size Changes (1 files changed, +0.0% overall, 1↑ 0↓)
Totals:
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.058x ➖, 0↑ 5↓)
datafusion / vortex-compact (1.039x ➖, 0↑ 1↓)
datafusion / parquet (0.983x ➖, 2↑ 0↓)
datafusion / arrow (1.005x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.014x ➖, 0↑ 0↓)
duckdb / vortex-compact (1.014x ➖, 0↑ 0↓)
duckdb / parquet (1.009x ➖, 1↑ 3↓)
duckdb / duckdb (1.000x ➖, 0↑ 0↓)
File Size Changes (9 files changed, -0.1% overall, 4↑ 5↓)
Totals:
Full attributed analysis
|
Benchmarks: PolarSignals ProfilingVortex (geomean): 1.217x ❌ How to read Verdict and Engines
datafusion / vortex-file-compressed (1.217x ❌, 0↑ 8↓)
No file size changes detected. |
Benchmarks: TPC-DS SF=1 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.018x ➖, 0↑ 10↓)
datafusion / vortex-compact (1.000x ➖, 1↑ 0↓)
datafusion / parquet (0.993x ➖, 2↑ 0↓)
duckdb / vortex-file-compressed (1.016x ➖, 1↑ 3↓)
duckdb / vortex-compact (1.004x ➖, 1↑ 2↓)
duckdb / parquet (0.999x ➖, 0↑ 0↓)
duckdb / duckdb (1.000x ➖, 3↑ 1↓)
File Size Changes (6 files changed, -0.0% overall, 1↑ 5↓)
Totals:
Full attributed analysis
|
Benchmarks: FineWeb S3Verdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.368x ❌, 0↑ 3↓)
datafusion / vortex-compact (1.113x ➖, 0↑ 2↓)
datafusion / parquet (1.316x ❌, 0↑ 5↓)
duckdb / vortex-file-compressed (1.115x ➖, 0↑ 2↓)
duckdb / vortex-compact (1.140x ➖, 0↑ 2↓)
duckdb / parquet (1.143x ➖, 0↑ 0↓)
Full attributed analysis
|
Benchmarks: Statistical and Population GeneticsVerdict: No clear signal (low confidence) How to read Verdict and Engines
duckdb / vortex-file-compressed (1.037x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.014x ➖, 0↑ 0↓)
duckdb / parquet (1.026x ➖, 0↑ 0↓)
File Size Changes (1 files changed, -0.0% overall, 0↑ 1↓)
Totals:
Full attributed analysis
|
Benchmarks: Random AccessVortex (geomean): 0.907x ➖ How to read Verdict and Engines
unknown / unknown (0.980x ➖, 13↑ 2↓)
|
Benchmarks: TPC-H SF=10 on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.027x ➖, 0↑ 1↓)
datafusion / vortex-compact (1.010x ➖, 0↑ 0↓)
datafusion / parquet (0.996x ➖, 0↑ 0↓)
datafusion / arrow (1.000x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.005x ➖, 0↑ 0↓)
duckdb / vortex-compact (0.996x ➖, 0↑ 0↓)
duckdb / parquet (0.991x ➖, 1↑ 0↓)
duckdb / duckdb (0.995x ➖, 0↑ 0↓)
File Size Changes (26 files changed, +0.0% overall, 13↑ 13↓)
Totals:
Full attributed analysis
|
Benchmarks: Clickbench on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.028x ➖, 0↑ 4↓)
datafusion / parquet (1.003x ➖, 0↑ 1↓)
duckdb / vortex-file-compressed (1.028x ➖, 2↑ 6↓)
duckdb / parquet (0.999x ➖, 0↑ 0↓)
duckdb / duckdb (1.009x ➖, 0↑ 0↓)
File Size Changes (104 files changed, -0.0% overall, 50↑ 54↓)
Totals:
Full attributed analysis
|
Benchmarks: Appian on NVMEVerdict: No clear signal (low confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.014x ➖, 0↑ 0↓)
datafusion / parquet (1.010x ➖, 0↑ 0↓)
duckdb / vortex-file-compressed (1.016x ➖, 0↑ 0↓)
duckdb / parquet (1.006x ➖, 0↑ 0↓)
duckdb / duckdb (0.998x ➖, 0↑ 0↓)
File Size Changes (4 files changed, -0.0% overall, 1↑ 3↓)
Totals:
Full attributed analysis
|
Benchmarks: TPC-H SF=1 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.237x ➖, 0↑ 8↓)
datafusion / vortex-compact (1.365x ❌, 0↑ 12↓)
datafusion / parquet (1.057x ➖, 1↑ 3↓)
duckdb / vortex-file-compressed (1.166x ➖, 0↑ 5↓)
duckdb / vortex-compact (1.176x ➖, 0↑ 5↓)
duckdb / parquet (1.179x ➖, 0↑ 1↓)
Full attributed analysis
|
Benchmarks: CompressionVortex (geomean): 1.007x ➖ How to read Verdict and Engines
unknown / unknown (1.012x ➖, 0↑ 7↓)
|
Benchmarks: TPC-H SF=10 on S3Verdict: No clear signal (environment too noisy confidence) How to read Verdict and Engines
datafusion / vortex-file-compressed (1.203x ➖, 0↑ 4↓)
datafusion / vortex-compact (1.231x ➖, 0↑ 6↓)
datafusion / parquet (1.160x ➖, 0↑ 5↓)
duckdb / vortex-file-compressed (1.098x ➖, 0↑ 1↓)
duckdb / vortex-compact (1.133x ➖, 0↑ 2↓)
duckdb / parquet (1.139x ➖, 0↑ 0↓)
Full attributed analysis
|
Summary
Add support for serialising a
ArrayNodetree into a vortex array, with array stats. Before we always serialised these into flatbuffers that were appended to the tail of the serialised array buffers. This PR adds the scaffolding to get a vortex array that can later at read time used to deserialise the buffers into ArrayRef's.Main user of this would be the array tree layout, which will land in a followup PR