-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Open
Description
Summary
Add benchmark variants that run string-aggregate workloads with TopK disabled (representative case) in addition to the existing TopK-enabled worst-case benchmarks, so we can directly compare performance and verify correctness across Utf8 and Utf8View group keys.
Background
- The current
datafusion/core/benches/topk_aggregate.rsbenchmarks exercise the TopK-enabled code path for string aggregates. - @haohuaijin suggested adding non-TopK benchmarks so we can compare the TopK and non-TopK behavior/performances and validate the TopK fix in PR Fix TopK aggregation for UTF-8/Utf8View group keys and add safe fallback for unsupported string aggregates #19285.
Why this matters
- Allows straightforward measurement of TopK's benefit (or regression) relative to the fallback path.
- Helps validate correctness (e.g., Utf8View grouping) under both code paths.
- Provides repeatable benchmarks to include in PRs and discussions.
haohuaijin
Metadata
Metadata
Assignees
Labels
No labels