-
Notifications
You must be signed in to change notification settings - Fork 45
Hybrid layout design for HashJoin/Sort #119
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
641c03a to
03120aa
Compare
03120aa to
dfe7dce
Compare
| << ", spill enabled: " << spillEnabled() | ||
| << ", maxHashTableSize = " << maxHashTableBucketCount_; | ||
| << ", maxHashTableSize = " << maxHashTableBucketCount_ | ||
| << ", hybrid mode " << (hybridJoin_ ? "enabled" : "disbaled"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo disabled
| if (hybridJoin_) { | ||
| BOLT_CHECK_LE( | ||
| driverId_, | ||
| 255, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why hardcode limit to 255?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently we store a BIGINT (64 bits) of rowId where the top 8 bits represents the driverId and the remaining 56 bits represents the rowId for each driver. So the max # of driver it supports is 255. Maybe we can make it as a config.
| const T* rawValues = flatChild->rawValues(); | ||
| const uint64_t* rawNulls = flatChild->rawNulls(); | ||
|
|
||
| constexpr vector_size_t kPrefetchDist = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the magic number 16 fit for all the arch? e.g. x86 arm?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. I tuned it on a x86_64 machine, k = 8-32 perform similarly.
What problem does this PR solve?
Issue Number: #11
Type of Change
Description
Currently HashJoin and Sort operations store data in row-based RowContainer, which incurs non-trivial layout conversion overhead. This PR introduces a hybrid storage design that keeps payload columns in their original columnar format while only storing keys in RowContainer, reducing this layout conversion overhead.
Main Changes
HybridContainer
RowContainer) from payload storage (kept asRowVectorPtr).HybridRowIdto reference payload rows.HybridRowIdencodes{containerId, rowId}to support multi-driver parallel execution in HashJoin.Multi-driver support in HashJoin
allContainers_map enables cross-container payload extraction during the probe phase.Extraction optimizations
coalesceBatches()flattens multiple payload batches into a single contiguous batch to reduce TLB misses during extraction.sortByContainerId()reorders rows by containerId before extraction to improve cache locality in multi-container scenarios.isSingleContainer()provides a fast path that skips sorting overhead in the single-driver scenario.Configuration options
hybrid_join_enabled/hybrid_sort_enabledto opt in to hybrid execution.hybrid_join_reorder_enabledto control row reordering (disabled in tests to preserve deterministic output).Performance Impact
No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).
Positive Impact: I have run benchmarks.
Click to view Benchmark Results
Negative Impact: Explained below (e.g., trade-off for correctness).
Tested on bolt_tpch_benchmark with sf=10 iteration=5. The performances are almost the same.
Release Note
Please describe the changes in this PR
Release Note:
Checklist (For Author)
Breaking Changes
No
Yes (Description: ...)
Click to view Breaking Changes