Hybrid layout design for HashJoin/Sort #119

ruochenj123 · 2026-01-14T20:41:58Z

What problem does this PR solve?

Issue Number: #11

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
🚀 Performance improvement (optimization)
⚠️ Breaking change (fix or feature that would cause existing functionality to change)
🔨 Refactoring (no logic changes)
🔧 Build/CI or Infrastructure changes
📝 Documentation only

Description

Currently HashJoin and Sort operations store data in row-based RowContainer, which incurs non-trivial layout conversion overhead. This PR introduces a hybrid storage design that keeps payload columns in their original columnar format while only storing keys in RowContainer, reducing this layout conversion overhead.

Main Changes

HybridContainer
- Separates key storage (in RowContainer) from payload storage (kept as RowVectorPtr).
- Introduces encoded HybridRowId to reference payload rows.
- HybridRowId encodes {containerId, rowId} to support multi-driver parallel execution in HashJoin.
Multi-driver support in HashJoin
- Each driver builds its own HybridContainer.
- After table merge, the allContainers_ map enables cross-container payload extraction during the probe phase.
Extraction optimizations
- coalesceBatches() flattens multiple payload batches into a single contiguous batch to reduce TLB misses during extraction.
- sortByContainerId() reorders rows by containerId before extraction to improve cache locality in multi-container scenarios.
- Prefetching during extraction to hide memory latency and reduce data loading time.
- isSingleContainer() provides a fast path that skips sorting overhead in the single-driver scenario.
Configuration options
- hybrid_join_enabled / hybrid_sort_enabled to opt in to hybrid execution.
- hybrid_join_reorder_enabled to control row reordering (disabled in tests to preserve deterministic output).

Performance Impact

No Impact: This change does not affect the critical path (e.g., build system, doc, error handling).

Positive Impact: I have run benchmarks.

Click to view Benchmark Results

Paste your google-benchmark or TPC-H results here.
Before: 10.5s
After:   8.2s  (+20%)

Negative Impact: Explained below (e.g., trade-off for correctness).

Tested on bolt_tpch_benchmark with sf=10 iteration=5. The performances are almost the same.

+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| Type     | q1   | q2   | q3   | q4   | q5   | q6   | q7   | q8   | q9   | q10  | q11  | q12  | q13  | q14  | q15  | q16  | q17  | q18  | q19  | q20  | q21  | q22  |
+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+
| baseline | 856  | 287.4| 1096 | 2492 | 1374 | 453.6| 1090 | 1420 | 1990 | 1662 | 151  | 558.8| 1544 | 874  | 717.4| 353.6| 3816 | 2250 | 952  | 639.2| 2234 | 583.2|
| hybrid   | 882.8| 315  | 1100 | 2390 | 1370 | 450.4| 1078 | 1266 | 2072 | 1738 | 149.2| 631  | 1532 | 852.6| 698.6| 343.4| 4052 | 2062 | 918.4| 664.4| 2248 | 545.4|
+----------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+------+

Release Note

Please describe the changes in this PR

Release Note:

Release Note:
- Fixed a crash in `substr` when input is null.
- optimized `group by` performance by 20%.

Checklist (For Author)

I have added/updated unit tests (ctest).
I have verified the code with local build (Release/Debug).
I have run clang-format / linters.
(Optional) I have run Sanitizers (ASAN/TSAN) locally for complex C++ changes.
No need to test or manual test.

Breaking Changes

No

Yes (Description: ...)

Click to view Breaking Changes

Breaking Changes:
- Description of the breaking change.
- Possible solutions or workarounds.
- Any other relevant information.

CLAassistant · 2026-01-14T20:42:05Z

All committers have signed the CLA.

yangzhg · 2026-01-21T02:24:47Z

bolt/exec/HashBuild.cpp

            << ", spill enabled: " << spillEnabled()
-            << ", maxHashTableSize = " << maxHashTableBucketCount_;
+            << ", maxHashTableSize = " << maxHashTableBucketCount_
+            << ", hybrid mode " << (hybridJoin_ ? "enabled" : "disbaled");


typo disabled

yangzhg · 2026-01-21T02:25:36Z

bolt/exec/HashBuild.cpp

+  if (hybridJoin_) {
+    BOLT_CHECK_LE(
+        driverId_,
+        255,


why hardcode limit to 255?

Currently we store a BIGINT (64 bits) of rowId where the top 8 bits represents the driverId and the remaining 56 bits represents the rowId for each driver. So the max # of driver it supports is 255. Maybe we can make it as a config.

yangzhg · 2026-01-21T03:10:09Z

bolt/exec/RowContainer.h

+    const T* rawValues = flatChild->rawValues();
+    const uint64_t* rawNulls = flatChild->rawNulls();
+
+    constexpr vector_size_t kPrefetchDist = 16;


Does the magic number 16 fit for all the arch? e.g. x86 arm?

I'm not sure. I tuned it on a x86_64 machine, k = 8-32 perform similarly.

ruochenj123 changed the title ~~Hybrid layout design for HashJoin/Sort~~ [WIP] Hybrid layout design for HashJoin/Sort Jan 14, 2026

markjin1990 requested review from kexianda and markjin1990 January 14, 2026 21:34

markjin1990 added the performance performance improvement needed label Jan 14, 2026

markjin1990 requested a review from fzhedu January 14, 2026 21:35

ruochenj123 force-pushed the hybrid-design branch from 641c03a to 03120aa Compare January 15, 2026 08:32

ruochenj123 added 6 commits January 15, 2026 10:10

WIP hybrid design

36a3177

implement hybrid for spill

fe56c10

add fast path

31e04b1

clang-format

6513688

fix lisence

6e64cfb

fix empty container case

dfe7dce

ruochenj123 force-pushed the hybrid-design branch from 03120aa to dfe7dce Compare January 15, 2026 15:11

markjin1990 changed the title ~~[WIP] Hybrid layout design for HashJoin/Sort~~ Hybrid layout design for HashJoin/Sort Jan 20, 2026

yangzhg reviewed Jan 21, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid layout design for HashJoin/Sort #119

Hybrid layout design for HashJoin/Sort #119

Uh oh!

ruochenj123 commented Jan 14, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Jan 14, 2026 •

edited

Loading

Uh oh!

yangzhg Jan 21, 2026

Uh oh!

yangzhg Jan 21, 2026

Uh oh!

ruochenj123 Jan 21, 2026

Uh oh!

yangzhg Jan 21, 2026

Uh oh!

ruochenj123 Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Hybrid layout design for HashJoin/Sort #119

Are you sure you want to change the base?

Hybrid layout design for HashJoin/Sort #119

Uh oh!

Conversation

ruochenj123 commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

Type of Change

Description

Main Changes

Performance Impact

Release Note

Checklist (For Author)

Breaking Changes

Uh oh!

CLAassistant commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yangzhg Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

yangzhg Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ruochenj123 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

yangzhg Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

ruochenj123 Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ruochenj123 commented Jan 14, 2026 •

edited

Loading

CLAassistant commented Jan 14, 2026 •

edited

Loading