Skip to content

Conversation

@gabotechs
Copy link
Contributor

@gabotechs gabotechs commented Jan 12, 2026

Which issue does this PR close?

It does not close any issue, but it's related to:

Rationale for this change

This is a PR from a batch of PRs that attempt to improve performance in hash joins:

It adds the new BufferExec node at the top of the probe side of hash joins so that some work is eagerly performed before the build side of the hash join is completely finished.

Why should this speed up joins?

In order to better understand the impact of this PR, it's useful to understand how streams work in Rust: creating a stream does not perform any work, progress is just made if the stream gets polled.

This means that whenever we call .execute() on an ExecutionPlan (like the probe side of a join), nothing happens, not even the most basic TCP connections or system calls are performed. Instead, all this work is delayed as much as possible until the first poll is made to the stream, losing the opportunity to make some early progress.

This gets worst when multiple hash joins are chained together: they will get executed in cascade as if they were domino pieces, which has the benefit of leaving a small memory footprint, but underutilizes the resources of the machine for executing the query faster.

NOTE: still don't know if this improves the benchmarks, just experimenting for now

What changes are included in this PR?

Adds a new HashJoinBuffering physical optimizer rule that will idempotently place BufferExec nodes on the probe side of has joins:

            ┌───────────────────┐
            │   HashJoinExec    │
            └─────▲────────▲────┘
          ┌───────┘        └─────────┐
          │                          │
 ┌────────────────┐         ┌─────────────────┐
 │   Build side   │       + │   BufferExec    │
 └────────────────┘         └────────▲────────┘
                                     │
                            ┌────────┴────────┐
                            │   Probe side    │
                            └─────────────────┘

Are these changes tested?

yes, by existing tests

Are there any user-facing changes?

yes, users will see a new BufferExec being placed at top of the probe side of each hash join. (Still unsure about whether de default mode should be enabled)


Results

Warning

I'm very skeptical about this benchmarks run on my laptop, take them with a grain of salt, they should be run in a more controlled environment

Comparing main and hash-join-buffering-on-probe-side
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃       main ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │   37.80 ms │                          19.07 ms │ +1.98x faster │
│ QQuery 2  │  130.36 ms │                          54.25 ms │ +2.40x faster │
│ QQuery 3  │   99.05 ms │                          90.99 ms │ +1.09x faster │
│ QQuery 4  │  894.61 ms │                         340.70 ms │ +2.63x faster │
│ QQuery 5  │  151.16 ms │                         147.84 ms │     no change │
│ QQuery 6  │  566.37 ms │                         513.89 ms │ +1.10x faster │
│ QQuery 7  │  290.12 ms │                         248.25 ms │ +1.17x faster │
│ QQuery 8  │   97.46 ms │                          90.59 ms │ +1.08x faster │
│ QQuery 9  │   88.59 ms │                          94.18 ms │  1.06x slower │
│ QQuery 10 │   85.89 ms │                          48.71 ms │ +1.76x faster │
│ QQuery 11 │  567.85 ms │                         180.30 ms │ +3.15x faster │
│ QQuery 12 │   35.66 ms │                          32.78 ms │ +1.09x faster │
│ QQuery 13 │  313.89 ms │                         312.86 ms │     no change │
│ QQuery 14 │  741.51 ms │                         367.39 ms │ +2.02x faster │
│ QQuery 15 │   23.11 ms │                          49.44 ms │  2.14x slower │
│ QQuery 16 │   32.72 ms │                         109.53 ms │  3.35x slower │
│ QQuery 17 │  220.05 ms │                         160.70 ms │ +1.37x faster │
│ QQuery 18 │  114.36 ms │                         162.51 ms │  1.42x slower │
│ QQuery 19 │  133.50 ms │                         123.87 ms │ +1.08x faster │
│ QQuery 20 │   12.37 ms │                          52.66 ms │  4.26x slower │
│ QQuery 21 │   15.53 ms │                         132.58 ms │  8.54x slower │
│ QQuery 22 │  288.69 ms │                         375.91 ms │  1.30x slower │
│ QQuery 23 │  772.46 ms │                         488.07 ms │ +1.58x faster │
│ QQuery 24 │  340.42 ms │                         287.51 ms │ +1.18x faster │
│ QQuery 25 │  307.77 ms │                         195.09 ms │ +1.58x faster │
│ QQuery 26 │   81.78 ms │                         123.89 ms │  1.51x slower │
│ QQuery 27 │  297.72 ms │                         240.88 ms │ +1.24x faster │
│ QQuery 28 │  127.20 ms │                         127.28 ms │     no change │
│ QQuery 29 │  261.03 ms │                         161.52 ms │ +1.62x faster │
│ QQuery 30 │   35.53 ms │                          26.18 ms │ +1.36x faster │
│ QQuery 31 │  120.02 ms │                         101.47 ms │ +1.18x faster │
│ QQuery 32 │   48.49 ms │                          43.37 ms │ +1.12x faster │
│ QQuery 33 │  112.83 ms │                         110.45 ms │     no change │
│ QQuery 34 │   85.92 ms │                          80.71 ms │ +1.06x faster │
│ QQuery 35 │   81.94 ms │                          51.65 ms │ +1.59x faster │
│ QQuery 36 │  165.56 ms │                         168.79 ms │     no change │
│ QQuery 37 │  153.98 ms │                         155.81 ms │     no change │
│ QQuery 38 │   60.75 ms │                          53.06 ms │ +1.14x faster │
│ QQuery 39 │   81.49 ms │                         294.01 ms │  3.61x slower │
│ QQuery 40 │   87.94 ms │                          76.12 ms │ +1.16x faster │
│ QQuery 41 │   10.61 ms │                           9.61 ms │ +1.10x faster │
│ QQuery 42 │   89.63 ms │                          88.33 ms │     no change │
│ QQuery 43 │   69.61 ms │                          63.42 ms │ +1.10x faster │
│ QQuery 44 │    9.08 ms │                           7.78 ms │ +1.17x faster │
│ QQuery 45 │   53.17 ms │                          32.19 ms │ +1.65x faster │
│ QQuery 46 │  175.44 ms │                         167.41 ms │     no change │
│ QQuery 47 │  478.10 ms │                         123.03 ms │ +3.89x faster │
│ QQuery 48 │  224.20 ms │                         212.88 ms │ +1.05x faster │
│ QQuery 49 │  206.10 ms │                         200.87 ms │     no change │
│ QQuery 50 │  176.44 ms │                         141.12 ms │ +1.25x faster │
│ QQuery 51 │  141.42 ms │                         105.32 ms │ +1.34x faster │
│ QQuery 52 │   90.66 ms │                          89.26 ms │     no change │
│ QQuery 53 │   89.56 ms │                          83.37 ms │ +1.07x faster │
│ QQuery 54 │  123.43 ms │                         119.06 ms │     no change │
│ QQuery 55 │   88.73 ms │                          90.23 ms │     no change │
│ QQuery 56 │  114.66 ms │                         112.92 ms │     no change │
│ QQuery 57 │  131.64 ms │                          69.73 ms │ +1.89x faster │
│ QQuery 58 │  228.01 ms │                         127.59 ms │ +1.79x faster │
│ QQuery 59 │  169.17 ms │                         127.03 ms │ +1.33x faster │
│ QQuery 60 │  118.92 ms │                         115.28 ms │     no change │
│ QQuery 61 │  149.06 ms │                         147.06 ms │     no change │
│ QQuery 62 │  441.11 ms │                         433.50 ms │     no change │
│ QQuery 63 │   95.44 ms │                          85.84 ms │ +1.11x faster │
│ QQuery 64 │  606.32 ms │                         442.72 ms │ +1.37x faster │
│ QQuery 65 │  208.68 ms │                          91.03 ms │ +2.29x faster │
│ QQuery 66 │  188.17 ms │                         177.41 ms │ +1.06x faster │
│ QQuery 67 │  249.91 ms │                         234.31 ms │ +1.07x faster │
│ QQuery 68 │  235.92 ms │                         224.15 ms │     no change │
│ QQuery 69 │   89.95 ms │                          46.44 ms │ +1.94x faster │
│ QQuery 70 │  278.67 ms │                         203.35 ms │ +1.37x faster │
│ QQuery 71 │  109.23 ms │                         109.86 ms │     no change │
│ QQuery 72 │  508.24 ms │                         391.84 ms │ +1.30x faster │
│ QQuery 73 │   90.02 ms │                          78.49 ms │ +1.15x faster │
│ QQuery 74 │  373.75 ms │                         112.90 ms │ +3.31x faster │
│ QQuery 75 │  227.43 ms │                         172.97 ms │ +1.31x faster │
│ QQuery 76 │  116.42 ms │                         110.72 ms │     no change │
│ QQuery 77 │  170.31 ms │                         144.66 ms │ +1.18x faster │
│ QQuery 78 │  422.27 ms │                         245.42 ms │ +1.72x faster │
│ QQuery 79 │  190.47 ms │                         166.21 ms │ +1.15x faster │
│ QQuery 80 │  265.88 ms │                         242.36 ms │ +1.10x faster │
│ QQuery 81 │   23.05 ms │                          17.96 ms │ +1.28x faster │
│ QQuery 82 │  173.94 ms │                         162.41 ms │ +1.07x faster │
│ QQuery 83 │   40.37 ms │                          18.62 ms │ +2.17x faster │
│ QQuery 84 │   40.52 ms │                          26.07 ms │ +1.55x faster │
│ QQuery 85 │  138.45 ms │                          71.38 ms │ +1.94x faster │
│ QQuery 86 │   30.41 ms │                          28.27 ms │ +1.08x faster │
│ QQuery 87 │   62.64 ms │                          54.20 ms │ +1.16x faster │
│ QQuery 88 │   84.50 ms │                          74.60 ms │ +1.13x faster │
│ QQuery 89 │  108.95 ms │                          89.03 ms │ +1.22x faster │
│ QQuery 90 │   19.19 ms │                          16.36 ms │ +1.17x faster │
│ QQuery 91 │   53.45 ms │                          34.82 ms │ +1.54x faster │
│ QQuery 92 │   49.13 ms │                          25.47 ms │ +1.93x faster │
│ QQuery 93 │  151.86 ms │                         134.34 ms │ +1.13x faster │
│ QQuery 94 │   52.94 ms │                          46.45 ms │ +1.14x faster │
│ QQuery 95 │  125.23 ms │                          50.85 ms │ +2.46x faster │
│ QQuery 96 │   59.70 ms │                          54.86 ms │ +1.09x faster │
│ QQuery 97 │   99.90 ms │                          71.00 ms │ +1.41x faster │
│ QQuery 98 │  129.60 ms │                         111.11 ms │ +1.17x faster │
│ QQuery 99 │ 4562.37 ms │                        4353.70 ms │     no change │
└───────────┴────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (main)                                │ 21975.53ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 17884.01ms │
│ Average Time (main)                              │   221.98ms │
│ Average Time (hash-join-buffering-on-probe-side) │   180.65ms │
│ Queries Faster                                   │         70 │
│ Queries Slower                                   │          9 │
│ Queries with No Change                           │         20 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      main ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │  44.90 ms │                          40.62 ms │ +1.11x faster │
│ QQuery 2  │  18.76 ms │                          12.43 ms │ +1.51x faster │
│ QQuery 3  │  28.97 ms │                          23.39 ms │ +1.24x faster │
│ QQuery 4  │  17.85 ms │                          16.29 ms │ +1.10x faster │
│ QQuery 5  │  93.97 ms │                          43.91 ms │ +2.14x faster │
│ QQuery 6  │  17.08 ms │                          17.50 ms │     no change │
│ QQuery 7  │  90.73 ms │                          46.86 ms │ +1.94x faster │
│ QQuery 8  │  85.72 ms │                          36.05 ms │ +2.38x faster │
│ QQuery 9  │  74.19 ms │                          43.14 ms │ +1.72x faster │
│ QQuery 10 │  89.22 ms │                          39.76 ms │ +2.24x faster │
│ QQuery 11 │  13.64 ms │                           9.49 ms │ +1.44x faster │
│ QQuery 12 │  53.55 ms │                          28.44 ms │ +1.88x faster │
│ QQuery 13 │  20.46 ms │                          20.60 ms │     no change │
│ QQuery 14 │  44.52 ms │                          22.86 ms │ +1.95x faster │
│ QQuery 15 │  33.20 ms │                          27.10 ms │ +1.22x faster │
│ QQuery 16 │  12.82 ms │                          11.75 ms │ +1.09x faster │
│ QQuery 17 │  82.07 ms │                          50.03 ms │ +1.64x faster │
│ QQuery 18 │ 109.41 ms │                          62.02 ms │ +1.76x faster │
│ QQuery 19 │  39.01 ms │                          34.62 ms │ +1.13x faster │
│ QQuery 20 │  53.24 ms │                          26.53 ms │ +2.01x faster │
│ QQuery 21 │  76.87 ms │                          53.66 ms │ +1.43x faster │
│ QQuery 22 │   9.18 ms │                           8.46 ms │ +1.09x faster │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                                │ 1109.37ms │
│ Total Time (hash-join-buffering-on-probe-side)   │  675.51ms │
│ Average Time (main)                              │   50.43ms │
│ Average Time (hash-join-buffering-on-probe-side) │   30.71ms │
│ Queries Faster                                   │        20 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │         2 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      main ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 333.88 ms │                         333.10 ms │     no change │
│ QQuery 2  │ 149.56 ms │                          95.79 ms │ +1.56x faster │
│ QQuery 3  │ 291.89 ms │                         272.45 ms │ +1.07x faster │
│ QQuery 4  │ 115.77 ms │                         116.32 ms │     no change │
│ QQuery 5  │ 435.41 ms │                         408.67 ms │ +1.07x faster │
│ QQuery 6  │ 122.00 ms │                         119.41 ms │     no change │
│ QQuery 7  │ 597.53 ms │                         554.64 ms │ +1.08x faster │
│ QQuery 8  │ 505.06 ms │                         447.98 ms │ +1.13x faster │
│ QQuery 9  │ 718.08 ms │                         664.75 ms │ +1.08x faster │
│ QQuery 10 │ 355.45 ms │                         318.31 ms │ +1.12x faster │
│ QQuery 11 │ 117.63 ms │                          87.23 ms │ +1.35x faster │
│ QQuery 12 │ 229.20 ms │                         197.97 ms │ +1.16x faster │
│ QQuery 13 │ 250.32 ms │                         219.43 ms │ +1.14x faster │
│ QQuery 14 │ 197.94 ms │                         173.28 ms │ +1.14x faster │
│ QQuery 15 │ 318.42 ms │                         288.27 ms │ +1.10x faster │
│ QQuery 16 │  85.11 ms │                          66.98 ms │ +1.27x faster │
│ QQuery 17 │ 723.73 ms │                         667.37 ms │ +1.08x faster │
│ QQuery 18 │ 794.77 ms │                         726.88 ms │ +1.09x faster │
│ QQuery 19 │ 320.78 ms │                         292.61 ms │ +1.10x faster │
│ QQuery 20 │ 293.52 ms │                         258.06 ms │ +1.14x faster │
│ QQuery 21 │ 786.11 ms │                         732.63 ms │ +1.07x faster │
│ QQuery 22 │  84.85 ms │                          79.90 ms │ +1.06x faster │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (main)                                │ 7827.02ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 7122.04ms │
│ Average Time (main)                              │  355.77ms │
│ Average Time (hash-join-buffering-on-probe-side) │  323.73ms │
│ Queries Faster                                   │        19 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │         3 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

@github-actions github-actions bot added optimizer Optimizer rules core Core DataFusion crate sqllogictest SQL Logic Tests (.slt) common Related to common crate execution Related to the execution crate proto Related to proto crate datasource Changes to the datasource crate physical-plan Changes to the physical-plan crate labels Jan 12, 2026
@gabotechs
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 Hi @gabotechs, thanks for the request (#19761 (comment)). scrape_comments.py only responds to whitelisted users. Allowed users: Dandandan, Omega359, adriangb, alamb, comphead, geoffreyclaude, klion26, rluvaton, xudong963, zhuqi-lucas.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Jan 12, 2026
@gabotechs
Copy link
Contributor Author

run benchmarks

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpch_mem clickbench_partitioned clickbench_extended
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark clickbench_extended.json
--------------------
┏━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query    ┃        HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0 │  2479.50 ms │                        2365.91 ms │     no change │
│ QQuery 1 │   933.04 ms │                         961.61 ms │     no change │
│ QQuery 2 │  2128.72 ms │                        1828.41 ms │ +1.16x faster │
│ QQuery 3 │  1140.67 ms │                        1106.77 ms │     no change │
│ QQuery 4 │  2349.73 ms │                        2265.79 ms │     no change │
│ QQuery 5 │ 28477.94 ms │                       27819.90 ms │     no change │
│ QQuery 6 │  3913.85 ms │                        3886.72 ms │     no change │
│ QQuery 7 │  2907.17 ms │                        2857.38 ms │     no change │
└──────────┴─────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 44330.62ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 43092.50ms │
│ Average Time (HEAD)                              │  5541.33ms │
│ Average Time (hash-join-buffering-on-probe-side) │  5386.56ms │
│ Queries Faster                                   │          1 │
│ Queries Slower                                   │          0 │
│ Queries with No Change                           │          7 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark clickbench_partitioned.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 0  │     1.91 ms │                           1.94 ms │     no change │
│ QQuery 1  │    50.86 ms │                          51.03 ms │     no change │
│ QQuery 2  │   129.07 ms │                         131.06 ms │     no change │
│ QQuery 3  │   151.75 ms │                         154.89 ms │     no change │
│ QQuery 4  │  1070.04 ms │                        1218.71 ms │  1.14x slower │
│ QQuery 5  │  1377.65 ms │                        1501.78 ms │  1.09x slower │
│ QQuery 6  │     1.82 ms │                           1.87 ms │     no change │
│ QQuery 7  │    56.03 ms │                          61.22 ms │  1.09x slower │
│ QQuery 8  │  1423.84 ms │                        1561.18 ms │  1.10x slower │
│ QQuery 9  │  1748.54 ms │                        1871.82 ms │  1.07x slower │
│ QQuery 10 │   343.11 ms │                         350.58 ms │     no change │
│ QQuery 11 │   390.93 ms │                         400.26 ms │     no change │
│ QQuery 12 │  1249.28 ms │                        1460.10 ms │  1.17x slower │
│ QQuery 13 │  1916.12 ms │                        2067.22 ms │  1.08x slower │
│ QQuery 14 │  1214.64 ms │                        1359.01 ms │  1.12x slower │
│ QQuery 15 │  1224.35 ms │                        1382.17 ms │  1.13x slower │
│ QQuery 16 │  2587.35 ms │                        2651.10 ms │     no change │
│ QQuery 17 │  2481.42 ms │                        2645.83 ms │  1.07x slower │
│ QQuery 18 │  6019.63 ms │                        4969.84 ms │ +1.21x faster │
│ QQuery 19 │   118.04 ms │                         122.91 ms │     no change │
│ QQuery 20 │  1977.36 ms │                        1907.42 ms │     no change │
│ QQuery 21 │  2282.79 ms │                        2227.74 ms │     no change │
│ QQuery 22 │  4147.94 ms │                        3809.68 ms │ +1.09x faster │
│ QQuery 23 │ 18037.69 ms │                       12405.70 ms │ +1.45x faster │
│ QQuery 24 │   203.52 ms │                         236.74 ms │  1.16x slower │
│ QQuery 25 │   482.62 ms │                         517.70 ms │  1.07x slower │
│ QQuery 26 │   218.15 ms │                         233.60 ms │  1.07x slower │
│ QQuery 27 │  2805.96 ms │                        2772.92 ms │     no change │
│ QQuery 28 │ 22174.76 ms │                       21847.75 ms │     no change │
│ QQuery 29 │   977.94 ms │                         952.08 ms │     no change │
│ QQuery 30 │  1315.68 ms │                        1336.40 ms │     no change │
│ QQuery 31 │  1366.16 ms │                        1421.09 ms │     no change │
│ QQuery 32 │  5155.78 ms │                        4350.28 ms │ +1.19x faster │
│ QQuery 33 │  5715.37 ms │                        5687.38 ms │     no change │
│ QQuery 34 │  6016.46 ms │                        5853.35 ms │     no change │
│ QQuery 35 │  1918.61 ms │                        2098.03 ms │  1.09x slower │
│ QQuery 36 │    67.22 ms │                          70.35 ms │     no change │
│ QQuery 37 │    45.47 ms │                          49.54 ms │  1.09x slower │
│ QQuery 38 │    65.42 ms │                          68.24 ms │     no change │
│ QQuery 39 │   104.44 ms │                         111.22 ms │  1.06x slower │
│ QQuery 40 │    27.46 ms │                          27.62 ms │     no change │
│ QQuery 41 │    23.04 ms │                          24.38 ms │  1.06x slower │
│ QQuery 42 │    19.89 ms │                          21.71 ms │  1.09x slower │
└───────────┴─────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 98706.12ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 91995.42ms │
│ Average Time (HEAD)                              │  2295.49ms │
│ Average Time (hash-join-buffering-on-probe-side) │  2139.43ms │
│ Queries Faster                                   │          4 │
│ Queries Slower                                   │         18 │
│ Queries with No Change                           │         21 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘
--------------------
Benchmark tpch_mem_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 140.65 ms │                         101.97 ms │ +1.38x faster │
│ QQuery 2  │  37.21 ms │                          30.95 ms │ +1.20x faster │
│ QQuery 3  │  44.92 ms │                          32.31 ms │ +1.39x faster │
│ QQuery 4  │  31.87 ms │                          30.19 ms │ +1.06x faster │
│ QQuery 5  │  92.53 ms │                          94.55 ms │     no change │
│ QQuery 6  │  21.01 ms │                          20.99 ms │     no change │
│ QQuery 7  │ 157.97 ms │                         165.53 ms │     no change │
│ QQuery 8  │  41.01 ms │                          35.06 ms │ +1.17x faster │
│ QQuery 9  │ 102.50 ms │                          93.90 ms │ +1.09x faster │
│ QQuery 10 │  68.82 ms │                          67.90 ms │     no change │
│ QQuery 11 │  19.57 ms │                          17.92 ms │ +1.09x faster │
│ QQuery 12 │  52.47 ms │                          54.41 ms │     no change │
│ QQuery 13 │  50.52 ms │                          47.74 ms │ +1.06x faster │
│ QQuery 14 │  15.26 ms │                          15.25 ms │     no change │
│ QQuery 15 │  31.19 ms │                          30.51 ms │     no change │
│ QQuery 16 │  30.26 ms │                          28.22 ms │ +1.07x faster │
│ QQuery 17 │ 144.19 ms │                         150.21 ms │     no change │
│ QQuery 18 │ 286.83 ms │                         262.07 ms │ +1.09x faster │
│ QQuery 19 │  40.60 ms │                          41.31 ms │     no change │
│ QQuery 20 │  57.30 ms │                          56.06 ms │     no change │
│ QQuery 21 │ 188.92 ms │                         179.62 ms │     no change │
│ QQuery 22 │  22.42 ms │                          22.15 ms │     no change │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 1678.03ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 1578.79ms │
│ Average Time (HEAD)                              │   76.27ms │
│ Average Time (hash-join-buffering-on-probe-side) │   71.76ms │
│ Queries Faster                                   │        10 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │        12 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

@gabotechs
Copy link
Contributor Author

run benchmark tpcds tpch10

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

Benchmark script failed with exit code 1.

Last 10 lines of output:

Click to expand
BRANCH_NAME: HEAD
DATA_DIR: /home/alamb/arrow-datafusion/benchmarks/data
RESULTS_DIR: /home/alamb/arrow-datafusion/benchmarks/results/HEAD
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************

Please prepare TPC-DS data first by following instructions:
  ./bench.sh data tpcds

@gabotechs
Copy link
Contributor Author

run benchmark tpch10

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpch10
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpch_sf10.json
--------------------
┏━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━┓
┃ Query     ┃ HEAD ┃ hash-join-buffering-on-probe-side ┃       Change ┃
┡━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━┩
│ QQuery 1  │ FAIL │                              FAIL │ incomparable │
│ QQuery 2  │ FAIL │                              FAIL │ incomparable │
│ QQuery 3  │ FAIL │                              FAIL │ incomparable │
│ QQuery 4  │ FAIL │                              FAIL │ incomparable │
│ QQuery 5  │ FAIL │                              FAIL │ incomparable │
│ QQuery 6  │ FAIL │                              FAIL │ incomparable │
│ QQuery 7  │ FAIL │                              FAIL │ incomparable │
│ QQuery 8  │ FAIL │                              FAIL │ incomparable │
│ QQuery 9  │ FAIL │                              FAIL │ incomparable │
│ QQuery 10 │ FAIL │                              FAIL │ incomparable │
│ QQuery 11 │ FAIL │                              FAIL │ incomparable │
│ QQuery 12 │ FAIL │                              FAIL │ incomparable │
│ QQuery 13 │ FAIL │                              FAIL │ incomparable │
│ QQuery 14 │ FAIL │                              FAIL │ incomparable │
│ QQuery 15 │ FAIL │                              FAIL │ incomparable │
│ QQuery 16 │ FAIL │                              FAIL │ incomparable │
│ QQuery 17 │ FAIL │                              FAIL │ incomparable │
│ QQuery 18 │ FAIL │                              FAIL │ incomparable │
│ QQuery 19 │ FAIL │                              FAIL │ incomparable │
│ QQuery 20 │ FAIL │                              FAIL │ incomparable │
│ QQuery 21 │ FAIL │                              FAIL │ incomparable │
│ QQuery 22 │ FAIL │                              FAIL │ incomparable │
└───────────┴──────┴───────────────────────────────────┴──────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━┓
┃ Benchmark Summary                                ┃        ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━┩
│ Total Time (HEAD)                                │ 0.00ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 0.00ms │
│ Average Time (HEAD)                              │ 0.00ms │
│ Average Time (hash-join-buffering-on-probe-side) │ 0.00ms │
│ Queries Faster                                   │      0 │
│ Queries Slower                                   │      0 │
│ Queries with No Change                           │      0 │
│ Queries with Failure                             │     22 │
└──────────────────────────────────────────────────┴────────┘

@gabotechs
Copy link
Contributor Author

run benchmark tpch

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 186.54 ms │                         180.81 ms │     no change │
│ QQuery 2  │  92.79 ms │                          48.71 ms │ +1.90x faster │
│ QQuery 3  │ 129.28 ms │                         106.07 ms │ +1.22x faster │
│ QQuery 4  │  80.78 ms │                          74.64 ms │ +1.08x faster │
│ QQuery 5  │ 186.74 ms │                         163.71 ms │ +1.14x faster │
│ QQuery 6  │  70.54 ms │                          66.87 ms │ +1.06x faster │
│ QQuery 7  │ 222.50 ms │                         194.54 ms │ +1.14x faster │
│ QQuery 8  │ 175.16 ms │                         125.23 ms │ +1.40x faster │
│ QQuery 9  │ 231.17 ms │                         174.24 ms │ +1.33x faster │
│ QQuery 10 │ 190.18 ms │                         148.84 ms │ +1.28x faster │
│ QQuery 11 │  70.01 ms │                          46.31 ms │ +1.51x faster │
│ QQuery 12 │ 120.18 ms │                         109.09 ms │ +1.10x faster │
│ QQuery 13 │ 219.34 ms │                         204.01 ms │ +1.08x faster │
│ QQuery 14 │  95.98 ms │                          88.23 ms │ +1.09x faster │
│ QQuery 15 │ 132.46 ms │                         100.40 ms │ +1.32x faster │
│ QQuery 16 │  64.09 ms │                          46.41 ms │ +1.38x faster │
│ QQuery 17 │ 280.98 ms │                         211.97 ms │ +1.33x faster │
│ QQuery 18 │ 332.62 ms │                         271.65 ms │ +1.22x faster │
│ QQuery 19 │ 140.44 ms │                         130.87 ms │ +1.07x faster │
│ QQuery 20 │ 135.30 ms │                         100.57 ms │ +1.35x faster │
│ QQuery 21 │ 265.90 ms │                         234.12 ms │ +1.14x faster │
│ QQuery 22 │  41.36 ms │                          37.33 ms │ +1.11x faster │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 3464.36ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 2864.63ms │
│ Average Time (HEAD)                              │  157.47ms │
│ Average Time (hash-join-buffering-on-probe-side) │  130.21ms │
│ Queries Faster                                   │        21 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │         1 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

@gabotechs
Copy link
Contributor Author

run benchmark tpcds

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

Benchmark script failed with exit code 1.

Last 10 lines of output:

Click to expand
BRANCH_NAME: HEAD
DATA_DIR: /home/alamb/arrow-datafusion/benchmarks/data
RESULTS_DIR: /home/alamb/arrow-datafusion/benchmarks/results/HEAD
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************

Please prepare TPC-DS data first by following instructions:
  ./bench.sh data tpcds

@gabotechs
Copy link
Contributor Author

🤔 the tpcds benchmark command seems broken

@Dandandan
Copy link
Contributor

Dandandan commented Jan 12, 2026

Interesting idea, do you have some insights on the memory usage vs not doing this "eager execution"?

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpch
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpch_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃      HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │ 176.46 ms │                         181.14 ms │     no change │
│ QQuery 2  │  96.16 ms │                          51.82 ms │ +1.86x faster │
│ QQuery 3  │ 133.76 ms │                         110.27 ms │ +1.21x faster │
│ QQuery 4  │  84.55 ms │                          70.30 ms │ +1.20x faster │
│ QQuery 5  │ 175.00 ms │                         161.67 ms │ +1.08x faster │
│ QQuery 6  │  66.67 ms │                          69.00 ms │     no change │
│ QQuery 7  │ 210.79 ms │                         191.70 ms │ +1.10x faster │
│ QQuery 8  │ 167.95 ms │                         121.06 ms │ +1.39x faster │
│ QQuery 9  │ 230.88 ms │                         172.45 ms │ +1.34x faster │
│ QQuery 10 │ 189.20 ms │                         152.50 ms │ +1.24x faster │
│ QQuery 11 │  63.40 ms │                          44.03 ms │ +1.44x faster │
│ QQuery 12 │ 117.89 ms │                         108.69 ms │ +1.08x faster │
│ QQuery 13 │ 214.23 ms │                         201.13 ms │ +1.07x faster │
│ QQuery 14 │  92.73 ms │                          84.70 ms │ +1.09x faster │
│ QQuery 15 │ 127.15 ms │                         100.18 ms │ +1.27x faster │
│ QQuery 16 │  61.95 ms │                          45.03 ms │ +1.38x faster │
│ QQuery 17 │ 265.32 ms │                         214.97 ms │ +1.23x faster │
│ QQuery 18 │ 306.08 ms │                         276.84 ms │ +1.11x faster │
│ QQuery 19 │ 136.73 ms │                         133.87 ms │     no change │
│ QQuery 20 │ 128.70 ms │                         101.25 ms │ +1.27x faster │
│ QQuery 21 │ 257.48 ms │                         232.27 ms │ +1.11x faster │
│ QQuery 22 │  41.03 ms │                          36.87 ms │ +1.11x faster │
└───────────┴───────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃           ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 3344.12ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 2861.74ms │
│ Average Time (HEAD)                              │  152.01ms │
│ Average Time (hash-join-buffering-on-probe-side) │  130.08ms │
│ Queries Faster                                   │        19 │
│ Queries Slower                                   │         0 │
│ Queries with No Change                           │         3 │
│ Queries with Failure                             │         0 │
└──────────────────────────────────────────────────┴───────────┘

@Dandandan
Copy link
Contributor

For.tpcds, it seems mostly speedups but also some (4x) slowdowns. Any way we could avoid those?

@gabotechs
Copy link
Contributor Author

Interesting idea, do you have some insights on the memory usage vs not doing this "eager execution"?

This definitely has an impact to memory consumption, as it holds record batches in-memory until the hash join decides to start consuming them. This is the reason why it's important to put a limit to how much memory is buffered (currently configurable).

With the current setup reported in the benchmarks, it will buffer at most 1Mb per partition (can be configured with execution.hash_join_buffering_capacity), so the memory footprint is at most ~1Mb * execution.target_partitions per hash join present in the query.

@gabotechs
Copy link
Contributor Author

For.tpcds, it seems mostly speedups but also some (4x) slowdowns. Any way we could avoid those?

I would not trust too much the benchmarks I reported in the PR description, for the good and for the bad, I've seen that the same query can take 300ms or 2500ms depending on whatever my specific laptop decides to be doing while the benchmark runs.

I'd like to run the TPC-DS benchmarks using robot Andrew, which I assume they run in a more stable environment than my laptop.

@Dandandan
Copy link
Contributor

Dandandan commented Jan 13, 2026

Thanks for the explanations!

  • It might make sense to not have a buffer limit per partition, but on a BufferExec level? E.g. a 16MB limit instead of 16 * 1MB

  • I am also wondering in the presence of limits / early exits, more "eager" evaluation might do some high amount of work loading probe sides not needed in the end, perhaps we can detect this and not apply BufferExec in those cases? Or just load a minimal amount based on the limit?

@Dandandan
Copy link
Contributor

run benchmark tpcds

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

Benchmark script failed with exit code 1.

Last 10 lines of output:

Click to expand
BRANCH_NAME: HEAD
DATA_DIR: /home/alamb/arrow-datafusion/benchmarks/data
RESULTS_DIR: /home/alamb/arrow-datafusion/benchmarks/results/HEAD
CARGO_COMMAND: cargo run --release
PREFER_HASH_JOIN: true
***************************

Please prepare TPC-DS data first by following instructions:
  ./bench.sh data tpcds

@gabotechs
Copy link
Contributor Author

It might make sense to not have a buffer limit per partition, but on a BufferExec level? E.g. a 16MB limit instead of 16 * 1MB

That would be easy to do, however, I fear that it can very easily end up in deadlocks. For example, if partition 0 exhausts all the memory budget, polling any other partition will block until someone pulls something out of partition 0, which might never happen as whoever could potentially poll partition 0 is to busy deadlocked on partition 1.

A more health behavior IMO would be to have a memory budget per-partition and just put the limit lower: rather than having a global 10Mb, just have a per-partition 1Mb limit.

I am also wondering in the presence of limits / early exits, more "eager" evaluation might do some high amount of work loading probe sides not needed in the end, perhaps we can detect this and not apply BufferExec in those cases? Or just load a minimal amount based on the limit?

🤔 that's interesting, we do might be able react appropriately to with_fetch() in BufferExec in order to buffer at most fetch rows as an earlier limit to the configured memory budget.

@Dandandan
Copy link
Contributor

run benchmark tpcds

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    74.04 ms │                          44.70 ms │ +1.66x faster │
│ QQuery 2  │   211.54 ms │                         117.40 ms │ +1.80x faster │
│ QQuery 3  │   164.11 ms │                         146.09 ms │ +1.12x faster │
│ QQuery 4  │  1876.88 ms │                         816.05 ms │ +2.30x faster │
│ QQuery 5  │   292.98 ms │                         267.92 ms │ +1.09x faster │
│ QQuery 6  │  1426.19 ms │                        1182.01 ms │ +1.21x faster │
│ QQuery 7  │   491.02 ms │                         457.64 ms │ +1.07x faster │
│ QQuery 8  │   176.52 ms │                         163.67 ms │ +1.08x faster │
│ QQuery 9  │   306.59 ms │                         300.29 ms │     no change │
│ QQuery 10 │   177.42 ms │                         113.19 ms │ +1.57x faster │
│ QQuery 11 │  1263.86 ms │                         504.32 ms │ +2.51x faster │
│ QQuery 12 │    69.65 ms │                          53.80 ms │ +1.29x faster │
│ QQuery 13 │   548.27 ms │                         505.42 ms │ +1.08x faster │
│ QQuery 14 │  1907.31 ms │                        1421.21 ms │ +1.34x faster │
│ QQuery 15 │    29.62 ms │                         111.16 ms │  3.75x slower │
│ QQuery 16 │    64.23 ms │                         174.52 ms │  2.72x slower │
│ QQuery 17 │   358.70 ms │                         323.75 ms │ +1.11x faster │
│ QQuery 18 │   194.27 ms │                         331.66 ms │  1.71x slower │
│ QQuery 19 │   231.44 ms │                         205.67 ms │ +1.13x faster │
│ QQuery 20 │    25.35 ms │                          88.86 ms │  3.51x slower │
│ QQuery 21 │    39.23 ms │                         250.06 ms │  6.38x slower │
│ QQuery 22 │   705.13 ms │                         772.32 ms │  1.10x slower │
│ QQuery 23 │  1747.34 ms │                        1704.10 ms │     no change │
│ QQuery 24 │   650.55 ms │                         575.09 ms │ +1.13x faster │
│ QQuery 25 │   516.40 ms │                         378.87 ms │ +1.36x faster │
│ QQuery 26 │   128.02 ms │                         243.55 ms │  1.90x slower │
│ QQuery 27 │   490.69 ms │                         457.35 ms │ +1.07x faster │
│ QQuery 28 │   311.45 ms │                         309.04 ms │     no change │
│ QQuery 29 │   441.28 ms │                         323.29 ms │ +1.36x faster │
│ QQuery 30 │    77.34 ms │                          58.17 ms │ +1.33x faster │
│ QQuery 31 │   314.86 ms │                         227.26 ms │ +1.39x faster │
│ QQuery 32 │    83.81 ms │                          71.98 ms │ +1.16x faster │
│ QQuery 33 │   212.24 ms │                         211.75 ms │     no change │
│ QQuery 34 │   162.21 ms │                         143.61 ms │ +1.13x faster │
│ QQuery 35 │   174.84 ms │                         127.93 ms │ +1.37x faster │
│ QQuery 36 │   288.58 ms │                         288.38 ms │     no change │
│ QQuery 37 │   253.48 ms │                         251.77 ms │     no change │
│ QQuery 38 │   152.25 ms │                         130.66 ms │ +1.17x faster │
│ QQuery 39 │   206.28 ms │                         616.52 ms │  2.99x slower │
│ QQuery 40 │   172.74 ms │                         162.90 ms │ +1.06x faster │
│ QQuery 41 │    22.88 ms │                          21.08 ms │ +1.09x faster │
│ QQuery 42 │   150.01 ms │                         138.01 ms │ +1.09x faster │
│ QQuery 43 │   129.42 ms │                         113.35 ms │ +1.14x faster │
│ QQuery 44 │    28.05 ms │                          26.93 ms │     no change │
│ QQuery 45 │    86.05 ms │                          71.32 ms │ +1.21x faster │
│ QQuery 46 │   323.39 ms │                         293.57 ms │ +1.10x faster │
│ QQuery 47 │  1020.85 ms │                         374.39 ms │ +2.73x faster │
│ QQuery 48 │   402.68 ms │                         378.88 ms │ +1.06x faster │
│ QQuery 49 │   370.82 ms │                         349.24 ms │ +1.06x faster │
│ QQuery 50 │   330.80 ms │                         305.01 ms │ +1.08x faster │
│ QQuery 51 │   300.20 ms │                         244.82 ms │ +1.23x faster │
│ QQuery 52 │   147.72 ms │                         137.81 ms │ +1.07x faster │
│ QQuery 53 │   151.60 ms │                         139.08 ms │ +1.09x faster │
│ QQuery 54 │   227.12 ms │                         189.47 ms │ +1.20x faster │
│ QQuery 55 │   148.87 ms │                         137.51 ms │ +1.08x faster │
│ QQuery 56 │   211.10 ms │                         205.67 ms │     no change │
│ QQuery 57 │   294.17 ms │                         201.04 ms │ +1.46x faster │
│ QQuery 58 │   471.25 ms │                         296.58 ms │ +1.59x faster │
│ QQuery 59 │   292.52 ms │                         244.71 ms │ +1.20x faster │
│ QQuery 60 │   215.60 ms │                         212.67 ms │     no change │
│ QQuery 61 │   244.07 ms │                         253.37 ms │     no change │
│ QQuery 62 │  1302.36 ms │                        1256.77 ms │     no change │
│ QQuery 63 │   154.42 ms │                         140.25 ms │ +1.10x faster │
│ QQuery 64 │  1155.14 ms │                         996.81 ms │ +1.16x faster │
│ QQuery 65 │   356.87 ms │                         176.11 ms │ +2.03x faster │
│ QQuery 66 │   387.01 ms │                         401.11 ms │     no change │
│ QQuery 67 │   533.93 ms │                         516.58 ms │     no change │
│ QQuery 68 │   375.03 ms │                         358.80 ms │     no change │
│ QQuery 69 │   170.85 ms │                         111.88 ms │ +1.53x faster │
│ QQuery 70 │   503.57 ms │                         396.10 ms │ +1.27x faster │
│ QQuery 71 │   190.73 ms │                         187.03 ms │     no change │
│ QQuery 72 │  2044.52 ms │                       11637.03 ms │  5.69x slower │
│ QQuery 73 │   158.34 ms │                         144.26 ms │ +1.10x faster │
│ QQuery 74 │   797.68 ms │                         347.86 ms │ +2.29x faster │
│ QQuery 75 │   408.67 ms │                         391.63 ms │     no change │
│ QQuery 76 │   187.48 ms │                         182.58 ms │     no change │
│ QQuery 77 │   300.57 ms │                         264.04 ms │ +1.14x faster │
│ QQuery 78 │   925.99 ms │                         687.13 ms │ +1.35x faster │
│ QQuery 79 │   326.55 ms │                         303.54 ms │ +1.08x faster │
│ QQuery 80 │   515.45 ms │                         490.71 ms │     no change │
│ QQuery 81 │    53.24 ms │                          43.91 ms │ +1.21x faster │
│ QQuery 82 │   284.29 ms │                         259.03 ms │ +1.10x faster │
│ QQuery 83 │    80.60 ms │                          55.97 ms │ +1.44x faster │
│ QQuery 84 │    69.41 ms │                          52.27 ms │ +1.33x faster │
│ QQuery 85 │   228.06 ms │                         157.50 ms │ +1.45x faster │
│ QQuery 86 │    60.21 ms │                          54.78 ms │ +1.10x faster │
│ QQuery 87 │   151.56 ms │                         127.95 ms │ +1.18x faster │
│ QQuery 88 │   274.28 ms │                         265.11 ms │     no change │
│ QQuery 89 │   170.70 ms │                         149.77 ms │ +1.14x faster │
│ QQuery 90 │    45.84 ms │                          44.39 ms │     no change │
│ QQuery 91 │    93.85 ms │                          70.36 ms │ +1.33x faster │
│ QQuery 92 │    83.75 ms │                          51.22 ms │ +1.64x faster │
│ QQuery 93 │   267.01 ms │                         246.05 ms │ +1.09x faster │
│ QQuery 94 │    93.84 ms │                          84.47 ms │ +1.11x faster │
│ QQuery 95 │   240.79 ms │                         159.30 ms │ +1.51x faster │
│ QQuery 96 │   116.75 ms │                         109.48 ms │ +1.07x faster │
│ QQuery 97 │   186.67 ms │                         149.50 ms │ +1.25x faster │
│ QQuery 98 │   219.37 ms │                         168.59 ms │ +1.30x faster │
│ QQuery 99 │ 14072.57 ms │                       14124.13 ms │     no change │
└───────────┴─────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 50175.82ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 54332.40ms │
│ Average Time (HEAD)                              │   506.83ms │
│ Average Time (hash-join-buffering-on-probe-side) │   548.81ms │
│ Queries Faster                                   │         69 │
│ Queries Slower                                   │          9 │
│ Queries with No Change                           │         21 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

@gabotechs
Copy link
Contributor Author

run benchmark tpcds

@alamb-ghbot
Copy link

🤖 ./gh_compare_branch.sh gh_compare_branch.sh Running
Linux aal-dev 6.14.0-1018-gcp #19~24.04.1-Ubuntu SMP Wed Sep 24 23:23:09 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Comparing hash-join-buffering-on-probe-side (3e4660b) to 0c5c97b diff using: tpcds
Results will be posted here when complete

@alamb-ghbot
Copy link

🤖: Benchmark completed

Details

Comparing HEAD and hash-join-buffering-on-probe-side
--------------------
Benchmark tpcds_sf1.json
--------------------
┏━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query     ┃        HEAD ┃ hash-join-buffering-on-probe-side ┃        Change ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1  │    72.13 ms │                          46.54 ms │ +1.55x faster │
│ QQuery 2  │   213.83 ms │                         121.49 ms │ +1.76x faster │
│ QQuery 3  │   162.67 ms │                         144.96 ms │ +1.12x faster │
│ QQuery 4  │  1888.89 ms │                         846.37 ms │ +2.23x faster │
│ QQuery 5  │   286.61 ms │                         276.72 ms │     no change │
│ QQuery 6  │  1472.04 ms │                        1188.47 ms │ +1.24x faster │
│ QQuery 7  │   506.45 ms │                         461.49 ms │ +1.10x faster │
│ QQuery 8  │   179.84 ms │                         163.70 ms │ +1.10x faster │
│ QQuery 9  │   301.56 ms │                         294.23 ms │     no change │
│ QQuery 10 │   173.72 ms │                         119.63 ms │ +1.45x faster │
│ QQuery 11 │  1276.58 ms │                         520.15 ms │ +2.45x faster │
│ QQuery 12 │    70.65 ms │                          54.67 ms │ +1.29x faster │
│ QQuery 13 │   551.59 ms │                         511.33 ms │ +1.08x faster │
│ QQuery 14 │  1914.84 ms │                        1417.35 ms │ +1.35x faster │
│ QQuery 15 │    30.18 ms │                         109.81 ms │  3.64x slower │
│ QQuery 16 │    65.35 ms │                         174.27 ms │  2.67x slower │
│ QQuery 17 │   355.69 ms │                         317.68 ms │ +1.12x faster │
│ QQuery 18 │   195.33 ms │                         322.07 ms │  1.65x slower │
│ QQuery 19 │   230.45 ms │                         205.17 ms │ +1.12x faster │
│ QQuery 20 │    26.41 ms │                          83.97 ms │  3.18x slower │
│ QQuery 21 │    38.42 ms │                         239.44 ms │  6.23x slower │
│ QQuery 22 │   707.02 ms │                         775.70 ms │  1.10x slower │
│ QQuery 23 │  1759.11 ms │                        1702.52 ms │     no change │
│ QQuery 24 │   650.20 ms │                         578.79 ms │ +1.12x faster │
│ QQuery 25 │   520.53 ms │                         375.28 ms │ +1.39x faster │
│ QQuery 26 │   127.14 ms │                         249.25 ms │  1.96x slower │
│ QQuery 27 │   506.20 ms │                         459.81 ms │ +1.10x faster │
│ QQuery 28 │   311.09 ms │                         311.86 ms │     no change │
│ QQuery 29 │   442.52 ms │                         321.71 ms │ +1.38x faster │
│ QQuery 30 │    74.33 ms │                          58.42 ms │ +1.27x faster │
│ QQuery 31 │   322.99 ms │                         230.00 ms │ +1.40x faster │
│ QQuery 32 │    86.09 ms │                          74.36 ms │ +1.16x faster │
│ QQuery 33 │   212.72 ms │                         206.90 ms │     no change │
│ QQuery 34 │   163.62 ms │                         145.24 ms │ +1.13x faster │
│ QQuery 35 │   179.33 ms │                         124.47 ms │ +1.44x faster │
│ QQuery 36 │   289.68 ms │                         285.16 ms │     no change │
│ QQuery 37 │   256.52 ms │                         258.03 ms │     no change │
│ QQuery 38 │   152.83 ms │                         128.51 ms │ +1.19x faster │
│ QQuery 39 │   204.12 ms │                         661.07 ms │  3.24x slower │
│ QQuery 40 │   171.49 ms │                         155.84 ms │ +1.10x faster │
│ QQuery 41 │    23.33 ms │                          21.48 ms │ +1.09x faster │
│ QQuery 42 │   148.54 ms │                         138.65 ms │ +1.07x faster │
│ QQuery 43 │   131.26 ms │                         112.00 ms │ +1.17x faster │
│ QQuery 44 │    29.13 ms │                          27.14 ms │ +1.07x faster │
│ QQuery 45 │    83.45 ms │                          70.32 ms │ +1.19x faster │
│ QQuery 46 │   326.72 ms │                         302.35 ms │ +1.08x faster │
│ QQuery 47 │  1051.17 ms │                         375.59 ms │ +2.80x faster │
│ QQuery 48 │   405.49 ms │                         374.86 ms │ +1.08x faster │
│ QQuery 49 │   371.95 ms │                         351.60 ms │ +1.06x faster │
│ QQuery 50 │   334.58 ms │                         301.09 ms │ +1.11x faster │
│ QQuery 51 │   302.79 ms │                         248.39 ms │ +1.22x faster │
│ QQuery 52 │   150.07 ms │                         140.97 ms │ +1.06x faster │
│ QQuery 53 │   153.47 ms │                         137.04 ms │ +1.12x faster │
│ QQuery 54 │   228.96 ms │                         209.98 ms │ +1.09x faster │
│ QQuery 55 │   149.41 ms │                         140.02 ms │ +1.07x faster │
│ QQuery 56 │   210.53 ms │                         214.56 ms │     no change │
│ QQuery 57 │   298.38 ms │                         201.33 ms │ +1.48x faster │
│ QQuery 58 │   484.00 ms │                         299.61 ms │ +1.62x faster │
│ QQuery 59 │   296.72 ms │                         251.57 ms │ +1.18x faster │
│ QQuery 60 │   216.52 ms │                         212.91 ms │     no change │
│ QQuery 61 │   250.28 ms │                         251.23 ms │     no change │
│ QQuery 62 │  1291.20 ms │                        1272.41 ms │     no change │
│ QQuery 63 │   153.50 ms │                         137.88 ms │ +1.11x faster │
│ QQuery 64 │  1146.28 ms │                        1012.42 ms │ +1.13x faster │
│ QQuery 65 │   361.55 ms │                         183.41 ms │ +1.97x faster │
│ QQuery 66 │   379.94 ms │                         400.16 ms │  1.05x slower │
│ QQuery 67 │   534.98 ms │                         521.33 ms │     no change │
│ QQuery 68 │   382.40 ms │                         364.34 ms │     no change │
│ QQuery 69 │   173.65 ms │                         112.89 ms │ +1.54x faster │
│ QQuery 70 │   506.83 ms │                         384.98 ms │ +1.32x faster │
│ QQuery 71 │   191.13 ms │                         184.23 ms │     no change │
│ QQuery 72 │  2002.45 ms │                        1985.33 ms │     no change │
│ QQuery 73 │   157.75 ms │                         143.58 ms │ +1.10x faster │
│ QQuery 74 │   814.30 ms │                         321.63 ms │ +2.53x faster │
│ QQuery 75 │   412.83 ms │                         382.34 ms │ +1.08x faster │
│ QQuery 76 │   197.15 ms │                         190.73 ms │     no change │
│ QQuery 77 │   296.83 ms │                         265.03 ms │ +1.12x faster │
│ QQuery 78 │   931.69 ms │                         678.51 ms │ +1.37x faster │
│ QQuery 79 │   333.08 ms │                         301.90 ms │ +1.10x faster │
│ QQuery 80 │   512.52 ms │                         490.02 ms │     no change │
│ QQuery 81 │    54.31 ms │                          44.66 ms │ +1.22x faster │
│ QQuery 82 │   283.16 ms │                         260.95 ms │ +1.09x faster │
│ QQuery 83 │    77.92 ms │                          58.65 ms │ +1.33x faster │
│ QQuery 84 │    68.50 ms │                          56.74 ms │ +1.21x faster │
│ QQuery 85 │   228.81 ms │                         164.25 ms │ +1.39x faster │
│ QQuery 86 │    60.12 ms │                          53.81 ms │ +1.12x faster │
│ QQuery 87 │   155.62 ms │                         127.95 ms │ +1.22x faster │
│ QQuery 88 │   274.99 ms │                         263.76 ms │     no change │
│ QQuery 89 │   170.43 ms │                         148.49 ms │ +1.15x faster │
│ QQuery 90 │    46.17 ms │                          43.29 ms │ +1.07x faster │
│ QQuery 91 │    98.30 ms │                          70.32 ms │ +1.40x faster │
│ QQuery 92 │    85.00 ms │                          48.11 ms │ +1.77x faster │
│ QQuery 93 │   266.97 ms │                         247.07 ms │ +1.08x faster │
│ QQuery 94 │    92.65 ms │                          83.55 ms │ +1.11x faster │
│ QQuery 95 │   239.09 ms │                         159.94 ms │ +1.49x faster │
│ QQuery 96 │   116.06 ms │                         111.24 ms │     no change │
│ QQuery 97 │   186.06 ms │                         150.64 ms │ +1.24x faster │
│ QQuery 98 │   221.38 ms │                         165.85 ms │ +1.33x faster │
│ QQuery 99 │ 14219.97 ms │                       14260.64 ms │     no change │
└───────────┴─────────────┴───────────────────────────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary                                ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (HEAD)                                │ 50523.12ms │
│ Total Time (hash-join-buffering-on-probe-side)   │ 44958.12ms │
│ Average Time (HEAD)                              │   510.33ms │
│ Average Time (hash-join-buffering-on-probe-side) │   454.12ms │
│ Queries Faster                                   │         70 │
│ Queries Slower                                   │          9 │
│ Queries with No Change                           │         20 │
│ Queries with Failure                             │          0 │
└──────────────────────────────────────────────────┴────────────┘

@gabotechs gabotechs force-pushed the hash-join-buffering-on-probe-side branch from 3e4660b to cdc6ad1 Compare January 13, 2026 09:51
@gabotechs
Copy link
Contributor Author

It does seem that some queries get a significant slowdown... I think this needs further investigation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate datasource Changes to the datasource crate documentation Improvements or additions to documentation execution Related to the execution crate optimizer Optimizer rules physical-plan Changes to the physical-plan crate proto Related to proto crate sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants