Skip to content

Conversation

@forsaken628
Copy link
Collaborator

@forsaken628 forsaken628 commented Nov 13, 2025

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Allow new hash join to switch to nested loop join implementation when the size on the build side is less than the new setting nested_loop_join_threshold. Only support inner join for now.

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Nov 13, 2025
@forsaken628 forsaken628 changed the title feat(query): nestedloopjoin feat(query): add nested loop join for new experimental hash join Nov 20, 2025
@forsaken628 forsaken628 marked this pull request as ready for review November 20, 2025 08:35
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

return;
}

if matches!(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we can move the branch logic into the nested loop join implementation? I see it only prevents the real finalize build logic. Please let me know if I'm missing any critical information.

@zhang2014 zhang2014 added the ci-benchmark Benchmark: run all test label Nov 23, 2025
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-18961-658c957-1763886162

note: this image tag is only available for internal use.

.map(|data_type| data_type.remove_nullable().is_bitmap())
.unwrap_or_default() =>
{
// no function matches signature `eq(Bitmap NULL, Bitmap NULL)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it reasonable that eq (bitmap NULL, bitmap NULL) is only available in join? @sundy-li

Copy link
Member

@sundy-li sundy-li Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add more signatures for bitmap types.

  1. eq(bitmap, bitmap) alias to bitmap_count( bitmap_xor(a, b) ) = 0,
  2. bitmap_and_count , bitmap_or_count , bitmap_xor_count

@forsaken628 forsaken628 added ci-benchmark-cloud Benchmark: run only cloud tests for tpch/hits and removed ci-benchmark Benchmark: run all test labels Nov 24, 2025
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-18961-34d5d36-1763980755

note: this image tag is only available for internal use.

@github-actions
Copy link
Contributor

@forsaken628 forsaken628 added ci-benchmark-load Benchmark: run data load test and removed ci-benchmark-cloud Benchmark: run only cloud tests for tpch/hits labels Nov 26, 2025
@github-actions
Copy link
Contributor

Docker Image for PR

  • tag: pr-18961-bbf72d7-1764129337

note: this image tag is only available for internal use.

@github-actions
Copy link
Contributor

@forsaken628
Copy link
Collaborator Author

forsaken628 commented Nov 27, 2025

benchmark

version: pr-18961-bbf72d7-1764129337
dataset: tpch_sf_100
table supplier rows: 1000000

SETTINGS (
enable_experimental_new_join=1,
nested_loop_join_threshold=0
)
select    s.*,n.n_name
from      supplier s
join      nation n on s.s_nationkey = n.n_nationkey IGNORE_RESULT;

duration: 687ms

SETTINGS (
enable_experimental_new_join=1,
nested_loop_join_threshold=100
)
select    s.*,n.n_name
from      supplier s
join      nation n on s.s_nationkey = n.n_nationkey IGNORE_RESULT;

duration: 834ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-benchmark-load Benchmark: run data load test pr-feature this PR introduces a new feature to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants