-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Add JoinType preservation helpers and dynamic_filter_side
; enable dynamic filter pushdown in HashJoinExec
#17518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kosiew
wants to merge
45
commits into
apache:main
Choose a base branch
from
kosiew:df-join-metadata-16973
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
45 commits
Select commit
Hold shift + click to select a range
7e894be
Dynamic Filter and Join Handling Refactor Summary
kosiew 37c95a4
Dynamic Filter Pushdown Enhancements
kosiew 02f2d00
Refactor Dynamic Filter Handling in `HashJoinExec`
kosiew 24d2fea
Refactor dynamic filter side determination in `JoinType` for clarity …
kosiew 5116377
Enhance dynamic filter handling in HashJoinExec to support left and r…
kosiew 1f4a413
Remove unnecessary JoinType imports from dynamic filter pushdown tests
kosiew b3b7a30
Refactor dynamic filter tests to use a helper function for scan creation
kosiew 190270d
Refactor dynamic filter pushdown tests: remove unused CoalescePartiti…
kosiew 459abf4
Enhance dynamic filter pushdown logic in HashJoinExec and SharedBound…
kosiew 36d1a61
Refactor dynamic filter handling in HashJoinExec: enforce specified j…
kosiew 1c40186
Implement `preserves` method in `JoinType` to enhance join side prese…
kosiew 581cbcd
Add test for dynamic filter pushdown in partitioned HashJoinExec
kosiew 12ac262
Add tests for dynamic filter pushdown in right and left mark joins in…
kosiew 50e8e17
Revert "Add tests for dynamic filter pushdown in right and left mark …
kosiew 7dcdfdc
Add LEFT_PRESERVING and RIGHT_PRESERVING constants for join type pres…
kosiew f744fc3
Add preservation checks for LeftAnti and RightAnti join types
kosiew 5755a1a
Enhance documentation for dynamic_filter_side to clarify behavior wit…
kosiew 500a32b
Refactor imports in filter_pushdown tests for clarity and consistency
kosiew 9764539
Remove unused dynamic filter references from HashJoinExec and HashJoi…
kosiew d9384f8
Refactor dynamic filter expression handling in HashJoinExec for clari…
kosiew 29c6d63
Enhance dynamic filter pushdown in HashJoinExec by adding CoalescePar…
kosiew d2722c1
Restore main test_hashjoin_dynamic_filter_pushdown_partitioned
kosiew 2bf7dc4
Add test for nested hash join dynamic filter pushdown
kosiew 4d14bc1
Add comments for tests
kosiew b4f9701
Implement dynamic filter pushdown support for HashJoinExec
kosiew 411a459
Refactor filter pushdown tests to utilize helper functions for buildi…
kosiew 81145aa
Enhance dynamic filter selection logic for JoinType to accurately ref…
kosiew 434ad72
Refactor dynamic_filter_side logic to improve join type handling and …
kosiew 688ba1f
Fix dynamic filter creation to return an error for unspecified join s…
kosiew b814c22
Clarify documentation for dynamic filter pushdown eligibility in Join…
kosiew 5c499d8
Add probe-side bounds accumulation for dynamic filter pushdown in Has…
kosiew c353d5e
Add test for dynamic filter bounds accumulation in HashJoinExec
kosiew 5a51266
Add comment to explain waits_for_all_partitions_before_updating
kosiew ea03b6d
Implement dynamic filter handling for FULL joins and add test for ski…
kosiew b6dc5ea
Fix import path for MaxAccumulator and MinAccumulator in HashJoinStream
kosiew 0782214
Merge branch 'main' into df-join-metadata-16973
kosiew 61c33b5
Fix clippy error
kosiew fcca244
Clarify comment on dynamic filter pushdown eligibility in JoinType im…
kosiew f5f1fc3
Fix clippy error
kosiew 39390be
Refactor TestMemoryExec initialization to use schema.clone() for clarity
kosiew 6cf11d5
Fix clippy error
kosiew 5cfb95c
Merge branch 'main' into df-join-metadata-16973
kosiew d67f5db
Add dynamic filter side preference tests for join types
kosiew 238a57e
Refactor dynamic filter side logic in join types and update related t…
kosiew 990d940
Clarify dynamic filter behavior for semi/anti joins in pushdown logic
kosiew File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about Semi and Anti joins?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Semi and Anti joins purposely aren’t listed in LEFT_PRESERVING: they only return the subset of left rows that either match (Semi) or don’t match (Anti), so they don’t preserve all left input rows. The right-side analogue has the same behaviour, which is why only the outer/mark variants appear in those preservation tables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. Sorry about that. Given I think this logic is trying to figure out when to push down predicates, I was thinking it should also be considering SEMI and ANTI joins.
For example, I was thinking about these two joins
In both cases, I believe it is correct to push the predicate on
a.x
below the join on thea
side. However, also in both cases I don't think it is correct to push theb.y
predicate below the join:For the
LEFT JOIN
pushingb.y
can re-introduce rows froma
that should be filtered after the join. Same thing forANTI JOIN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for double-checking! In this file the LEFT_PRESERVING array is meant to capture only the join variants that are guaranteed to emit every left row at least once—i.e. the outer and mark joins. Semi/anti joins intentionally aren’t listed because they drop rows from their respective inputs, so they don’t satisfy that preservation property.
That distinction matters a few lines later when we decide which child can safely receive a dynamic filter. If we marked LeftSemi/LeftAnti as left-preserving, dynamic_filter_pushdown_side would classify them the same way as a left outer join and start attaching the dynamic filter to the right input—the exact misbehaviour you’re warning about for predicates that reference b.y. Because they remain non-preserving, those join types fall through to the (false, false) arm and we keep the dynamic filter on the left side only, which means predicates like a.x < 5 are still pushed while b.y < 10 is not.
I’ll add a short comment in the code/tests to spell this out so it’s harder to miss in the future.