Compute Dynamic Filters only when a consumer supports them #19546

LiaCastaneda · 2025-12-29T13:16:51Z

Which issue does this PR close?

Rationale for this change

Currently, DataFusion computes bounds for all queries that contain a HashJoinExec node whenever the option enable_dynamic_filter_pushdown is set to true (default). It might make sense to compute these bounds only when we explicitly know there is a consumer that will use them.

What changes are included in this PR?

As suggested in #17527 (comment), this PR adds an is_used() method to DynamicFilterPhysicalExpr that checks if any consumers are holding a reference to the filter using Arc::strong_count().

During filter pushdown, consumers that accept the filter and use it later in execution have to retain a reference to Arc. For example, scan nodes like ParquetSource.

Are these changes tested?

I added a unit test in dynamic_filters.rs (test_is_used) that verifies the Arc reference counting behavior.
Existing integration tests in datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs validate the end-to-end behavior. These tests verify that dynamic filters are computed and filled when consumers are present.

Are there any user-facing changes?

new is_used() function

LiaCastaneda · 2025-12-29T14:14:16Z

This is another (desired) alternative to #18938
cc @adriangb this PR implements the is_used approach.

adriangb

I like this! The advantages over #19387 are:

No API change / breaking changes
Less code churn for us and users
Complexity is contained within dynamic filters and even there within producers
Should work for distributed systems (whatever is broadcasting updates to filters will also need to hold onto a reference to the dynamic filter)

This also means that if we run into issues with this approach it's easy to back out of 😄

Is there any way we can add a test showing that if there are no downstream consumers we don't compute the filters?

LiaCastaneda · 2025-12-29T15:22:24Z

Is there any way we can add a test showing that if there are no downstream consumers we don't compute the filters?

I added test_hashjoin_dynamic_filter_pushdown_not_used that creates a TestScanNode with support == false in the probe node and with enable_dynamic_filter_pushdown enabled so this variable

let enable_dynamic_filter_pushdown = context
            .session_config()
            .options()
            .optimizer
            .enable_join_dynamic_filter_pushdown
            && self
                .dynamic_filter
                .as_ref()
                .map(|df| df.filter.is_used())
                .unwrap_or(false);

should still return false even if the config option is enabled

datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs

adriangb

Maybe append to the same test the positive case (probe side does support pushdown, is_used is true) just to prove the point? Could even be in a loop to avoid code duplication.

LiaCastaneda · 2025-12-29T17:00:16Z

datafusion/physical-plan/src/joins/hash_join/exec.rs

+        let _consumer = Arc::clone(&dynamic_filter)
+            .with_new_children(vec![])
+            .unwrap();


I had to add a consumer in these tests, otherwise is_used will return false, no filters will be computed and wait_complete will never return. I will add an is_used check inside wait_complete as well, I can't imagine this ever happenning (unless we call wait_complete on a probe node that does not accept dynamic filters which would be wrong usage) but its worth adding just in case.

adriangb · 2025-12-31T01:06:19Z

@LiaCastaneda thank you! Maybe a nice follow up would be to split up the CASE structure so that each dynamic piece is its own unit and can be computed independently, and maybe remove the barrier? I can write an issue to describe.

LiaCastaneda · 2025-12-31T08:59:31Z

By computed independently and removing the barrier, do you mean computing and emitting each filter for each partition progressively?

adriangb · 2025-12-31T13:34:06Z

I opened #19580 😄
There is also #16973 which may interest you.

LiaCastaneda · 2025-12-31T15:08:16Z

nice! I will take look

LiaCastaneda added 2 commits December 29, 2025 12:14

Compute Dynamic Filters only when a consumer supports them

e116a9e

use strong_count of inner struct instead

e6130c8

github-actions bot added physical-expr Changes to the physical-expr crates physical-plan Changes to the physical-plan crate labels Dec 29, 2025

Adjust unit test

1097b7d

LiaCastaneda marked this pull request as ready for review December 29, 2025 14:14

adriangb approved these changes Dec 29, 2025

View reviewed changes

add integration test

061d2b1

github-actions bot added the core Core DataFusion crate label Dec 29, 2025

Fix cargo doc

c33c073

adriangb reviewed Dec 29, 2025

View reviewed changes

datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs Outdated Show resolved Hide resolved

Change testing approach

bb1db12

adriangb approved these changes Dec 29, 2025

View reviewed changes

LiaCastaneda added 3 commits December 29, 2025 17:13

test when is_used should be true and false

392b7c9

clippy

0ef0893

Adjust test in exec.rs

95581d3

LiaCastaneda commented Dec 29, 2025

View reviewed changes

LiaCastaneda added 2 commits December 29, 2025 18:16

Add is_used check inside wait_complete

a035dd1

test name

c1712bf

adriangb added this pull request to the merge queue Dec 31, 2025

Merged via the queue into apache:main with commit f1e5c94 Dec 31, 2025
32 checks passed

LiaCastaneda mentioned this pull request Dec 31, 2025

Compute Dynamic Filters only when a consumer supports them #18938

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compute Dynamic Filters only when a consumer supports them #19546

Compute Dynamic Filters only when a consumer supports them #19546

LiaCastaneda commented Dec 29, 2025

Uh oh!

LiaCastaneda commented Dec 29, 2025

Uh oh!

adriangb left a comment •

edited

Loading

Uh oh!

LiaCastaneda commented Dec 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

adriangb left a comment

Uh oh!

LiaCastaneda Dec 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

adriangb commented Dec 31, 2025

Uh oh!

LiaCastaneda commented Dec 31, 2025

Uh oh!

adriangb commented Dec 31, 2025

Uh oh!

LiaCastaneda commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Compute Dynamic Filters only when a consumer supports them #19546

Compute Dynamic Filters only when a consumer supports them #19546

Conversation

LiaCastaneda commented Dec 29, 2025

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

Uh oh!

LiaCastaneda commented Dec 29, 2025

Uh oh!

adriangb left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

LiaCastaneda commented Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

adriangb left a comment

Choose a reason for hiding this comment

Uh oh!

LiaCastaneda Dec 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adriangb commented Dec 31, 2025

Uh oh!

LiaCastaneda commented Dec 31, 2025

Uh oh!

adriangb commented Dec 31, 2025

Uh oh!

LiaCastaneda commented Dec 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

adriangb left a comment •

edited

Loading

LiaCastaneda commented Dec 29, 2025 •

edited

Loading

LiaCastaneda Dec 29, 2025 •

edited

Loading