Skip to content

Make filter pushdown API more precise for their purpose #18856

@2010YOUY01

Description

@2010YOUY01

Is your feature request related to a problem or challenge?

Original PR discussion: #18644 (comment)

Backgrounds

  • Backgrounds for dynamic filter: https://datafusion.apache.org/blog/2025/09/10/dynamic-filters/

  • Backgrounds for parquet filter pushdown: Parquet accepts a pushed-down filter for two purposes: 1) pruning partitions using statistics 2) Do row-level filtering for late materialization

  • Current APIs TLDR:

    • When pushdown predicates, we don't differentiate is it for only stats pruning, or both stats and row-level filters, and the parquet operator will try to apply both (unless turned off by an option) Reference:
      fn gather_filters_for_pushdown(
    • When child reply back the pushdown result in the optimizer pass, it said either Yes/No (I think the semantics now is Yes -> stat+row, No->maybe-stat)

Describe the solution you'd like

When one ExecutionPlan tries to pushdown a predicate, it should be able to indicate "this predicate is only for stats pruning", etc. The reason is some predicate might be too expensive and not worth evaluate at row level.

When the child propagate back pushdown result, it should also indicate how do they accept the filter (only for stat, or for both stat and row-level), so that the source node for the predicate can respond accordingly.

Describe alternatives you've considered

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions