Skip to content

Comments

fix(rust/sedona-spatial-join): prevent filter pushdown past KNN joins#611

Merged
Kontinuation merged 5 commits intoapache:mainfrom
Kontinuation:dont-pushdown-filters-for-knnjoin
Feb 18, 2026
Merged

fix(rust/sedona-spatial-join): prevent filter pushdown past KNN joins#611
Kontinuation merged 5 commits intoapache:mainfrom
Kontinuation:dont-pushdown-filters-for-knnjoin

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Feb 13, 2026

Summary

KNN joins have different semantics than regular spatial joins — pushing filters to the object (build) side changes which objects are the k nearest neighbors, producing incorrect results. DataFusion's builtin PushDownFilter optimizer rule doesn't know this and incorrectly pushes filters through KNN joins.

This PR adds a KnnJoinEarlyRewrite optimizer rule that converts KNN joins to SpatialJoinPlanNode extension nodes before DataFusion's PushDownFilter rule runs. Extension nodes naturally block filter pushdown via prevent_predicate_push_down_columns() returning all columns.

Changes

  • New KnnJoinEarlyRewrite optimizer rule — handles two patterns:
    1. Join(filter=ST_KNN(...)) — when the ON clause has only the spatial predicate
    2. Filter(ST_KNN(...), Join(on=[...])) — when the ON clause also has equi-join conditions (DataFusion's SQL planner separates these)
  • Positional rule insertionMergeSpatialProjectionIntoJoin and KnnJoinEarlyRewrite are inserted before PushDownFilter, while SpatialJoinLogicalRewrite (for non-KNN joins) remains after so non-KNN joins still benefit from filter pushdown
  • Updated SpatialJoinLogicalRewrite — skips KNN joins (already handled by the early rewrite)
  • Integration tests verifying that object-side filters are NOT pushed down for KNN joins, but ARE pushed down for non-KNN spatial joins

Rule ordering

... → MergeSpatialProjectionIntoJoin → KnnJoinEarlyRewrite → PushDownFilter → ... → SpatialJoinLogicalRewrite

Follow-ups

  1. We don't enforce ST_KNN to appear first in the chain of AND expressions. For instance, ST_KNN(L.geom, R.geom, 5) AND L.id = R.id has the same semantics as L.id = R.id AND ST_KNN(L.geom, R.geom, 5). This seems to be unnatural. Optimization rule does not seem to be a good place to enforce this, so we leave it to future patches that work on SQL parser and ASTs.
  2. We don't allow any filter pushdown for ST_KNN for now. Actually filters applied to the query side could be pushed down without any problem, we need to implement such rules ourselves in future patches.

TODO

prevent_predicate_push_down_columns method seems to do the trick. I'll experiment with it. Hopefully we can implement query side filter pushdown easily.
UPDATE: No. It is a terrible idea. There's no shortcut. We have to implement the optimization rule ourselves.

Closes #605

KNN joins have different semantics than regular spatial joins — pushing
filters to the object (build) side changes which objects are the k
nearest neighbors, producing incorrect results.

Add KnnJoinEarlyRewrite optimizer rule that converts KNN joins to
SpatialJoinPlanNode extension nodes before DataFusion's PushDownFilter
rule runs, since extension nodes naturally block filter pushdown via
prevent_predicate_push_down_columns().

Rule ordering: MergeSpatialProjectionIntoJoin → KnnJoinEarlyRewrite →
PushDownFilter → ... → SpatialJoinLogicalRewrite (for non-KNN joins).

Closes apache#605
@Kontinuation Kontinuation force-pushed the dont-pushdown-filters-for-knnjoin branch from cde2b7f to d6bc0d7 Compare February 13, 2026 13:54
Comment on lines 109 to 116
fn necessary_children_exprs(&self, _output_columns: &[usize]) -> Option<Vec<Vec<usize>>> {
// Request all columns from both children. This ensures the optimizer
// recurses into children while preserving all columns needed by the
// join filter and output schema.
let left_indices: Vec<usize> = (0..self.left.schema().fields().len()).collect();
let right_indices: Vec<usize> = (0..self.right.schema().fields().len()).collect();
Some(vec![left_indices, right_indices])
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for working around a bug in DataFusion. I'll submit a patch later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an early logical optimizer rewrite to ensure KNN joins are converted into SpatialJoinPlanNode extension nodes before DataFusion’s PushDownFilter runs, preventing incorrect filter pushdown onto the KNN build side.

Changes:

  • Insert MergeSpatialFilterIntoJoin + new KnnJoinEarlyRewrite before PushDownFilter, and run SpatialJoinLogicalRewrite after it for non-KNN joins.
  • Remove is_spatial_predicate and update predicate-name collection tests accordingly.
  • Add integration tests asserting object-side filter pushdown is blocked for KNN joins but allowed for non-KNN spatial joins; add necessary_children_exprs to the extension node.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.

File Description
rust/sedona-spatial-join/src/planner/optimizer.rs Adds KnnJoinEarlyRewrite, reorders optimizer rules around PushDownFilter, refactors join→extension rewrite helper.
rust/sedona-spatial-join/src/planner/logical_plan_node.rs Adds necessary_children_exprs implementation for SpatialJoinPlanNode.
rust/sedona-spatial-join/src/planner/spatial_expr_utils.rs Removes is_spatial_predicate and updates tests to validate collect_spatial_predicate_names.
rust/sedona-spatial-join/tests/spatial_join_integration.rs Expands KNN filter correctness coverage and adds physical-plan assertions for filter pushdown behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kontinuation Kontinuation force-pushed the dont-pushdown-filters-for-knnjoin branch from e8c0c85 to b4381c3 Compare February 17, 2026 07:43
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@Kontinuation Kontinuation merged commit 55b822f into apache:main Feb 18, 2026
17 checks passed
@paleolimbot paleolimbot added this to the 0.3.0 milestone Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve clarify of filter after KNN join

2 participants