Skip to content

feat(rust/sedona-spatial-join): Support partitioned KNN join to handle larger than memory object side#573

Open
Kontinuation wants to merge 3 commits intoapache:mainfrom
Kontinuation:pr-partitioned-knnj
Open

feat(rust/sedona-spatial-join): Support partitioned KNN join to handle larger than memory object side#573
Kontinuation wants to merge 3 commits intoapache:mainfrom
Kontinuation:pr-partitioned-knnj

Conversation

@Kontinuation
Copy link
Member

This patch breaks the object size of KNN(query, object, K) into smaller partitions, and spills results of KNN queries to a set of spill files. We merge the KNN query result with the nearest-so-far results we obtained when processing the previous partition. The global KNN result will be produced after all partitions were processed.

The tricky part for spilling and merging nearest-so-far KNN query results is implemented in knn_result_merger.rs. It needs to handle lots of edge cases correctly and we were very cautious when implementing this. Fuzz tests were added to verify the correctness of the KNN result merger.

}

fn partition_no_multi(&self, _bbox: &BoundingBox) -> Result<SpatialPartition> {
let idx = self.counter.fetch_add(1, Ordering::Relaxed);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This round robin partitioner is nondeterministic due to the order of concurrent tasks being scheduled. We will address this by making all partitioners non-sync, so that each async task will have its own partitioner with its own mutable internal state. This will be a relatively large refactoring so I'll leave it to the next PR.

@Kontinuation Kontinuation marked this pull request as ready for review February 4, 2026 12:25
@Kontinuation Kontinuation requested a review from Copilot February 5, 2026 00:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for partitioned KNN (K-Nearest Neighbors) spatial joins to handle object sides larger than available memory by splitting the object data into smaller partitions and maintaining nearest-so-far results across partitions.

Changes:

  • Implements a KNN results merger that spills intermediate results to disk and merges them across partitions
  • Adds round-robin and broadcast partitioners for distributing KNN join data
  • Updates query methods to track and filter distances alongside join indices
  • Adds comprehensive fuzz testing for the merger correctness

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rust/sedona-spatial-join/src/probe/knn_results_merger.rs New implementation of KNN results merger with spilling and merging logic
rust/sedona-spatial-join/src/stream.rs Integration of KNN results merger into the spatial join stream processing
rust/sedona-spatial-join/src/utils/join_utils.rs Extended filter function to handle distance filtering
rust/sedona-spatial-join/src/partitioning/round_robin.rs New round-robin partitioner for even data distribution
rust/sedona-spatial-join/src/partitioning/broadcast.rs New broadcast partitioner for probe side distribution
rust/sedona-spatial-join/src/prepare.rs Updated to use new partitioners for KNN joins
rust/sedona-spatial-join/src/index/spatial_index.rs Added distance tracking to KNN query methods
rust/sedona-spatial-join/src/exec.rs Added comprehensive tests for partitioned KNN joins
rust/sedona-spatial-join/src/probe.rs Module registration for KNN results merger
rust/sedona-spatial-join/src/partitioning.rs Module registration for new partitioners

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant