Skip to content

fix(rust/sedona-spatial-join): Fix several bugs related to KNN join#508

Merged
Kontinuation merged 2 commits intoapache:mainfrom
Kontinuation:fix-knn-join-bugs
Jan 14, 2026
Merged

fix(rust/sedona-spatial-join): Fix several bugs related to KNN join#508
Kontinuation merged 2 commits intoapache:mainfrom
Kontinuation:fix-knn-join-bugs

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Jan 12, 2026

This patch fixes 3 bugs related to KNN join:

  1. Panic when the number of partitions of joined relations are different.
  2. Incorrect join result when the number of projected columns from the left and right side is different.
  3. Incorrect K-nearest neighbor search result when query object is not a point and include_tie_breakers is enabled.

Rust tests were also added to make testing KNN joins easier.

@Kontinuation Kontinuation changed the title fix: Fix several bugs related to KNN join fix(rust/sedona-spatial-join): Fix several bugs related to KNN join Jan 12, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled.

Changes:

  • Fixed build/probe side determination for KNN joins to use knn.probe_side.negate() instead of hardcoded JoinSide::Right
  • Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach
  • Removed incorrect column index swapping logic for KNN joins
  • Added comprehensive test coverage for KNN join correctness

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
rust/sedona-spatial-join/src/stream.rs Updated build side determination to use probe_side.negate() instead of hardcoded Right
rust/sedona-spatial-join/src/index/spatial_index.rs Fixed tie-breaker envelope calculation to use bounding box expansion and added test for non-point queries
rust/sedona-spatial-join/src/exec.rs Fixed output partitioning logic and removed incorrect column index swapping, added KNN join correctness tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kontinuation Kontinuation marked this pull request as ready for review January 13, 2026 01:11
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines 1222 to 1234
let geoms_binary = geoms_col
.as_any()
.downcast_ref::<arrow_array::BinaryArray>();
let geoms_binary_view = geoms_col
.as_any()
.downcast_ref::<arrow_array::BinaryViewArray>();

if geoms_binary.is_none() && geoms_binary_view.is_none() {
panic!(
"Column 'geometry' should be Binary or BinaryView. Schema: {:?}",
batch.schema()
);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if you could simplify this part with an executor:

let executor = GeoTypesExecutor::new(...);
let idx = geom_idx.iter();
executor.execute_wkb_void(|maybe_geom| { result.push((idx.next().unwrap().unwrap(), maybe_geom.unwrap())); })

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored this code to use an executor.

// For regular joins, build_side is always Left.
let build_side = match &self.spatial_predicate {
SpatialPredicate::KNearestNeighbors(_) => JoinSide::Right,
SpatialPredicate::KNearestNeighbors(knn) => knn.probe_side.negate(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When debugging my partition out of range issue with sd_random_geometry(), copilot sent me to this line and asked me to debug print the output of build_side (I should have listened!)

Kontinuation added a commit to Kontinuation/sedona-db that referenced this pull request Jan 13, 2026
Copy link
Member

@zhangfengcdt zhangfengcdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Kontinuation Kontinuation merged commit a2d3244 into apache:main Jan 14, 2026
15 checks passed
pwrliang pushed a commit to pwrliang/sedona-db that referenced this pull request Jan 15, 2026
…pache#508)

Fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled.

**Changes:**
- Fixed build/probe side determination for KNN joins to use `knn.probe_side.negate()` instead of hardcoded `JoinSide::Right`
- Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach
- Removed incorrect column index swapping logic for KNN joins
- Added comprehensive test coverage for KNN join correctness
pwrliang pushed a commit to pwrliang/sedona-db that referenced this pull request Jan 15, 2026
…pache#508)

Fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled.

**Changes:**
- Fixed build/probe side determination for KNN joins to use `knn.probe_side.negate()` instead of hardcoded `JoinSide::Right`
- Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach
- Removed incorrect column index swapping logic for KNN joins
- Added comprehensive test coverage for KNN join correctness
pwrliang pushed a commit to pwrliang/sedona-db that referenced this pull request Jan 15, 2026
…pache#508)

Fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled.

**Changes:**
- Fixed build/probe side determination for KNN joins to use `knn.probe_side.negate()` instead of hardcoded `JoinSide::Right`
- Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach
- Removed incorrect column index swapping logic for KNN joins
- Added comprehensive test coverage for KNN join correctness
pwrliang pushed a commit to zhangfengcdt/sedona-db that referenced this pull request Jan 15, 2026
…pache#508)

Fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled.

**Changes:**
- Fixed build/probe side determination for KNN joins to use `knn.probe_side.negate()` instead of hardcoded `JoinSide::Right`
- Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach
- Removed incorrect column index swapping logic for KNN joins
- Added comprehensive test coverage for KNN join correctness
@paleolimbot paleolimbot added this to the 0.3.0 milestone Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants