fix(rust/sedona-spatial-join): Fix several bugs related to KNN join#508
fix(rust/sedona-spatial-join): Fix several bugs related to KNN join#508Kontinuation merged 2 commits intoapache:mainfrom
Conversation
54489c4 to
6e474da
Compare
There was a problem hiding this comment.
Pull request overview
This PR fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled.
Changes:
- Fixed build/probe side determination for KNN joins to use
knn.probe_side.negate()instead of hardcodedJoinSide::Right - Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach
- Removed incorrect column index swapping logic for KNN joins
- Added comprehensive test coverage for KNN join correctness
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
rust/sedona-spatial-join/src/stream.rs |
Updated build side determination to use probe_side.negate() instead of hardcoded Right |
rust/sedona-spatial-join/src/index/spatial_index.rs |
Fixed tie-breaker envelope calculation to use bounding box expansion and added test for non-point queries |
rust/sedona-spatial-join/src/exec.rs |
Fixed output partitioning logic and removed incorrect column index swapping, added KNN join correctness tests |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rust/sedona-spatial-join/src/exec.rs
Outdated
| let geoms_binary = geoms_col | ||
| .as_any() | ||
| .downcast_ref::<arrow_array::BinaryArray>(); | ||
| let geoms_binary_view = geoms_col | ||
| .as_any() | ||
| .downcast_ref::<arrow_array::BinaryViewArray>(); | ||
|
|
||
| if geoms_binary.is_none() && geoms_binary_view.is_none() { | ||
| panic!( | ||
| "Column 'geometry' should be Binary or BinaryView. Schema: {:?}", | ||
| batch.schema() | ||
| ); | ||
| } |
There was a problem hiding this comment.
I wonder if you could simplify this part with an executor:
let executor = GeoTypesExecutor::new(...);
let idx = geom_idx.iter();
executor.execute_wkb_void(|maybe_geom| { result.push((idx.next().unwrap().unwrap(), maybe_geom.unwrap())); })There was a problem hiding this comment.
Refactored this code to use an executor.
| // For regular joins, build_side is always Left. | ||
| let build_side = match &self.spatial_predicate { | ||
| SpatialPredicate::KNearestNeighbors(_) => JoinSide::Right, | ||
| SpatialPredicate::KNearestNeighbors(knn) => knn.probe_side.negate(), |
There was a problem hiding this comment.
When debugging my partition out of range issue with sd_random_geometry(), copilot sent me to this line and asked me to debug print the output of build_side (I should have listened!)
…pache#508) Fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled. **Changes:** - Fixed build/probe side determination for KNN joins to use `knn.probe_side.negate()` instead of hardcoded `JoinSide::Right` - Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach - Removed incorrect column index swapping logic for KNN joins - Added comprehensive test coverage for KNN join correctness
…pache#508) Fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled. **Changes:** - Fixed build/probe side determination for KNN joins to use `knn.probe_side.negate()` instead of hardcoded `JoinSide::Right` - Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach - Removed incorrect column index swapping logic for KNN joins - Added comprehensive test coverage for KNN join correctness
…pache#508) Fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled. **Changes:** - Fixed build/probe side determination for KNN joins to use `knn.probe_side.negate()` instead of hardcoded `JoinSide::Right` - Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach - Removed incorrect column index swapping logic for KNN joins - Added comprehensive test coverage for KNN join correctness
…pache#508) Fixes three critical bugs in the KNN join implementation: handling different partition counts between joined relations, correcting column projection when left/right sides have different column counts, and fixing incorrect K-nearest neighbor search results for non-point query objects with tie-breakers enabled. **Changes:** - Fixed build/probe side determination for KNN joins to use `knn.probe_side.negate()` instead of hardcoded `JoinSide::Right` - Corrected tie-breaker envelope calculation to use bounding box expansion instead of centroid-based approach - Removed incorrect column index swapping logic for KNN joins - Added comprehensive test coverage for KNN join correctness
This patch fixes 3 bugs related to KNN join:
include_tie_breakersis enabled.Rust tests were also added to make testing KNN joins easier.