chore(rust/sedona-spatial-join): Support right mark joins and enabled tests for all kinds of spatial joins#514
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for RightMark joins to the Rust spatial join implementation and enables previously commented-out tests for all spatial join types. The changes align the implementation with DataFusion 50.2.0.
Changes:
- Added RightMark join handling in join utilities to compute mark indices from the build side
- Updated function signatures to accept join_type parameter for proper mark column generation
- Enabled comprehensive spatial join tests for all join types (Semi, Anti, and Mark variants)
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| rust/sedona-spatial-join/src/utils/join_utils.rs | Added RightMark join logic, refactored bitmap building, updated to DataFusion 50.2.0, changed return type of asymmetric_join_output_partitioning |
| rust/sedona-spatial-join/src/stream.rs | Added join_type parameter to build_batch_from_indices calls |
| rust/sedona-spatial-join/src/exec.rs | Updated asymmetric_join_output_partitioning calls with error handling, enabled all join type tests, added mark join test implementation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you!
This would also benefit from a Python integration/sanity for each join type while you're here.
| pub(crate) fn get_mark_indices<T: ArrowPrimitiveType, R: ArrowPrimitiveType>( | ||
| range: &Range<usize>, | ||
| input_indices: &PrimitiveArray<T>, | ||
| ) -> PrimitiveArray<R> | ||
| where | ||
| NativeAdapter<T>: From<<T as ArrowPrimitiveType>::Native>, | ||
| { |
There was a problem hiding this comment.
It would help (me + future llms) to have a comment on what exactly this is doing (outputting an array consisting of all zeroes where some of the values are null?).
Does it need to be generic on Range and the primitive type? (might be much easier to read if one can assume these are just int64s).
There was a problem hiding this comment.
I have commented this function. I'd like to keep the generic interface as-is, in order to be consistent with the original DataFusion code.
|
Python integration tests were added for SEMI/ANIT/MARK joins in 1051f5a. |
… tests for all kinds of spatial joins (apache#514) This patch updates join_utils.rs to support RightMark join. Spatial join tests for all join types were enabled.
This patch updates join_utils.rs to support RightMark join. Spatial join tests for all join types were enabled.