feat(rust/sedona-spatial-join) Spill EvaluatedBatch and add external evaluated batch stream#522
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements spilling functionality for EvaluatedBatch streams in the Rust sedona-spatial-join crate. The implementation wraps EvaluatedBatch data into a regular RecordBatch format for disk serialization, containing the original data in a struct array along with geometry and distance fields.
Changes:
- Added generic Arrow IPC spill reader/writer infrastructure for
RecordBatch - Implemented
EvaluatedBatchspilling with schema transformation and serialization - Created
ExternalEvaluatedBatchStreamto read spilled batches back from disk - Refactored build-side collection to use evaluated batch streams
Reviewed changes
Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-spatial-join/src/utils/spill.rs | Generic RecordBatch spill reader/writer with batch splitting and compaction |
| rust/sedona-spatial-join/src/utils/arrow_utils.rs | Added compact_batch/compact_array functions to reorganize view array buffers |
| rust/sedona-spatial-join/src/evaluated_batch/spill.rs | EvaluatedBatch-specific spill reader/writer with schema transformation |
| rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/external.rs | External stream implementation for reading spilled evaluated batches |
| rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/evaluate.rs | Stream adapters for evaluating geometry expressions on record batches |
| rust/sedona-spatial-join/src/stream.rs | Updated to use evaluated batch streams for probe side |
| rust/sedona-spatial-join/src/index/build_side_collector.rs | Refactored to use evaluated batch streams and consolidate sequential/concurrent logic |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/external.rs
Outdated
Show resolved
Hide resolved
rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/evaluate.rs
Show resolved
Hide resolved
b0cece0 to
0a6da89
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Only (very) minor optional comments from me...thank you!
| .as_any() | ||
| .downcast_ref::<StructArray>() | ||
| .ok_or_else(|| { | ||
| DataFusionError::Internal("Expected data column to be a StructArray".to_string()) |
There was a problem hiding this comment.
We don't have a great way of making this a sedona internal err (maybe in some future we could make a sedona_internal_datafusion_err macro).
Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
Add spill reader and writer for evaluated batch stream.
EvaluatedBatchis wrapped as a regularRecordBatchwith the following fields:data: The original batch inEvaluatedBatch::batchgeom: The value ofEvaluatedBatch::geom_array::geometry_arraydist: The value ofEvaluatedBatch::geom_array::distanceExternalEvaluatedBatchStreamimplementsEvaluatedBatchStreamand producesEvaluateBatchfrom spill files.