Skip to content

feat(rust/sedona-spatial-join) Spill EvaluatedBatch and add external evaluated batch stream#522

Merged
Kontinuation merged 8 commits intoapache:mainfrom
Kontinuation:spill
Jan 19, 2026
Merged

feat(rust/sedona-spatial-join) Spill EvaluatedBatch and add external evaluated batch stream#522
Kontinuation merged 8 commits intoapache:mainfrom
Kontinuation:spill

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Jan 16, 2026

Add spill reader and writer for evaluated batch stream. EvaluatedBatch is wrapped as a regular RecordBatch with the following fields:

  • data: The original batch in EvaluatedBatch::batch
  • geom: The value of EvaluatedBatch::geom_array::geometry_array
  • dist: The value of EvaluatedBatch::geom_array::distance

ExternalEvaluatedBatchStream implements EvaluatedBatchStream and produces EvaluateBatch from spill files.

@Kontinuation Kontinuation requested a review from Copilot January 16, 2026 07:05
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements spilling functionality for EvaluatedBatch streams in the Rust sedona-spatial-join crate. The implementation wraps EvaluatedBatch data into a regular RecordBatch format for disk serialization, containing the original data in a struct array along with geometry and distance fields.

Changes:

  • Added generic Arrow IPC spill reader/writer infrastructure for RecordBatch
  • Implemented EvaluatedBatch spilling with schema transformation and serialization
  • Created ExternalEvaluatedBatchStream to read spilled batches back from disk
  • Refactored build-side collection to use evaluated batch streams

Reviewed changes

Copilot reviewed 18 out of 19 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
rust/sedona-spatial-join/src/utils/spill.rs Generic RecordBatch spill reader/writer with batch splitting and compaction
rust/sedona-spatial-join/src/utils/arrow_utils.rs Added compact_batch/compact_array functions to reorganize view array buffers
rust/sedona-spatial-join/src/evaluated_batch/spill.rs EvaluatedBatch-specific spill reader/writer with schema transformation
rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/external.rs External stream implementation for reading spilled evaluated batches
rust/sedona-spatial-join/src/evaluated_batch/evaluated_batch_stream/evaluate.rs Stream adapters for evaluating geometry expressions on record batches
rust/sedona-spatial-join/src/stream.rs Updated to use evaluated batch streams for probe side
rust/sedona-spatial-join/src/index/build_side_collector.rs Refactored to use evaluated batch streams and consolidate sequential/concurrent logic

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kontinuation Kontinuation force-pushed the spill branch 2 times, most recently from b0cece0 to 0a6da89 Compare January 16, 2026 07:39
@Kontinuation Kontinuation marked this pull request as ready for review January 17, 2026 02:45
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only (very) minor optional comments from me...thank you!

.as_any()
.downcast_ref::<StructArray>()
.ok_or_else(|| {
DataFusionError::Internal("Expected data column to be a StructArray".to_string())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a great way of making this a sedona internal err (maybe in some future we could make a sedona_internal_datafusion_err macro).

Co-authored-by: Dewey Dunnington <dewey@dunnington.ca>
@Kontinuation Kontinuation merged commit 9b6e2e7 into apache:main Jan 19, 2026
15 checks passed
@paleolimbot paleolimbot added this to the 0.3.0 milestone Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants