Skip to content

chore(rust/sedona-spatial-join): Make partitioner not sync, create dedicated partitioner for each task#592

Merged
Kontinuation merged 4 commits intoapache:mainfrom
Kontinuation:pr-make-partitioner-non-sync
Feb 11, 2026
Merged

chore(rust/sedona-spatial-join): Make partitioner not sync, create dedicated partitioner for each task#592
Kontinuation merged 4 commits intoapache:mainfrom
Kontinuation:pr-make-partitioner-non-sync

Conversation

@Kontinuation
Copy link
Member

This is a follow up of comment #573 (comment). The round robin partitioner uses an internal atomic counter and produces nondeterministic partitioning results due to task scheduling orders. This patch did a large refactoring to make each task own its own partitioner instance, thus eliminates randomness caused by racing. We have also removed the Sync constraint from the partitioner trait to enforce this design.

@Kontinuation Kontinuation marked this pull request as ready for review February 10, 2026 14:40
@Kontinuation Kontinuation requested a review from Copilot February 10, 2026 15:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors spatial partitioning to avoid nondeterministic behavior from shared, stateful partitioners by ensuring each task owns its own partitioner instance, and removes the Sync constraint from SpatialPartitioner.

Changes:

  • Replaced shared Arc<dyn SpatialPartitioner> usage with per-task Box<dyn SpatialPartitioner> cloning via a new box_clone() trait method.
  • Updated probe-side stream options to hold a clonable partitioner prototype and create dedicated partitioners per task.
  • Refactored partitioner implementations (e.g., RoundRobin, RTree) to support cloning under the new trait contract.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
rust/sedona-spatial-join/src/probe/partitioned_stream_provider.rs Stores a clonable partitioner prototype and clones per task/stream creation.
rust/sedona-spatial-join/src/probe/first_pass_stream.rs Moves from shared Arc partitioner to owned Box partitioner per stream.
rust/sedona-spatial-join/src/prepare.rs Updates join preparation to pass owned partitioners and clone per spawned task.
rust/sedona-spatial-join/src/partitioning/stream_repartitioner.rs Updates repartitioner to own a Box partitioner and accept cloned instances.
rust/sedona-spatial-join/src/partitioning/rtree.rs Makes RTreePartitioner cheaply cloneable via Arc-backed inner implementation.
rust/sedona-spatial-join/src/partitioning/round_robin.rs Replaces atomic counter with task-local Cell and adds box_clone().
rust/sedona-spatial-join/src/partitioning/kdb.rs Adds Clone + box_clone() to satisfy new trait API.
rust/sedona-spatial-join/src/partitioning/flat.rs Adds Clone + box_clone() to satisfy new trait API.
rust/sedona-spatial-join/src/partitioning/broadcast.rs Adds Clone + box_clone() to satisfy new trait API.
rust/sedona-spatial-join/src/partitioning.rs Drops Sync bound and adds box_clone() to enable per-task ownership.
rust/sedona-spatial-join/bench/partitioning/stream_repartitioner.rs Updates benchmarks to use Box partitioners and clone per iteration.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Kontinuation and others added 3 commits February 10, 2026 23:50
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Kontinuation Kontinuation merged commit 76ae8fc into apache:main Feb 11, 2026
16 checks passed
@paleolimbot paleolimbot added this to the 0.3.0 milestone Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants