Skip to content

chore(rust/sedona-spatial-join): use RTree partitioner for probe side when partition count > 48#598

Merged
Kontinuation merged 4 commits intoapache:mainfrom
Kontinuation:support-rtree-partitioner
Feb 13, 2026
Merged

chore(rust/sedona-spatial-join): use RTree partitioner for probe side when partition count > 48#598
Kontinuation merged 4 commits intoapache:mainfrom
Kontinuation:support-rtree-partitioner

Conversation

@Kontinuation
Copy link
Member

Summary

  • Switch the probe-side partitioner from always using FlatPartitioner (linear scan) to using RTreePartitioner when the number of partitions exceeds 48.
  • Add a flat_vs_rtree benchmark for head-to-head comparison across partition counts (16–400).

Benchmark Results

Partitions Flat (µs) RTree (µs) Ratio Winner
16 70.5 101.8 0.69x Flat
25 96.6 111.5 0.87x Flat
36 123.9 118.1 1.05x RTree (marginal)
64 175.2 121.2 1.45x RTree
100 228.6 137.9 1.66x RTree
256 486.0 169.1 2.87x RTree
400 735.3 223.7 3.29x RTree

The crossover is at ~36 partitions. A threshold of 48 provides a comfortable margin.

…on count > 48

The probe side partitioner was always using FlatPartitioner (linear scan,
O(n) per query). Benchmarks show that RTreePartitioner becomes faster
at ~36 partitions, with the advantage growing significantly beyond that
(e.g. 2.87x at 256 partitions, 3.29x at 400).

Use a threshold of 48 partitions to switch from Flat to RTree, providing
a comfortable margin above the measured crossover point.

Also adds a flat_vs_rtree benchmark for head-to-head comparison across
partition counts (16-400).
@Kontinuation Kontinuation requested a review from Copilot February 11, 2026 16:19
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the spatial join probe-side partitioning by switching from a linear-scan (FlatPartitioner) to an RTree-based partitioner when the partition count exceeds 48, based on benchmark evidence showing RTree outperforms linear scan above ~36 partitions.

Changes:

  • Introduce a partition count threshold (48) to select between flat and RTree partitioners for probe-side partitioning
  • Add benchmark comparing FlatPartitioner vs RTreePartitioner across various partition counts (16–400)
  • Remove unused build_index.rs file

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
rust/sedona-spatial-join/src/prepare.rs Adds threshold constant and conditional logic to switch between FlatPartitioner and RTreePartitioner based on partition count
rust/sedona-spatial-join/src/build_index.rs Removes entire file (appears to be dead code)
rust/sedona-spatial-join/bench/partitioning/flat_vs_rtree.rs Adds new benchmark comparing FlatPartitioner vs RTreePartitioner performance across different partition counts
rust/sedona-spatial-join/Cargo.toml Registers the new flat_vs_rtree benchmark

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kontinuation Kontinuation marked this pull request as ready for review February 11, 2026 16:43
@Kontinuation Kontinuation changed the title feat(spatial-join): use RTree partitioner for probe side when partition count > 48 chore(rust/sedona-spatial-join): use RTree partitioner for probe side when partition count > 48 Feb 11, 2026
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines 19 to 22
//!
//! This module is included (`mod common;`) by multiple independent bench binaries,
//! not all of which use every item. Suppress dead-code warnings accordingly.
#![allow(dead_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to avoid a module-level clippy allow (so we can know that this needs to be removed if there's a time when all the benchmarks stop using it). Maybe you can make this a pub export of a regular module in src called internal_benchmark_util?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to src/utils. Also switched from rand to fastrand to avoid introducing new dependency.

…rk_util

Replace the bench/partitioning/common.rs module (which needed a
module-level #![allow(dead_code)]) with a proper library module at
src/utils/internal_benchmark_util.rs. Switch from rand to fastrand
(already a regular dependency) so no feature gate is needed.
@Kontinuation Kontinuation merged commit 04b86a6 into apache:main Feb 13, 2026
17 checks passed
@paleolimbot paleolimbot added this to the 0.3.0 milestone Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants