Skip to content

chore(rust/sedona-spatial-join): Revamp memory reservation pattern for spatial join#534

Merged
Kontinuation merged 2 commits intoapache:mainfrom
Kontinuation:reserve-all-mem-beforehand
Jan 23, 2026
Merged

chore(rust/sedona-spatial-join): Revamp memory reservation pattern for spatial join#534
Kontinuation merged 2 commits intoapache:mainfrom
Kontinuation:reserve-all-mem-beforehand

Conversation

@Kontinuation
Copy link
Member

This change shifts spatial join memory planning earlier in the build-side collection flow, simplifying reservation handling while improving reliability around DataFusion’s reservation behavior. We only reserve memory when collecting the build side batches for building (probably partitioned) spatial indexes and don't reserve memory when probing the index and producing result batches.

Note: reserving memory while producing batches can trigger DataFusion reservation failures (see https://github.com/apache/datafusion/issues/17334).

@Kontinuation Kontinuation requested a review from Copilot January 21, 2026 11:37
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the memory reservation pattern for spatial joins by moving memory planning earlier in the build-side collection flow. The change eliminates the concurrent reservation wrapper and shifts from incremental memory tracking during probing to upfront estimation during collection.

Changes:

  • Removed the ConcurrentReservation wrapper and associated reservation logic during probe operations
  • Introduced upfront memory estimation for spatial index components (R-tree, refiner, KNN components)
  • Added estimate_max_memory_usage method to refiners for pre-allocation planning

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
rust/sedona-spatial-join/src/utils/concurrent_reservation.rs Removed entire concurrent reservation wrapper implementation
rust/sedona-spatial-join/src/utils.rs Removed concurrent_reservation module export
rust/sedona-spatial-join/src/refine/tg.rs Added estimate_max_memory_usage method for TG refiner
rust/sedona-spatial-join/src/refine/geos.rs Added estimate_max_memory_usage method for GEOS refiner
rust/sedona-spatial-join/src/refine/geo.rs Added estimate_max_memory_usage method for Geo refiner
rust/sedona-spatial-join/src/refine.rs Added trait method for estimating maximum memory usage
rust/sedona-spatial-join/src/index/spatial_index_builder.rs Refactored to use upfront memory estimation and removed incremental reservation growth
rust/sedona-spatial-join/src/index/spatial_index.rs Removed refiner reservation tracking during query operations
rust/sedona-spatial-join/src/index/knn_adapter.rs Changed from reservation-based to estimation-based memory tracking
rust/sedona-spatial-join/src/index/build_side_collector.rs Added upfront memory reservation based on estimated usage
rust/sedona-spatial-join/src/build_index.rs Updated collector initialization to pass spatial predicate and options

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kontinuation Kontinuation force-pushed the reserve-all-mem-beforehand branch from ad5e50b to cc5dd4b Compare January 21, 2026 13:11
@Kontinuation Kontinuation marked this pull request as ready for review January 22, 2026 07:31
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines +271 to +274
// TODO: This is a rough estimate of the memory usage of the tg geometry and
// may not be accurate.
// https://github.com/apache/sedona-db/issues/281
build_stats.total_size_bytes().unwrap_or(0) as usize * 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A tg geometry can be queried for its memory size. Is that relevant here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We prefer to estimate the memory needed without the overhead of creating tg geom for each geometry, so we calculated the estimated size instead of measure the real size.

The estimation here is accurate enough in practice. I have also run some experiments using tg_geom_size to obtain a better formula for more accurate estimation. I'll submit a patch to refine this formula separately.

@Kontinuation Kontinuation merged commit a9ceb6a into apache:main Jan 23, 2026
21 of 23 checks passed
@paleolimbot paleolimbot added this to the 0.3.0 milestone Jan 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants