feat(rust/sedona): Add SedonaFairSpillPool memory pool and CLI memory limit support#599
Conversation
… limit support Add a new SedonaFairSpillPool that reserves a configurable fraction of memory for unspillable consumers, preventing spillable operators from exhausting all available memory. This addresses the issue where operators like RepartitionExec (unspillable) could fail when spatial join operators (spillable) consumed all memory. Also add --memory-limit, --mem-pool-type, and --unspillable-reserve-ratio CLI options to sedona-cli, and refactor SedonaContext to accept a custom RuntimeEnv for memory pool configuration.
There was a problem hiding this comment.
Pull request overview
Adds configurable memory-pool behavior to Sedona (including a new fair spill pool with reserved capacity for unspillable consumers) and wires this through sedona-cli flags by allowing SedonaContext to be created with an injected RuntimeEnv.
Changes:
- Introduces
SedonaFairSpillPoolwith an “unspillable reserve” mechanism plus unit tests. - Adds
--memory-limit,--mem-pool-type, and--unspillable-reserve-ratiotosedona-cli, including parsing for human-readable sizes. - Refactors
SedonaContextconstruction to support injecting a pre-configuredRuntimeEnv.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sedona-cli/src/pool_type.rs | Adds a PoolType enum to select greedy vs fair pool from CLI. |
| sedona-cli/src/main.rs | Adds CLI flags, builds a RuntimeEnv with selected memory pool, and adds size parsing helpers. |
| sedona-cli/src/lib.rs | Exposes the new pool_type module. |
| rust/sedona/src/memory_pool.rs | Implements SedonaFairSpillPool with reserved unspillable capacity and tests. |
| rust/sedona/src/lib.rs | Exposes the new memory_pool module publicly. |
| rust/sedona/src/context.rs | Adds new_local_interactive_with_runtime_env to allow custom runtime env injection. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Validate unspillable_reserve_ratio is within 0.0..=1.0 via clap value_parser - Deduplicate NonZeroUsize::new(10) into a shared local variable - Remove negative sign from size parsing regex for clearer error messages - Fix grammar: 'resulting a reservation' -> 'resulting in a reservation'
| fn parse_size_string(size: &str, label: &str) -> Result<usize, String> { | ||
| static BYTE_SUFFIXES: LazyLock<HashMap<&'static str, ByteUnit>> = LazyLock::new(|| { |
There was a problem hiding this comment.
Can you copy DataFusion's test for this?
This could also go in rust/sedona (we'll need it in R and Python, too?)
There was a problem hiding this comment.
Added test, also augmented with decimal numbers and too large numbers.
| use std::{ | ||
| fmt::{self, Display, Formatter}, | ||
| str::FromStr, | ||
| }; | ||
|
|
||
| #[derive(PartialEq, Debug, Clone)] | ||
| pub enum PoolType { |
There was a problem hiding this comment.
It seems like we will also need this in R and Python...should it go in rust/sedona?
There was a problem hiding this comment.
Moved this to rust/sedona.
2010YOUY01
left a comment
There was a problem hiding this comment.
This looks like a good idea. If it works well, we could use it in DataFusion upstream. Looking forward to the results using it in the spilling queries!
One potential follow-up: we could set configurations through SQL SET ... statements like https://datafusion.apache.org/user-guide/configs.html#runtime-configuration-settings
- Move PoolType from sedona-cli to rust/sedona for R/Python reuse - Fix parse_size_string to handle decimal values (e.g. 1.5g, 0.5m) - Add comprehensive tests for parse_size_string - Fix SedonaFairSpillPool doc comment formatting
5f07a36 to
754e996
Compare
Summary
SedonaFairSpillPool, a new memory pool that reserves a configurable fraction of total memory for unspillable consumers, preventing spillable operators from exhausting all available memory (addresses datafusion#17334 in the Sedona context)--memory-limit,--mem-pool-type, and--unspillable-reserve-ratioCLI arguments tosedona-clifor configuring memory pool behaviorSedonaContext::new_local_interactive()to exposenew_local_interactive_with_runtime_env(), allowing callers to inject a customRuntimeEnvwith pre-configured memory poolsMotivation
When running out-of-core spatial joins, spillable operators (e.g.,
SpatialJoinExec) could consume all available memory, causing unspillable operators (e.g.,RepartitionExec's merge consumer) to fail with OOM errors. TheSedonaFairSpillPoolmitigates this by reserving a configurable portion (default 20%) of the memory pool exclusively for unspillable allocations.