20067: fix: race condition in SpillPool caused by buffered stream by martin-augment · Pull Request #219 · martin-augment/datafusion

martin-augment · 2026-01-30T12:27:34Z

20067: To review by AI

Signed-off-by: Ahmed hossam <[email protected]>

gemini-code-assist · 2026-01-30T12:27:45Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical race condition within the SpillPool by refactoring how data streams are handled during spill operations. The core change involves discontinuing the use of a specific buffered stream utility, which was identified as the root cause of the concurrency issue. This ensures more reliable and stable spill management by simplifying the stream processing logic.

Highlights

Race Condition Fix: This pull request resolves a race condition in the SpillPool by removing the use of a buffered stream, which was identified as the underlying cause of the concurrency issue.
Removal of spawn_buffered: The spawn_buffered utility function is no longer used within the SpillManager::read_spill method, and its corresponding import has been removed from the file.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-30T12:27:54Z

Walkthrough

The change removes the buffering wrapper from the SpillManager::read_spill_as_stream method in the spill manager module. The spawn_buffered import is eliminated and the stream construction is simplified to return the raw stream directly without buffering. The import list is updated to only include metrics. This is an internal implementation change with no impact to the public API signatures.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-20067-2026-01-30-12-27-30

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request addresses a race condition in the SpillPool by removing the spawn_buffered call when reading from SpillReaderStream. The SpillReaderStream is designed to read one batch at a time without internal buffering, and the spawn_buffered utility was previously used to add an external buffering layer. Removing this external buffering directly resolves the identified race condition, aligning the behavior with the intended non-buffered nature of the SpillReaderStream for spill file reads.

claude · 2026-01-30T12:29:31Z

Code Review: Fix race condition in SpillPool caused by buffered stream

Summary

This PR addresses a race condition in the SpillPool by removing the use of spawn_buffered in SpillManager::read_spill_as_stream. The change is minimal but has important implications for the concurrency behavior of spill operations.

Analysis

The Change

The PR removes the buffering layer (spawn_buffered) that was previously wrapping the stream returned by read_spill_as_stream in datafusion/physical-plan/src/spill/spill_manager.rs:189.

Before:

Ok(spawn_buffered(stream, self.batch_read_buffer_capacity))

After:

Ok(stream)

Race Condition Context

Based on the codebase analysis:

SpillPool Architecture: The SpillPool (in spill_pool.rs) implements a producer-consumer pattern with careful lock coordination between writers and readers.
The Problem with spawn_buffered: The spawn_buffered function (in common.rs:94) spawns the stream execution to a separate tokio task with an intermediate buffer. This creates an additional layer of asynchronous buffering that can interfere with the SpillPool's lock-based coordination mechanism.
SpillFile Reader: Looking at SpillFile::poll_next in spill_pool.rs:535-546, the code carefully creates reader streams inside its poll implementation while holding locks at specific points. Adding spawn_buffered on top introduces a spawned task that polls the stream independently, potentially causing:
- Timing issues between lock acquisition and stream polling
- Unexpected interaction with the waker registration mechanism
- Concurrent access patterns that weren't anticipated by the SpillPool design
Existing Test: The test spill_reader_stream_yield() in core/tests/execution/coop.rs:274 was specifically added to verify cooperative yielding behavior. This test actually uses spawn_buffered to simulate the old behavior, suggesting the issue is related to buffering interaction with cooperative scheduling.

Code Quality Assessment

✅ Strengths

Minimal change: The fix is surgical and doesn't introduce unnecessary complexity
Correct approach: Removing the buffering layer aligns with the comment in spill/mod.rs:55 which states the stream "will read one batch at a time and will not do any buffering"
Clean removal: Both the usage and the import of spawn_buffered are removed

⚠️ Concerns

Dead code: The field batch_read_buffer_capacity (line 45) is now unused but still exists:
- It's set in the constructor (line 56)
- Has a setter method with_batch_read_buffer_capacity (line 61-67)
- Is still used in multi_level_merge.rs:276 via .with_batch_read_buffer_capacity(buffer_size)
Recommendation: Either remove this field and its associated methods in a follow-up PR, or add a comment explaining why it's kept for future use.
Documentation gap: The comment in spill/mod.rs:55 mentions using spawn_buffered for buffering, but this is no longer applicable:
```
/// It will read one batch at a time and will not do any buffering, 
/// to buffer data use [`crate::common::spawn_buffered`]
```
Recommendation: Update or remove this comment reference to spawn_buffered in a follow-up.
Missing test coverage: While existing tests exist, there doesn't appear to be a specific test that validates the race condition is fixed.

Recommendation: Consider adding a test that specifically exercises the concurrent SpillPool scenario that was failing before this fix (if reproducible).

Performance Considerations

Potential Impact: Removing buffering could theoretically reduce throughput in some scenarios since batches are now read synchronously rather than being pre-fetched in a background task.

However:

The SpillReaderStream already uses cooperative() wrapper for proper yielding
The SpillPool has its own coordination mechanism
The multi_level_merge.rs:276 code suggests that callers can control buffering at a higher level if needed
The race condition fix is more important than potential micro-optimizations

Verdict: The performance trade-off is acceptable and likely negligible in practice.

Security Concerns

No security issues identified. The change reduces complexity which generally improves security posture.

Test Coverage

Existing tests: The codebase has comprehensive spill-related tests:

Multiple tests in spill/mod.rs (lines 390, 454, 485, 520, 778, 780)
Cooperative yielding test in coop.rs:274
Benchmark tests in spill_io.rs

Recommendation: Ensure all existing tests pass with this change. The test at coop.rs:274 is particularly important as it specifically validates yielding behavior with buffered streams.

Overall Assessment

Rating: ✅ Approve with minor follow-up suggestions

Summary:

The fix correctly addresses the race condition by removing the problematic buffering layer
The change is minimal and well-targeted
No new bugs or security issues introduced
Minor cleanup of dead code and documentation would be beneficial in follow-up PRs

Recommended Actions:

✅ Merge this PR as-is to fix the race condition
📝 File follow-up issue to clean up unused batch_read_buffer_capacity field and methods
📝 Update documentation in spill/mod.rs:55 to remove reference to spawn_buffered
🧪 Consider adding a regression test if the race condition is reproducible

Reviewed files:

datafusion/physical-plan/src/spill/spill_manager.rs

Related code examined:

datafusion/physical-plan/src/common.rs (spawn_buffered implementation)
datafusion/physical-plan/src/spill/spill_pool.rs (SpillPool reader/writer coordination)
datafusion/physical-plan/src/spill/mod.rs (SpillReaderStream)
datafusion/core/tests/execution/coop.rs (cooperative yielding tests)

augmentcode · 2026-01-30T12:34:14Z

🤖 Augment PR Summary

Summary: Removes buffered wrapping around spill-file reads to avoid a race condition when consuming spill output.
Changes: SpillManager::read_spill_as_stream now returns the raw cooperative stream (no spawn_buffered).

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-30T12:34:15Z

datafusion/physical-plan/src/spill/spill_manager.rs

        )));

-        Ok(spawn_buffered(stream, self.batch_read_buffer_capacity))
+        Ok(stream)


read_spill_as_stream no longer uses self.batch_read_buffer_capacity, so with_batch_read_buffer_capacity is now a no-op (e.g. datafusion/physical-plan/src/sorts/multi_level_merge.rs:276) and the struct field/doc comment is now misleading.

Other Locations

datafusion/physical-plan/src/sorts/multi_level_merge.rs:276

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! The batch_read_buffer_capacity field is no more read and it could be either removed or at least marked as deprecated. Prevents maintaining dead code.

martin-augment · 2026-01-30T12:41:12Z

Dead code: The field batch_read_buffer_capacity (line 45) is now unused but still exists:

It's set in the constructor (line 56)

Has a setter method with_batch_read_buffer_capacity (line 61-67)

Is still used in multi_level_merge.rs:276 via .with_batch_read_buffer_capacity(buffer_size)

Recommendation: Either remove this field and its associated methods in a follow-up PR, or add a comment explaining why it's kept for future use.

value:useful; category:bug; feedback: The Claude AI reviewer is correct! The batch_read_buffer_capacity field is no more read and it could be either removed or at least marked as deprecated. Prevents maintaining dead code.

dekuu5 added 2 commits January 29, 2026 20:10

fix: race condition in SpillPool caused by buffered stream

ccde430

Signed-off-by: Ahmed hossam <[email protected]>

Merge branch 'main' into test_SpillPool

1b8ef43

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

augmentcode bot reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20067: fix: race condition in SpillPool caused by buffered stream#219

20067: fix: race condition in SpillPool caused by buffered stream#219
martin-augment wants to merge 2 commits intomainfrom
pr-20067-2026-01-30-12-27-30

martin-augment commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

claude bot commented Jan 30, 2026

Uh oh!

augmentcode bot commented Jan 30, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Jan 30, 2026

Uh oh!

martin-augment Jan 30, 2026

Uh oh!

martin-augment commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

martin-augment commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

claude bot commented Jan 30, 2026

Code Review: Fix race condition in SpillPool caused by buffered stream

Summary

Analysis

The Change

Race Condition Context

Code Quality Assessment

✅ Strengths

⚠️ Concerns

Performance Considerations

Security Concerns

Test Coverage

Overall Assessment

Uh oh!

augmentcode bot commented Jan 30, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coderabbitai bot commented Jan 30, 2026 •

edited

Loading