Skip to content

fix(python/sedonadb): Ensure that Python UDFs executing with >1 batch do not cause deadlock#558

Merged
paleolimbot merged 7 commits intoapache:mainfrom
paleolimbot:fix-udf-execution
Jan 30, 2026
Merged

fix(python/sedonadb): Ensure that Python UDFs executing with >1 batch do not cause deadlock#558
paleolimbot merged 7 commits intoapache:mainfrom
paleolimbot:fix-udf-execution

Conversation

@paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Jan 28, 2026

I had intended to post a reprex to GeoPandas regarding crashes when threading but was caught by this issue, where the way we collected things into Python caused a lot of attempts to acquire the GIL (or perhaps a lock related to tokio) which interfered with UDF execution.

Briefly, before this PR, the Python bindings always collected via a special RecordBatchReader that called block_on() + allow_threads(), waiting for the next batch in the output SendableRecordBatchStream. To ensure cancellation requests worked, we aquired the GIL every 1 second to check for signals. After this PR, we use one block_on() + allow_threads() and collect all the batches at once when we know this is what needs to happen anyway. Streaming output still works it just is now only invoked when required.

This constant block_on() + GIL acquisition caused a deadlock when Python UDFs were also trying to acquire the GIL (or perhaps a tokio lock of some kind).

The workaround here is not a full solution but covers the most common case, where a user wants to collect the entire result (e.g., .to_pandas(). This is simpler to orchestrate.

@paleolimbot paleolimbot requested a review from Copilot January 29, 2026 15:51
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR resolves a deadlock issue that occurred when executing Python UDFs with multiple record batches. The fix modifies the batch collection strategy to use a single block_on() call instead of repeatedly acquiring the GIL for each batch, while preserving streaming output for cases where it's needed.

Changes:

  • Modified batch collection to gather all batches in a single async operation for .to_pandas() and .execute()
  • Updated signal checking interval from 1 second to 2 seconds and added py.run(cr"pass", None, None) calls
  • Added regression tests to verify multi-batch operations complete without hanging

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
python/sedonadb/src/dataframe.rs Introduces to_batches() method and Batches struct for single-pass collection; refactors execute() to collect in one async operation
python/sedonadb/python/sedonadb/dataframe.py Updates to_arrow_table() to use new to_batches() method
python/sedonadb/src/runtime.rs Increases signal check interval and adds py.run(cr"pass") calls before signal checks
python/sedonadb/tests/test_udf.py Adds regression tests for multi-batch UDF execution and collection; corrects expected exception types
python/sedonadb/tests/test_dataframe.py Updates error message assertion to match new wording
python/sedonadb/tests/functions/test_wkb.py Adds GEOS version check for EWKB tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@paleolimbot paleolimbot marked this pull request as ready for review January 29, 2026 16:06
Copy link
Member

@zhangfengcdt zhangfengcdt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments, otherwise, lgtm

#[pymethods]
impl Batches {
fn __len__(&self) -> usize {
self.count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the number of rows or number of batches? Looks like len is expected to be the later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great point...I removed it since it's not used for anything yet.

return pa.table(self, schema=pa.schema(schema))
# Collects all batches into an object that exposes __arrow_c_stream__()
batches = self._impl.to_batches(schema)
return pa.table(batches)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we do not need to pass schema to pa.table anymore.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, passing it via to_batches() ensures it makes it to Rust!

@paleolimbot paleolimbot merged commit 4419338 into apache:main Jan 30, 2026
5 checks passed
@paleolimbot paleolimbot deleted the fix-udf-execution branch January 30, 2026 21:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants