fix(python/sedonadb): Ensure that Python UDFs executing with >1 batch do not cause deadlock by paleolimbot · Pull Request #558 · apache/sedona-db

paleolimbot · 2026-01-28T23:19:08Z

I had intended to post a reprex to GeoPandas regarding crashes when threading but was caught by this issue, where the way we collected things into Python caused a lot of attempts to acquire the GIL (or perhaps a lock related to tokio) which interfered with UDF execution.

Briefly, before this PR, the Python bindings always collected via a special RecordBatchReader that called block_on() + allow_threads(), waiting for the next batch in the output SendableRecordBatchStream. To ensure cancellation requests worked, we aquired the GIL every 1 second to check for signals. After this PR, we use one block_on() + allow_threads() and collect all the batches at once when we know this is what needs to happen anyway. Streaming output still works it just is now only invoked when required.

This constant block_on() + GIL acquisition caused a deadlock when Python UDFs were also trying to acquire the GIL (or perhaps a tokio lock of some kind).

The workaround here is not a full solution but covers the most common case, where a user wants to collect the entire result (e.g., .to_pandas(). This is simpler to orchestrate.

Copilot

Pull request overview

This PR resolves a deadlock issue that occurred when executing Python UDFs with multiple record batches. The fix modifies the batch collection strategy to use a single block_on() call instead of repeatedly acquiring the GIL for each batch, while preserving streaming output for cases where it's needed.

Changes:

Modified batch collection to gather all batches in a single async operation for .to_pandas() and .execute()
Updated signal checking interval from 1 second to 2 seconds and added py.run(cr"pass", None, None) calls
Added regression tests to verify multi-batch operations complete without hanging

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
python/sedonadb/src/dataframe.rs	Introduces `to_batches()` method and `Batches` struct for single-pass collection; refactors `execute()` to collect in one async operation
python/sedonadb/python/sedonadb/dataframe.py	Updates `to_arrow_table()` to use new `to_batches()` method
python/sedonadb/src/runtime.rs	Increases signal check interval and adds `py.run(cr"pass")` calls before signal checks
python/sedonadb/tests/test_udf.py	Adds regression tests for multi-batch UDF execution and collection; corrects expected exception types
python/sedonadb/tests/test_dataframe.py	Updates error message assertion to match new wording
python/sedonadb/tests/functions/test_wkb.py	Adds GEOS version check for EWKB tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/sedonadb/src/dataframe.rs

python/sedonadb/tests/test_udf.py

zhangfengcdt

minor comments, otherwise, lgtm

zhangfengcdt · 2026-01-30T19:34:03Z

python/sedonadb/src/dataframe.rs

+#[pymethods]
+impl Batches {
+    fn __len__(&self) -> usize {
+        self.count


Is this the number of rows or number of batches? Looks like len is expected to be the later.

That's a great point...I removed it since it's not used for anything yet.

zhangfengcdt · 2026-01-30T19:38:50Z

python/sedonadb/python/sedonadb/dataframe.py

-            return pa.table(self, schema=pa.schema(schema))
+        # Collects all batches into an object that exposes __arrow_c_stream__()
+        batches = self._impl.to_batches(schema)
+        return pa.table(batches)


I assume we do not need to pass schema to pa.table anymore.

Yes, passing it via to_batches() ensures it makes it to Rust!

paleolimbot added 4 commits January 28, 2026 17:07

fix collection

fa615d2

fix wheel tests

b27c1dd

regression test

d290aef

fix error class

9fb1750

paleolimbot requested a review from Copilot January 29, 2026 15:51

Copilot AI reviewed Jan 29, 2026

View reviewed changes

python/sedonadb/src/dataframe.rs Outdated Show resolved Hide resolved

python/sedonadb/tests/test_udf.py Show resolved Hide resolved

fix one more error class

2e08155

paleolimbot marked this pull request as ready for review January 29, 2026 16:06

lint

99d5d7d

zhangfengcdt approved these changes Jan 30, 2026

View reviewed changes

remove count for now

0968b81

paleolimbot merged commit 4419338 into apache:main Jan 30, 2026
5 checks passed

paleolimbot deleted the fix-udf-execution branch January 30, 2026 21:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(python/sedonadb): Ensure that Python UDFs executing with >1 batch do not cause deadlock#558

fix(python/sedonadb): Ensure that Python UDFs executing with >1 batch do not cause deadlock#558
paleolimbot merged 7 commits intoapache:mainfrom
paleolimbot:fix-udf-execution

paleolimbot commented Jan 28, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

zhangfengcdt left a comment

Uh oh!

zhangfengcdt Jan 30, 2026

Uh oh!

paleolimbot Jan 30, 2026

Uh oh!

zhangfengcdt Jan 30, 2026

Uh oh!

paleolimbot Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paleolimbot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

zhangfengcdt left a comment

Choose a reason for hiding this comment

Uh oh!

zhangfengcdt Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

zhangfengcdt Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

paleolimbot commented Jan 28, 2026 •

edited

Loading