feat(python/sedonadb): Implement parameter binding#575
feat(python/sedonadb): Implement parameter binding#575paleolimbot merged 9 commits intoapache:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements parameter binding for SedonaDB, enabling users to bind arbitrary Python objects as query parameters using placeholders like $1 or $my_param. The implementation includes a new Literal expression type and conversion logic that handles various Python objects including Shapely geometries, GeoPandas objects, and Arrow-compatible types.
Changes:
- Added parameter binding support to SQL queries via
with_params()method andparamsargument tosql() - Implemented
Literalexpression type with conversion logic for Python objects to Arrow arrays - Added Rust-side support for importing Arrow scalars and binding parameters to DataFrames
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| python/sedonadb/tests/test_dataframe.py | Tests for parameter binding with positional and named parameters |
| python/sedonadb/tests/expr/test_literal.py | Comprehensive tests for literal expression conversion from various Python types |
| python/sedonadb/src/import_from.rs | Added import_arrow_scalar function to convert Arrow arrays to scalar values with metadata |
| python/sedonadb/src/dataframe.rs | Implemented with_params method to bind positional and named parameters to DataFrames |
| python/sedonadb/python/sedonadb/expr/literal.py | Core literal expression implementation with conversion logic for Python objects |
| python/sedonadb/python/sedonadb/expr/init.py | Module initialization for expression API |
| python/sedonadb/python/sedonadb/dataframe.py | Added with_params() method to DataFrame class |
| python/sedonadb/python/sedonadb/context.py | Added params argument to sql() method for parameter binding |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
| return f"{type(obj).__module__}.{type(obj).__name__}" | ||
|
|
||
|
|
||
| SPECIAL_CASED_LITERALS = { |
There was a problem hiding this comment.
Would this work for subclass if we match the class names?
and missing LinearRing?
There was a problem hiding this comment.
Good catch on LinearRing!
I use the class name approach (which doesn't catch subclasses) because in order to do isinstance(obj, shapely.Geometry) we need to import shapely, which is something I'd rather not do at the module level if it can be avoided. If something comes up where there's no alternative we could require certain dependencies for using parameterized queries.
| if len(obj) != 1: | ||
| raise ValueError("Can't create SedonaDB literal from Series with length != 1") | ||
|
|
||
| if obj.dtype.name == "geometry": |
There was a problem hiding this comment.
should we also check crs is defined here?
There was a problem hiding this comment.
I'm not sure if dtype.name == "geometry" implies that the series is a GeoPandas GeoSeries, and if can assume that crs is defined in this case. Besides, the PR LGTM.
There was a problem hiding this comment.
This one is because of a fun corner case in Pandas land: geo_df.iloc[0] is a Series with dtype geometry, not a GeoSeries (hence checking obj.array.crs instead of obj.crs).
This PR provides the ability to bind arbitrary Python objects as parameters in a query (e.g.
SELECT ST_Envelope($1)). As discussed in the ticket, most of the work here is actually around the logic to convert arbitrary Python objects, where the heuristic is:obj.__arrow_c_array__()protocolpyarrow.array([obj])In general, dataframe-ish objects with exactly one value and anything convertible to arrow of length one works (this is vaugely the same logic as how you can put a subquery in any spot where a scalar is expected). It also works nicely for geometry because GeoPandas objects preserve their CRS but shapely objects don't (so
geo_df.geometry[0]is lossy butgeo_df.geometryis not).As a side effect this also kick starts our expression API with a single expression type
Literal.Closes #111.
Geometry objects can also be bound here: