Skip to content

feat: Implement native Rust ST_Centroid and ST_Length functions#33

Merged
jiayuasu merged 24 commits intoapache:mainfrom
zhangfengcdt:feature/implement_geo_centroid_udf
Sep 8, 2025
Merged

feat: Implement native Rust ST_Centroid and ST_Length functions#33
jiayuasu merged 24 commits intoapache:mainfrom
zhangfengcdt:feature/implement_geo_centroid_udf

Conversation

@zhangfengcdt
Copy link
Member

@zhangfengcdt zhangfengcdt commented Sep 5, 2025

Implements native Rust versions of ST_Centroid and ST_Length functions using the geo-generic-alg library, providing substantial performance improvements over the existing GEOS-based implementations.

  • Added ST_Centroid implementation (rust/sedona-geo/src/st_centroid.rs)

    • Native Rust implementation using geo-generic-alg
    • Support for Point, LineString, Polygon, and GeometryCollection types
    • Registered as alternative to GEOS implementation
  • Added ST_Length implementation (rust/sedona-geo/src/st_length.rs)

    • Native Rust implementation using geo-generic-alg
    • Support for LineString, Polygon, and GeometryCollection types
    • Comprehensive length calculation including polygon perimeters
  • Updated benchmark tests (benchmarks/test_functions.py)

    • Modified ST_Length tests to use segments_large, collections_simple, and collections_complex tables
    • Enhanced test coverage for performance validation

ST_Centroid

Before the fix:

----------------------------------------------- benchmark 'table=polygons_complex': 2 tests ------------------------------------------------
Name (time in ms)                                Median               Mean            StdDev                Min                Max
--------------------------------------------------------------------------------------------------------------------------------------------
test_st_centroid[polygons_complex-SedonaDB]      7.3047 (1.0)       7.3327 (1.0)      0.1581 (1.0)       7.0635 (1.0)       8.0731 (1.0)
test_st_centroid[polygons_complex-DuckDB]       22.6901 (3.11)     22.9047 (3.12)     0.5560 (3.52)     22.4189 (3.17)     24.6498 (3.05)
--------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------- benchmark 'table=polygons_simple': 2 tests ---------------------------------------------
Name (time in ms)                              Median              Mean            StdDev               Min               Max
---------------------------------------------------------------------------------------------------------------------------------------
test_st_centroid[polygons_simple-DuckDB]       1.4665 (1.0)      1.4766 (1.0)      0.0496 (1.0)      1.4512 (1.0)      2.1513 (1.0)
test_st_centroid[polygons_simple-SedonaDB]     2.3950 (1.63)     2.3920 (1.62)     0.0915 (1.84)     1.7250 (1.19)     2.7119 (1.26)
---------------------------------------------------------------------------------------------------------------------------------------


After the fix:

----------------------------------------------- benchmark 'table=polygons_complex': 2 tests ------------------------------------------------
Name (time in ms)                                Median               Mean            StdDev                Min                Max
--------------------------------------------------------------------------------------------------------------------------------------------
test_st_centroid[polygons_complex-SedonaDB]      1.7091 (1.0)       1.7219 (1.0)      0.0624 (1.0)       1.6485 (1.0)       2.1138 (1.0)
test_st_centroid[polygons_complex-DuckDB]       23.4304 (13.71)    23.4766 (13.63)    0.2540 (4.07)     23.0811 (14.00)    24.1968 (11.45)
--------------------------------------------------------------------------------------------------------------------------------------------

---------------------------------------------- benchmark 'table=polygons_simple': 2 tests ---------------------------------------------
Name (time in ms)                              Median              Mean            StdDev               Min               Max
---------------------------------------------------------------------------------------------------------------------------------------
test_st_centroid[polygons_simple-SedonaDB]     0.3223 (1.0)      0.3314 (1.0)      0.0379 (1.0)      0.2905 (1.0)      0.6315 (1.0)
test_st_centroid[polygons_simple-DuckDB]       1.4543 (4.51)     1.4577 (4.40)     0.0514 (1.36)     1.3902 (4.79)     2.1562 (3.41)
---------------------------------------------------------------------------------------------------------------------------------------

ST_Length

Before the fix:

----------------------------------------------- benchmark 'table=collections_complex': 2 tests ----------------------------------------------
Name (time in ms)                                 Median               Mean            StdDev                Min                Max
---------------------------------------------------------------------------------------------------------------------------------------------
test_st_length[collections_complex-DuckDB]        5.8670 (1.0)       5.9233 (1.0)      0.2211 (1.0)       5.5197 (1.0)       6.6817 (1.0)
test_st_length[collections_complex-SedonaDB]     14.0494 (2.39)     14.4129 (2.43)     0.8906 (4.03)     13.7268 (2.49)     18.2037 (2.72)
---------------------------------------------------------------------------------------------------------------------------------------------

--------------------------------------------- benchmark 'table=collections_simple': 2 tests ---------------------------------------------
Name (time in ms)                               Median              Mean            StdDev               Min                Max
-----------------------------------------------------------------------------------------------------------------------------------------
test_st_length[collections_simple-DuckDB]       0.7602 (1.0)      0.7618 (1.0)      0.0269 (1.0)      0.7120 (1.0)       1.1222 (1.0)
test_st_length[collections_simple-SedonaDB]     9.4402 (12.42)    9.7369 (12.78)    0.7997 (29.74)    9.1862 (12.90)    14.2978 (12.74)
-----------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------- benchmark 'table=segments_large': 2 tests ---------------------------------------------
Name (time in ms)                           Median              Mean            StdDev               Min               Max
------------------------------------------------------------------------------------------------------------------------------------
test_st_length[segments_large-DuckDB]       0.2080 (1.0)      0.2336 (1.0)      0.0436 (1.0)      0.1952 (1.0)      0.5038 (1.0)
test_st_length[segments_large-SedonaDB]     2.7922 (13.42)    2.8142 (12.05)    0.1098 (2.52)     2.6992 (13.83)    3.4997 (6.95)
------------------------------------------------------------------------------------------------------------------------------------

After the fix:

--------------------------------------------- benchmark 'table=collections_complex': 2 tests --------------------------------------------
Name (time in ms)                                Median              Mean            StdDev               Min               Max
-----------------------------------------------------------------------------------------------------------------------------------------
test_st_length[collections_complex-DuckDB]       6.1634 (1.0)      6.1403 (1.0)      0.4365 (1.0)      5.3774 (1.0)      7.3927 (1.0)
test_st_length[collections_complex-SedonaDB]     6.4726 (1.05)     6.6360 (1.08)     0.4949 (1.13)     6.0559 (1.13)     9.0267 (1.22)
-----------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------- benchmark 'table=collections_simple': 2 tests ---------------------------------------------
Name (time in ms)                               Median              Mean            StdDev               Min               Max
----------------------------------------------------------------------------------------------------------------------------------------
test_st_length[collections_simple-SedonaDB]     0.6573 (1.0)      0.6722 (1.0)      0.0507 (1.0)      0.5901 (1.0)      0.9739 (1.0)
test_st_length[collections_simple-DuckDB]       0.9263 (1.41)     0.9159 (1.36)     0.1164 (2.29)     0.6735 (1.14)     1.7632 (1.81)
----------------------------------------------------------------------------------------------------------------------------------------

-------------------------------------------- benchmark 'table=segments_large': 2 tests ---------------------------------------------
Name (time in ms)                           Median              Mean            StdDev               Min               Max
------------------------------------------------------------------------------------------------------------------------------------
test_st_length[segments_large-DuckDB]       0.3814 (1.54)     0.3290 (1.29)     0.1390 (3.99)     0.1933 (1.0)      1.9363 (4.15)
test_st_length[segments_large-SedonaDB]     0.2478 (1.0)      0.2553 (1.0)      0.0348 (1.0)      0.2147 (1.11)     0.4661 (1.0)
------------------------------------------------------------------------------------------------------------------------------------

Performance Results

The native Rust implementations provide significant performance benefits, especially for simpler geometries, while maintaining full compatibility with the existing API.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a preliminary skim!

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! (One nit to handle now or in a follow-up!)

Comment on lines 24 to 30
#[derive(Error, Debug)]
pub enum IsEmptyError {
#[error("Invalid geometry type")]
InvalidGeometryType,
}

pub fn is_geometry_empty<G: GeometryTrait<T = f64>>(geometry: &G) -> Result<bool, IsEmptyError> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I didn't catch this on the last round - we have SedonaGeometryError already and we should use that!

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jiayuasu jiayuasu merged commit 924aa6c into apache:main Sep 8, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants