ST_Distance and ST_DWithin based on georust/geo#73
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR implements ST_Distance and ST_DWithin spatial functions using the georust/geo library's Euclidean distance calculations, providing a significant performance improvement over the GEOS-based implementation.
- Adds ST_Distance function for calculating Euclidean distance between geometries
- Adds ST_DWithin function for checking if geometries are within a specified distance
- Includes comprehensive test coverage and benchmarks for both functions
Reviewed Changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| rust/sedona-geo/src/st_distance.rs | Implements ST_Distance function with Euclidean distance calculation |
| rust/sedona-geo/src/st_dwithin.rs | Implements ST_DWithin function for distance-based geometric queries |
| rust/sedona-geo/src/register.rs | Registers the new functions in the scalar kernels list |
| rust/sedona-geo/src/lib.rs | Adds module declarations for the new functions |
| rust/sedona-geo/benches/geo-functions.rs | Adds benchmarks for performance testing of the new functions |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
Looks great! |
Geographies has its own implementation of ST_Distance: sedona-db/c/sedona-s2geography/src/scalar_kernel.rs Lines 85 to 91 in 3d01227 |
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you! I think the GeoTypesExecutor is worth addressing now (array distance inputs could be a follow-up although I think it is not too hard to handle).
rust/sedona-geo/src/st_dwithin.rs
Outdated
| } else { | ||
| return Err(DataFusionError::Execution(format!( | ||
| "Invalid distance: {:?}", | ||
| args[2] | ||
| ))); | ||
| } |
There was a problem hiding this comment.
This will error when distance isn't a constant? I think you could probably handle this case with:
let arg2_array = arg2.to_array(executor.num_iterations())?;
let arg2_f64_array = as_float64_array(&arg2_array)?;
let arg2_iter = arg2_f64_array.iter();
// In the loop
let distance = arg2_iter.next().unwrap()That is possibly slower for the scalar case and could be kept separate if it ends up mattering.
There was a problem hiding this comment.
I am simply replicating what st_dwithin for GEOS does:
sedona-db/c/sedona-geos/src/st_dwithin.rs
Lines 56 to 70 in aa4f80a
I'll address the same problem in GEOS UDF code in this patch.
rust/sedona-geo/src/st_dwithin.rs
Outdated
| let geom_a = item_to_geometry(wkb_a)?; | ||
| let geom_b = item_to_geometry(wkb_b)?; |
There was a problem hiding this comment.
Just double-checking: do we need these conversions, or is the distance metric implemented for the generic case?
Also, it should be more efficient to use the GeoTypesExecutor, which should ensure that scalar inputs on either the right or left are only converted once.
There was a problem hiding this comment.
The generic distance PR was merged 1 hour ago. I'll remove the conversion and call distance_ext directly on WKB values.
This patch implements ST_Distance and ST_DWithin using georust/geo's Euclidean distance function.
Performance comparison with GEOS-based implementation:
GEO:
GEOS:
Benchmarking ternary functions such as ST_DWithin takes forever, it could be a problem with our benchmarking framework.