Skip to content

ST_Distance and ST_DWithin based on georust/geo#73

Merged
jiayuasu merged 4 commits intoapache:mainfrom
Kontinuation:geo-distance
Sep 13, 2025
Merged

ST_Distance and ST_DWithin based on georust/geo#73
jiayuasu merged 4 commits intoapache:mainfrom
Kontinuation:geo-distance

Conversation

@Kontinuation
Copy link
Member

@Kontinuation Kontinuation commented Sep 12, 2025

This patch implements ST_Distance and ST_DWithin using georust/geo's Euclidean distance function.

Performance comparison with GEOS-based implementation:

GEO:

geo-st_distance-ArrayScalar(Point, Polygon(10))
                        time:   [19.120 ms 19.171 ms 19.227 ms]
geo-st_distance-ArrayScalar(Point, Polygon(500))
                        time:   [446.14 ms 448.10 ms 450.64 ms]

GEOS:

geos-st_distance-ArrayScalar(Polygon(10), Polygon(10))
                        time:   [187.51 ms 188.38 ms 189.72 ms]
geos-st_distance-ArrayScalar(Polygon(10), Polygon(500))
                        time:   [2.8651 s 2.8735 s 2.8822 s]

Benchmarking ternary functions such as ST_DWithin takes forever, it could be a problem with our benchmarking framework.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements ST_Distance and ST_DWithin spatial functions using the georust/geo library's Euclidean distance calculations, providing a significant performance improvement over the GEOS-based implementation.

  • Adds ST_Distance function for calculating Euclidean distance between geometries
  • Adds ST_DWithin function for checking if geometries are within a specified distance
  • Includes comprehensive test coverage and benchmarks for both functions

Reviewed Changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated no comments.

Show a summary per file
File Description
rust/sedona-geo/src/st_distance.rs Implements ST_Distance function with Euclidean distance calculation
rust/sedona-geo/src/st_dwithin.rs Implements ST_DWithin function for distance-based geometric queries
rust/sedona-geo/src/register.rs Registers the new functions in the scalar kernels list
rust/sedona-geo/src/lib.rs Adds module declarations for the new functions
rust/sedona-geo/benches/geo-functions.rs Adds benchmarks for performance testing of the new functions

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@jesspav
Copy link
Collaborator

jesspav commented Sep 12, 2025

Looks great!
Do these functions not apply to geographies?

@Kontinuation
Copy link
Member Author

Looks great! Do these functions not apply to geographies?

Geographies has its own implementation of ST_Distance:

pub fn st_distance_impl() -> ScalarKernelRef {
S2ScalarKernel::new_ref(
S2ScalarUDF::Distance,
vec![ArgMatcher::is_geography(), ArgMatcher::is_geography()],
SedonaType::Arrow(DataType::Float64),
)
}

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I think the GeoTypesExecutor is worth addressing now (array distance inputs could be a follow-up although I think it is not too hard to handle).

Comment on lines 67 to 72
} else {
return Err(DataFusionError::Execution(format!(
"Invalid distance: {:?}",
args[2]
)));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will error when distance isn't a constant? I think you could probably handle this case with:

let arg2_array = arg2.to_array(executor.num_iterations())?;
let arg2_f64_array = as_float64_array(&arg2_array)?;
let arg2_iter = arg2_f64_array.iter();

// In the loop
let distance = arg2_iter.next().unwrap()

That is possibly slower for the scalar case and could be kept separate if it ends up mattering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am simply replicating what st_dwithin for GEOS does:

// Extract the constant scalar value before looping over the input geometries
let distance: Option<f64>;
let arg2 = args[2].cast_to(&DataType::Float64, None)?;
if let ColumnarValue::Scalar(scalar_arg) = &arg2 {
if scalar_arg.is_null() {
distance = None;
} else {
distance = Some(f64::try_from(scalar_arg.clone())?);
}
} else {
return Err(DataFusionError::Execution(format!(
"Invalid distance: {:?}",
args[2]
)));
}

I'll address the same problem in GEOS UDF code in this patch.

Comment on lines 92 to 93
let geom_a = item_to_geometry(wkb_a)?;
let geom_b = item_to_geometry(wkb_b)?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just double-checking: do we need these conversions, or is the distance metric implemented for the generic case?

Also, it should be more efficient to use the GeoTypesExecutor, which should ensure that scalar inputs on either the right or left are only converted once.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic distance PR was merged 1 hour ago. I'll remove the conversion and call distance_ext directly on WKB values.

@jiayuasu jiayuasu merged commit 9efa952 into apache:main Sep 13, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants