feat(vortex-geo): native Point extension type and GeoDistance scalar function#8372
feat(vortex-geo): native Point extension type and GeoDistance scalar function#8372HarukiMoriarty wants to merge 9 commits into
Conversation
Adds a GeoArrow-style `Point` extension type (Struct<x,y,[z],[m]>, dimension-ready) and the planar `GeoDistance` scalar function between two point columns. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
… point GeoDistance computes the planar distance from each point in a column to a single constant query point (e.g. `ST_Distance(column, point)`). The second operand must be a constant: it is decoded once and broadcast over the column rather than materialized to one identical row per output element. Column-to- column distance is unsupported and errors. `try_new_array` now infers the output length from the point column instead of taking it as an explicit parameter. Signed-off-by: Nemo Yu <zhenghong@spiraldb.com>
…field types Signed-off-by: Nemo Yu <zyu379@wisc.edu>
…s on construction Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Merging this PR will degrade performance by 12.33%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
20.5 µs | 35.5 µs | -42.34% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.0)] |
111.3 µs | 138.2 µs | -19.46% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.1)] |
111.3 µs | 138 µs | -19.35% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.01)] |
111 µs | 137.6 µs | -19.35% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
161.7 µs | 197.9 µs | -18.3% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
176.8 µs | 213.1 µs | -17% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.1)] |
80.7 µs | 89.7 µs | -10.1% |
| ⚡ | Simulation | decompress_rd[f64, (100000, 0.0)] |
980.4 µs | 845.4 µs | +15.97% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing nemo/geo-point (e92be4b) with develop (9383c35)
| let DType::Struct(fields, _) = dtype else { | ||
| vortex_bail!("coordinate storage must be a Struct, was {dtype}"); | ||
| }; | ||
| let names: Vec<&str> = fields.names().iter().map(|n| n.as_ref()).collect(); |
There was a problem hiding this comment.
Removed. The names are now staged in a stack buffer inside from_field_names so the slice-pattern match still works, and coordinate_dimension zips names with fields directly instead of collecting.
| vortex_ensure!( | ||
| matches!( | ||
| field, | ||
| DType::Primitive(PType::F64, Nullability::NonNullable) |
There was a problem hiding this comment.
I thought that two fields are Nullable?
There was a problem hiding this comment.
z/m are optional fields (per dimension), not nullable ones — the GeoArrow spec requires coordinate fields to be non-nullable, with "only the outer level allowed to have nulls". So a point can be missing entirely, but a present point can't have a null ordinate.
There was a problem hiding this comment.
oh maybe you should make it Struct<x, y, ?z, ?m> instead, I forgot that ? also means possibly null, not just optional. Or you can do Struct<x, y, {z}, {m}>
Co-authored-by: Joe Isaacs <joe.isaacs@live.co.uk> Signed-off-by: Nemo Yu <83347615+HarukiMoriarty@users.noreply.github.com>
Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Summary
This PR adds a native point type to
vortex-geo. Points are by far the most common geometry in analytical datasets, and a columnar representation makes their coordinates directly accessible without parsing WKB.It also adds the scalar function: point-to-point distance with PostGIS
ST_Distancesemantics (planar/Euclidean, results in CRS units).API Changes
Adds to
vortex-geo, all registered throughvortex_geo::initialize:Point(vortex.geo.point): a location stored asStruct<x, y, z?, m?>of non-nullablef64, wherez?is an optional elevation andm?an optional measure.Coordinate: the internal value a point scalar unpacks to.GeoDistance(vortex.geo.distance): per-row distance between two equal-length point columns; either or both operands may be constant, in which case the query point is decoded once and broadcast.Testing
Unit tests cover dtype validation for every GeoArrow dimension (and rejection of invalid storage), round-tripping a point column through scalar execution back to the original coordinates, WKT display for all four dimensions, and distance over all operand shapes: column-to-constant (either side), column-to-column, and constant-to-constant.
Supersedes #8342 (same change, moved from my fork to an in-repo branch).