feat(vortex-geo): native Point extension type and GeoDistance scalar function#8342
feat(vortex-geo): native Point extension type and GeoDistance scalar function#8342HarukiMoriarty wants to merge 5 commits into
Conversation
Adds a GeoArrow-style `Point` extension type (Struct<x,y,[z],[m]>, dimension-ready) and the planar `GeoDistance` scalar function between two point columns. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
… point GeoDistance computes the planar distance from each point in a column to a single constant query point (e.g. `ST_Distance(column, point)`). The second operand must be a constant: it is decoded once and broadcast over the column rather than materialized to one identical row per output element. Column-to- column distance is unsupported and errors. `try_new_array` now infers the output length from the point column instead of taking it as an explicit parameter. Signed-off-by: Nemo Yu <zhenghong@spiraldb.com>
026ecbd to
ec95875
Compare
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
20.2 µs | 35.2 µs | -42.51% |
| ❌ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
177.2 µs | 213.3 µs | -16.93% |
| ❌ | Simulation | varbinview_large |
113.1 µs | 131.7 µs | -14.13% |
| ❌ | Simulation | decompress_rd[f64, (100000, 0.0)] |
845.2 µs | 980 µs | -13.76% |
| ❌ | Simulation | chunked_varbinview_canonical_into[(100, 100)] |
274.4 µs | 309.3 µs | -11.27% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.0)] |
138 µs | 110.9 µs | +24.41% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.1)] |
137.8 µs | 110.9 µs | +24.24% |
| ⚡ | Simulation | decompress_rd[f64, (10000, 0.01)] |
137.4 µs | 110.6 µs | +24.23% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.1)] |
89.3 µs | 80.2 µs | +11.4% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.0)] |
89.6 µs | 80.8 µs | +10.93% |
| ⚡ | Simulation | decompress_rd[f32, (10000, 0.01)] |
89.3 µs | 80.7 µs | +10.66% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing HarukiMoriarty:nemo/geo-point (ec95875) with develop (3d7bbfb)
| let xs = storage | ||
| .unmasked_field_by_name("x")? |
There was a problem hiding this comment.
you are sure, should check the struct is non-nullable.
There was a problem hiding this comment.
at least a debug assert
| //! The coordinate fields, where `?` marks an optional field, are: | ||
| //! - `x` — longitude or easting | ||
| //! - `y` — latitude or northing | ||
| //! - `z?` — elevation | ||
| //! - `m?` — measure: an arbitrary per-point value such as distance along a route or a timestamp |
There was a problem hiding this comment.
why do we always prefer f64 vs f32?
| // SPDX-License-Identifier: Apache-2.0 | ||
| // SPDX-FileCopyrightText: Copyright the Vortex contributors | ||
|
|
||
| //! Coordinate building blocks for geometry extension types: the `Struct<x, y, z?, m?>` storage, |
There was a problem hiding this comment.
can you fill in the types here?
Summary
This PR adds a native point type to
vortex-geo. Points are by far the most common geometry in analytical datasets, and a columnar representation makes their coordinates directly accessible without parsing WKB.It also adds the scalar function: point-to-point distance with PostGIS
ST_Distancesemantics (planar/Euclidean, results in CRS units).API Changes
Adds to
vortex-geo, all registered throughvortex_geo::initialize:Point(vortex.geo.point): a location stored asStruct<x, y, z?, m?>of non-nullablef64, wherez?is an optional elevation andm?an optional measure.Coordinate: the internal value a point scalar unpacks to.GeoDistance(vortex.geo.distance): per-row distance between two equal-length point columns; either or both operands may be constant, in which case the query point is decoded once and broadcast.Testing
Unit tests cover dtype validation for every GeoArrow dimension (and rejection of invalid storage), round-tripping a point column through scalar execution back to the original coordinates, WKT display for all four dimensions, and distance over all operand shapes: column-to-constant (either side), column-to-column, and constant-to-constant.