feat(vortex-geo): native Point extension type and GeoDistance scalar function#8342
feat(vortex-geo): native Point extension type and GeoDistance scalar function#8342HarukiMoriarty wants to merge 7 commits into
Conversation
Adds a GeoArrow-style `Point` extension type (Struct<x,y,[z],[m]>, dimension-ready) and the planar `GeoDistance` scalar function between two point columns. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
… point GeoDistance computes the planar distance from each point in a column to a single constant query point (e.g. `ST_Distance(column, point)`). The second operand must be a constant: it is decoded once and broadcast over the column rather than materialized to one identical row per output element. Column-to- column distance is unsupported and errors. `try_new_array` now infers the output length from the point column instead of taking it as an explicit parameter. Signed-off-by: Nemo Yu <zhenghong@spiraldb.com>
026ecbd to
ec95875
Compare
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
161.8 µs | 198 µs | -18.25% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing HarukiMoriarty:nemo/geo-point (8f4d3a5) with develop (eda4dd0)
| let xs = storage | ||
| .unmasked_field_by_name("x")? |
There was a problem hiding this comment.
you are sure, should check the struct is non-nullable.
There was a problem hiding this comment.
at least a debug assert
There was a problem hiding this comment.
maybe you want to add a parse_storage helper that produces a typed struct where the construct validates the storage array? So instead of calling xy_columns on the storage array you do parse_storage(storage_array, &mut ctx) and then that given you a ParsedCoordinate struct that holds the primitive arrays you want.
Let me know if that makes sense or not! You can look at the turboquant code for inspiration (though not that we are going to delete that / I need to delete that soon)
| //! The coordinate fields, where `?` marks an optional field, are: | ||
| //! - `x` — longitude or easting | ||
| //! - `y` — latitude or northing | ||
| //! - `z?` — elevation | ||
| //! - `m?` — measure: an arbitrary per-point value such as distance along a route or a timestamp |
There was a problem hiding this comment.
why do we always prefer f64 vs f32?
There was a problem hiding this comment.
cause GeoArrow and WKB both fix coordinates as float64.
…field types Signed-off-by: Nemo Yu <zyu379@wisc.edu>
…s on construction Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Summary
This PR adds a native point type to
vortex-geo. Points are by far the most common geometry in analytical datasets, and a columnar representation makes their coordinates directly accessible without parsing WKB.It also adds the scalar function: point-to-point distance with PostGIS
ST_Distancesemantics (planar/Euclidean, results in CRS units).API Changes
Adds to
vortex-geo, all registered throughvortex_geo::initialize:Point(vortex.geo.point): a location stored asStruct<x, y, z?, m?>of non-nullablef64, wherez?is an optional elevation andm?an optional measure.Coordinate: the internal value a point scalar unpacks to.GeoDistance(vortex.geo.distance): per-row distance between two equal-length point columns; either or both operands may be constant, in which case the query point is decoded once and broadcast.Testing
Unit tests cover dtype validation for every GeoArrow dimension (and rejection of invalid storage), round-tripping a point column through scalar execution back to the original coordinates, WKT display for all four dimensions, and distance over all operand shapes: column-to-constant (either side), column-to-column, and constant-to-constant.