feat(vortex-geo): native Geometry extension type with point support#8376
Draft
HarukiMoriarty wants to merge 1 commit into
Draft
feat(vortex-geo): native Geometry extension type with point support#8376HarukiMoriarty wants to merge 1 commit into
HarukiMoriarty wants to merge 1 commit into
Conversation
Add a single logical extension type `vortex.geo.geometry` for GeoArrow-native geometry. The geometry kind (point, linestring, ...) and the CRS live in the extension metadata; the storage dtype is the kind's GeoArrow separated coordinate layout, and the coordinate dimension is recovered from the storage field names. Only point columns are supported end to end so far; other kinds are rejected at dtype construction until their scalar unpacking and kernels exist. Add a `vortex.geo.distance` scalar function computing planar (Euclidean) distance. The signature takes geometry operands and execution dispatches on their kinds, with a point x point kernel; operands are type-checked at construction to be non-nullable point columns sharing a CRS. Signed-off-by: Nemo Yu <zyu379@wisc.edu>
Merging this PR will improve performance by 26.14%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ⚡ | Simulation | chunked_bool_canonical_into[(1000, 10)] |
35.5 µs | 20.5 µs | +72.69% |
| ⚡ | Simulation | chunked_varbinview_canonical_into[(1000, 10)] |
198.1 µs | 162 µs | +22.32% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(1000, 10)] |
213.5 µs | 177.4 µs | +20.4% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
244.4 ns | 215.3 ns | +13.55% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
304.7 ns | 275.6 ns | +10.58% |
Tip
Curious why this is faster? Comment @codspeedbot explain why this is faster on this PR, or directly use the CodSpeed MCP with your agent.
Comparing nemo/geo-geometry (ee1e166) with develop (8acef3a)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Alternative approach to #8372. That PR gives each geometry kind its own extension type (
vortex.geo.point, …); this PR uses a single typevortex.geo.geometryfor all kinds, with the kind and the CRS carried in the extension metadata.Spatial functions are signed
distance(geometry, geometry), not per-kind: the operand check is oneext.is::<Geometry>(), kernels dispatch on kind at runtime, and a new kind is a metadata value plus a match arm — not a new type and new signatures. The kind lives in metadata because GeoArrow layouts collide (linestring ≡ multipoint, polygon ≡ multilinestring).Like #8372, point is the first supported kind.
API Changes
Adds to
vortex-geo, all registered throughvortex_geo::initialize:Geometry(vortex.geo.geometry).GeoMetadatagains ageometry_type: GeometryKindfield next tocrs.GeometryValue: the value a geometry scalar unpacks to; itsDisplayemits WKT.GeoDistance(vortex.geo.distance): per-row distance with PostGISST_Distancesemantics.Testing
Unit tests cover dtype validation for every GeoArrow dimension (plus rejection of invalid storage, non-point kinds, and the
Unspecifiedkind), metadata round-tripping with and without a kind, round-tripping a point column through scalar execution back to the original coordinates, WKT display for every kind including dimension tags andEMPTY, distance over all operand shapes (column-to-constant on either side, column-to-column, constant-to-constant), and construction-time rejection of non-geometry operands, mismatched CRS, and nullable columns.