Skip to content

Making sd_format show geospatial values in arrays and structs properly#1

Merged
jiayuasu merged 6 commits intomainfrom
sd-format-for-all-types
Aug 29, 2025
Merged

Making sd_format show geospatial values in arrays and structs properly#1
jiayuasu merged 6 commits intomainfrom
sd-format-for-all-types

Conversation

@Kontinuation
Copy link
Member

This patch revamped the implementation of sd_format to work on any input types, and format the spatial columns recursively while leaving non-spatial columns untouched.

Before applying this patch:

> SELECT [ST_POINT(1,2), ST_POINT(3,4)];
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                         make_array(st_point(Int64(1),Int64(2)),st_point(Int64(3),Int64(4)))                                                                        │
│ list(field { name: "item", data_type: struct([field { name: "geoarrow.wkb", data_type: binary, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {"arrow:extension:name": "geoarrow.wkb", "arrow:exte… │
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ [{geoarrow.wkb: 0101000000000000000000f03f0000000000000040}, {geoarrow.wkb: 010100000000000000000008400000000000001040}]                                                                                           │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

> SELECT struct(ST_POINT(3,4) as a , ST_POINT(3,4) as b)
┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                    named_struct(Utf8("a"),st_point(Int64(3),Int64(4)),Utf8("b"),st_point(Int64(3),Int64(4)))                   │
│                              struct(a struct(geoarrow.wkb binary), b struct(geoarrow.wkb binary))                              │
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ {a: {geoarrow.wkb: 010100000000000000000008400000000000001040}, b: {geoarrow.wkb: 010100000000000000000008400000000000001040}} │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

After applying this patch:

> SELECT [ST_POINT(1,2), ST_POINT(3,4)];
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                        make_array(st_point(Int64(1),Int64(2)),st_point(Int64(3),Int64(4)))                                                                        │
│ list(field { name: "item", data_type: struct([field { name: "geoarrow.wkb", data_type: binary, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {"arrow:extension:name": "geoarrow.wkb", "arrow:ext… │
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
│ [POINT(1 2), POINT(3 4)]                                                                                                                                                                                          │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

> SELECT struct(ST_POINT(3,4) as a , ST_POINT(3,4) as b);
┌───────────────────────────────────────────────────────────────────────────────────────────┐
│ named_struct(Utf8("a"),st_point(Int64(3),Int64(4)),Utf8("b"),st_point(Int64(3),Int64(4))) │
│            struct(a struct(geoarrow.wkb binary), b struct(geoarrow.wkb binary))           │
╞═══════════════════════════════════════════════════════════════════════════════════════════╡
│ {a: POINT(3 4), b: POINT(3 4)}                                                            │
└───────────────────────────────────────────────────────────────────────────────────────────┘

@Kontinuation Kontinuation force-pushed the sd-format-for-all-types branch from a9f7268 to 0c1561d Compare August 29, 2025 11:25
@Kontinuation Kontinuation requested a review from Copilot August 29, 2025 12:50
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR revamps the sd_format function to properly handle geospatial values within complex nested data structures (arrays and structs) by recursively formatting spatial columns while leaving non-spatial columns unchanged.

  • Refactored implementation to use a single kernel that handles all data types rather than separate kernels for default and geometry types
  • Added recursive formatting support for nested structures (structs, lists, and list views)
  • Enhanced type system to properly transform geospatial types to UTF8 while preserving other data types

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
rust/sedona-functions/src/sd_format.rs Complete rewrite of sd_format implementation with recursive type handling and comprehensive test coverage
rust/sedona-expr/src/scalar_udf.rs Added is_any() matcher to support accepting any argument type

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Kontinuation Kontinuation force-pushed the sd-format-for-all-types branch from 8d361c8 to 82990fb Compare August 29, 2025 13:08
@Kontinuation Kontinuation marked this pull request as ready for review August 29, 2025 13:09
Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

Comment on lines +963 to +977
/// Helper function to verify that actual WKT values match expected values,
/// handling the normalization of comma spacing in WKT output
fn assert_wkt_values_match(actual_array: &StringArray, expected_values: &[Option<&str>]) {
for (i, expected) in expected_values.iter().enumerate() {
match expected {
Some(expected_value) => {
let actual_value = actual_array.value(i);
// Note: WKT output may not have spaces after commas
let normalized_expected = expected_value.replace(", ", ",");
assert_eq!(actual_value, normalized_expected);
}
None => assert!(actual_array.is_null(i)),
}
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there cases where we're generating WKT whose result we can't guess in advance?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. sd_format works with concrete ColumnarValues, so we'll always know the expected result in advance. We normalize the WKTs before comparison just for being robust.

@jiayuasu jiayuasu merged commit 54ce8ae into main Aug 29, 2025
3 checks passed
@Kontinuation Kontinuation deleted the sd-format-for-all-types branch September 2, 2025 01:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants