chore(rust): Remove the wrap/unwrap workaround by paleolimbot · Pull Request #12 · apache/sedona-db

paleolimbot · 2025-09-02T21:33:20Z

Now that DataFusion propagates field metadata through more types of expressions, we can remove the wrap/unwrap workaround! We now have plenty of integration tests (+ SedonaBench) such that we should be able to detect any regressions caused by DataFusion internals that haven't considered metadata yet.

Broadly, the changes are:

Remove all of the functions that wrapped schemas containing extension types or unwrapped them. All schemas and record batches now have the same representation outside the engine and inside the engine.
Removed SedonaType::from_data_type(). Previously a DataType could unambiguously be a geometry type or an Arrow type, but now a DataType is ambiguous (could be an Arrow type or the storage type of a geometry). Most uses of this were changed to SedonaType::from_storage_field(), where the extension metadata is available to let us know if it is an extension type or not.
Removed TryFrom<DataType> for SedonaType: we had a lot of code that looked like DataType::Boolean.try_into().unwrap(). This can now be DataType::Boolean.into() or SedonaType::Arrow(DataType::Boolean). I changed all internal usage to SedaonType::Arrow(DataType::Boolean) because it is more explicit and I was paranoid about dropping extension metadata by accident while doing this change.
Some tests weren't using the ScalarUdfTester and had to be rewritten to use it. It is no longer trivial to call a scalar function and the tester is pretty much required. These are the most verbose of the changes.
I removed SedonaType::data_type() because it is ambiguous. Where we want the underlying storage type we can use .storage_type() but mostly we want to_storage_field() because it doesn't drop metadata.

Three outstanding issues are:

Right/Left Anti/Semi joins drop metadata for some reason and now error where they previously worked. It is possibly a DataFusion bug but I would like to track this down independently of this PR.
Aggregate functions via FFI no longer work. We weren't using them yet and there's a PR to fix this upstream already (I had filed an issue).
Failures for assertions that used assert_scalar_equal() from sedona_testing are now awful (they show WKB instead of a WKT diff). This is because ScalarValue and ArrayRef no longer can be uniquely identified as "geometry" because they require field metadata to provide this context. This is solvable but I'd like to do it in a separate PR.

paleolimbot · 2025-09-04T06:43:05Z

@Kontinuation I'm getting close here but I have the following test failures:

---- exec::tests::test_left_joins::join_type_2_JoinType__LeftSemi stdout ----
Error: Context("type_coercion", NotImplemented("st_intersects([Arrow(Binary), Wkb(Planar, None)]): No kernel matching arguments"))

---- exec::tests::test_left_joins::join_type_3_JoinType__LeftAnti stdout ----
Error: Context("type_coercion", NotImplemented("st_intersects([Arrow(Binary), Wkb(Planar, None)]): No kernel matching arguments"))

---- exec::tests::test_right_joins::join_type_2_JoinType__RightSemi stdout ----
Error: Context("type_coercion", NotImplemented("st_intersects([Wkb(Planar, None), Arrow(Binary)]): No kernel matching arguments"))

---- exec::tests::test_right_joins::join_type_3_JoinType__RightAnti stdout ----
Error: Context("type_coercion", NotImplemented("st_intersects([Wkb(Planar, None), Arrow(Binary)]): No kernel matching arguments"))


failures:
    exec::tests::test_left_joins::join_type_2_JoinType__LeftSemi
    exec::tests::test_left_joins::join_type_3_JoinType__LeftAnti
    exec::tests::test_right_joins::join_type_2_JoinType__RightSemi
    exec::tests::test_right_joins::join_type_3_JoinType__RightAnti

I'll look more closely tomorrow, but is there anything you can think of quickly in our code where we need to do some kind of wrapping or unwrapping specifically for semi and/or anti joins? (It's possible this is a bug in DataFusion, too).

paleolimbot · 2025-09-04T14:48:53Z

rust/sedona-spatial-join/src/exec.rs

    #[tokio::test]
    async fn test_left_joins(
-        #[values(JoinType::Left, JoinType::LeftSemi, JoinType::LeftAnti)] join_type: JoinType,
+        #[values(JoinType::Left, /* JoinType::LeftSemi, JoinType::LeftAnti */)] join_type: JoinType,


A reminder to myself to circle back to this line. These two tests are failing and I'm not sure why yet.

paleolimbot · 2025-09-04T14:49:06Z

rust/sedona-spatial-join/src/exec.rs

    #[tokio::test]
    async fn test_right_joins(
-        #[values(JoinType::Right, JoinType::RightSemi, JoinType::RightAnti)] join_type: JoinType,
+        #[values(JoinType::Right, /* JoinType::RightSemi, JoinType::RightAnti */)]


Also this line

Kontinuation · 2025-09-04T14:56:22Z

I'll look more closely tomorrow, but is there anything you can think of quickly in our code where we need to do some kind of wrapping or unwrapping specifically for semi and/or anti joins? (It's possible this is a bug in DataFusion, too).

There's no special treatment for semi/anti joins. Outer/semi/anti joins are handled uniformly by utils::adjust_indices_by_join_type. The semi/anti join failures are very likely caused by a datafusion bug.

paleolimbot · 2025-09-04T15:05:17Z

Thank you! I'll take a look.

paleolimbot · 2025-09-04T16:10:41Z

rust/sedona-testing/src/compare.rs

-    #[test]
-    #[should_panic(expected = "actual ScalarValue != expected ScalarValue:
-actual ScalarValue has type Wkb(Spherical, None), expected ScalarValue has type Wkb(Planar, None)")]
-    fn value_scalar_not_equal() {
-        assert_value_equal(
-            &create_scalar_value(None, &WKB_GEOGRAPHY),
-            &create_scalar_value(None, &WKB_GEOMETRY),
-        );
-    }
-
-    #[test]
-    #[should_panic(expected = "actual Array != expected Array:
-actual Array has type Wkb(Spherical, None), expected Array has type Wkb(Planar, None)")]
-    fn value_array_not_equal() {
-        assert_value_equal(
-            &create_array_value(&[], &WKB_GEOGRAPHY),
-            &create_array_value(&[], &WKB_GEOMETRY),
-        );
-    }
-


This is the other place we need to circle back to. The assert_value_equal()/assert_array_equal()/assert_scalar_equal() functions used to give nice diffs for geometry arrays, but now they can't detect that the geometry arrays are geometry. We probably need create_array() to return a new struct ArrayWithMetadata that works in ScalarUdfTester::invoke_xxx(). create_scalar() should return a Literal, which can hold extra metadata already.

paleolimbot added 30 commits September 2, 2025 16:10

flag one function that highlights code that needs to change

4356dbc

flag scalar methods

31f8579

mark the array unwrapper

10f54e8

mark wrapper

0a568e3

mark array wrapper

760c948

mark the arg wrapper

5d79983

mark the schema wrapper

0ed9bac

mark the unwrapper

a588308

mark batch wrappers

d4e5470

format

028ba2d

scalar udf tests passing

2f7e477

create and compare tests

b542c12

fix the st_point tests

c151854

undo wrap/unwrap in the aggregate udf

f69e7b3

fix aggregators

dd574f2

fix xyzm

8359fe0

fix setsrid test

c82a456

woo sedona functions passes!

8aa547d

fix intersection aggregator

95be2b8

fix union aggregator

29f7e17

fix sedona-testing tests

720e930

passing tests

41c656b

geoparquet tests

d3a21bf

start on spatial join

a295e70

fix geoarrow-c tests

5de066b

fix sedona-proj tests

cb72659

fix tg tests

1487e10

remove problematic converters

9ef00f0

fix one spatial join test

d138e17

remove unused module

798194a

paleolimbot added 6 commits September 4, 2025 00:29

ffi and show tests

7c653f9

sedona tests passing

4699e97

fix expr tests

448f395

python tests

5770ea0

fmt

cbff4bd

Merge branch 'main' into remove-wrap-unwrap

45f6a7c

paleolimbot added 3 commits September 4, 2025 09:40

fix point zm tests

6510c96

fix one more test and ensure they all pass

85f20d7

clippy

22a9911

paleolimbot commented Sep 4, 2025

View reviewed changes

remove unused projectors

e51bd8d

remove more code

01151d4

paleolimbot added 5 commits September 4, 2025 10:32

remove unnecessary uses of try_into() sedona type

0b68d5a

consolidate use of try_into.unwrap() for sedona types

a9dba5b

a few more problems

02ce35e

a few more

4a10fd1

fix doctest

a0de0f8

paleolimbot commented Sep 4, 2025

View reviewed changes

paleolimbot marked this pull request as ready for review September 4, 2025 16:29

jiayuasu requested a review from Kontinuation September 4, 2025 20:46

jiayuasu merged commit 4b74b56 into apache:main Sep 4, 2025
5 checks passed

paleolimbot deleted the remove-wrap-unwrap branch September 10, 2025 01:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(rust): Remove the wrap/unwrap workaround#12

chore(rust): Remove the wrap/unwrap workaround#12
jiayuasu merged 46 commits intoapache:mainfrom
paleolimbot:remove-wrap-unwrap

paleolimbot commented Sep 2, 2025 •

edited

Loading

Uh oh!

paleolimbot commented Sep 4, 2025

Uh oh!

paleolimbot Sep 4, 2025

Uh oh!

paleolimbot Sep 4, 2025

Uh oh!

Kontinuation commented Sep 4, 2025

Uh oh!

paleolimbot commented Sep 4, 2025

Uh oh!

paleolimbot Sep 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

paleolimbot commented Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot commented Sep 4, 2025

Uh oh!

paleolimbot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

paleolimbot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Kontinuation commented Sep 4, 2025

Uh oh!

paleolimbot commented Sep 4, 2025

Uh oh!

paleolimbot Sep 4, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

paleolimbot commented Sep 2, 2025 •

edited

Loading