-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
I found this while testing arrow with DataFusion:
-
WIP: Upgrade DataFusion to arrow-rs/parquet 57.2.0 datafusion#19355
-
Related to Release arrow-rs / parquet Minor version
57.2.0(December 2025) #8465
There are several queries like this in the DataFusion tests that work with 57.1.0 but do not
work with main (what will be 57.2.0):
CREATE TABLE struct_values (
s1 struct<INT>,
s2 struct<a INT,b VARCHAR>
) AS VALUES
(struct(1), struct(1, 'string1')),
(struct(2), struct(2, 'string2')),
(struct(3), struct(3, 'string3'))
;They fail with this error
DataFusion error: Execution error: type mismatch and can't cast to got Struct("c0": Int64, "c1": Utf8) and Struct("a": Int32, "b": Utf8View)
Expected behavior
The tests should pass as they did previously
Additional context
The change came in via 7e637a7 / #8871 from @brancz
The tests pass when I revert this commit:
git revert 7e637a7559837b5a0171b23469e9652f2f83364bI believe what is going on is that the struct function creates types with placeholder field names (c0, c1, etc). You can see it like this
> select arrow_typeof(struct(1));
+--------------------------------+
| arrow_typeof(struct(Int64(1))) |
+--------------------------------+
| Struct("c0": Int64) |
+--------------------------------+
1 row(s) fetched.
Elapsed 0.029 seconds.> select arrow_typeof(struct(1, 'string1'));
+------------------------------------------------+
| arrow_typeof(struct(Int64(1),Utf8("string1"))) |
+------------------------------------------------+
| Struct("c0": Int64, "c1": Utf8) |
+------------------------------------------------+In prior versions of arrow, the fields could be matched by position, but now
they are matched by name and thus the above queries fail.
A suggested solution would be to try and match by name first, and if that fails, try and match by position.