-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Current behaviour is per this issue comment: #8181 (comment)
for
array_union:array_union([], []) = [] array_union([], null) = [] array_union(null, []) = [] array_union(null, null) = nullfor
array_except:array_except([], []) = [] array_except([], null) = [] array_except(null, []) = null array_except(null, null) = nullfor
array_intersect:array_intersect([], []) = [] array_intersect([], null) = [] array_intersect(null, []) = [] array_intersect(null, null) = null
I find this confusing as usually with other functions we return null if either input is null. And recent PR:
Now implements that for array inputs (for union and intersect) we return null in a row if either input list for the row is null (scalar null inputs still correspond to above behaviour as that was a separate code path).
Spark behaviour
Consistently return null if either input is null:
>>> spark.sql("select array_except(cast(null as array<int>), array(1))").show()
+----------------------------+
|array_except(NULL, array(1))|
+----------------------------+
| NULL|
+----------------------------+
>>> spark.sql("select array_except(array(1), cast(null as array<int>))").show()
+----------------------------+
|array_except(array(1), NULL)|
+----------------------------+
| NULL|
+----------------------------+
>>> spark.sql("select array_union(cast(null as array<int>), array(1))").show()
+---------------------------+
|array_union(NULL, array(1))|
+---------------------------+
| NULL|
+---------------------------+
>>> spark.sql("select array_union(array(1), cast(null as array<int>))").show()
+---------------------------+
|array_union(array(1), NULL)|
+---------------------------+
| NULL|
+---------------------------+
>>> spark.sql("select array_intersect(cast(null as array<int>), array(1))").show()
+-------------------------------+
|array_intersect(NULL, array(1))|
+-------------------------------+
| NULL|
+-------------------------------+
>>> spark.sql("select array_intersect(array(1), cast(null as array<int>))").show()
+-------------------------------+
|array_intersect(array(1), NULL)|
+-------------------------------+
| NULL|
+-------------------------------+Clickhouse behaviour
Consistently return null if either input is null:
:) select arrayExcept(null, [1]);
SELECT arrayExcept(NULL, [1])
Query id: ca4fab63-30c0-4e45-8358-b9d23e495357
┌─arrayExcept(NULL, [1])─┐
1. │ ᴺᵁᴸᴸ │
└────────────────────────┘
1 row in set. Elapsed: 0.002 sec.
:) select arrayExcept([1], null);
SELECT arrayExcept([1], NULL)
Query id: 4770abf8-0de1-47b7-889c-0bcb400a232f
┌─arrayExcept([1], NULL)─┐
1. │ ᴺᵁᴸᴸ │
└────────────────────────┘
1 row in set. Elapsed: 0.001 sec.
:) select arrayUnion(null, [1]);
SELECT arrayUnion(NULL, [1])
Query id: e8c75415-2e0c-4a65-a7be-69dbd81f4546
┌─arrayUnion(NULL, [1])─┐
1. │ ᴺᵁᴸᴸ │
└───────────────────────┘
1 row in set. Elapsed: 0.001 sec.
:) select arrayUnion([1], null);
SELECT arrayUnion([1], NULL)
Query id: 21aabafd-8995-4a42-b7b9-4eb7e66746fd
┌─arrayUnion([1], NULL)─┐
1. │ ᴺᵁᴸᴸ │
└───────────────────────┘
1 row in set. Elapsed: 0.001 sec.
:) select arrayIntersect(null, [1]);
SELECT arrayIntersect(NULL, [1])
Query id: 3c161ed4-a244-4793-957f-925cdf6e5e0e
┌─arrayIntersect(NULL, [1])─┐
1. │ ᴺᵁᴸᴸ │
└───────────────────────────┘
1 row in set. Elapsed: 0.002 sec.
:) select arrayIntersect([1], null);
SELECT arrayIntersect([1], NULL)
Query id: 9e486d9a-6129-40ea-947f-e8f63e7957e4
┌─arrayIntersect([1], NULL)─┐
1. │ ᴺᵁᴸᴸ │
└───────────────────────────┘
1 row in set. Elapsed: 0.002 sec.DuckDB behaviour
Only supports array_intersect, and nullability depends on first argument:
D select array_intersect(null, [1]);
┌───────────────────────────────────────────┐
│ array_intersect(NULL, main.list_value(1)) │
│ int32 │
├───────────────────────────────────────────┤
│ NULL │
└───────────────────────────────────────────┘
D select array_intersect([1], null);
┌───────────────────────────────────────────┐
│ array_intersect(main.list_value(1), NULL) │
│ int32[] │
├───────────────────────────────────────────┤
│ [] │
└───────────────────────────────────────────┘I believe with Spark & Clickhouse always returning null, and considering for most other functions we would return null, I think we should change the array union/intersect/except functions to return null if either input is null, whether that input is via a scalar null (with datatype null) or array input where the row being processed has a null value.