Skip to content

array setops (union, intersect, except) should return null if either input is null #19682

@Jefffrey

Description

@Jefffrey

Current behaviour is per this issue comment: #8181 (comment)

for array_union:

array_union([], []) = []
array_union([], null) = []
array_union(null, []) = []
array_union(null, null) = null

for array_except:

array_except([], []) = []
array_except([], null) = []
array_except(null, []) = null
array_except(null, null) = null

for array_intersect:

array_intersect([], []) = []
array_intersect([], null) = []
array_intersect(null, []) = []
array_intersect(null, null) = null

I find this confusing as usually with other functions we return null if either input is null. And recent PR:

Now implements that for array inputs (for union and intersect) we return null in a row if either input list for the row is null (scalar null inputs still correspond to above behaviour as that was a separate code path).

Spark behaviour

Consistently return null if either input is null:

>>> spark.sql("select array_except(cast(null as array<int>), array(1))").show()
+----------------------------+
|array_except(NULL, array(1))|
+----------------------------+
|                        NULL|
+----------------------------+

>>> spark.sql("select array_except(array(1), cast(null as array<int>))").show()
+----------------------------+
|array_except(array(1), NULL)|
+----------------------------+
|                        NULL|
+----------------------------+

>>> spark.sql("select array_union(cast(null as array<int>), array(1))").show()
+---------------------------+
|array_union(NULL, array(1))|
+---------------------------+
|                       NULL|
+---------------------------+

>>> spark.sql("select array_union(array(1), cast(null as array<int>))").show()
+---------------------------+
|array_union(array(1), NULL)|
+---------------------------+
|                       NULL|
+---------------------------+

>>> spark.sql("select array_intersect(cast(null as array<int>), array(1))").show()
+-------------------------------+
|array_intersect(NULL, array(1))|
+-------------------------------+
|                           NULL|
+-------------------------------+

>>> spark.sql("select array_intersect(array(1), cast(null as array<int>))").show()
+-------------------------------+
|array_intersect(array(1), NULL)|
+-------------------------------+
|                           NULL|
+-------------------------------+

Clickhouse behaviour

Consistently return null if either input is null:

:) select arrayExcept(null, [1]);

SELECT arrayExcept(NULL, [1])

Query id: ca4fab63-30c0-4e45-8358-b9d23e495357

   ┌─arrayExcept(NULL, [1])─┐
1. │ ᴺᵁᴸᴸ                   │
   └────────────────────────┘

1 row in set. Elapsed: 0.002 sec.

:) select arrayExcept([1], null);

SELECT arrayExcept([1], NULL)

Query id: 4770abf8-0de1-47b7-889c-0bcb400a232f

   ┌─arrayExcept([1], NULL)─┐
1. │ ᴺᵁᴸᴸ                   │
   └────────────────────────┘

1 row in set. Elapsed: 0.001 sec.

:) select arrayUnion(null, [1]);

SELECT arrayUnion(NULL, [1])

Query id: e8c75415-2e0c-4a65-a7be-69dbd81f4546

   ┌─arrayUnion(NULL, [1])─┐
1. │ ᴺᵁᴸᴸ                  │
   └───────────────────────┘

1 row in set. Elapsed: 0.001 sec.

:) select arrayUnion([1], null);

SELECT arrayUnion([1], NULL)

Query id: 21aabafd-8995-4a42-b7b9-4eb7e66746fd

   ┌─arrayUnion([1], NULL)─┐
1. │ ᴺᵁᴸᴸ                  │
   └───────────────────────┘

1 row in set. Elapsed: 0.001 sec.

:) select arrayIntersect(null, [1]);

SELECT arrayIntersect(NULL, [1])

Query id: 3c161ed4-a244-4793-957f-925cdf6e5e0e

   ┌─arrayIntersect(NULL, [1])─┐
1. │ ᴺᵁᴸᴸ                      │
   └───────────────────────────┘

1 row in set. Elapsed: 0.002 sec.

:) select arrayIntersect([1], null);

SELECT arrayIntersect([1], NULL)

Query id: 9e486d9a-6129-40ea-947f-e8f63e7957e4

   ┌─arrayIntersect([1], NULL)─┐
1. │ ᴺᵁᴸᴸ                      │
   └───────────────────────────┘

1 row in set. Elapsed: 0.002 sec.

DuckDB behaviour

Only supports array_intersect, and nullability depends on first argument:

D select array_intersect(null, [1]);
┌───────────────────────────────────────────┐
│ array_intersect(NULL, main.list_value(1)) │
│                   int32                   │
├───────────────────────────────────────────┤
│                   NULL                    │
└───────────────────────────────────────────┘
D select array_intersect([1], null);
┌───────────────────────────────────────────┐
│ array_intersect(main.list_value(1), NULL) │
│                  int32[]                  │
├───────────────────────────────────────────┤
│ []                                        │
└───────────────────────────────────────────┘

I believe with Spark & Clickhouse always returning null, and considering for most other functions we would return null, I think we should change the array union/intersect/except functions to return null if either input is null, whether that input is via a scalar null (with datatype null) or array input where the row being processed has a null value.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions