[SPARK-54918][SQL] Normalize -0.0 to 0.0 in array operations #53695

asugranyes · 2026-01-06T15:12:40Z

What changes were proposed in this pull request?

Add normalization of -0.0 to 0.0 in hash-based array operations: array_distinct, array_union, array_intersect, and array_except.

Changes:

Add normalizeZero() and normalizeZeroCode() to SQLOpenHashSet for interpreted and codegen paths
Apply normalization in all four array operations before hashing

Why are the changes needed?

IEEE 754 defines -0.0 == 0.0, but they have different binary representations and hash codes. This causes incorrect behavior when arrays contain both values:

// Returns [0.0, -0.0, 1.0] instead of [0.0, 1.0]
Seq(Array(0.0, -0.0, 1.0)).toDF("values").selectExpr("array_distinct(values)").show()

// Returns [0.0, -0.0] instead of [0.0]
Seq((Array(0.0), Array(-0.0))).toDF("a", "b").selectExpr("array_union(a, b)").show()

// Returns [] instead of [0.0]
Seq((Array(0.0, 1.0), Array(-0.0, 2.0))).toDF("a", "b").selectExpr("array_intersect(a, b)").show()

// Returns [0.0, 1.0] instead of [1.0]
Seq((Array(0.0, 1.0), Array(-0.0))).toDF("a", "b").selectExpr("array_except(a, b)").show()

Spark already normalizes -0.0 to 0.0 in join keys, window partition keys, and aggregate grouping keys via NormalizeFloatingNumbers. This fix makes array operations consistent.

Does this PR introduce any user-facing change?

Yes. Array operations now correctly treat -0.0 and 0.0 as equal, consistent with SQL semantics and IEEE 754.

How was this patch tested?

Added unit tests in CollectionExpressionsSuite for all four operations with Double and Float types
All existing tests pass

Was this patch authored or co-authored using generative AI tooling?

No

github-actions · 2026-01-06T15:12:51Z

JIRA Issue Information

=== Bug SPARK-54918 ===
Summary: Array operations do not normalize -0.0 to 0.0
Assignee: None
Status: Open
Affected: ["4.2.0"]

This comment was automatically generated by GitHub Actions

github-actions bot added the SQL label Jan 6, 2026

asugranyes force-pushed the SPARK-54918 branch 4 times, most recently from 63cc2a3 to 05e7a1c Compare January 7, 2026 20:08

asugranyes mentioned this pull request Jan 7, 2026

[SPARK-54698][SQL] Support hashing for all data types for array set like operations #53468

Open

[SPARK-54918][SQL] Normalize -0.0 to 0.0 in array operations

defc3d5

asugranyes force-pushed the SPARK-54918 branch from 05e7a1c to defc3d5 Compare January 7, 2026 22:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54918][SQL] Normalize -0.0 to 0.0 in array operations #53695

[SPARK-54918][SQL] Normalize -0.0 to 0.0 in array operations #53695

asugranyes commented Jan 6, 2026

Uh oh!

github-actions bot commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

[SPARK-54918][SQL] Normalize -0.0 to 0.0 in array operations #53695

Are you sure you want to change the base?

[SPARK-54918][SQL] Normalize -0.0 to 0.0 in array operations #53695

Conversation

asugranyes commented Jan 6, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

github-actions bot commented Jan 6, 2026

JIRA Issue Information

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant