Skip to content

Conversation

@asugranyes
Copy link

What changes were proposed in this pull request?

Add normalization of -0.0 to 0.0 in hash-based array operations: array_distinct, array_union, array_intersect, and array_except.

Changes:

  • Add normalizeZero() and normalizeZeroCode() to SQLOpenHashSet for interpreted and codegen paths
  • Apply normalization in all four array operations before hashing

Why are the changes needed?

IEEE 754 defines -0.0 == 0.0, but they have different binary representations and hash codes. This causes incorrect behavior when arrays contain both values:

// Returns [0.0, -0.0, 1.0] instead of [0.0, 1.0]
Seq(Array(0.0, -0.0, 1.0)).toDF("values").selectExpr("array_distinct(values)").show()

// Returns [0.0, -0.0] instead of [0.0]
Seq((Array(0.0), Array(-0.0))).toDF("a", "b").selectExpr("array_union(a, b)").show()

// Returns [] instead of [0.0]
Seq((Array(0.0, 1.0), Array(-0.0, 2.0))).toDF("a", "b").selectExpr("array_intersect(a, b)").show()

// Returns [0.0, 1.0] instead of [1.0]
Seq((Array(0.0, 1.0), Array(-0.0))).toDF("a", "b").selectExpr("array_except(a, b)").show()

Spark already normalizes -0.0 to 0.0 in join keys, window partition keys, and aggregate grouping keys via NormalizeFloatingNumbers. This fix makes array operations consistent.

Does this PR introduce any user-facing change?

Yes. Array operations now correctly treat -0.0 and 0.0 as equal, consistent with SQL semantics and IEEE 754.

How was this patch tested?

  • Added unit tests in CollectionExpressionsSuite for all four operations with Double and Float types
  • All existing tests pass

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions
Copy link

github-actions bot commented Jan 6, 2026

JIRA Issue Information

=== Bug SPARK-54918 ===
Summary: Array operations do not normalize -0.0 to 0.0
Assignee: None
Status: Open
Affected: ["4.2.0"]


This comment was automatically generated by GitHub Actions

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant