[SPARK-54339][SQL] Fix AttributeMap non-determinism #53044

kelvinjian-db · 2025-11-13T20:09:26Z

What changes were proposed in this pull request?

This PR fixes the +, updated, and removed methods of AttributeMap to correctly hash with Attribute.ExprId instead of Attribute as a whole.

Why are the changes needed?

This change fixes non-determinism with the AttributeMap when an entry is being added to the AttributeMap with + such that attr1 != attr2 but attr1.exprId = attr2.exprId.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a new test suite.

Was this patch authored or co-authored using generative AI tooling?

Tests were generated by Claude Code on Sonnet 4.5.

andylam-db · 2025-11-13T20:13:18Z

sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/AttributeMap.scala

  override def + [B1 >: A](kv: (Attribute, B1)): AttributeMap[B1] =
-    AttributeMap(baseMap.values.toMap + kv)
+    new AttributeMap(baseMap + (kv._1.exprId -> kv))

  override def updated[B1 >: A](key: Attribute, value: B1): Map[Attribute, B1] =
-    baseMap.values.toMap + (key -> value)
+    this + (key -> value)

  override def iterator: Iterator[(Attribute, A)] = baseMap.valuesIterator

-  override def removed(key: Attribute): Map[Attribute, A] = baseMap.values.toMap - key
+  override def removed(key: Attribute): Map[Attribute, A] = new AttributeMap(baseMap - key.exprId)



To be clear for other reviewers: Converting the internal baseMap to a regular Map[Attribute, A] that hashes on Attribute objects directly is the problem here.

andylam-db · 2025-11-13T20:13:56Z

Do we need to backport to previous versions of Spark? @cloud-fan

cloud-fan · 2025-11-13T20:30:55Z

yea we should backport

kelvinjian-db added 2 commits November 13, 2025 11:56

fix AttributeMap impl

c19339c

add test

bf0b45e

github-actions bot added the SQL label Nov 13, 2025

andylam-db approved these changes Nov 13, 2025

View reviewed changes

cloud-fan approved these changes Nov 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-54339][SQL] Fix AttributeMap non-determinism #53044

[SPARK-54339][SQL] Fix AttributeMap non-determinism #53044

kelvinjian-db commented Nov 13, 2025

Uh oh!

andylam-db Nov 13, 2025

Uh oh!

andylam-db commented Nov 13, 2025

Uh oh!

cloud-fan commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[SPARK-54339][SQL] Fix AttributeMap non-determinism #53044

Are you sure you want to change the base?

[SPARK-54339][SQL] Fix AttributeMap non-determinism #53044

Conversation

kelvinjian-db commented Nov 13, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

andylam-db Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

andylam-db commented Nov 13, 2025

Uh oh!

cloud-fan commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants