Skip to content

Conversation

@kelvinjian-db
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes the +, updated, and removed methods of AttributeMap to correctly hash with Attribute.ExprId instead of Attribute as a whole.

Why are the changes needed?

This change fixes non-determinism with the AttributeMap when an entry is being added to the AttributeMap with + such that attr1 != attr2 but attr1.exprId = attr2.exprId.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Added a new test suite.

Was this patch authored or co-authored using generative AI tooling?

Tests were generated by Claude Code on Sonnet 4.5.

@github-actions github-actions bot added the SQL label Nov 13, 2025
Comment on lines -51 to 59
override def + [B1 >: A](kv: (Attribute, B1)): AttributeMap[B1] =
AttributeMap(baseMap.values.toMap + kv)
new AttributeMap(baseMap + (kv._1.exprId -> kv))

override def updated[B1 >: A](key: Attribute, value: B1): Map[Attribute, B1] =
baseMap.values.toMap + (key -> value)
this + (key -> value)

override def iterator: Iterator[(Attribute, A)] = baseMap.valuesIterator

override def removed(key: Attribute): Map[Attribute, A] = baseMap.values.toMap - key
override def removed(key: Attribute): Map[Attribute, A] = new AttributeMap(baseMap - key.exprId)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear for other reviewers: Converting the internal baseMap to a regular Map[Attribute, A] that hashes on Attribute objects directly is the problem here.

@andylam-db
Copy link
Contributor

Do we need to backport to previous versions of Spark? @cloud-fan

@cloud-fan
Copy link
Contributor

yea we should backport

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants