-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Fix NULL handling in ScalarValue::partial_cmp (closes #19579) #19587
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix NULL handling in ScalarValue::partial_cmp (closes #19579) #19587
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes NULL handling in ScalarValue::partial_cmp to align with SQL three-valued logic semantics. Previously, the method returned concrete orderings when NULL values were involved, which was incorrect and could lead to subtle bugs in higher-level operations.
Key changes:
- Modified
ScalarValue::partial_cmpto returnNonefor any comparison involving NULL values - Updated
ScalarValue::try_cmpto properly error on NULL comparisons - Fixed vector ordering tests to verify that NULL comparisons return
None
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| datafusion/common/src/scalar/mod.rs | Core NULL comparison logic: added early return for NULL values in partial_cmp, removed old NULL comparison code, updated try_cmp test to expect errors for NULL comparisons, added comprehensive test for NULL partial ordering |
| datafusion/common/src/utils/mod.rs | Updated vector ordering tests to verify NULL comparisons return None instead of asserting ordering |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
2010YOUY01
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the implementation looks good to me.
Let's fix the tests first and then see what else pops up.
| ); | ||
| assert_eq!( | ||
| assert!( | ||
| ScalarValue::try_cmp(&ScalarValue::Int32(None), &ScalarValue::Int32(Some(2))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great to update the doc comments for try_cmp, now it only says it errors for incompatible types, but it's also throwing error for input nulls after the change
| ScalarValue::Int32(Some(1)), | ||
| ] | ||
| ]) | ||
| .is_none() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TLDR: I suggest we follow the PostgreSQL's behavior and return true here.
By definition it should return Null
SQL Null behavior reference:
| /// ## Boolean Tri-state logic |
However postgres and DuckDB all has 'Null equals Null' behavior if Null is inside a composite type
postgres=# SELECT ARRAY[2, NULL, 0] < ARRAY[2, NULL, 1];
?column?
----------
t
(1 row)
D SELECT [2, NULL, 0] < [2, NULL, 1] AS result;
┌─────────┐
│ result │
│ boolean │
├─────────┤
│ true │
└─────────┘Postgres explains the rationale here
https://www.postgresql.org/docs/current/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON
I’ve read that section three times now, and I’ll be honest — I still have no idea what they’re talking about 😅
DuckDB said they're following Postgres behavior
https://duckdb.org/docs/stable/sql/data_types/list#comparison-and-ordering
| } | ||
| (Dictionary(_, _), _) => None, | ||
| (Null, Null) => Some(Ordering::Equal), | ||
| // Null is handled by the early return above, but we need this for exhaustiveness |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we do something like
(Null, Null) | (Null, _) | (_, Null) => unreachable!("Nulls are already handled before entering this matching armto be more defensive
Which issue does this PR close?
Nullbehavior forScalarValue::partial_cmp()#19579Rationale for this change
ScalarValue::partial_cmppreviously returned a concrete ordering when NULLvalues were involved (e.g. comparing a non-NULL value with NULL, or NULL with
NULL). This behavior was unexpected and inconsistent with SQL three-valued
logic, where NULL values are not comparable.
The incorrect ordering could propagate to higher-level utilities such as
try_cmpand vector ordering, leading to subtle and hard-to-detect bugs.What changes are included in this PR?
ScalarValue::partial_cmpto returnNonewhenever either operandis NULL
ScalarValue::try_cmpreturns an error for NULL comparisonsNULL comparison semantics
Are these changes tested?
Yes.
ScalarValue::partial_cmpAre there any user-facing changes?
Yes.
Comparisons involving NULL
ScalarValues no longer produce an ordering and nowbehave consistently with SQL semantics. This may affect users relying on the
previous (incorrect) ordering behavior.