Skip to content

Conversation

@Brijesh-Thakkar
Copy link

Which issue does this PR close?

Rationale for this change

ScalarValue::partial_cmp previously returned a concrete ordering when NULL
values were involved (e.g. comparing a non-NULL value with NULL, or NULL with
NULL). This behavior was unexpected and inconsistent with SQL three-valued
logic, where NULL values are not comparable.

The incorrect ordering could propagate to higher-level utilities such as
try_cmp and vector ordering, leading to subtle and hard-to-detect bugs.

What changes are included in this PR?

  • Updated ScalarValue::partial_cmp to return None whenever either operand
    is NULL
  • Ensured ScalarValue::try_cmp returns an error for NULL comparisons
  • Aligned vector ordering utilities to behave consistently with the updated
    NULL comparison semantics
  • Added tests to validate the corrected behavior

Are these changes tested?

Yes.

  • Added a new unit test covering NULL comparison behavior in
    ScalarValue::partial_cmp
  • All existing unit tests and doctests pass

Are there any user-facing changes?

Yes.

Comparisons involving NULL ScalarValues no longer produce an ordering and now
behave consistently with SQL semantics. This may affect users relying on the
previous (incorrect) ordering behavior.

Copilot AI review requested due to automatic review settings December 31, 2025 19:18
@github-actions github-actions bot added the common Related to common crate label Dec 31, 2025
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes NULL handling in ScalarValue::partial_cmp to align with SQL three-valued logic semantics. Previously, the method returned concrete orderings when NULL values were involved, which was incorrect and could lead to subtle bugs in higher-level operations.

Key changes:

  • Modified ScalarValue::partial_cmp to return None for any comparison involving NULL values
  • Updated ScalarValue::try_cmp to properly error on NULL comparisons
  • Fixed vector ordering tests to verify that NULL comparisons return None

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
datafusion/common/src/scalar/mod.rs Core NULL comparison logic: added early return for NULL values in partial_cmp, removed old NULL comparison code, updated try_cmp test to expect errors for NULL comparisons, added comprehensive test for NULL partial ordering
datafusion/common/src/utils/mod.rs Updated vector ordering tests to verify NULL comparisons return None instead of asserting ordering

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Contributor

@2010YOUY01 2010YOUY01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the implementation looks good to me.

Let's fix the tests first and then see what else pops up.

);
assert_eq!(
assert!(
ScalarValue::try_cmp(&ScalarValue::Int32(None), &ScalarValue::Int32(Some(2)))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to update the doc comments for try_cmp, now it only says it errors for incompatible types, but it's also throwing error for input nulls after the change

ScalarValue::Int32(Some(1)),
]
])
.is_none()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TLDR: I suggest we follow the PostgreSQL's behavior and return true here.

By definition it should return Null
SQL Null behavior reference:

/// ## Boolean Tri-state logic

However postgres and DuckDB all has 'Null equals Null' behavior if Null is inside a composite type

postgres=# SELECT ARRAY[2, NULL, 0] < ARRAY[2, NULL, 1];
 ?column?
----------
 t
(1 row)

D SELECT [2, NULL, 0] < [2, NULL, 1] AS result;
┌─────────┐
│ result  │
│ boolean │
├─────────┤
│ true    │
└─────────┘

Postgres explains the rationale here
https://www.postgresql.org/docs/current/functions-comparisons.html#COMPOSITE-TYPE-COMPARISON
I’ve read that section three times now, and I’ll be honest — I still have no idea what they’re talking about 😅

DuckDB said they're following Postgres behavior
https://duckdb.org/docs/stable/sql/data_types/list#comparison-and-ordering

}
(Dictionary(_, _), _) => None,
(Null, Null) => Some(Ordering::Equal),
// Null is handled by the early return above, but we need this for exhaustiveness
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we do something like

(Null, Null) | (Null, _) | (_, Null) => unreachable!("Nulls are already handled before entering this matching arm

to be more defensive

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected Null behavior for ScalarValue::partial_cmp()

2 participants