20097: Improve performance of `CASE WHEN x THEN y ELSE NULL` expressions #225

martin-augment · 2026-02-02T08:20:01Z

20097: To review by AI

coderabbitai · 2026-02-02T08:20:11Z

Walkthrough

This change modifies the CASE expression evaluation logic to handle scenarios where no else clause is present. The implementation introduces a scatter-based approach for single-branch CASE expressions without an else, consolidates optional else-branch handling across multiple evaluation paths, and reduces unnecessary else-expression evaluation. The code now branches on the presence or absence of an else expression, directing evaluation to either the existing merge-based logic (when else is present) or a new scatter-based result path (when else is absent). A new import for the scatter utility is added to support this change.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-20097-2026-02-02-08-19-56

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-02-02T08:20:18Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance and robustness of CASE expressions, particularly those with a single WHEN THEN clause and an implicit or explicit NULL in the ELSE branch. By leveraging a new scatter utility and refining the evaluation logic, it avoids unnecessary computations and ensures correct behavior when an ELSE clause is omitted, leading to more efficient query execution.

Highlights

Performance Optimization: Optimized the evaluation of CASE WHEN x THEN y ELSE NULL expressions by introducing a scatter utility, avoiding the creation and merging of large NULL arrays.
Robustness Improvement: Enhanced the evaluate method to explicitly return a NULL scalar when all WHEN conditions are false and no ELSE clause is provided, preventing potential panics.
EvalMethod Refinement: Adjusted the EvalMethod selection logic to correctly identify single WHEN THEN clauses even when an explicit ELSE clause is absent, implying ELSE NULL.
New Utility: Integrated the scatter utility from datafusion_physical_expr_common::utils to efficiently distribute values based on a boolean array.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

claude · 2026-02-02T08:21:57Z

PR Review: Improve performance of `CASE WHEN x THEN y ELSE NULL` expressions

Summary

This PR optimizes the performance of CASE WHEN ... THEN ... ELSE NULL expressions by using the scatter function to avoid unnecessary evaluation of the else branch when it's NULL. The optimization is well-designed and addresses a meaningful performance gap.

Code Quality ✅

Strengths:

Clean separation of concerns with pattern matching on else_expr (case.rs:965-998)
Proper use of existing utilities (scatter from datafusion_physical_expr_common::utils)
Maintains consistent error handling patterns
Code is well-structured and readable

Minor observations:

The change at line 663 removes the requirement for else_expr.is_some() in the ExpressionOrExpression evaluation path, correctly allowing it to handle both cases with and without an else clause

Logic & Correctness ✅

The implementation correctly handles three scenarios:

No else clause (case.rs:966-968): Uses scatter to place then values at true positions and nulls elsewhere
- Correctly converts then_value to an array with the right size (when_value.true_count())
- The scatter function properly handles the mask to insert nulls at false positions
With else clause (case.rs:970-997): Falls back to the original merge approach
- Maintains all the original logic for type casting and handling different ColumnarValue combinations
All-false case (case.rs:1149-1156): Returns a null scalar when no conditions match and there's no else clause
- Correctly uses ScalarValue::try_new_null with the proper return type

Performance Considerations ✅

Optimization rationale:

When else_expr is None, the old code would evaluate not(&when_value), filter the batch, and evaluate a non-existent else expression (unwrap on None)
Wait - actually looking at the old code (line 967 in original), it did self.else_expr.as_ref().unwrap() which would panic if None!
This is actually a bug fix as well as an optimization - the old code assumed else_expr was always Some in this path

Performance improvement:

Avoids evaluating not(&when_value) when else is NULL
Skips filtering the else batch
Uses scatter which is more efficient than merge with a null array
Only evaluates the then expression on filtered rows (existing behavior preserved)

Potential Issues & Questions

Critical: Test Coverage ⚠️

The PR doesn't include tests for the new code path. Specifically:

Missing test: CASE WHEN condition THEN value END (no ELSE clause)
Missing test: Edge cases with all-true, all-false, and mixed boolean masks with no else clause

Recommendation: Add tests covering:

// Test case with no else clause
let when = binary(col("a", &schema)?, Operator::Gt, lit(5i32), &schema)?;
let then = col("b", &schema)?;
let expr = Arc::new(CaseExpr::try_new(None, vec![(when, then)], None)?);
// Verify correct null placement

Question: Evaluation Method Classification

At case.rs:663, the condition changed from:

} else if body.when_then_expr.len() == 1 && body.else_expr.is_some() {

to:

} else if body.when_then_expr.len() == 1 {

This means CASE WHEN x THEN y END now uses ExpressionOrExpression instead of potentially NoExpression.

Question: Is there a performance difference between these evaluation methods for the no-else case? Should we consider a specialized eval method for this pattern?

Minor: Documentation

The scatter function is well-documented at utils.rs:53-58, but consider adding a comment in expr_or_expr explaining when scatter is used vs merge.

Security Concerns ✅

No security issues identified. The change doesn't introduce:

Unsafe code
Unchecked array accesses
Integer overflow risks
Type confusion

Overall Assessment

Verdict: Approve with recommendations ✅

This is a solid optimization that also fixes a potential panic (the unwrap on None). The code quality is good and follows DataFusion conventions.

Required before merge:

Add test coverage for CASE expressions without ELSE clauses

Nice to have:

Benchmark demonstrating the performance improvement
Consider adding a code comment explaining the scatter vs merge decision

Great work on identifying this optimization opportunity!

Review generated with assistance from Claude Code

gemini-code-assist

Code Review

This pull request introduces a performance improvement for CASE WHEN ... THEN ... END expressions by adding a specialized evaluation path that avoids an ELSE clause. The changes correctly handle the absence of an ELSE expression in several parts of the CASE expression evaluation logic, preventing potential panics and using the efficient scatter utility for constructing the result. The overall implementation is sound and enhances performance for this common SQL pattern. I have one suggestion for improving code consistency and robustness.

gemini-code-assist · 2026-02-02T08:24:58Z

datafusion/physical-expr/src/expressions/case.rs

+                let else_expr =
+                    try_cast(Arc::clone(else_expr), &batch.schema(), return_type.clone())
+                        .unwrap_or_else(|_| Arc::clone(else_expr));


Using unwrap_or_else here to ignore a potential error from try_cast seems risky. If try_cast fails, it likely indicates a bug in the planner where types were not correctly coerced. By ignoring the error and using the original expression, a subsequent error might occur during merge which would be harder to debug.

Other parts of this file handle the result of try_cast using ? to propagate the error (e.g., lines 745, 848, 936). For consistency and robustness, it would be better to do the same here.

Suggested change

let else_expr =

try_cast(Arc::clone(else_expr), &batch.schema(), return_type.clone())

.unwrap_or_else(|_| Arc::clone(else_expr));

let else_expr = try_cast(Arc::clone(else_expr), &batch.schema(), return_type.clone())?;

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The type of the else expression is expected to match the batch's schema, so there is no need to fallback.

augmentcode · 2026-02-02T08:30:26Z

🤖 Augment PR Summary

Summary: Improves physical evaluation of single-branch CASE expressions, especially when the ELSE branch is effectively NULL.

Changes:

Broadened EvalMethod::ExpressionOrExpression selection to cover single WHEN/THEN without requiring an ELSE.
Added a scatter-based fast path to materialize THEN values into the full output while filling non-matching rows with NULL.
Handled all-false/all-null masks when there is no ELSE by returning a typed NULL scalar.

Technical Notes: Keeps the existing merge-based path when an ELSE expression is present, while avoiding unnecessary evaluation/filtering when it is absent.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-02-02T08:30:27Z

datafusion/physical-expr/src/expressions/case.rs

            {
                EvalMethod::ScalarOrScalar
-            } else if body.when_then_expr.len() == 1 && body.else_expr.is_some() {
+            } else if body.when_then_expr.len() == 1 {


This change makes EvalMethod::ExpressionOrExpression apply even when else_expr is None (i.e. CASE WHEN … THEN … END). If that’s intentional, consider updating the ExpressionOrExpression doc comment to reflect that ELSE is now optional for this eval method.

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct! The docstring of ExpressionOrExpression mentions both the then and the else expressions. The proposed change makes the else expression optional and this requires an update of the docstring too.

augmentcode · 2026-02-02T08:30:27Z

datafusion/physical-expr/src/expressions/case.rs

+        match &self.else_expr {
+            None => {
+                let then_array = then_value.to_array(when_value.true_count())?;
+                scatter(&when_value, then_array.as_ref()).map(ColumnarValue::Array)


The new else_expr == None fast-path relies on scatter to reconstruct the full output, but I don’t see a test that specifically exercises this path when the then expression is not eligible for InfallibleExprOrNull. Consider adding a regression test for CASE WHEN <cond> THEN <non-infallible expr> END (or ELSE NULL) to cover mask/null semantics.

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! There are neither old unit tests for this case nor new ones. Prevents adding new functionality without test cases which could lead to unnoticed regressions in the future.

martin-augment · 2026-02-02T13:42:24Z

@CodeRabbit review

coderabbitai · 2026-02-02T13:42:33Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

martin-augment · 2026-02-02T13:45:22Z

Critical: Test Coverage ⚠️

The PR doesn't include tests for the new code path. Specifically:

Missing test: CASE WHEN condition THEN value END (no ELSE clause)

Missing test: Edge cases with all-true, all-false, and mixed boolean masks with no else clause

Recommendation: Add tests covering:
// Test case with no else clause
let when = binary(col("a", &schema)?, Operator::Gt, lit(5i32), &schema)?;
let then = col("b", &schema)?;
let expr = Arc::new(CaseExpr::try_new(None, vec![(when, then)], None)?);
// Verify correct null placement

value:useful; category:bug; feedback: The Claude AI reviewer is correct! There are neither old unit tests for this case nor new ones. Prevents adding new functionality without test cases which could lead to unnoticed regressions in the future.

Improve performance of CASE WHEN x THEN y ELSE NULL expressions

0d64069

gemini-code-assist bot reviewed Feb 2, 2026

View reviewed changes

augmentcode bot reviewed Feb 2, 2026

View reviewed changes

20097: Improve performance of CASE WHEN x THEN y ELSE NULL expressions #225

Are you sure you want to change the base?

20097: Improve performance of CASE WHEN x THEN y ELSE NULL expressions #225

Uh oh!

Conversation

martin-augment commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot commented Feb 2, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

claude bot commented Feb 2, 2026

PR Review: Improve performance of CASE WHEN x THEN y ELSE NULL expressions

Summary

Code Quality ✅

Logic & Correctness ✅

Performance Considerations ✅

Potential Issues & Questions

Critical: Test Coverage ⚠️

Question: Evaluation Method Classification

Minor: Documentation

Security Concerns ✅

Overall Assessment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Feb 2, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Feb 2, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Feb 2, 2026

Uh oh!

coderabbitai bot commented Feb 2, 2026

Uh oh!

martin-augment commented Feb 2, 2026

Critical: Test Coverage ⚠️

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

20097: Improve performance of `CASE WHEN x THEN y ELSE NULL` expressions #225

20097: Improve performance of `CASE WHEN x THEN y ELSE NULL` expressions #225

coderabbitai bot commented Feb 2, 2026 •

edited

Loading

PR Review: Improve performance of `CASE WHEN x THEN y ELSE NULL` expressions