Skip to content

Conversation

@martin-augment
Copy link
Owner

20097: To review by AI

@coderabbitai
Copy link

coderabbitai bot commented Feb 2, 2026

Walkthrough

This change modifies the CASE expression evaluation logic to handle scenarios where no else clause is present. The implementation introduces a scatter-based approach for single-branch CASE expressions without an else, consolidates optional else-branch handling across multiple evaluation paths, and reduces unnecessary else-expression evaluation. The code now branches on the presence or absence of an else expression, directing evaluation to either the existing merge-based logic (when else is present) or a new scatter-based result path (when else is absent). A new import for the scatter utility is added to support this change.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-20097-2026-02-02-08-19-56

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the performance and robustness of CASE expressions, particularly those with a single WHEN THEN clause and an implicit or explicit NULL in the ELSE branch. By leveraging a new scatter utility and refining the evaluation logic, it avoids unnecessary computations and ensures correct behavior when an ELSE clause is omitted, leading to more efficient query execution.

Highlights

  • Performance Optimization: Optimized the evaluation of CASE WHEN x THEN y ELSE NULL expressions by introducing a scatter utility, avoiding the creation and merging of large NULL arrays.
  • Robustness Improvement: Enhanced the evaluate method to explicitly return a NULL scalar when all WHEN conditions are false and no ELSE clause is provided, preventing potential panics.
  • EvalMethod Refinement: Adjusted the EvalMethod selection logic to correctly identify single WHEN THEN clauses even when an explicit ELSE clause is absent, implying ELSE NULL.
  • New Utility: Integrated the scatter utility from datafusion_physical_expr_common::utils to efficiently distribute values based on a boolean array.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link

claude bot commented Feb 2, 2026

PR Review: Improve performance of CASE WHEN x THEN y ELSE NULL expressions

Summary

This PR optimizes the performance of CASE WHEN ... THEN ... ELSE NULL expressions by using the scatter function to avoid unnecessary evaluation of the else branch when it's NULL. The optimization is well-designed and addresses a meaningful performance gap.

Code Quality ✅

Strengths:

  • Clean separation of concerns with pattern matching on else_expr (case.rs:965-998)
  • Proper use of existing utilities (scatter from datafusion_physical_expr_common::utils)
  • Maintains consistent error handling patterns
  • Code is well-structured and readable

Minor observations:

  • The change at line 663 removes the requirement for else_expr.is_some() in the ExpressionOrExpression evaluation path, correctly allowing it to handle both cases with and without an else clause

Logic & Correctness ✅

The implementation correctly handles three scenarios:

  1. No else clause (case.rs:966-968): Uses scatter to place then values at true positions and nulls elsewhere

    • Correctly converts then_value to an array with the right size (when_value.true_count())
    • The scatter function properly handles the mask to insert nulls at false positions
  2. With else clause (case.rs:970-997): Falls back to the original merge approach

    • Maintains all the original logic for type casting and handling different ColumnarValue combinations
  3. All-false case (case.rs:1149-1156): Returns a null scalar when no conditions match and there's no else clause

    • Correctly uses ScalarValue::try_new_null with the proper return type

Performance Considerations ✅

Optimization rationale:

  • When else_expr is None, the old code would evaluate not(&when_value), filter the batch, and evaluate a non-existent else expression (unwrap on None)
  • Wait - actually looking at the old code (line 967 in original), it did self.else_expr.as_ref().unwrap() which would panic if None!
  • This is actually a bug fix as well as an optimization - the old code assumed else_expr was always Some in this path

Performance improvement:

  • Avoids evaluating not(&when_value) when else is NULL
  • Skips filtering the else batch
  • Uses scatter which is more efficient than merge with a null array
  • Only evaluates the then expression on filtered rows (existing behavior preserved)

Potential Issues & Questions

Critical: Test Coverage ⚠️

The PR doesn't include tests for the new code path. Specifically:

  • Missing test: CASE WHEN condition THEN value END (no ELSE clause)
  • Missing test: Edge cases with all-true, all-false, and mixed boolean masks with no else clause

Recommendation: Add tests covering:

// Test case with no else clause
let when = binary(col("a", &schema)?, Operator::Gt, lit(5i32), &schema)?;
let then = col("b", &schema)?;
let expr = Arc::new(CaseExpr::try_new(None, vec![(when, then)], None)?);
// Verify correct null placement

Question: Evaluation Method Classification

At case.rs:663, the condition changed from:

} else if body.when_then_expr.len() == 1 && body.else_expr.is_some() {

to:

} else if body.when_then_expr.len() == 1 {

This means CASE WHEN x THEN y END now uses ExpressionOrExpression instead of potentially NoExpression.

Question: Is there a performance difference between these evaluation methods for the no-else case? Should we consider a specialized eval method for this pattern?

Minor: Documentation

The scatter function is well-documented at utils.rs:53-58, but consider adding a comment in expr_or_expr explaining when scatter is used vs merge.

Security Concerns ✅

No security issues identified. The change doesn't introduce:

  • Unsafe code
  • Unchecked array accesses
  • Integer overflow risks
  • Type confusion

Overall Assessment

Verdict: Approve with recommendations

This is a solid optimization that also fixes a potential panic (the unwrap on None). The code quality is good and follows DataFusion conventions.

Required before merge:

  • Add test coverage for CASE expressions without ELSE clauses

Nice to have:

  • Benchmark demonstrating the performance improvement
  • Consider adding a code comment explaining the scatter vs merge decision

Great work on identifying this optimization opportunity!


Review generated with assistance from Claude Code

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance improvement for CASE WHEN ... THEN ... END expressions by adding a specialized evaluation path that avoids an ELSE clause. The changes correctly handle the absence of an ELSE expression in several parts of the CASE expression evaluation logic, preventing potential panics and using the efficient scatter utility for constructing the result. The overall implementation is sound and enhances performance for this common SQL pattern. I have one suggestion for improving code consistency and robustness.

Comment on lines +977 to +979
let else_expr =
try_cast(Arc::clone(else_expr), &batch.schema(), return_type.clone())
.unwrap_or_else(|_| Arc::clone(else_expr));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using unwrap_or_else here to ignore a potential error from try_cast seems risky. If try_cast fails, it likely indicates a bug in the planner where types were not correctly coerced. By ignoring the error and using the original expression, a subsequent error might occur during merge which would be harder to debug.

Other parts of this file handle the result of try_cast using ? to propagate the error (e.g., lines 745, 848, 936). For consistency and robustness, it would be better to do the same here.

Suggested change
let else_expr =
try_cast(Arc::clone(else_expr), &batch.schema(), return_type.clone())
.unwrap_or_else(|_| Arc::clone(else_expr));
let else_expr = try_cast(Arc::clone(else_expr), &batch.schema(), return_type.clone())?;

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! The type of the else expression is expected to match the batch's schema, so there is no need to fallback.

@augmentcode
Copy link

augmentcode bot commented Feb 2, 2026

🤖 Augment PR Summary

Summary: Improves physical evaluation of single-branch CASE expressions, especially when the ELSE branch is effectively NULL.

Changes:

  • Broadened EvalMethod::ExpressionOrExpression selection to cover single WHEN/THEN without requiring an ELSE.
  • Added a scatter-based fast path to materialize THEN values into the full output while filling non-matching rows with NULL.
  • Handled all-false/all-null masks when there is no ELSE by returning a typed NULL scalar.

Technical Notes: Keeps the existing merge-based path when an ELSE expression is present, while avoiding unnecessary evaluation/filtering when it is absent.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

{
EvalMethod::ScalarOrScalar
} else if body.when_then_expr.len() == 1 && body.else_expr.is_some() {
} else if body.when_then_expr.len() == 1 {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change makes EvalMethod::ExpressionOrExpression apply even when else_expr is None (i.e. CASE WHEN … THEN … END). If that’s intentional, consider updating the ExpressionOrExpression doc comment to reflect that ELSE is now optional for this eval method.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct! The docstring of ExpressionOrExpression mentions both the then and the else expressions. The proposed change makes the else expression optional and this requires an update of the docstring too.

match &self.else_expr {
None => {
let then_array = then_value.to_array(when_value.true_count())?;
scatter(&when_value, then_array.as_ref()).map(ColumnarValue::Array)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new else_expr == None fast-path relies on scatter to reconstruct the full output, but I don’t see a test that specifically exercises this path when the then expression is not eligible for InfallibleExprOrNull. Consider adding a regression test for CASE WHEN <cond> THEN <non-infallible expr> END (or ELSE NULL) to cover mask/null semantics.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! There are neither old unit tests for this case nor new ones. Prevents adding new functionality without test cases which could lead to unnoticed regressions in the future.

@martin-augment
Copy link
Owner Author

@CodeRabbit review

@coderabbitai
Copy link

coderabbitai bot commented Feb 2, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@martin-augment
Copy link
Owner Author

Critical: Test Coverage ⚠️

The PR doesn't include tests for the new code path. Specifically:

  • Missing test: CASE WHEN condition THEN value END (no ELSE clause)
  • Missing test: Edge cases with all-true, all-false, and mixed boolean masks with no else clause

Recommendation: Add tests covering:

// Test case with no else clause
let when = binary(col("a", &schema)?, Operator::Gt, lit(5i32), &schema)?;
let then = col("b", &schema)?;
let expr = Arc::new(CaseExpr::try_new(None, vec![(when, then)], None)?);
// Verify correct null placement

value:useful; category:bug; feedback: The Claude AI reviewer is correct! There are neither old unit tests for this case nor new ones. Prevents adding new functionality without test cases which could lead to unnoticed regressions in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants