Skip to content

Conversation

@martin-augment
Copy link
Owner

20070: To review by AI

@coderabbitai
Copy link

coderabbitai bot commented Jan 30, 2026

Walkthrough

This pull request refactors error handling for function type coercion in DataFusion. Core changes include introducing a new generic generate_signature_error_message function in utils.rs that accepts the UDFCoercionExt trait to provide more detailed error messages. Error messages are updated across the codebase to display specific function call signatures with argument types instead of generic planning-time mismatch messages. Import statements are adjusted to reflect new error propagation paths using the ? operator and simplified error handling. Test files are updated to reflect the new error message format, displaying explicit function signatures (e.g., "Failed to coerce function call 'func(Type1, Type2)'") rather than previous generic coercion descriptions.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-20070-2026-01-30-12-43-03

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the user experience when encountering type coercion errors with functions, particularly User-Defined Functions (UDFs). By standardizing and enriching the error messages, it provides clearer feedback to users about why their function calls failed due to type mismatches. The new approach distinguishes between user-defined and built-in functions, offering specific guidance like candidate signatures for the latter, and presenting the original, unadulterated error for the former, thereby streamlining the debugging process and reducing user frustration.

Highlights

  • Standardized UDF Coercion Errors: Implemented a new mechanism to generate more consistent and informative error messages for User-Defined Function (UDF) type coercion failures.
  • Improved Error Clarity: Error messages now explicitly state the function call with its provided argument types and, for non-user-defined functions, suggest candidate signatures to guide users.
  • Dedicated User-Defined Error Handling: For UDFs with 'UserDefined' signatures, the original error message is preserved and presented clearly, stripping internal backtraces and prefixes.
  • Refactored Error Generation Logic: Centralized the error message generation into a new 'generate_signature_error_message' utility function, reducing redundancy and improving maintainability.
  • Extensive Test Updates: Updated numerous integration and SQL logic tests to reflect the new, standardized error message formats, ensuring consistency across the codebase.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@claude
Copy link

claude bot commented Jan 30, 2026

Code Review: Standardize UDF coercion error messages

Thank you for this PR! This is a solid improvement to error messaging consistency across DataFusion. Here's my review:

✅ Strengths

  1. Excellent refactoring: The consolidation of error message generation into a single generate_signature_error_message function in datafusion/expr/src/utils.rs:975-1026 significantly improves maintainability and consistency.

  2. Comprehensive test coverage: All three test cases cover important scenarios:

    • User-defined signatures (test_generate_signature_error_msg_user_defined)
    • Parameter names (test_generate_signature_error_msg_with_parameter_names)
    • Without parameter names (test_generate_signature_error_msg_without_parameter_names)
  3. Thorough test updates: 27 files of test expectations updated, demonstrating the comprehensive nature of the change and ensuring no regressions.

  4. Better user experience: The new error format is more actionable:

    • Shows the actual failed function call: 'function_name(arg_types)'
    • Provides clear guidance: "You might need to add explicit type casts"
    • Lists candidate signatures clearly
  5. Smart handling of UserDefined signatures: The code correctly distinguishes between user-defined signatures (where the original error is more informative) and standard signatures (where showing valid options is better) at datafusion/expr/src/utils.rs:993.

🔍 Observations

  1. Error handling in line 997-1001: The string manipulation to strip "Error during planning: " prefix uses .unwrap(), which could panic if the prefix is not present. Consider using unwrap_or_else or pattern matching:

    let original_error = match original_error {
        err @ DataFusionError::Plan(_) => err
            .strip_backtrace()
            .strip_prefix("Error during planning: ")
            .unwrap_or(err.strip_backtrace()),  // Fallback if prefix not found
        err => err.strip_backtrace(),
    };
  2. Removed unused imports: Good cleanup of DataFusionError and plan_datafusion_err in expr_schema.rs:9-10 and exec_err in functions.rs:49.

  3. Simplified error propagation: The change in type_coercion/functions.rs:316-319 from custom error wrapping to direct propagation (? operator) is cleaner and lets the centralized error handler do its job.

🤔 Suggestions

  1. Documentation: Consider adding a doc comment explaining when to use each error format (UserDefined vs standard) in the function documentation.

  2. TODO comment at line 996: The TODO comment asks "is there a better way to strip backtrace & the planning prefix?" - This is a valid concern. Consider:

    • Creating a helper method to safely strip error prefixes
    • Or using a more robust pattern matching approach
  3. Test assertion robustness: Tests use .starts_with() for assertions. While this works, it might mask issues if the end of the error message changes unexpectedly. Consider using .contains() for key phrases or exact matching where possible.

🔒 Security & Performance

  • No security concerns identified: The changes are purely formatting/messaging.
  • Performance: Negligible impact - error paths are not hot paths, and the string formatting is minimal.

📊 Overall Assessment

This is a well-executed PR that improves user experience through better error messages. The refactoring reduces code duplication and centralizes error formatting logic. The comprehensive test updates demonstrate thoroughness.

Recommendation: ✅ Approve with minor suggestion to address the .unwrap() call for defensive programming.

The single improvement I'd suggest is making the error prefix stripping more defensive, but this is a minor issue that doesn't block the PR.

@augmentcode
Copy link

augmentcode bot commented Jan 30, 2026

🤖 Augment PR Summary

Summary: This PR standardizes type-coercion error messages for UDF-like functions across planning paths.

Changes:

  • Refactors `utils::generate_signature_error_message` to build a consistent "Failed to coerce function call 'name(types)'" message and return a `DataFusionError`.
  • Adds special handling for TypeSignature::UserDefined to surface the original coercion error (instead of trying to infer candidate signatures).
  • Updates `expr_schema::verify_function_arguments` to delegate error formatting to the new helper.
  • Simplifies user-defined coercion in `type_coercion::functions` by propagating `coerce_types` errors directly.
  • Adjusts analyzer, SQL integration, and sqllogictest expectations to match the new standardized wording/format.

Technical Notes: Candidate signatures are rendered via TypeSignature::to_string_repr_with_names to preserve parameter names where available.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

err @ DataFusionError::Plan(_) => err
.strip_backtrace()
.strip_prefix("Error during planning: ")
.unwrap()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

strip_prefix("Error during planning: ").unwrap() can panic if the string format ever differs (or if the prefix is already stripped), turning an error-reporting path into a crash. Consider avoiding unwrap here (e.g., extract the inner Plan message directly) so planning failures remain non-panicking.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Augment AI reviewer is correct! Error reporting should not cause crashes, ever. It is good to make a an attempt to cleanup the message but it should be in a safer way, as the proposed change.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@datafusion/expr/src/utils.rs`:
- Around line 994-1002: The match arm handling DataFusionError::Plan can panic
because strip_prefix(...).unwrap() assumes the prefix exists; update the
DataFusionError::Plan branch (where original_error is matched) to call
strip_backtrace(), then attempt strip_prefix("Error during planning: ") safely
by using map/and_then or an if let to take the stripped result when present and
otherwise fall back to the unstripped string (i.e., replace the
.strip_prefix(...).unwrap().to_string() with logic that uses
.strip_prefix(...).map(|s| s.to_string()).unwrap_or_else(|| <original stripped
value>.to_string())). Keep the other branch using err.strip_backtrace()
unchanged and ensure the final original_error remains a String.

Comment on lines +994 to +1002
let original_error = match original_error {
// Since we're returning a Plan error we don't want any double nesting.
// TODO: is there a better way to strip backtrace & the planning prefix?
err @ DataFusionError::Plan(_) => err
.strip_backtrace()
.strip_prefix("Error during planning: ")
.unwrap()
.to_string(),
err => err.strip_backtrace(),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid potential panic when stripping the planning prefix.

strip_prefix(...).unwrap() will panic if the prefix is absent (e.g., a DataFusionError::Plan carrying a raw message without the standard prefix). Safer to fall back to the unstripped message.

🔧 Suggested fix
-        let original_error = match original_error {
-            // Since we're returning a Plan error we don't want any double nesting.
-            // TODO: is there a better way to strip backtrace & the planning prefix?
-            err @ DataFusionError::Plan(_) => err
-                .strip_backtrace()
-                .strip_prefix("Error during planning: ")
-                .unwrap()
-                .to_string(),
-            err => err.strip_backtrace(),
-        };
+        let original_error = match original_error {
+            // Since we're returning a Plan error we don't want any double nesting.
+            // TODO: is there a better way to strip backtrace & the planning prefix?
+            err @ DataFusionError::Plan(_) => {
+                let msg = err.strip_backtrace();
+                msg.strip_prefix("Error during planning: ")
+                    .unwrap_or(&msg)
+                    .to_string()
+            }
+            err => err.strip_backtrace(),
+        };
🤖 Prompt for AI Agents
In `@datafusion/expr/src/utils.rs` around lines 994 - 1002, The match arm handling
DataFusionError::Plan can panic because strip_prefix(...).unwrap() assumes the
prefix exists; update the DataFusionError::Plan branch (where original_error is
matched) to call strip_backtrace(), then attempt strip_prefix("Error during
planning: ") safely by using map/and_then or an if let to take the stripped
result when present and otherwise fall back to the unstripped string (i.e.,
replace the .strip_prefix(...).unwrap().to_string() with logic that uses
.strip_prefix(...).map(|s| s.to_string()).unwrap_or_else(|| <original stripped
value>.to_string())). Keep the other branch using err.strip_backtrace()
unchanged and ensure the final original_error remains a String.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! Error reporting should not cause crashes, ever. It is good to make a an attempt to cleanup the message but it should be in a safer way, as the proposed change.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request standardizes the error messages for UDF type coercion, which is a great improvement for user experience. The core of the change is centralizing error message generation in a new utils::generate_signature_error_message function. This provides more informative and consistent errors. The related refactoring to simplify error propagation using the ? operator is also a good cleanup. I have one suggestion to improve the robustness of the new error handling logic.

Comment on lines +997 to +1001
err @ DataFusionError::Plan(_) => err
.strip_backtrace()
.strip_prefix("Error during planning: ")
.unwrap()
.to_string(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The use of .unwrap() here could cause a panic if the prefix is not found in the error string. While it seems safe with the current implementation of DataFusionError::Plan, it's better to avoid unwrap() in production code to make it more robust against future changes. The TODO comment also indicates this might be a concern. I suggest handling the None case gracefully.

            err @ DataFusionError::Plan(_) => {
                let s = err.strip_backtrace();
                if let Some(unprefixed) = s.strip_prefix("Error during planning: ") {
                    unprefixed.to_string()
                } else {
                    s
                }
            },

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! Error reporting should not cause crashes, ever. It is good to make a an attempt to cleanup the message but it should be in a safer way, as the proposed change.

@martin-augment
Copy link
Owner Author

  1. Error handling in line 997-1001: The string manipulation to strip "Error during planning: " prefix uses .unwrap(), which could panic if the prefix is not present. Consider using unwrap_or_else or pattern matching:
    let original_error = match original_error {
        err @ DataFusionError::Plan(_) => err
            .strip_backtrace()
            .strip_prefix("Error during planning: ")
            .unwrap_or(err.strip_backtrace()),  // Fallback if prefix not found
        err => err.strip_backtrace(),
    };

value:useful; category:bug; feedback: The Claude AI reviewer is correct! Error reporting should not cause crashes, ever. It is good to make a an attempt to cleanup the message but it should be in a safer way, as the proposed change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants