20070: Standardize UDF coercion error messages #221

martin-augment · 2026-01-30T12:43:07Z

20070: To review by AI

coderabbitai · 2026-01-30T12:43:31Z

Walkthrough

This pull request refactors error handling for function type coercion in DataFusion. Core changes include introducing a new generic generate_signature_error_message function in utils.rs that accepts the UDFCoercionExt trait to provide more detailed error messages. Error messages are updated across the codebase to display specific function call signatures with argument types instead of generic planning-time mismatch messages. Import statements are adjusted to reflect new error propagation paths using the ? operator and simplified error handling. Test files are updated to reflect the new error message format, displaying explicit function signatures (e.g., "Failed to coerce function call 'func(Type1, Type2)'") rather than previous generic coercion descriptions.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-20070-2026-01-30-12-43-03

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist · 2026-01-30T12:43:41Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the user experience when encountering type coercion errors with functions, particularly User-Defined Functions (UDFs). By standardizing and enriching the error messages, it provides clearer feedback to users about why their function calls failed due to type mismatches. The new approach distinguishes between user-defined and built-in functions, offering specific guidance like candidate signatures for the latter, and presenting the original, unadulterated error for the former, thereby streamlining the debugging process and reducing user frustration.

Highlights

Standardized UDF Coercion Errors: Implemented a new mechanism to generate more consistent and informative error messages for User-Defined Function (UDF) type coercion failures.
Improved Error Clarity: Error messages now explicitly state the function call with its provided argument types and, for non-user-defined functions, suggest candidate signatures to guide users.
Dedicated User-Defined Error Handling: For UDFs with 'UserDefined' signatures, the original error message is preserved and presented clearly, stripping internal backtraces and prefixes.
Refactored Error Generation Logic: Centralized the error message generation into a new 'generate_signature_error_message' utility function, reducing redundancy and improving maintainability.
Extensive Test Updates: Updated numerous integration and SQL logic tests to reflect the new, standardized error message formats, ensuring consistency across the codebase.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

claude · 2026-01-30T12:44:36Z

Code Review: Standardize UDF coercion error messages

Thank you for this PR! This is a solid improvement to error messaging consistency across DataFusion. Here's my review:

✅ Strengths

Excellent refactoring: The consolidation of error message generation into a single generate_signature_error_message function in datafusion/expr/src/utils.rs:975-1026 significantly improves maintainability and consistency.
Comprehensive test coverage: All three test cases cover important scenarios:
- User-defined signatures (test_generate_signature_error_msg_user_defined)
- Parameter names (test_generate_signature_error_msg_with_parameter_names)
- Without parameter names (test_generate_signature_error_msg_without_parameter_names)
Thorough test updates: 27 files of test expectations updated, demonstrating the comprehensive nature of the change and ensuring no regressions.
Better user experience: The new error format is more actionable:
- Shows the actual failed function call: 'function_name(arg_types)'
- Provides clear guidance: "You might need to add explicit type casts"
- Lists candidate signatures clearly
Smart handling of UserDefined signatures: The code correctly distinguishes between user-defined signatures (where the original error is more informative) and standard signatures (where showing valid options is better) at datafusion/expr/src/utils.rs:993.

🔍 Observations

Error handling in line 997-1001: The string manipulation to strip "Error during planning: " prefix uses .unwrap(), which could panic if the prefix is not present. Consider using unwrap_or_else or pattern matching:

let original_error = match original_error {
    err @ DataFusionError::Plan(_) => err
        .strip_backtrace()
        .strip_prefix("Error during planning: ")
        .unwrap_or(err.strip_backtrace()),  // Fallback if prefix not found
    err => err.strip_backtrace(),
};

Removed unused imports: Good cleanup of DataFusionError and plan_datafusion_err in expr_schema.rs:9-10 and exec_err in functions.rs:49.
Simplified error propagation: The change in type_coercion/functions.rs:316-319 from custom error wrapping to direct propagation (? operator) is cleaner and lets the centralized error handler do its job.

🤔 Suggestions

Documentation: Consider adding a doc comment explaining when to use each error format (UserDefined vs standard) in the function documentation.
TODO comment at line 996: The TODO comment asks "is there a better way to strip backtrace & the planning prefix?" - This is a valid concern. Consider:
- Creating a helper method to safely strip error prefixes
- Or using a more robust pattern matching approach
Test assertion robustness: Tests use .starts_with() for assertions. While this works, it might mask issues if the end of the error message changes unexpectedly. Consider using .contains() for key phrases or exact matching where possible.

🔒 Security & Performance

No security concerns identified: The changes are purely formatting/messaging.
Performance: Negligible impact - error paths are not hot paths, and the string formatting is minimal.

📊 Overall Assessment

This is a well-executed PR that improves user experience through better error messages. The refactoring reduces code duplication and centralizes error formatting logic. The comprehensive test updates demonstrate thoroughness.

Recommendation: ✅ Approve with minor suggestion to address the .unwrap() call for defensive programming.

The single improvement I'd suggest is making the error prefix stripping more defensive, but this is a minor issue that doesn't block the PR.

augmentcode · 2026-01-30T12:48:39Z

🤖 Augment PR Summary

Summary: This PR standardizes type-coercion error messages for UDF-like functions across planning paths.

Changes:

Refactors `utils::generate_signature_error_message` to build a consistent "Failed to coerce function call 'name(types)'" message and return a `DataFusionError`.
Adds special handling for TypeSignature::UserDefined to surface the original coercion error (instead of trying to infer candidate signatures).
Updates `expr_schema::verify_function_arguments` to delegate error formatting to the new helper.
Simplifies user-defined coercion in `type_coercion::functions` by propagating `coerce_types` errors directly.
Adjusts analyzer, SQL integration, and sqllogictest expectations to match the new standardized wording/format.

Technical Notes: Candidate signatures are rendered via TypeSignature::to_string_repr_with_names to preserve parameter names where available.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-30T12:48:40Z

datafusion/expr/src/utils.rs

+            err @ DataFusionError::Plan(_) => err
+                .strip_backtrace()
+                .strip_prefix("Error during planning: ")
+                .unwrap()


strip_prefix("Error during planning: ").unwrap() can panic if the string format ever differs (or if the prefix is already stripped), turning an error-reporting path into a crash. Consider avoiding unwrap here (e.g., extract the inner Plan message directly) so planning failures remain non-panicking.

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:useful; category:bug; feedback: The Augment AI reviewer is correct! Error reporting should not cause crashes, ever. It is good to make a an attempt to cleanup the message but it should be in a safer way, as the proposed change.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@datafusion/expr/src/utils.rs`:
- Around line 994-1002: The match arm handling DataFusionError::Plan can panic
because strip_prefix(...).unwrap() assumes the prefix exists; update the
DataFusionError::Plan branch (where original_error is matched) to call
strip_backtrace(), then attempt strip_prefix("Error during planning: ") safely
by using map/and_then or an if let to take the stripped result when present and
otherwise fall back to the unstripped string (i.e., replace the
.strip_prefix(...).unwrap().to_string() with logic that uses
.strip_prefix(...).map(|s| s.to_string()).unwrap_or_else(|| <original stripped
value>.to_string())). Keep the other branch using err.strip_backtrace()
unchanged and ensure the final original_error remains a String.

coderabbitai · 2026-01-30T12:50:52Z

datafusion/expr/src/utils.rs

+        let original_error = match original_error {
+            // Since we're returning a Plan error we don't want any double nesting.
+            // TODO: is there a better way to strip backtrace & the planning prefix?
+            err @ DataFusionError::Plan(_) => err
+                .strip_backtrace()
+                .strip_prefix("Error during planning: ")
+                .unwrap()
+                .to_string(),
+            err => err.strip_backtrace(),


⚠️ Potential issue | 🟠 Major

Avoid potential panic when stripping the planning prefix.

strip_prefix(...).unwrap() will panic if the prefix is absent (e.g., a DataFusionError::Plan carrying a raw message without the standard prefix). Safer to fall back to the unstripped message.

🔧 Suggested fix

- let original_error = match original_error { - // Since we're returning a Plan error we don't want any double nesting. - // TODO: is there a better way to strip backtrace & the planning prefix? - err @ DataFusionError::Plan(_) => err - .strip_backtrace() - .strip_prefix("Error during planning: ") - .unwrap() - .to_string(), - err => err.strip_backtrace(), - }; + let original_error = match original_error { + // Since we're returning a Plan error we don't want any double nesting. + // TODO: is there a better way to strip backtrace & the planning prefix? + err @ DataFusionError::Plan(_) => { + let msg = err.strip_backtrace(); + msg.strip_prefix("Error during planning: ") + .unwrap_or(&msg) + .to_string() + } + err => err.strip_backtrace(), + };

🤖 Prompt for AI Agents

In `@datafusion/expr/src/utils.rs` around lines 994 - 1002, The match arm handling DataFusionError::Plan can panic because strip_prefix(...).unwrap() assumes the prefix exists; update the DataFusionError::Plan branch (where original_error is matched) to call strip_backtrace(), then attempt strip_prefix("Error during planning: ") safely by using map/and_then or an if let to take the stripped result when present and otherwise fall back to the unstripped string (i.e., replace the .strip_prefix(...).unwrap().to_string() with logic that uses .strip_prefix(...).map(|s| s.to_string()).unwrap_or_else(|| <original stripped value>.to_string())). Keep the other branch using err.strip_backtrace() unchanged and ensure the final original_error remains a String.

value:useful; category:bug; feedback: The CodeRabbit AI reviewer is correct! Error reporting should not cause crashes, ever. It is good to make a an attempt to cleanup the message but it should be in a safer way, as the proposed change.

gemini-code-assist

Code Review

This pull request standardizes the error messages for UDF type coercion, which is a great improvement for user experience. The core of the change is centralizing error message generation in a new utils::generate_signature_error_message function. This provides more informative and consistent errors. The related refactoring to simplify error propagation using the ? operator is also a good cleanup. I have one suggestion to improve the robustness of the new error handling logic.

gemini-code-assist · 2026-01-30T12:56:37Z

datafusion/expr/src/utils.rs

+            err @ DataFusionError::Plan(_) => err
+                .strip_backtrace()
+                .strip_prefix("Error during planning: ")
+                .unwrap()
+                .to_string(),


The use of .unwrap() here could cause a panic if the prefix is not found in the error string. While it seems safe with the current implementation of DataFusionError::Plan, it's better to avoid unwrap() in production code to make it more robust against future changes. The TODO comment also indicates this might be a concern. I suggest handling the None case gracefully.

err @ DataFusionError::Plan(_) => { let s = err.strip_backtrace(); if let Some(unprefixed) = s.strip_prefix("Error during planning: ") { unprefixed.to_string() } else { s } },

value:useful; category:bug; feedback: The Gemini AI reviewer is correct! Error reporting should not cause crashes, ever. It is good to make a an attempt to cleanup the message but it should be in a safer way, as the proposed change.

martin-augment · 2026-01-30T13:52:11Z

Error handling in line 997-1001: The string manipulation to strip "Error during planning: " prefix uses .unwrap(), which could panic if the prefix is not present. Consider using unwrap_or_else or pattern matching:
let original_error = match original_error {
    err @ DataFusionError::Plan(_) => err
        .strip_backtrace()
        .strip_prefix("Error during planning: ")
        .unwrap_or(err.strip_backtrace()),  // Fallback if prefix not found
    err => err.strip_backtrace(),
};

value:useful; category:bug; feedback: The Claude AI reviewer is correct! Error reporting should not cause crashes, ever. It is good to make a an attempt to cleanup the message but it should be in a safer way, as the proposed change.

Jefffrey added 2 commits January 30, 2026 09:36

Standardize UDF coercion error messages

f873568

fix tests

cf5484e

augmentcode bot reviewed Jan 30, 2026

View reviewed changes

coderabbitai bot reviewed Jan 30, 2026

View reviewed changes

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

20070: Standardize UDF coercion error messages #221

Are you sure you want to change the base?

20070: Standardize UDF coercion error messages #221

Uh oh!

Conversation

martin-augment commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

claude bot commented Jan 30, 2026

Code Review: Standardize UDF coercion error messages

✅ Strengths

🔍 Observations

🤔 Suggestions

🔒 Security & Performance

📊 Overall Assessment

Uh oh!

augmentcode bot commented Jan 30, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Jan 30, 2026 •

edited

Loading