Skip to content

Conversation

@martin-augment
Copy link
Owner

20073: To review by AI

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant performance enhancement for the chr scalar function by implementing a dedicated and optimized execution path for scalar inputs. This change ensures that single-value character conversions are processed more efficiently. Additionally, the PR includes updated benchmarks to accurately measure the performance gains for scalar operations and to clearly distinguish them from array-based operations, providing better insights into the function's performance characteristics.

Highlights

  • Performance Optimization: The chr scalar function now has an optimized execution path specifically for scalar inputs, improving performance for single-value character conversions.
  • Benchmarking Improvements: New benchmarks have been added to specifically measure the performance of the chr function with scalar inputs, and existing benchmarks have been clarified to focus on array-based inputs.
  • Code Refactoring: The invoke_with_args method for the ChrFunc has been refactored to explicitly handle ColumnarValue::Scalar and ColumnarValue::Array inputs, leading to cleaner and more efficient code.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 30, 2026

Walkthrough

The changes introduce dedicated benchmark paths for the chr function and refactor its implementation to provide more explicit argument handling. The benchmark file adds separate scalar and array benchmark routes with shared configuration, replacing direct RNG usage with a seeded helper function. The chr function implementation replaces its previous wrapper logic with custom argument handling that validates Int64 arguments, processes nulls appropriately, and converts valid integers to Unicode scalar values, while adding explicit type checking and error paths for invalid conversions. Public signatures and export declarations remain unchanged.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-20073-2026-01-30-11-18-46

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Jan 30, 2026

Code Review: Optimize scalar path for chr function

Summary

This PR optimizes the chr function by implementing a dedicated scalar path in invoke_with_args, avoiding the overhead of array construction for single-value operations. This is a solid performance optimization that follows established patterns in the codebase.


✅ Strengths

  1. Follows Established Patterns: The scalar optimization approach matches other functions in the codebase like octet_length and bit_length (datafusion/functions/src/string/octet_length.rs:89-100), demonstrating consistency with the project's architecture.

  2. Performance Improvements:

    • Adds dedicated scalar benchmark (chr/scalar) to measure the optimization
    • Renames array benchmark from chr to chr/array for clarity
    • The scalar path avoids unnecessary array allocation overhead
  3. Correct Null Handling: Properly handles null scalars by returning a null ScalarValue of the correct return type (datafusion/functions/src/string/chr.rs:127-131).

  4. Error Handling: Maintains proper error handling for invalid Unicode code points in both scalar and array paths.

  5. Benchmark Improvements: Fixed the RNG initialization to use seedable_rng() consistently, ensuring reproducible benchmark results.


🔍 Issues & Concerns

1. Code Duplication (Medium Priority)

The Unicode validation logic is duplicated between the scalar path (lines 143-151) and the array path (lines 50-57 in the chr helper function):

// Scalar path
if let Ok(u) = u32::try_from(code_point)
    && let Some(c) = core::char::from_u32(u)
{
    Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some(
        c.to_string(),
    ))))
} else {
    exec_err!("invalid Unicode scalar value: {code_point}")
}

This creates maintenance burden - bug fixes or improvements must be applied twice. Consider extracting this into a helper function.

2. Inconsistent Error Types (Low Priority)

The scalar path uses internal_err! for unexpected data types (line 136), while the array path also uses internal_err! (line 155). This is consistent, which is good. However, the data type check in the array path seems redundant since the signature already enforces Int64.

3. Performance Consideration: String Allocation (Low Priority)

The scalar path uses c.to_string() (line 147) which allocates a new String. For single character conversion, this is acceptable, but consider that the array path uses encode_utf8 with a stack buffer for better performance. The performance difference is negligible for scalar values, so this is not critical.

4. Benchmark Typo Fixed (Note)

Good catch on fixing the typo: cot_fnchr_fn (line 39). This was a bug in the original benchmark.


🧪 Test Coverage

Positive: Existing tests adequately cover:

  • ✅ Normal Unicode conversion (including emoji, special chars)
  • ✅ Null handling
  • ✅ Error cases (negative values, out-of-range, surrogate pairs)
  • ✅ Edge cases (empty arrays)

Gap: No explicit test for the scalar path through invoke_with_args. The existing tests only test the chr helper function directly. Consider adding a test that exercises the scalar path:

#[test]
fn test_chr_scalar_path() {
    let func = ChrFunc::new();
    let args = vec![ColumnarValue::Scalar(ScalarValue::Int64(Some(65)))];
    let result = func.invoke_with_args(ScalarFunctionArgs {
        args,
        // ... other fields
    }).unwrap();
    
    match result {
        ColumnarValue::Scalar(ScalarValue::Utf8(Some(s))) => assert_eq!(s, "A"),
        _ => panic!("Expected scalar string result"),
    }
}

🔒 Security

No security concerns identified. The PR maintains existing input validation for Unicode code points and doesn't introduce new attack vectors.


📊 Performance

The optimization is sound and should provide measurable improvements for scalar workloads. The benchmark addition will help track this over time.

Recommendation: After merging, run the benchmark to quantify the improvement:

cargo bench --bench chr

✏️ Minor Suggestions

  1. Consider removing the redundant data type check at line 154-159 since the type signature already enforces Int64
  2. The take_function_args pattern with destructuring is clean: let [arg] = take_function_args(...)?;

📝 Overall Assessment

Recommendation: Approve with minor suggestions

This is a solid optimization that improves performance without compromising correctness. The code follows established patterns in the DataFusion codebase. The main suggestion is to reduce code duplication by extracting the Unicode validation logic into a shared helper function.

Rating: ⭐⭐⭐⭐ (4/5)

  • Code quality: Good
  • Testing: Adequate (could add scalar path test)
  • Performance: Improved
  • Maintainability: Could be better (code duplication)

🤖 Review generated by Claude Code

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request optimizes the scalar path for the chr function by adding a specialized implementation in invoke_with_args. This avoids creating intermediate arrays for scalar inputs, which should improve performance. The changes include adding a new benchmark for the scalar path and updating the existing array benchmark. The implementation is sound, but I've identified an area of code duplication that could be refactored to improve maintainability.

Comment on lines +143 to +151
if let Ok(u) = u32::try_from(code_point)
&& let Some(c) = core::char::from_u32(u)
{
Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some(
c.to_string(),
))))
} else {
exec_err!("invalid Unicode scalar value: {code_point}")
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This character conversion and error handling logic is very similar to the logic in the chr array function (lines 50-57). To improve maintainability and avoid code duplication, consider extracting this logic into a small, private helper function that can be shared between both implementations. The compiler should be able to inline it, so there should be no performance regression.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback:The Gemini AI reviewer is correct! It would be good to extract a helper function for the conversion of the i64 to a character/string and reuse it for both scalars and arrays. It would prevent double maintenance of the code.

@augmentcode
Copy link

augmentcode bot commented Jan 30, 2026

🤖 Augment PR Summary

Summary: This PR optimizes the chr scalar UDF path to avoid the array-based wrapper overhead.

Changes:

  • Replaced `make_scalar_function` usage with explicit scalar vs array dispatch in ChrFunc::invoke_with_args
  • Added a scalar-specific implementation using ScalarValue and strict Unicode scalar validation
  • Kept the existing array implementation for batch inputs (via the existing chr(&[ArrayRef]) helper)
  • Updated Criterion benchmarks to include a new chr/scalar benchmark and renamed the existing one to chr/array

Technical Notes: Uses take_function_args for argument validation and returns Utf8 consistently for both scalar and array execution paths.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

@@ -119,7 +119,47 @@ impl ScalarUDFImpl for ChrFunc {
}

fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new scalar fast-path in ChrFunc::invoke_with_args isn’t covered by the existing unit tests (they only exercise the internal array helper chr). Consider adding a test that invokes the UDF with scalar inputs (valid/invalid/null) to guard this optimized branch against regressions.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! There are only unit tests for the ColumnarValue::Array branch. It would be good to add some SQL Logic Tests for both scalar and array inputs. They would prevent regressions in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants