20073: perf: Optimize scalar path for chr function #217

martin-augment · 2026-01-30T11:18:49Z

20073: To review by AI

gemini-code-assist · 2026-01-30T11:19:04Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant performance enhancement for the chr scalar function by implementing a dedicated and optimized execution path for scalar inputs. This change ensures that single-value character conversions are processed more efficiently. Additionally, the PR includes updated benchmarks to accurately measure the performance gains for scalar operations and to clearly distinguish them from array-based operations, providing better insights into the function's performance characteristics.

Highlights

Performance Optimization: The chr scalar function now has an optimized execution path specifically for scalar inputs, improving performance for single-value character conversions.
Benchmarking Improvements: New benchmarks have been added to specifically measure the performance of the chr function with scalar inputs, and existing benchmarks have been clarified to focus on array-based inputs.
Code Refactoring: The invoke_with_args method for the ChrFunc has been refactored to explicitly handle ColumnarValue::Scalar and ColumnarValue::Array inputs, leading to cleaner and more efficient code.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-30T11:19:12Z

Walkthrough

The changes introduce dedicated benchmark paths for the chr function and refactor its implementation to provide more explicit argument handling. The benchmark file adds separate scalar and array benchmark routes with shared configuration, replacing direct RNG usage with a seeded helper function. The chr function implementation replaces its previous wrapper logic with custom argument handling that validates Int64 arguments, processes nulls appropriately, and converts valid integers to Unicode scalar values, while adding explicit type checking and error paths for invalid conversions. Public signatures and export declarations remain unchanged.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch pr-20073-2026-01-30-11-18-46

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-30T11:20:20Z

Code Review: Optimize scalar path for chr function

Summary

This PR optimizes the chr function by implementing a dedicated scalar path in invoke_with_args, avoiding the overhead of array construction for single-value operations. This is a solid performance optimization that follows established patterns in the codebase.

✅ Strengths

Follows Established Patterns: The scalar optimization approach matches other functions in the codebase like octet_length and bit_length (datafusion/functions/src/string/octet_length.rs:89-100), demonstrating consistency with the project's architecture.
Performance Improvements:
- Adds dedicated scalar benchmark (chr/scalar) to measure the optimization
- Renames array benchmark from chr to chr/array for clarity
- The scalar path avoids unnecessary array allocation overhead
Correct Null Handling: Properly handles null scalars by returning a null ScalarValue of the correct return type (datafusion/functions/src/string/chr.rs:127-131).
Error Handling: Maintains proper error handling for invalid Unicode code points in both scalar and array paths.
Benchmark Improvements: Fixed the RNG initialization to use seedable_rng() consistently, ensuring reproducible benchmark results.

🔍 Issues & Concerns

1. Code Duplication (Medium Priority)

The Unicode validation logic is duplicated between the scalar path (lines 143-151) and the array path (lines 50-57 in the chr helper function):

// Scalar path
if let Ok(u) = u32::try_from(code_point)
    && let Some(c) = core::char::from_u32(u)
{
    Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some(
        c.to_string(),
    ))))
} else {
    exec_err!("invalid Unicode scalar value: {code_point}")
}

This creates maintenance burden - bug fixes or improvements must be applied twice. Consider extracting this into a helper function.

2. Inconsistent Error Types (Low Priority)

The scalar path uses internal_err! for unexpected data types (line 136), while the array path also uses internal_err! (line 155). This is consistent, which is good. However, the data type check in the array path seems redundant since the signature already enforces Int64.

3. Performance Consideration: String Allocation (Low Priority)

The scalar path uses c.to_string() (line 147) which allocates a new String. For single character conversion, this is acceptable, but consider that the array path uses encode_utf8 with a stack buffer for better performance. The performance difference is negligible for scalar values, so this is not critical.

4. Benchmark Typo Fixed (Note)

Good catch on fixing the typo: cot_fn → chr_fn (line 39). This was a bug in the original benchmark.

🧪 Test Coverage

Positive: Existing tests adequately cover:

✅ Normal Unicode conversion (including emoji, special chars)
✅ Null handling
✅ Error cases (negative values, out-of-range, surrogate pairs)
✅ Edge cases (empty arrays)

Gap: No explicit test for the scalar path through invoke_with_args. The existing tests only test the chr helper function directly. Consider adding a test that exercises the scalar path:

#[test]
fn test_chr_scalar_path() {
    let func = ChrFunc::new();
    let args = vec![ColumnarValue::Scalar(ScalarValue::Int64(Some(65)))];
    let result = func.invoke_with_args(ScalarFunctionArgs {
        args,
        // ... other fields
    }).unwrap();
    
    match result {
        ColumnarValue::Scalar(ScalarValue::Utf8(Some(s))) => assert_eq!(s, "A"),
        _ => panic!("Expected scalar string result"),
    }
}

🔒 Security

No security concerns identified. The PR maintains existing input validation for Unicode code points and doesn't introduce new attack vectors.

📊 Performance

The optimization is sound and should provide measurable improvements for scalar workloads. The benchmark addition will help track this over time.

Recommendation: After merging, run the benchmark to quantify the improvement:

cargo bench --bench chr

✏️ Minor Suggestions

Consider removing the redundant data type check at line 154-159 since the type signature already enforces Int64
The take_function_args pattern with destructuring is clean: let [arg] = take_function_args(...)?;

📝 Overall Assessment

Recommendation: Approve with minor suggestions

This is a solid optimization that improves performance without compromising correctness. The code follows established patterns in the DataFusion codebase. The main suggestion is to reduce code duplication by extracting the Unicode validation logic into a shared helper function.

Rating: ⭐⭐⭐⭐ (4/5)

Code quality: Good
Testing: Adequate (could add scalar path test)
Performance: Improved
Maintainability: Could be better (code duplication)

🤖 Review generated by Claude Code

gemini-code-assist

Code Review

This pull request optimizes the scalar path for the chr function by adding a specialized implementation in invoke_with_args. This avoids creating intermediate arrays for scalar inputs, which should improve performance. The changes include adding a new benchmark for the scalar path and updating the existing array benchmark. The implementation is sound, but I've identified an area of code duplication that could be refactored to improve maintainability.

gemini-code-assist · 2026-01-30T11:21:11Z

datafusion/functions/src/string/chr.rs

+                if let Ok(u) = u32::try_from(code_point)
+                    && let Some(c) = core::char::from_u32(u)
+                {
+                    Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some(
+                        c.to_string(),
+                    ))))
+                } else {
+                    exec_err!("invalid Unicode scalar value: {code_point}")
+                }


This character conversion and error handling logic is very similar to the logic in the chr array function (lines 50-57). To improve maintainability and avoid code duplication, consider extracting this logic into a small, private helper function that can be shared between both implementations. The compiler should be able to inline it, so there should be no performance regression.

value:good-to-have; category:bug; feedback:The Gemini AI reviewer is correct! It would be good to extract a helper function for the conversion of the i64 to a character/string and reuse it for both scalars and arrays. It would prevent double maintenance of the code.

augmentcode · 2026-01-30T11:26:12Z

🤖 Augment PR Summary

Summary: This PR optimizes the chr scalar UDF path to avoid the array-based wrapper overhead.

Changes:

Replaced `make_scalar_function` usage with explicit scalar vs array dispatch in ChrFunc::invoke_with_args
Added a scalar-specific implementation using ScalarValue and strict Unicode scalar validation
Kept the existing array implementation for batch inputs (via the existing chr(&[ArrayRef]) helper)
Updated Criterion benchmarks to include a new chr/scalar benchmark and renamed the existing one to chr/array

Technical Notes: Uses take_function_args for argument validation and returns Utf8 consistently for both scalar and array execution paths.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 1 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-30T11:26:13Z

datafusion/functions/src/string/chr.rs

@@ -119,7 +119,47 @@ impl ScalarUDFImpl for ChrFunc {
    }

    fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {


The new scalar fast-path in ChrFunc::invoke_with_args isn’t covered by the existing unit tests (they only exercise the internal array helper chr). Consider adding a test that invokes the UDF with scalar inputs (valid/invalid/null) to guard this optimized branch against regressions.

_{🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.}

value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! There are only unit tests for the ColumnarValue::Array branch. It would be good to add some SQL Logic Tests for both scalar and array inputs. They would prevent regressions in the future.

perf: Optimize scalar path for chr function

93bfdc2

gemini-code-assist bot reviewed Jan 30, 2026

View reviewed changes

augmentcode bot reviewed Jan 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

20073: perf: Optimize scalar path for chr function #217

20073: perf: Optimize scalar path for chr function #217

Uh oh!

martin-augment commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Uh oh!

coderabbitai bot commented Jan 30, 2026 •

edited

Loading

Uh oh!

claude bot commented Jan 30, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 30, 2026

Uh oh!

martin-augment Jan 30, 2026

Uh oh!

augmentcode bot commented Jan 30, 2026

Uh oh!

augmentcode bot left a comment

Uh oh!

augmentcode bot Jan 30, 2026

Uh oh!

martin-augment Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -119,7 +119,47 @@ impl ScalarUDFImpl for ChrFunc {
		}

		fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> {

20073: perf: Optimize scalar path for chr function #217

Are you sure you want to change the base?

20073: perf: Optimize scalar path for chr function #217

Uh oh!

Conversation

martin-augment commented Jan 30, 2026

Uh oh!

gemini-code-assist bot commented Jan 30, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Jan 30, 2026

Code Review: Optimize scalar path for chr function

Summary

✅ Strengths

🔍 Issues & Concerns

1. Code Duplication (Medium Priority)

2. Inconsistent Error Types (Low Priority)

3. Performance Consideration: String Allocation (Low Priority)

4. Benchmark Typo Fixed (Note)

🧪 Test Coverage

🔒 Security

📊 Performance

✏️ Minor Suggestions

📝 Overall Assessment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot commented Jan 30, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

martin-augment Jan 30, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

coderabbitai bot commented Jan 30, 2026 •

edited

Loading