-
Notifications
You must be signed in to change notification settings - Fork 0
20073: perf: Optimize scalar path for chr function #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant performance enhancement for the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThe changes introduce dedicated benchmark paths for the chr function and refactor its implementation to provide more explicit argument handling. The benchmark file adds separate scalar and array benchmark routes with shared configuration, replacing direct RNG usage with a seeded helper function. The chr function implementation replaces its previous wrapper logic with custom argument handling that validates Int64 arguments, processes nulls appropriately, and converts valid integers to Unicode scalar values, while adding explicit type checking and error paths for invalid conversions. Public signatures and export declarations remain unchanged. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review: Optimize scalar path for chr functionSummaryThis PR optimizes the ✅ Strengths
🔍 Issues & Concerns1. Code Duplication (Medium Priority)The Unicode validation logic is duplicated between the scalar path (lines 143-151) and the array path (lines 50-57 in the // Scalar path
if let Ok(u) = u32::try_from(code_point)
&& let Some(c) = core::char::from_u32(u)
{
Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some(
c.to_string(),
))))
} else {
exec_err!("invalid Unicode scalar value: {code_point}")
}This creates maintenance burden - bug fixes or improvements must be applied twice. Consider extracting this into a helper function. 2. Inconsistent Error Types (Low Priority)The scalar path uses 3. Performance Consideration: String Allocation (Low Priority)The scalar path uses 4. Benchmark Typo Fixed (Note)Good catch on fixing the typo: 🧪 Test CoveragePositive: Existing tests adequately cover:
Gap: No explicit test for the scalar path through #[test]
fn test_chr_scalar_path() {
let func = ChrFunc::new();
let args = vec![ColumnarValue::Scalar(ScalarValue::Int64(Some(65)))];
let result = func.invoke_with_args(ScalarFunctionArgs {
args,
// ... other fields
}).unwrap();
match result {
ColumnarValue::Scalar(ScalarValue::Utf8(Some(s))) => assert_eq!(s, "A"),
_ => panic!("Expected scalar string result"),
}
}🔒 SecurityNo security concerns identified. The PR maintains existing input validation for Unicode code points and doesn't introduce new attack vectors. 📊 PerformanceThe optimization is sound and should provide measurable improvements for scalar workloads. The benchmark addition will help track this over time. Recommendation: After merging, run the benchmark to quantify the improvement: cargo bench --bench chr✏️ Minor Suggestions
📝 Overall AssessmentRecommendation: Approve with minor suggestions This is a solid optimization that improves performance without compromising correctness. The code follows established patterns in the DataFusion codebase. The main suggestion is to reduce code duplication by extracting the Unicode validation logic into a shared helper function. Rating: ⭐⭐⭐⭐ (4/5)
🤖 Review generated by Claude Code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request optimizes the scalar path for the chr function by adding a specialized implementation in invoke_with_args. This avoids creating intermediate arrays for scalar inputs, which should improve performance. The changes include adding a new benchmark for the scalar path and updating the existing array benchmark. The implementation is sound, but I've identified an area of code duplication that could be refactored to improve maintainability.
| if let Ok(u) = u32::try_from(code_point) | ||
| && let Some(c) = core::char::from_u32(u) | ||
| { | ||
| Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some( | ||
| c.to_string(), | ||
| )))) | ||
| } else { | ||
| exec_err!("invalid Unicode scalar value: {code_point}") | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This character conversion and error handling logic is very similar to the logic in the chr array function (lines 50-57). To improve maintainability and avoid code duplication, consider extracting this logic into a small, private helper function that can be shared between both implementations. The compiler should be able to inline it, so there should be no performance regression.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:good-to-have; category:bug; feedback:The Gemini AI reviewer is correct! It would be good to extract a helper function for the conversion of the i64 to a character/string and reuse it for both scalars and arrays. It would prevent double maintenance of the code.
🤖 Augment PR SummarySummary: This PR optimizes the Changes:
Technical Notes: Uses 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @@ -119,7 +119,47 @@ impl ScalarUDFImpl for ChrFunc { | |||
| } | |||
|
|
|||
| fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> { | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new scalar fast-path in ChrFunc::invoke_with_args isn’t covered by the existing unit tests (they only exercise the internal array helper chr). Consider adding a test that invokes the UDF with scalar inputs (valid/invalid/null) to guard this optimized branch against regressions.
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! There are only unit tests for the ColumnarValue::Array branch. It would be good to add some SQL Logic Tests for both scalar and array inputs. They would prevent regressions in the future.
20073: To review by AI