-
Notifications
You must be signed in to change notification settings - Fork 0
20114: feat: unify left and right functions and benches #228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Warning Rate limit exceeded
⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ✨ Finishing touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request streamlines the implementation and benchmarking of the Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request unifies the implementation of left and right string functions by extracting common logic into a new common.rs module. It also consolidates their respective benchmarks into a single left_right.rs benchmark file. This is a great refactoring that reduces code duplication and improves maintainability. I've suggested a small improvement to the new benchmark file to further reduce duplication.
| // Benchmark with positive n (no optimization needed) | ||
| let mut bench_name = if is_string_view { | ||
| "string_view_array positive n" | ||
| } else { | ||
| "string_array positive n" | ||
| }; | ||
| let return_type = if is_string_view { | ||
| DataType::Utf8View | ||
| } else { | ||
| DataType::Utf8 | ||
| }; | ||
|
|
||
| let args = create_args(size, 32, false, is_string_view); | ||
| group.bench_function(BenchmarkId::new(bench_name, size), |b| { | ||
| let arg_fields = args | ||
| .iter() | ||
| .enumerate() | ||
| .map(|(idx, arg)| { | ||
| Field::new(format!("arg_{idx}"), arg.data_type(), true).into() | ||
| }) | ||
| .collect::<Vec<_>>(); | ||
| let config_options = Arc::new(ConfigOptions::default()); | ||
|
|
||
| b.iter(|| { | ||
| black_box( | ||
| function | ||
| .invoke_with_args(ScalarFunctionArgs { | ||
| args: args.clone(), | ||
| arg_fields: arg_fields.clone(), | ||
| number_rows: size, | ||
| return_field: Field::new( | ||
| "f", | ||
| return_type.clone(), | ||
| true, | ||
| ) | ||
| .into(), | ||
| config_options: Arc::clone(&config_options), | ||
| }) | ||
| .expect("should work"), | ||
| ) | ||
| }) | ||
| }); | ||
|
|
||
| // Benchmark with negative n (triggers optimization) | ||
| bench_name = if is_string_view { | ||
| "string_view_array negative n" | ||
| } else { | ||
| "string_array negative n" | ||
| }; | ||
| let return_type = if is_string_view { | ||
| DataType::Utf8View | ||
| } else { | ||
| DataType::Utf8 | ||
| }; | ||
|
|
||
| let args = create_args(size, 32, true, is_string_view); | ||
| group.bench_function(BenchmarkId::new(bench_name, size), |b| { | ||
| let arg_fields = args | ||
| .iter() | ||
| .enumerate() | ||
| .map(|(idx, arg)| { | ||
| Field::new(format!("arg_{idx}"), arg.data_type(), true).into() | ||
| }) | ||
| .collect::<Vec<_>>(); | ||
| let config_options = Arc::new(ConfigOptions::default()); | ||
|
|
||
| b.iter(|| { | ||
| black_box( | ||
| function | ||
| .invoke_with_args(ScalarFunctionArgs { | ||
| args: args.clone(), | ||
| arg_fields: arg_fields.clone(), | ||
| number_rows: size, | ||
| return_field: Field::new( | ||
| "f", | ||
| return_type.clone(), | ||
| true, | ||
| ) | ||
| .into(), | ||
| config_options: Arc::clone(&config_options), | ||
| }) | ||
| .expect("should work"), | ||
| ) | ||
| }) | ||
| }); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for benchmarking with positive and negative n is very similar and can be refactored to reduce duplication. You can introduce a loop over [false, true] for use_negative and move the common benchmarking logic inside.
let return_type = if is_string_view {
DataType::Utf8View
} else {
DataType::Utf8
};
for use_negative in [false, true] {
let bench_name = format!(
"{} {}",
if is_string_view {
"string_view_array"
} else {
"string_array"
},
if use_negative { "negative n" } else { "positive n" }
);
let args = create_args(size, 32, use_negative, is_string_view);
group.bench_function(BenchmarkId::new(bench_name, size), |b| {
let arg_fields = args
.iter()
.enumerate()
.map(|(idx, arg)| {
Field::new(format!("arg_{idx}"), arg.data_type(), true).into()
})
.collect::<Vec<_>>();
let config_options = Arc::new(ConfigOptions::default());
b.iter(|| {
black_box(
function
.invoke_with_args(ScalarFunctionArgs {
args: args.clone(),
arg_fields: arg_fields.clone(),
number_rows: size,
return_field: Field::new(
"f",
return_type.clone(),
true,
)
.into(),
config_options: Arc::clone(&config_options),
})
.expect("should work"),
)
})
});
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:good-to-have; category:bug; feedback: The Gemini AI reviewer is correct! There is a code duplication in the benchmark test that could be avoided by adding a new for loop for the negative/positive flag.Prevent maintaining the same code twice.
Pull Request Review: Unify left and right functions and benchesSummaryThis PR refactors the ✅ StrengthsCode Quality
Testing
Performance Considerations
🔍 Issues & Concerns1. Bug: Incorrect left() documentation comment (Lines 91-93 in
|
🤖 Augment PR SummarySummary: This PR deduplicates the Unicode string slicing implementations and benchmarks for Changes:
Technical Notes: The new implementation centralizes UTF-8 codepoint-based slicing logic and provides specialized handling for edge cases such as 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| let range = F::slice(string, n); | ||
| let result_bytes = &string.as_bytes()[range.clone()]; | ||
|
|
||
| let byte_view = ByteView::from(view); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
general_left_right_view calls ByteView::from(view) for every Utf8View element, but other code paths in this repo only use ByteView::from when the view length is > 12 (non-inline). If view is an inline view (short strings), this may decode invalid offset/buffer metadata or even panic; consider avoiding ByteView::from for inline views.
Severity: medium
🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot confirm/reject this claim. The provided information is not enough to decide.
value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct! There is a code duplication in the benchmark test that could be avoided by adding a new for loop for the negative/positive flag.Prevent maintaining the same code twice. |
value:annoying; category:bug; feedback: The Claude AI reviewer is not correct! The referred code is within the "Less" branch of |
20114: To review by AI