-
Notifications
You must be signed in to change notification settings - Fork 0
20073: perf: Optimize scalar path for chr function #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -24,9 +24,9 @@ use arrow::datatypes::DataType; | |
| use arrow::datatypes::DataType::Int64; | ||
| use arrow::datatypes::DataType::Utf8; | ||
|
|
||
| use crate::utils::make_scalar_function; | ||
| use datafusion_common::cast::as_int64_array; | ||
| use datafusion_common::{Result, exec_err}; | ||
| use datafusion_common::utils::take_function_args; | ||
| use datafusion_common::{Result, ScalarValue, exec_err, internal_err}; | ||
| use datafusion_expr::{ColumnarValue, Documentation, Volatility}; | ||
| use datafusion_expr::{ScalarFunctionArgs, ScalarUDFImpl, Signature}; | ||
| use datafusion_macros::user_doc; | ||
|
|
@@ -119,7 +119,47 @@ impl ScalarUDFImpl for ChrFunc { | |
| } | ||
|
|
||
| fn invoke_with_args(&self, args: ScalarFunctionArgs) -> Result<ColumnarValue> { | ||
| make_scalar_function(chr, vec![])(&args.args) | ||
| let return_type = args.return_field.data_type(); | ||
| let [arg] = take_function_args(self.name(), args.args)?; | ||
|
|
||
| match arg { | ||
| ColumnarValue::Scalar(scalar) => { | ||
| if scalar.is_null() { | ||
| return Ok(ColumnarValue::Scalar(ScalarValue::try_from( | ||
| return_type, | ||
| )?)); | ||
| } | ||
|
|
||
| let code_point = match scalar { | ||
| ScalarValue::Int64(Some(v)) => v, | ||
| _ => { | ||
| return internal_err!( | ||
| "Unexpected data type {:?} for function chr", | ||
| scalar.data_type() | ||
| ); | ||
| } | ||
| }; | ||
|
|
||
| if let Ok(u) = u32::try_from(code_point) | ||
| && let Some(c) = core::char::from_u32(u) | ||
| { | ||
| Ok(ColumnarValue::Scalar(ScalarValue::Utf8(Some( | ||
| c.to_string(), | ||
| )))) | ||
| } else { | ||
| exec_err!("invalid Unicode scalar value: {code_point}") | ||
| } | ||
|
Comment on lines
+143
to
+151
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This character conversion and error handling logic is very similar to the logic in the
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. value:good-to-have; category:bug; feedback:The Gemini AI reviewer is correct! It would be good to extract a helper function for the conversion of the i64 to a character/string and reuse it for both scalars and arrays. It would prevent double maintenance of the code. |
||
| } | ||
| ColumnarValue::Array(array) => { | ||
| if !matches!(array.data_type(), Int64) { | ||
| return internal_err!( | ||
| "Unexpected data type {:?} for function chr", | ||
| array.data_type() | ||
| ); | ||
| } | ||
| Ok(ColumnarValue::Array(chr(&[array])?)) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| fn documentation(&self) -> Option<&Documentation> { | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new scalar fast-path in
ChrFunc::invoke_with_argsisn’t covered by the existing unit tests (they only exercise the internal array helperchr). Consider adding a test that invokes the UDF with scalar inputs (valid/invalid/null) to guard this optimized branch against regressions.🤖 Was this useful? React with 👍 or 👎, or 🚀 if it prevented an incident/outage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
value:good-to-have; category:bug; feedback:The Augment AI reviewer is correct! There are only unit tests for the ColumnarValue::Array branch. It would be good to add some SQL Logic Tests for both scalar and array inputs. They would prevent regressions in the future.