-
Notifications
You must be signed in to change notification settings - Fork 0
20006: feat(spark): Adds negative spark function #213
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request adds a new Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThis pull request introduces a new unary negation function (SparkNegative) to the DataFusion Spark module. A new module ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Pull Request Review: Spark Negative Function ImplementationSummaryThis PR implements the Code Quality & Best Practices ✅Strengths:
Issues to Address:1. Inconsistent Error Handling Pattern
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the negative Spark-compatible mathematical function. The implementation correctly handles various numeric types, including integers, floats, and decimals, applying wrapping negation for integers and decimals to align with Spark's legacy mode behavior. The code includes comprehensive unit tests for scalar and array inputs, covering edge cases like MIN/MAX integer values, NaN, and Infinity for floats, and null handling. The integration into the mod.rs file is also correctly done.
| /// TODOs: | ||
| /// - Spark's ANSI-compliant dialect, when off (i.e. `spark.sql.ansi.enabled=false`), | ||
| /// negating the minimal value of a signed integer wraps around. | ||
| /// For example: negative(i32::MIN) returns i32::MIN (wraps instead of error). | ||
| /// This is the current implementation (legacy mode only). | ||
| /// - Spark's ANSI mode (when `spark.sql.ansi.enabled=true`) should throw an | ||
| /// ARITHMETIC_OVERFLOW error on integer overflow instead of wrapping. | ||
| /// This is not yet implemented - all operations currently use wrapping behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The TODOs clearly outline the current behavior (Spark legacy mode with wrapping negation) and the future work required for ANSI-compliant integer overflow handling. It's good that this is explicitly documented. Please ensure there's a tracking issue for implementing the ANSI mode behavior to throw ARITHMETIC_OVERFLOW on integer overflow.
| DataType::Float16 => simple_negative_array!(array, Float16Array), | ||
| DataType::Float32 => simple_negative_array!(array, Float32Array), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@datafusion/spark/src/function/math/negative.rs`:
- Around line 149-172: The ColumnarValue::Array branch for Spark NEGATIVE lacks
handling for unsigned integer types, causing unsigned numerics to hit the "Not
supported datatype" path; update the match arms in the array branch (the match
on array.data_type()) to add DataType::UInt8, DataType::UInt16, DataType::UInt32
and DataType::UInt64 and call wrapping_negative_array! for each, and mirror
those additions in the scalar handling code (the ColumnarValue::Scalar branch)
by adding DataType::UInt8/16/32/64 cases that use wrapping_negative_scalar! so
unsigned integers are handled consistently with signed integers.
🧹 Nitpick comments (2)
datafusion/spark/src/function/math/negative.rs (2)
74-76: Consider defensive bounds check inreturn_type.While
Signature::numeric(1, ...)should ensure exactly one argument, accessingarg_types[0]without a bounds check could panic if the framework's validation is bypassed or if the signature is changed in the future.♻️ Suggested defensive check
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { - Ok(arg_types[0].clone()) + arg_types + .first() + .cloned() + .ok_or_else(|| DataFusionError::Internal("negative requires 1 argument".to_string())) }
283-299: Consider adding Float16 scalar tests.The Float32 and Float64 scalar tests are comprehensive, but Float16 is supported in the implementation (lines 159, 184-186) without corresponding test coverage. Adding Float16 test cases would ensure consistency.
🧪 Suggested Float16 test additions
// Add to test_negative_scalar_floats use half::f16; test_negative_scalar!(Float16, f16::from_f32(5.5), f16::from_f32(-5.5)); test_negative_scalar!(Float16, f16::from_f32(-5.5), f16::from_f32(5.5)); test_negative_scalar!(Float16, f16::INFINITY, f16::NEG_INFINITY); test_negative_scalar!(Float16, f16::NEG_INFINITY, f16::INFINITY);
| ColumnarValue::Array(array) => match array.data_type() { | ||
| DataType::Null => Ok(args[0].clone()), | ||
|
|
||
| // Signed integers - use wrapping negation (Spark legacy mode behavior) | ||
| DataType::Int8 => wrapping_negative_array!(array, Int8Array), | ||
| DataType::Int16 => wrapping_negative_array!(array, Int16Array), | ||
| DataType::Int32 => wrapping_negative_array!(array, Int32Array), | ||
| DataType::Int64 => wrapping_negative_array!(array, Int64Array), | ||
|
|
||
| // Floating point - simple negation (no overflow possible) | ||
| DataType::Float16 => simple_negative_array!(array, Float16Array), | ||
| DataType::Float32 => simple_negative_array!(array, Float32Array), | ||
| DataType::Float64 => simple_negative_array!(array, Float64Array), | ||
|
|
||
| // Decimal types - wrapping negation | ||
| DataType::Decimal128(_, _) => { | ||
| wrapping_negative_array!(array, Decimal128Array) | ||
| } | ||
| DataType::Decimal256(_, _) => { | ||
| wrapping_negative_array!(array, Decimal256Array) | ||
| } | ||
|
|
||
| dt => internal_err!("Not supported datatype for Spark NEGATIVE: {dt}"), | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if other Spark math functions handle unsigned integers
rg -n "UInt8|UInt16|UInt32|UInt64" datafusion/spark/src/function/math/ --type rustRepository: martin-augment/datafusion
Length of output: 1365
🏁 Script executed:
#!/bin/bash
# Read the full negative.rs file to see the complete implementation
cat -n datafusion/spark/src/function/math/negative.rs | head -200Repository: martin-augment/datafusion
Length of output: 8881
🏁 Script executed:
#!/bin/bash
# Check the Signature::numeric definition to understand what types it includes
rg -A 5 "fn numeric" datafusion/expr/src/signature.rs | head -20Repository: martin-augment/datafusion
Length of output: 141
🏁 Script executed:
#!/bin/bash
# Find where Signature::numeric is defined
rg -n "fn numeric" --type rustRepository: martin-augment/datafusion
Length of output: 595
🏁 Script executed:
#!/bin/bash
# Check the Signature::numeric implementation
sed -n '1192,1220p' datafusion/expr-common/src/signature.rsRepository: martin-augment/datafusion
Length of output: 993
🏁 Script executed:
#!/bin/bash
# Check what TypeSignature::Numeric includes
rg -A 20 "Numeric\(" datafusion/expr-common/src/signature.rs | grep -A 20 "impl.*TypeSignature\|fn matches_numeric"Repository: martin-augment/datafusion
Length of output: 1008
🏁 Script executed:
#!/bin/bash
# Search for where TypeSignature::Numeric is handled in type checking
rg -B 5 -A 10 "TypeSignature::Numeric" datafusion/expr-common/src/type_coercion/signature.rs | head -50Repository: martin-augment/datafusion
Length of output: 162
🏁 Script executed:
#!/bin/bash
# Find files that validate TypeSignature::Numeric
fd -t f "signature" --type-list=rust | xargs rg -l "TypeSignature::Numeric" | head -5Repository: martin-augment/datafusion
Length of output: 451
🏁 Script executed:
#!/bin/bash
# Check how TypeSignature::Numeric is validated in function type checking
rg -B 3 -A 15 "TypeSignature::Numeric" datafusion/expr/src/type_coercion/functions.rs | head -80Repository: martin-augment/datafusion
Length of output: 2849
🏁 Script executed:
#!/bin/bash
# Find NativeType definition and is_numeric method
rg -B 5 -A 15 "fn is_numeric" --type rustRepository: martin-augment/datafusion
Length of output: 2902
Add support for unsigned integer types (UInt8, UInt16, UInt32, UInt64).
The function signature accepts unsigned integer types via Signature::numeric(), but the implementation rejects them with "Not supported datatype for Spark NEGATIVE". This is inconsistent with other Spark math functions (abs, rint) which explicitly handle unsigned integers. Add cases for UInt8, UInt16, UInt32, and UInt64 using wrapping_negative_array! macro (lines 149-172) and wrapping_negative_scalar! macro (lines 173+) to match the pattern for signed integers.
🤖 Prompt for AI Agents
In `@datafusion/spark/src/function/math/negative.rs` around lines 149 - 172, The
ColumnarValue::Array branch for Spark NEGATIVE lacks handling for unsigned
integer types, causing unsigned numerics to hit the "Not supported datatype"
path; update the match arms in the array branch (the match on array.data_type())
to add DataType::UInt8, DataType::UInt16, DataType::UInt32 and DataType::UInt64
and call wrapping_negative_array! for each, and mirror those additions in the
scalar handling code (the ColumnarValue::Scalar branch) by adding
DataType::UInt8/16/32/64 cases that use wrapping_negative_scalar! so unsigned
integers are handled consistently with signed integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| )) | ||
| })?; | ||
| let result: $ARRAY_TYPE = array.unary(|x| x.wrapping_neg()); | ||
| Ok(ColumnarValue::Array(Arc::new(result))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decimal array negation loses precision and scale metadata
High Severity
The wrapping_negative_array! macro calls array.unary(|x| x.wrapping_neg()) without calling .with_data_type(input.data_type().clone()) afterward. Arrow's unary() method does not preserve the original data type metadata - for decimal arrays, this means precision and scale are lost and reset to defaults. The similar make_wrapping_abs_function! macro in datafusion/functions/src/math/abs.rs explicitly calls .with_data_type() for this reason. Decimal array results will have incorrect precision/scale, potentially causing type mismatches or incorrect query results.
Additional Locations (1)
🤖 Augment PR SummarySummary: Adds Spark-compatible Changes:
Technical Notes: Behavior is intentionally “legacy Spark” for signed integer overflow (wraps) with a TODO to add ANSI-mode overflow errors later. 🤖 Was this summary useful? React with 👍 or 👎 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| impl SparkNegative { | ||
| pub fn new() -> Self { | ||
| Self { | ||
| signature: Signature::numeric(1, Volatility::Immutable), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signature::numeric likely accepts unsigned integer types (e.g. UInt32), but spark_negative doesn’t handle them and will error at runtime. Consider tightening the signature (or adding coercion/handling) so callers don’t get a late "Not supported datatype" failure for numeric-but-unsigned inputs.
🤖 Was this useful? React with 👍 or 👎
| make_udf_function!(width_bucket::SparkWidthBucket, width_bucket); | ||
| make_udf_function!(trigonometry::SparkCsc, csc); | ||
| make_udf_function!(trigonometry::SparkSec, sec); | ||
| make_udf_function!(negative::SparkNegative, negative); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
20006: To review by AI