Skip to content

Conversation

@martin-augment
Copy link
Owner

20006: To review by AI

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a new negative scalar function to the datafusion-spark module, enabling the computation of the unary minus for numeric expressions. The implementation supports a wide range of data types, from integers to floating-point and decimal values, and specifically addresses integer overflow by employing wrapping negation to align with Spark's default (non-ANSI) behavior. This enhancement expands the mathematical functions available within the Spark compatibility layer.

Highlights

  • New Spark Function: Introduces negative, a Spark-compatible function for unary negation.
  • Numeric Type Support: Implements negation for various numeric types including integers, floating-point numbers, and decimals.
  • Spark Legacy Behavior: Integer negation uses wrapping behavior, consistent with Spark's non-ANSI (legacy) mode for MIN_VALUE overflow.
  • Comprehensive Testing: Includes unit tests covering scalar and array inputs, null handling, and edge cases like iN::MIN, NaN, and Infinity.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 26, 2026

Walkthrough

This pull request introduces a new unary negation function (SparkNegative) to the DataFusion Spark module. A new module negative.rs implements the UDF with support for multiple numeric types including integers, floating-point numbers, and decimals. The implementation uses wrapping semantics for integer negation to match Spark behavior, preserves NULL and NaN handling, and includes comprehensive test coverage. The new function is registered in the math module's function registry and exported as a public UDF.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Jan 26, 2026

Pull Request Review: Spark Negative Function Implementation

Summary

This PR implements the negative function for Spark compatibility, which returns the negation of a numeric input (equivalent to unary minus). The implementation is well-structured with comprehensive tests.

Code Quality & Best Practices ✅

Strengths:

  1. Apache License Header: Properly included at the top of the file
  2. Clear Documentation: Good doc comments explaining Spark compatibility and behavior
  3. Comprehensive Tests: Excellent test coverage for scalars, arrays, edge cases, and null handling
  4. Macros for Code Reuse: Smart use of macros to reduce repetition
  5. Consistent Patterns: Follows the codebase patterns seen in other math functions

Issues to Address:

1. Inconsistent Error Handling Pattern ⚠️

Location: negative.rs:139-143

The code uses internal_err! for argument count validation:

if args.len() != 1 {
    return internal_err!(
        "negative takes exactly 1 argument, but got: {}",
        args.len()
    );
}

Issue: Other functions in the codebase use assert_eq_or_internal_err! macro (see modulus.rs:30, trigonometry.rs:69 uses take_function_args). This pattern is more consistent with the codebase.

Recommendation: Use the assert_eq_or_internal_err! macro or take_function_args helper for consistency:

use datafusion_common::assert_eq_or_internal_err;

pub fn spark_negative(args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {
    assert_eq_or_internal_err!(args.len(), 1, "negative takes exactly 1 argument");
    // ...
}

2. Missing Unsigned Integer Support ⚠️

Location: negative.rs:138-203

Issue: The function doesn't handle unsigned integer types (UInt8, UInt16, UInt32, UInt64). Looking at abs.rs, unsigned integers are explicitly handled in Spark functions. For unsigned types, negation should either:

  • Return an error for non-zero values (type mismatch)
  • Cast to signed type (type promotion)
  • Follow Spark's specific behavior

Recommendation: Check Spark's documentation for negative() behavior with unsigned integers and add appropriate handling. If Spark doesn't support unsigned integers, document this limitation.

3. Float16 Support May Be Unnecessary ℹ️

Location: negative.rs:157, 182

Observation: The code supports Float16, but I don't see Float16 usage in other Spark functions like modulus.rs or trigonometry.rs. Spark typically uses Float32/Float64.

Recommendation: Verify if Spark actually supports Float16. If not, remove it to match the signature pattern of other functions.

Potential Bugs 🐛

1. Signature May Be Too Permissive

Location: negative.rs:55

signature: Signature::numeric(1, Volatility::Immutable),

Issue: Signature::numeric() accepts all numeric types, but the implementation doesn't handle unsigned integers. This creates a mismatch between what the signature promises and what the function delivers.

Recommendation: Either:

  1. Implement unsigned integer support, OR
  2. Use a more specific signature that only accepts signed types

Performance Considerations 🚀

Overall: Good Performance

The implementation uses efficient Arrow operations:

  • unary() operations on arrays (vectorized)
  • ✅ No unnecessary allocations
  • ✅ Proper use of wrapping arithmetic for integers

Minor Optimization Opportunity:

Location: negative.rs:169-170

The error messages construct strings at runtime:

format!("Expected {}, got different type", stringify!($ARRAY_TYPE))

This is in the error path, so performance impact is minimal, but consider using static strings or the unsupported_data_type_exec_err helper used in trigonometry.rs:85-89.

Security Concerns 🔒

Wrapping Overflow Behavior

Location: negative.rs:35-39

The TODO comment is important:

// TODOs:
//  - Spark's ANSI mode (when `spark.sql.ansi.enabled=true`) should throw an
//    ARITHMETIC_OVERFLOW error on integer overflow instead of wrapping.
//    This is not yet implemented - all operations currently use wrapping behavior.

Assessment:

  • ✅ The wrapping behavior matches Spark's legacy mode (ANSI disabled)
  • ⚠️ ANSI mode support is deferred but documented
  • ✅ The behavior is explicit and predictable (not undefined)

Recommendation: This is acceptable for an initial implementation, but:

  1. Consider adding a GitHub issue for ANSI mode support
  2. Add a test comment showing the expected ANSI behavior
  3. Ensure this is consistent with other Spark functions in the codebase

Test Coverage ✅

Excellent Coverage:

  • ✅ Scalar values (integers, floats, decimals)
  • ✅ Array values
  • ✅ Null handling
  • ✅ Edge cases (MIN, MAX values)
  • ✅ Special float values (NaN, Infinity)
  • ✅ Wrong argument count
  • ✅ Wrapping behavior for MIN values

Missing Tests:

  1. Mixed Scalar/Array inputs: Tests use either all scalars or all arrays, but don't test scalar operations being broadcast
  2. Unsigned integer error handling: If unsigned types should error, add tests showing this
  3. Type coercion: What happens with mixed-type inputs?

Integration Tests

Recommendation: Add SQL-level integration tests to verify the function works end-to-end in Spark SQL queries. Check if there's a test suite for Spark functions where this should be added.

Summary

Overall Assessment: This is a solid implementation with good test coverage and clear code. The main issues are:

  1. Missing unsigned integer support (or explicit rejection)
  2. Minor inconsistency with error handling patterns
  3. Signature may be too permissive

Recommended Actions:

  1. ✅ Add unsigned integer handling or explicitly reject them
  2. ✅ Use assert_eq_or_internal_err! or take_function_args for consistency
  3. ✅ Verify Float16 support in Spark
  4. ℹ️ Consider adding SQL-level integration tests
  5. ℹ️ Create a follow-up issue for ANSI mode support if not already tracked

Great work overall! The code is well-tested and follows good practices. 🎉

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the negative Spark-compatible mathematical function. The implementation correctly handles various numeric types, including integers, floats, and decimals, applying wrapping negation for integers and decimals to align with Spark's legacy mode behavior. The code includes comprehensive unit tests for scalar and array inputs, covering edge cases like MIN/MAX integer values, NaN, and Infinity for floats, and null handling. The integration into the mod.rs file is also correctly done.

Comment on lines +33 to +40
/// TODOs:
/// - Spark's ANSI-compliant dialect, when off (i.e. `spark.sql.ansi.enabled=false`),
/// negating the minimal value of a signed integer wraps around.
/// For example: negative(i32::MIN) returns i32::MIN (wraps instead of error).
/// This is the current implementation (legacy mode only).
/// - Spark's ANSI mode (when `spark.sql.ansi.enabled=true`) should throw an
/// ARITHMETIC_OVERFLOW error on integer overflow instead of wrapping.
/// This is not yet implemented - all operations currently use wrapping behavior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The TODOs clearly outline the current behavior (Spark legacy mode with wrapping negation) and the future work required for ANSI-compliant integer overflow handling. It's good that this is explicitly documented. Please ensure there's a tracking issue for implementing the ANSI mode behavior to throw ARITHMETIC_OVERFLOW on integer overflow.

Comment on lines +159 to +160
DataType::Float16 => simple_negative_array!(array, Float16Array),
DataType::Float32 => simple_negative_array!(array, Float32Array),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The Float16 data type is handled in the spark_negative function, but there are no dedicated test cases for Float16Array or ScalarValue::Float16 in the tests module. Adding specific tests for Float16 would improve test coverage and ensure correctness for this type.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@datafusion/spark/src/function/math/negative.rs`:
- Around line 149-172: The ColumnarValue::Array branch for Spark NEGATIVE lacks
handling for unsigned integer types, causing unsigned numerics to hit the "Not
supported datatype" path; update the match arms in the array branch (the match
on array.data_type()) to add DataType::UInt8, DataType::UInt16, DataType::UInt32
and DataType::UInt64 and call wrapping_negative_array! for each, and mirror
those additions in the scalar handling code (the ColumnarValue::Scalar branch)
by adding DataType::UInt8/16/32/64 cases that use wrapping_negative_scalar! so
unsigned integers are handled consistently with signed integers.
🧹 Nitpick comments (2)
datafusion/spark/src/function/math/negative.rs (2)

74-76: Consider defensive bounds check in return_type.

While Signature::numeric(1, ...) should ensure exactly one argument, accessing arg_types[0] without a bounds check could panic if the framework's validation is bypassed or if the signature is changed in the future.

♻️ Suggested defensive check
     fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
-        Ok(arg_types[0].clone())
+        arg_types
+            .first()
+            .cloned()
+            .ok_or_else(|| DataFusionError::Internal("negative requires 1 argument".to_string()))
     }

283-299: Consider adding Float16 scalar tests.

The Float32 and Float64 scalar tests are comprehensive, but Float16 is supported in the implementation (lines 159, 184-186) without corresponding test coverage. Adding Float16 test cases would ensure consistency.

🧪 Suggested Float16 test additions
// Add to test_negative_scalar_floats
use half::f16;

test_negative_scalar!(Float16, f16::from_f32(5.5), f16::from_f32(-5.5));
test_negative_scalar!(Float16, f16::from_f32(-5.5), f16::from_f32(5.5));
test_negative_scalar!(Float16, f16::INFINITY, f16::NEG_INFINITY);
test_negative_scalar!(Float16, f16::NEG_INFINITY, f16::INFINITY);

Comment on lines +149 to +172
ColumnarValue::Array(array) => match array.data_type() {
DataType::Null => Ok(args[0].clone()),

// Signed integers - use wrapping negation (Spark legacy mode behavior)
DataType::Int8 => wrapping_negative_array!(array, Int8Array),
DataType::Int16 => wrapping_negative_array!(array, Int16Array),
DataType::Int32 => wrapping_negative_array!(array, Int32Array),
DataType::Int64 => wrapping_negative_array!(array, Int64Array),

// Floating point - simple negation (no overflow possible)
DataType::Float16 => simple_negative_array!(array, Float16Array),
DataType::Float32 => simple_negative_array!(array, Float32Array),
DataType::Float64 => simple_negative_array!(array, Float64Array),

// Decimal types - wrapping negation
DataType::Decimal128(_, _) => {
wrapping_negative_array!(array, Decimal128Array)
}
DataType::Decimal256(_, _) => {
wrapping_negative_array!(array, Decimal256Array)
}

dt => internal_err!("Not supported datatype for Spark NEGATIVE: {dt}"),
},
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if other Spark math functions handle unsigned integers
rg -n "UInt8|UInt16|UInt32|UInt64" datafusion/spark/src/function/math/ --type rust

Repository: martin-augment/datafusion

Length of output: 1365


🏁 Script executed:

#!/bin/bash
# Read the full negative.rs file to see the complete implementation
cat -n datafusion/spark/src/function/math/negative.rs | head -200

Repository: martin-augment/datafusion

Length of output: 8881


🏁 Script executed:

#!/bin/bash
# Check the Signature::numeric definition to understand what types it includes
rg -A 5 "fn numeric" datafusion/expr/src/signature.rs | head -20

Repository: martin-augment/datafusion

Length of output: 141


🏁 Script executed:

#!/bin/bash
# Find where Signature::numeric is defined
rg -n "fn numeric" --type rust

Repository: martin-augment/datafusion

Length of output: 595


🏁 Script executed:

#!/bin/bash
# Check the Signature::numeric implementation
sed -n '1192,1220p' datafusion/expr-common/src/signature.rs

Repository: martin-augment/datafusion

Length of output: 993


🏁 Script executed:

#!/bin/bash
# Check what TypeSignature::Numeric includes
rg -A 20 "Numeric\(" datafusion/expr-common/src/signature.rs | grep -A 20 "impl.*TypeSignature\|fn matches_numeric"

Repository: martin-augment/datafusion

Length of output: 1008


🏁 Script executed:

#!/bin/bash
# Search for where TypeSignature::Numeric is handled in type checking
rg -B 5 -A 10 "TypeSignature::Numeric" datafusion/expr-common/src/type_coercion/signature.rs | head -50

Repository: martin-augment/datafusion

Length of output: 162


🏁 Script executed:

#!/bin/bash
# Find files that validate TypeSignature::Numeric
fd -t f "signature" --type-list=rust | xargs rg -l "TypeSignature::Numeric" | head -5

Repository: martin-augment/datafusion

Length of output: 451


🏁 Script executed:

#!/bin/bash
# Check how TypeSignature::Numeric is validated in function type checking
rg -B 3 -A 15 "TypeSignature::Numeric" datafusion/expr/src/type_coercion/functions.rs | head -80

Repository: martin-augment/datafusion

Length of output: 2849


🏁 Script executed:

#!/bin/bash
# Find NativeType definition and is_numeric method
rg -B 5 -A 15 "fn is_numeric" --type rust

Repository: martin-augment/datafusion

Length of output: 2902


Add support for unsigned integer types (UInt8, UInt16, UInt32, UInt64).

The function signature accepts unsigned integer types via Signature::numeric(), but the implementation rejects them with "Not supported datatype for Spark NEGATIVE". This is inconsistent with other Spark math functions (abs, rint) which explicitly handle unsigned integers. Add cases for UInt8, UInt16, UInt32, and UInt64 using wrapping_negative_array! macro (lines 149-172) and wrapping_negative_scalar! macro (lines 173+) to match the pattern for signed integers.

🤖 Prompt for AI Agents
In `@datafusion/spark/src/function/math/negative.rs` around lines 149 - 172, The
ColumnarValue::Array branch for Spark NEGATIVE lacks handling for unsigned
integer types, causing unsigned numerics to hit the "Not supported datatype"
path; update the match arms in the array branch (the match on array.data_type())
to add DataType::UInt8, DataType::UInt16, DataType::UInt32 and DataType::UInt64
and call wrapping_negative_array! for each, and mirror those additions in the
scalar handling code (the ColumnarValue::Scalar branch) by adding
DataType::UInt8/16/32/64 cases that use wrapping_negative_scalar! so unsigned
integers are handled consistently with signed integers.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

))
})?;
let result: $ARRAY_TYPE = array.unary(|x| x.wrapping_neg());
Ok(ColumnarValue::Array(Arc::new(result)))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decimal array negation loses precision and scale metadata

High Severity

The wrapping_negative_array! macro calls array.unary(|x| x.wrapping_neg()) without calling .with_data_type(input.data_type().clone()) afterward. Arrow's unary() method does not preserve the original data type metadata - for decimal arrays, this means precision and scale are lost and reset to defaults. The similar make_wrapping_abs_function! macro in datafusion/functions/src/math/abs.rs explicitly calls .with_data_type() for this reason. Decimal array results will have incorrect precision/scale, potentially causing type mismatches or incorrect query results.

Additional Locations (1)

Fix in Cursor Fix in Web

@augmentcode
Copy link

augmentcode bot commented Jan 26, 2026

🤖 Augment PR Summary

Summary: Adds Spark-compatible negative(expr) (unary minus) to the DataFusion Spark function library.

Changes:

  • Registers a new math module negative and exposes it via expr_fn::negative and the math functions() list.
  • Implements SparkNegative as a ScalarUDFImpl with numeric signature and same-as-input return type.
  • Supports columnar and scalar execution for signed integers (wrapping negation), floats (simple negation), and decimals (wrapping negation).
  • Adds unit tests covering integer wrapping on MIN values, float infinities/NaN handling, decimal negation, NULL propagation, and argument-count validation.

Technical Notes: Behavior is intentionally “legacy Spark” for signed integer overflow (wraps) with a TODO to add ANSI-mode overflow errors later.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 2 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.

impl SparkNegative {
pub fn new() -> Self {
Self {
signature: Signature::numeric(1, Volatility::Immutable),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Signature::numeric likely accepts unsigned integer types (e.g. UInt32), but spark_negative doesn’t handle them and will error at runtime. Consider tightening the signature (or adding coercion/handling) so callers don’t get a late "Not supported datatype" failure for numeric-but-unsigned inputs.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

make_udf_function!(width_bucket::SparkWidthBucket, width_bucket);
make_udf_function!(trigonometry::SparkCsc, csc);
make_udf_function!(trigonometry::SparkSec, sec);
make_udf_function!(negative::SparkNegative, negative);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most Spark math functions appear to have sqllogictest coverage under datafusion/sqllogictest/test_files/spark/math; it might be worth adding a negative.slt to lock in Spark parity (NULL / NaN / decimal / integer-min wrapping behavior).

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant