20006: feat(spark): Adds negative spark function #213

martin-augment · 2026-01-26T13:42:25Z

20006: To review by AI

gemini-code-assist · 2026-01-26T13:42:42Z

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request adds a new negative scalar function to the datafusion-spark module, enabling the computation of the unary minus for numeric expressions. The implementation supports a wide range of data types, from integers to floating-point and decimal values, and specifically addresses integer overflow by employing wrapping negation to align with Spark's default (non-ANSI) behavior. This enhancement expands the mathematical functions available within the Spark compatibility layer.

Highlights

New Spark Function: Introduces negative, a Spark-compatible function for unary negation.
Numeric Type Support: Implements negation for various numeric types including integers, floating-point numbers, and decimals.
Spark Legacy Behavior: Integer negation uses wrapping behavior, consistent with Spark's non-ANSI (legacy) mode for MIN_VALUE overflow.
Comprehensive Testing: Includes unit tests covering scalar and array inputs, null handling, and edge cases like iN::MIN, NaN, and Infinity.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

coderabbitai · 2026-01-26T13:43:18Z

Walkthrough

This pull request introduces a new unary negation function (SparkNegative) to the DataFusion Spark module. A new module negative.rs implements the UDF with support for multiple numeric types including integers, floating-point numbers, and decimals. The implementation uses wrapping semantics for integer negation to match Spark behavior, preserves NULL and NaN handling, and includes comprehensive test coverage. The new function is registered in the math module's function registry and exported as a public UDF.

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-01-26T13:44:02Z

Pull Request Review: Spark Negative Function Implementation

Summary

This PR implements the negative function for Spark compatibility, which returns the negation of a numeric input (equivalent to unary minus). The implementation is well-structured with comprehensive tests.

Code Quality & Best Practices ✅

Strengths:

Apache License Header: Properly included at the top of the file
Clear Documentation: Good doc comments explaining Spark compatibility and behavior
Comprehensive Tests: Excellent test coverage for scalars, arrays, edge cases, and null handling
Macros for Code Reuse: Smart use of macros to reduce repetition
Consistent Patterns: Follows the codebase patterns seen in other math functions

Issues to Address:

1. Inconsistent Error Handling Pattern ⚠️

Location: negative.rs:139-143

The code uses internal_err! for argument count validation:

if args.len() != 1 {
    return internal_err!(
        "negative takes exactly 1 argument, but got: {}",
        args.len()
    );
}

Issue: Other functions in the codebase use assert_eq_or_internal_err! macro (see modulus.rs:30, trigonometry.rs:69 uses take_function_args). This pattern is more consistent with the codebase.

Recommendation: Use the assert_eq_or_internal_err! macro or take_function_args helper for consistency:

use datafusion_common::assert_eq_or_internal_err;

pub fn spark_negative(args: &[ColumnarValue]) -> Result<ColumnarValue, DataFusionError> {
    assert_eq_or_internal_err!(args.len(), 1, "negative takes exactly 1 argument");
    // ...
}

2. Missing Unsigned Integer Support ⚠️

Location: negative.rs:138-203

Issue: The function doesn't handle unsigned integer types (UInt8, UInt16, UInt32, UInt64). Looking at abs.rs, unsigned integers are explicitly handled in Spark functions. For unsigned types, negation should either:

Return an error for non-zero values (type mismatch)
Cast to signed type (type promotion)
Follow Spark's specific behavior

Recommendation: Check Spark's documentation for negative() behavior with unsigned integers and add appropriate handling. If Spark doesn't support unsigned integers, document this limitation.

3. Float16 Support May Be Unnecessary ℹ️

Location: negative.rs:157, 182

Observation: The code supports Float16, but I don't see Float16 usage in other Spark functions like modulus.rs or trigonometry.rs. Spark typically uses Float32/Float64.

Recommendation: Verify if Spark actually supports Float16. If not, remove it to match the signature pattern of other functions.

Potential Bugs 🐛

1. Signature May Be Too Permissive

Location: negative.rs:55

signature: Signature::numeric(1, Volatility::Immutable),

Issue: Signature::numeric() accepts all numeric types, but the implementation doesn't handle unsigned integers. This creates a mismatch between what the signature promises and what the function delivers.

Recommendation: Either:

Implement unsigned integer support, OR
Use a more specific signature that only accepts signed types

Performance Considerations 🚀

Overall: Good Performance

The implementation uses efficient Arrow operations:

✅ unary() operations on arrays (vectorized)
✅ No unnecessary allocations
✅ Proper use of wrapping arithmetic for integers

Minor Optimization Opportunity:

Location: negative.rs:169-170

The error messages construct strings at runtime:

format!("Expected {}, got different type", stringify!($ARRAY_TYPE))

This is in the error path, so performance impact is minimal, but consider using static strings or the unsupported_data_type_exec_err helper used in trigonometry.rs:85-89.

Security Concerns 🔒

Wrapping Overflow Behavior

Location: negative.rs:35-39

The TODO comment is important:

// TODOs:
//  - Spark's ANSI mode (when `spark.sql.ansi.enabled=true`) should throw an
//    ARITHMETIC_OVERFLOW error on integer overflow instead of wrapping.
//    This is not yet implemented - all operations currently use wrapping behavior.

Assessment:

✅ The wrapping behavior matches Spark's legacy mode (ANSI disabled)
⚠️ ANSI mode support is deferred but documented
✅ The behavior is explicit and predictable (not undefined)

Recommendation: This is acceptable for an initial implementation, but:

Consider adding a GitHub issue for ANSI mode support
Add a test comment showing the expected ANSI behavior
Ensure this is consistent with other Spark functions in the codebase

Test Coverage ✅

Excellent Coverage:

✅ Scalar values (integers, floats, decimals)
✅ Array values
✅ Null handling
✅ Edge cases (MIN, MAX values)
✅ Special float values (NaN, Infinity)
✅ Wrong argument count
✅ Wrapping behavior for MIN values

Missing Tests:

Mixed Scalar/Array inputs: Tests use either all scalars or all arrays, but don't test scalar operations being broadcast
Unsigned integer error handling: If unsigned types should error, add tests showing this
Type coercion: What happens with mixed-type inputs?

Integration Tests

Recommendation: Add SQL-level integration tests to verify the function works end-to-end in Spark SQL queries. Check if there's a test suite for Spark functions where this should be added.

Summary

Overall Assessment: This is a solid implementation with good test coverage and clear code. The main issues are:

Missing unsigned integer support (or explicit rejection)
Minor inconsistency with error handling patterns
Signature may be too permissive

Recommended Actions:

✅ Add unsigned integer handling or explicitly reject them
✅ Use assert_eq_or_internal_err! or take_function_args for consistency
✅ Verify Float16 support in Spark
ℹ️ Consider adding SQL-level integration tests
ℹ️ Create a follow-up issue for ANSI mode support if not already tracked

Great work overall! The code is well-tested and follows good practices. 🎉

gemini-code-assist

Code Review

This pull request introduces the negative Spark-compatible mathematical function. The implementation correctly handles various numeric types, including integers, floats, and decimals, applying wrapping negation for integers and decimals to align with Spark's legacy mode behavior. The code includes comprehensive unit tests for scalar and array inputs, covering edge cases like MIN/MAX integer values, NaN, and Infinity for floats, and null handling. The integration into the mod.rs file is also correctly done.

gemini-code-assist · 2026-01-26T13:44:27Z

datafusion/spark/src/function/math/negative.rs

+/// TODOs:
+///  - Spark's ANSI-compliant dialect, when off (i.e. `spark.sql.ansi.enabled=false`),
+///    negating the minimal value of a signed integer wraps around.
+///    For example: negative(i32::MIN) returns i32::MIN (wraps instead of error).
+///    This is the current implementation (legacy mode only).
+///  - Spark's ANSI mode (when `spark.sql.ansi.enabled=true`) should throw an
+///    ARITHMETIC_OVERFLOW error on integer overflow instead of wrapping.
+///    This is not yet implemented - all operations currently use wrapping behavior.


The TODOs clearly outline the current behavior (Spark legacy mode with wrapping negation) and the future work required for ANSI-compliant integer overflow handling. It's good that this is explicitly documented. Please ensure there's a tracking issue for implementing the ANSI mode behavior to throw ARITHMETIC_OVERFLOW on integer overflow.

gemini-code-assist · 2026-01-26T13:44:27Z

datafusion/spark/src/function/math/negative.rs

+            DataType::Float16 => simple_negative_array!(array, Float16Array),
+            DataType::Float32 => simple_negative_array!(array, Float32Array),


The Float16 data type is handled in the spark_negative function, but there are no dedicated test cases for Float16Array or ScalarValue::Float16 in the tests module. Adding specific tests for Float16 would improve test coverage and ensure correctness for this type.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@datafusion/spark/src/function/math/negative.rs`:
- Around line 149-172: The ColumnarValue::Array branch for Spark NEGATIVE lacks
handling for unsigned integer types, causing unsigned numerics to hit the "Not
supported datatype" path; update the match arms in the array branch (the match
on array.data_type()) to add DataType::UInt8, DataType::UInt16, DataType::UInt32
and DataType::UInt64 and call wrapping_negative_array! for each, and mirror
those additions in the scalar handling code (the ColumnarValue::Scalar branch)
by adding DataType::UInt8/16/32/64 cases that use wrapping_negative_scalar! so
unsigned integers are handled consistently with signed integers.

🧹 Nitpick comments (2)

datafusion/spark/src/function/math/negative.rs (2)
74-76: Consider defensive bounds check in return_type.

While Signature::numeric(1, ...) should ensure exactly one argument, accessing arg_types[0] without a bounds check could panic if the framework's validation is bypassed or if the signature is changed in the future.
♻️ Suggested defensive check
     fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
-        Ok(arg_types[0].clone())
+        arg_types
+            .first()
+            .cloned()
+            .ok_or_else(|| DataFusionError::Internal("negative requires 1 argument".to_string()))
     }
283-299: Consider adding Float16 scalar tests.

The Float32 and Float64 scalar tests are comprehensive, but Float16 is supported in the implementation (lines 159, 184-186) without corresponding test coverage. Adding Float16 test cases would ensure consistency.
🧪 Suggested Float16 test additions
// Add to test_negative_scalar_floats
use half::f16;

test_negative_scalar!(Float16, f16::from_f32(5.5), f16::from_f32(-5.5));
test_negative_scalar!(Float16, f16::from_f32(-5.5), f16::from_f32(5.5));
test_negative_scalar!(Float16, f16::INFINITY, f16::NEG_INFINITY);
test_negative_scalar!(Float16, f16::NEG_INFINITY, f16::INFINITY);

coderabbitai · 2026-01-26T13:47:43Z

datafusion/spark/src/function/math/negative.rs

+        ColumnarValue::Array(array) => match array.data_type() {
+            DataType::Null => Ok(args[0].clone()),
+
+            // Signed integers - use wrapping negation (Spark legacy mode behavior)
+            DataType::Int8 => wrapping_negative_array!(array, Int8Array),
+            DataType::Int16 => wrapping_negative_array!(array, Int16Array),
+            DataType::Int32 => wrapping_negative_array!(array, Int32Array),
+            DataType::Int64 => wrapping_negative_array!(array, Int64Array),
+
+            // Floating point - simple negation (no overflow possible)
+            DataType::Float16 => simple_negative_array!(array, Float16Array),
+            DataType::Float32 => simple_negative_array!(array, Float32Array),
+            DataType::Float64 => simple_negative_array!(array, Float64Array),
+
+            // Decimal types - wrapping negation
+            DataType::Decimal128(_, _) => {
+                wrapping_negative_array!(array, Decimal128Array)
+            }
+            DataType::Decimal256(_, _) => {
+                wrapping_negative_array!(array, Decimal256Array)
+            }
+
+            dt => internal_err!("Not supported datatype for Spark NEGATIVE: {dt}"),
+        },


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check if other Spark math functions handle unsigned integers rg -n "UInt8|UInt16|UInt32|UInt64" datafusion/spark/src/function/math/ --type rust

Repository: martin-augment/datafusion

Length of output: 1365

🏁 Script executed:

#!/bin/bash # Read the full negative.rs file to see the complete implementation cat -n datafusion/spark/src/function/math/negative.rs | head -200

Repository: martin-augment/datafusion

Length of output: 8881

🏁 Script executed:

#!/bin/bash # Check the Signature::numeric definition to understand what types it includes rg -A 5 "fn numeric" datafusion/expr/src/signature.rs | head -20

Repository: martin-augment/datafusion

Length of output: 141

🏁 Script executed:

#!/bin/bash # Find where Signature::numeric is defined rg -n "fn numeric" --type rust

Repository: martin-augment/datafusion

Length of output: 595

🏁 Script executed:

#!/bin/bash # Check the Signature::numeric implementation sed -n '1192,1220p' datafusion/expr-common/src/signature.rs

Repository: martin-augment/datafusion

Length of output: 993

🏁 Script executed:

#!/bin/bash # Check what TypeSignature::Numeric includes rg -A 20 "Numeric\(" datafusion/expr-common/src/signature.rs | grep -A 20 "impl.*TypeSignature\|fn matches_numeric"

Repository: martin-augment/datafusion

Length of output: 1008

🏁 Script executed:

#!/bin/bash # Search for where TypeSignature::Numeric is handled in type checking rg -B 5 -A 10 "TypeSignature::Numeric" datafusion/expr-common/src/type_coercion/signature.rs | head -50

Repository: martin-augment/datafusion

Length of output: 162

🏁 Script executed:

#!/bin/bash # Find files that validate TypeSignature::Numeric fd -t f "signature" --type-list=rust | xargs rg -l "TypeSignature::Numeric" | head -5

Repository: martin-augment/datafusion

Length of output: 451

🏁 Script executed:

#!/bin/bash # Check how TypeSignature::Numeric is validated in function type checking rg -B 3 -A 15 "TypeSignature::Numeric" datafusion/expr/src/type_coercion/functions.rs | head -80

Repository: martin-augment/datafusion

Length of output: 2849

🏁 Script executed:

#!/bin/bash # Find NativeType definition and is_numeric method rg -B 5 -A 15 "fn is_numeric" --type rust

Repository: martin-augment/datafusion

Length of output: 2902

Add support for unsigned integer types (UInt8, UInt16, UInt32, UInt64).

The function signature accepts unsigned integer types via Signature::numeric(), but the implementation rejects them with "Not supported datatype for Spark NEGATIVE". This is inconsistent with other Spark math functions (abs, rint) which explicitly handle unsigned integers. Add cases for UInt8, UInt16, UInt32, and UInt64 using wrapping_negative_array! macro (lines 149-172) and wrapping_negative_scalar! macro (lines 173+) to match the pattern for signed integers.

🤖 Prompt for AI Agents

In `@datafusion/spark/src/function/math/negative.rs` around lines 149 - 172, The ColumnarValue::Array branch for Spark NEGATIVE lacks handling for unsigned integer types, causing unsigned numerics to hit the "Not supported datatype" path; update the match arms in the array branch (the match on array.data_type()) to add DataType::UInt8, DataType::UInt16, DataType::UInt32 and DataType::UInt64 and call wrapping_negative_array! for each, and mirror those additions in the scalar handling code (the ColumnarValue::Scalar branch) by adding DataType::UInt8/16/32/64 cases that use wrapping_negative_scalar! so unsigned integers are handled consistently with signed integers.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.}

cursor · 2026-01-26T13:50:40Z

datafusion/spark/src/function/math/negative.rs

+                ))
+            })?;
+        let result: $ARRAY_TYPE = array.unary(|x| x.wrapping_neg());
+        Ok(ColumnarValue::Array(Arc::new(result)))


Decimal array negation loses precision and scale metadata

High Severity

The wrapping_negative_array! macro calls array.unary(|x| x.wrapping_neg()) without calling .with_data_type(input.data_type().clone()) afterward. Arrow's unary() method does not preserve the original data type metadata - for decimal arrays, this means precision and scale are lost and reset to defaults. The similar make_wrapping_abs_function! macro in datafusion/functions/src/math/abs.rs explicitly calls .with_data_type() for this reason. Decimal array results will have incorrect precision/scale, potentially causing type mismatches or incorrect query results.

Additional Locations (1)

datafusion/spark/src/function/math/negative.rs#L163-L169

augmentcode · 2026-01-26T13:54:02Z

🤖 Augment PR Summary

Summary: Adds Spark-compatible negative(expr) (unary minus) to the DataFusion Spark function library.

Changes:

Registers a new math module negative and exposes it via expr_fn::negative and the math functions() list.
Implements SparkNegative as a ScalarUDFImpl with numeric signature and same-as-input return type.
Supports columnar and scalar execution for signed integers (wrapping negation), floats (simple negation), and decimals (wrapping negation).
Adds unit tests covering integer wrapping on MIN values, float infinities/NaN handling, decimal negation, NULL propagation, and argument-count validation.

Technical Notes: Behavior is intentionally “legacy Spark” for signed integer overflow (wraps) with a TODO to add ANSI-mode overflow errors later.

_{🤖 Was this summary useful? React with 👍 or 👎}

augmentcode

Review completed. 2 suggestions posted.

Comment augment review to trigger a new review at any time.

augmentcode · 2026-01-26T13:54:03Z

datafusion/spark/src/function/math/negative.rs

+impl SparkNegative {
+    pub fn new() -> Self {
+        Self {
+            signature: Signature::numeric(1, Volatility::Immutable),


Signature::numeric likely accepts unsigned integer types (e.g. UInt32), but spark_negative doesn’t handle them and will error at runtime. Consider tightening the signature (or adding coercion/handling) so callers don’t get a late "Not supported datatype" failure for numeric-but-unsigned inputs.

_{🤖 Was this useful? React with 👍 or 👎}

augmentcode · 2026-01-26T13:54:04Z

datafusion/spark/src/function/math/mod.rs

 make_udf_function!(width_bucket::SparkWidthBucket, width_bucket);
 make_udf_function!(trigonometry::SparkCsc, csc);
 make_udf_function!(trigonometry::SparkSec, sec);
+make_udf_function!(negative::SparkNegative, negative);


Most Spark math functions appear to have sqllogictest coverage under datafusion/sqllogictest/test_files/spark/math; it might be worth adding a negative.slt to lock in Spark parity (NULL / NaN / decimal / integer-min wrapping behavior).

_{🤖 Was this useful? React with 👍 or 👎}

Nisha Agrawal added 2 commits January 26, 2026 15:49

Adds negative spark function

ced42b5

Lint fix

744ad95

gemini-code-assist bot reviewed Jan 26, 2026

View reviewed changes

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

cursor bot reviewed Jan 26, 2026

View reviewed changes

augmentcode bot reviewed Jan 26, 2026

View reviewed changes

		DataType::Float16 => simple_negative_array!(array, Float16Array),
		DataType::Float32 => simple_negative_array!(array, Float32Array),

20006: feat(spark): Adds negative spark function #213

Are you sure you want to change the base?

20006: feat(spark): Adds negative spark function #213

Uh oh!

Conversation

martin-augment commented Jan 26, 2026

Uh oh!

gemini-code-assist bot commented Jan 26, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

coderabbitai bot commented Jan 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

claude bot commented Jan 26, 2026

Pull Request Review: Spark Negative Function Implementation

Summary

Code Quality & Best Practices ✅

Strengths:

Issues to Address:

1. Inconsistent Error Handling Pattern ⚠️

2. Missing Unsigned Integer Support ⚠️

3. Float16 Support May Be Unnecessary ℹ️

Potential Bugs 🐛

1. Signature May Be Too Permissive

Performance Considerations 🚀

Overall: Good Performance

Minor Optimization Opportunity:

Security Concerns 🔒

Wrapping Overflow Behavior

Test Coverage ✅

Excellent Coverage:

Missing Tests:

Integration Tests

Summary

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Jan 26, 2026

Choose a reason for hiding this comment

Decimal array negation loses precision and scale metadata

Uh oh!

augmentcode bot commented Jan 26, 2026

Uh oh!

augmentcode bot left a comment

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

augmentcode bot Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Jan 26, 2026 •

edited

Loading