Skip to content

20006: feat(spark): Adds negative spark function#214

Open
martin-augment wants to merge 17 commits intomainfrom
pr-20006-2026-01-28-09-28-53
Open

20006: feat(spark): Adds negative spark function#214
martin-augment wants to merge 17 commits intomainfrom
pr-20006-2026-01-28-09-28-53

Conversation

@martin-augment
Copy link
Owner

20006: To review by AI

@gemini-code-assist
Copy link

Summary of Changes

Hello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new Spark-compatible negative scalar function, which computes the negation (unary minus) of its input. The implementation supports a wide range of numeric and interval data types, ensuring consistency with Spark's legacy behavior for integer overflow by using wrapping negation. This enhancement expands the mathematical functions available in the Spark dialect and includes comprehensive test coverage to validate its behavior across various inputs and edge cases.

Highlights

  • New Spark Function: A new Spark-compatible negative scalar function has been introduced, which computes the negation (unary minus) of its input.
  • Data Type Support: The function supports a wide range of numeric data types including signed integers (Int8, Int16, Int32, Int64), floating-point numbers (Float16, Float32, Float64), decimal types (Decimal32, Decimal64, Decimal128, Decimal256), and interval types (IntervalYearMonth, IntervalDayTime, IntervalMonthDayNano).
  • Integer Overflow Behavior: For signed integers, the implementation uses wrapping negation, aligning with Spark's legacy mode behavior where negating the minimum value of a signed integer wraps around (e.g., negative(i32::MIN) returns i32::MIN). ANSI mode error handling for overflow is noted as not yet implemented.
  • Comprehensive Testing: Extensive SQL logic tests have been added to cover various scenarios, including different data types, NULL values, expressions, nested calls, and edge cases like minimum integer values, infinity, and NaN for floating-point numbers.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

Walkthrough

The PR adds a new NEGATIVE unary function to DataFusion's Spark implementation. It includes a new module with a SparkNegative UDF struct implementing scalar negation across multiple data types (integers, floats, decimals, intervals, and arrays). Integer and interval types use wrapping behavior, while floating-point types use standard negation. The function returns NULL for NULL inputs and errors for unsupported types. The implementation integrates with DataFusion's ScalarUDFImpl interface. A comprehensive test suite covers various data types and edge cases including wrap-around behavior and special floating-point values.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Jan 28, 2026

Pull Request Review: feat(spark): Adds negative spark function

Thank you for implementing the negative function for Spark compatibility! I've reviewed the code and found it to be well-implemented overall. Here are my detailed findings:

Strengths

  1. Comprehensive Type Coverage: The implementation handles all relevant numeric types (Int8, Int16, Int32, Int64, Float16, Float32, Float64, Decimal types, and Interval types).

  2. Correct Wrapping Behavior: Properly implements wrapping negation for signed integers using .wrapping_neg(), which matches Spark's legacy mode behavior (non-ANSI).

  3. Proper NULL Handling: Correctly returns NULL for NULL inputs and preserves nullability in array operations.

  4. Special Value Handling: Correctly handles NaN and infinity for floating-point types.

  5. Excellent Test Coverage: The SQL logic tests (negative.slt) are comprehensive, covering:

    • All numeric types
    • Edge cases (MIN values, infinity, NaN)
    • NULL handling
    • Expression composition
    • Wrap-around behavior
  6. Good Documentation: Clear comments explaining the ANSI mode behavior and linking to the relevant GitHub issue (Support ANSI mode for negate function apache/datafusion#20034).

  7. Consistent Code Style: Follows the patterns established in other Spark math functions.


🐛 Issues Found

1. Missing Unsigned Integer Handling

Location: datafusion/spark/src/function/math/negative.rs:90-95

DataType::Null
| DataType::UInt8
| DataType::UInt16
| DataType::UInt32
| DataType::UInt64 => Ok(args[0].clone()),

Issue: For unsigned integer types, the function simply returns the input unchanged. This is incorrect - Spark's negative() should negate unsigned integers by converting them to signed types.

Expected Behavior:

  • negative(5::tinyint unsigned) should return -5
  • The result type should be promoted to a signed type with sufficient range

Recommendation: Either implement proper unsigned integer negation with type promotion, or add a comment explaining why these types are not supported if that's intentional.


2. Missing Unit Tests

Location: datafusion/spark/src/function/math/negative.rs

Issue: The implementation lacks Rust unit tests (no #[cfg(test)] module). While SQL logic tests exist, unit tests provide:

  • Faster feedback during development
  • More granular testing of edge cases
  • Better documentation of expected behavior
  • Easier debugging of specific scenarios

Recommendation: Add a #[cfg(test)] module with unit tests similar to those in modulus.rs or hex.rs. Example tests should include:

  • Basic integer negation (array and scalar)
  • Floating-point negation with NaN/Infinity
  • Decimal negation
  • Interval negation
  • NULL handling
  • Wrapping behavior for MIN values

3. Incomplete Argument Validation

Location: datafusion/spark/src/function/math/negative.rs:87

let [_] = take_function_args("negative", args)?;

Issue: The underscore pattern ignores the extracted argument but then accesses args[0] directly. While functional, this is inconsistent with patterns in other functions.

Recommendation: Either use the extracted argument or use a more explicit validation pattern like:

if args.len() \!= 1 {
    return exec_err\!("negative expects exactly one argument");
}

⚠️ Code Quality Suggestions

4. Code Duplication

The array and scalar handling code has significant duplication. Each data type is handled twice with nearly identical logic. Consider:

  • Extracting common negation logic into helper functions
  • Using macros to reduce boilerplate (if appropriate for the codebase)

However, I note that this pattern is consistent with other functions in the codebase, so this may be an intentional design choice for clarity.


5. Unused Import

Location: datafusion/spark/src/function/math/negative.rs:21

use bigdecimal::num_traits::WrappingNeg;

Issue: This import appears to be unused. The code uses the built-in .wrapping_neg() method on primitive types, not the trait from bigdecimal.

Recommendation: Remove the unused import or add a comment explaining why it's needed if there's a non-obvious reason.


📊 Performance Considerations

The implementation is efficient:

  • Uses unary() operations for arrays, which is optimal
  • No unnecessary allocations
  • Direct arithmetic operations for scalars

No performance concerns identified.


🔒 Security Considerations

The implementation is safe:

  • No unsafe code beyond the inherent array operations
  • Proper wrapping behavior prevents undefined overflow
  • No risk of panic or undefined behavior

No security concerns identified.


📝 Minor Suggestions

  1. Formatting: The export_functions! macro call at lines 66-70 has inconsistent formatting compared to others in the file. Consider:

    export_functions\!((negative, "Returns the negation of expr (unary minus).", arg1));
  2. Interval Type Documentation: Add a brief comment explaining why interval types support negation (it's not immediately obvious why you'd negate a time interval).


Summary

Overall Assessment: This is a solid implementation with good test coverage. The main issues are:

  1. Critical: Incorrect handling of unsigned integers (should be fixed)
  2. Important: Missing unit tests (should be added)
  3. Minor: Unused import and small style inconsistencies

Recommendation: Request changes to address the unsigned integer handling and add unit tests before merging.


📋 Checklist for Author

  • Fix unsigned integer handling (or document why it's not supported)
  • Add Rust unit tests in a #[cfg(test)] module
  • Remove unused bigdecimal::num_traits::WrappingNeg import
  • Consider consistent formatting in the export_functions! macro

Great work on this implementation! The comprehensive SQL tests are particularly impressive. 🎉

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a Spark-compatible negative function. The implementation is mostly correct, covering various numeric, decimal, and interval types, and includes comprehensive tests. However, I've identified a significant issue with the handling of unsigned integer types, where the function either returns an incorrect value or a non-user-friendly error. My review comments focus on correcting this behavior to ensure the function is robust and provides clear feedback on unsupported operations.

Comment on lines +92 to +96
DataType::Null
| DataType::UInt8
| DataType::UInt16
| DataType::UInt32
| DataType::UInt64 => Ok(args[0].clone()),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The current implementation for unsigned integer types (UInt*) in array form incorrectly returns the original value. For example, negative(1) would return 1. Since negating a positive unsigned integer is not a valid operation within its type, this should result in an error. This change corrects the behavior to return an execution error, which is more appropriate and aligns with expectations for such an operation.

Suggested change
DataType::Null
| DataType::UInt8
| DataType::UInt16
| DataType::UInt32
| DataType::UInt64 => Ok(args[0].clone()),
DataType::Null => Ok(args[0].clone()),
DataType::UInt8
| DataType::UInt16
| DataType::UInt32
| DataType::UInt64 => {
return exec_err!("The operator 'negative' is not defined for unsigned integers.");
},

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-but-wont-fix; category:bug; feedback: The Gemini AI reviewer is correct! Doing nothing for unsigned integers would be surprising but none of the other DB solutions support this too. Spark does not support unsigned integers, Postgres/DuckDB do not support negative() function.

})),
)),

dt => not_impl_err!("Not supported datatype for Spark NEGATIVE: {dt}"),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The scalar implementation for negative does not handle unsigned integer types, causing it to fall through to a not_impl_err!. This is inconsistent with the array implementation and not user-friendly. It should return a proper execution error. This change adds handling for unsigned scalar values to return an execution error, making the behavior consistent for both scalar and array inputs.

            ScalarValue::UInt8(Some(_))
            | ScalarValue::UInt16(Some(_))
            | ScalarValue::UInt32(Some(_))
            | ScalarValue::UInt64(Some(_)) => {
                return exec_err!("The operator 'negative' is not defined for unsigned integers.");
            }
            dt => not_impl_err!("Not supported datatype for Spark NEGATIVE: {dt}"),

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Gemini AI reviewer is correct! The unsigned integers are supported for arrays but not for scalars. This is inconsistent and support for scalars should be added. Prevents returning an error for using unsigned integers as scalar values.

use arrow::datatypes::{DataType, IntervalDayTime, IntervalMonthDayNano};
use bigdecimal::num_traits::WrappingNeg;
use datafusion_common::utils::take_function_args;
use datafusion_common::{Result, ScalarValue, not_impl_err};

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To return a user-facing execution error for invalid inputs, exec_err should be imported from datafusion_common. This will be used to handle unsupported operations on unsigned integers.

Suggested change
use datafusion_common::{Result, ScalarValue, not_impl_err};
use datafusion_common::{exec_err, not_impl_err, Result, ScalarValue};

@augmentcode
Copy link

augmentcode bot commented Jan 28, 2026

🤖 Augment PR Summary

Summary: Adds a Spark-compatible negative (unary minus) scalar function to the DataFusion Spark function library.

Changes:

  • Introduced SparkNegative UDF in spark/src/function/math/negative.rs that computes negation for numeric values using Arrow array/scalar kernels.
  • Registered and exported the new function from the Spark math module (math/mod.rs) so it is available via the Spark function registry and expr_fn.
  • Expanded Spark sqllogictest coverage for negative across ints, floats (including ±0.0, NaN, ±Infinity), decimals, NULL propagation, nested calls, and expression usage.
  • Added wrap-around test cases for negating minimum signed integer values (legacy Spark behavior).

Technical Notes: Integer negation uses wrapping semantics to match Spark legacy (non-ANSI) behavior; ANSI-mode overflow errors are documented as not implemented yet.

🤖 Was this summary useful? React with 👍 or 👎

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 1 suggestions posted.

Fix All in Augment

Comment augment review to trigger a new review at any time.


match &args[0] {
ColumnarValue::Array(array) => match array.data_type() {
DataType::Null
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For UInt* arrays this returns the input unchanged, so negative becomes a no-op for unsigned columns, but unsigned scalars fall through to not_impl_err! — meaning behavior depends on whether the argument is scalar vs array.

Fix This in Augment

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The Augment AI reviewer is correct! The unsigned integers are supported for arrays but not for scalars. This is inconsistent and support for scalars should be added. Prevents returning an error for using unsigned integers as scalar values.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@datafusion/spark/src/function/math/negative.rs`:
- Around line 92-96: The scalar match in negative.rs is missing arms for
unsigned integer ScalarValue variants, causing UInt scalars to fall through to
not_impl_err! while UInt arrays are returned unchanged; add match arms for
ScalarValue::UInt8, ScalarValue::UInt16, ScalarValue::UInt32, and
ScalarValue::UInt64 to the scalar branch (the same section that currently
handles Int*/Float*/Null) and return Ok(args[0].clone()) for each so scalar
UInts behave consistently with the array branch (mirror how abs.rs handles UInt
scalar cases).
🧹 Nitpick comments (1)
datafusion/sqllogictest/test_files/spark/math/negative.slt (1)

105-116: Add interval test cases to match new interval support.
Line 105-116 extends coverage to decimals, but there are still no tests for IntervalYearMonth/IntervalDayTime/IntervalMonthDayNano even though the implementation supports them. Adding a few interval cases would prevent regressions and validate the interval branch.

Comment on lines +92 to +96
DataType::Null
| DataType::UInt8
| DataType::UInt16
| DataType::UInt32
| DataType::UInt64 => Ok(args[0].clone()),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# First, let's see the file structure and read the negative.rs file
cd datafusion/spark/src/function/math || exit 1
wc -l negative.rs

Repository: martin-augment/datafusion

Length of output: 84


🏁 Script executed:

#!/bin/bash
# Read the array handling section (lines 85-105 to get context)
sed -n '85,105p' datafusion/spark/src/function/math/negative.rs

Repository: martin-augment/datafusion

Length of output: 943


🏁 Script executed:

#!/bin/bash
# Read the scalar handling section (lines 185-200 to see the scalar branch)
sed -n '185,200p' datafusion/spark/src/function/math/negative.rs

Repository: martin-augment/datafusion

Length of output: 717


🏁 Script executed:

#!/bin/bash
# Get all UInt cases in the scalar branch (lines 185-280)
sed -n '185,280p' datafusion/spark/src/function/math/negative.rs | grep -n "UInt\|DataType::" | head -30

Repository: martin-augment/datafusion

Length of output: 51


🏁 Script executed:

#!/bin/bash
# Check how other math UDFs handle unsigned types
ls -la datafusion/spark/src/function/math/*.rs | head -10

Repository: martin-augment/datafusion

Length of output: 1017


🏁 Script executed:

#!/bin/bash
# Search for patterns in other math functions (e.g., abs.rs, cast behavior)
rg -A 3 "UInt8|UInt16|UInt32|UInt64" datafusion/spark/src/function/math/ -g '*.rs' | head -50

Repository: martin-augment/datafusion

Length of output: 3246


🏁 Script executed:

#!/bin/bash
# Read the entire scalar branch section (line 192 to the end)
sed -n '192,280p' datafusion/spark/src/function/math/negative.rs

Repository: martin-augment/datafusion

Length of output: 3703


Add missing UInt cases to scalar branch for consistent behavior.*

Array branch (lines 92-96) passes unsigned integer types unchanged, but the scalar branch (lines 192+) lacks any ScalarValue::UInt* cases and falls through to not_impl_err!, creating inconsistent behavior. When a UInt array flows through, it returns unchanged; a UInt scalar fails instead. For consistency with other math functions like abs.rs, add scalar UInt handling:

ScalarValue::Int64(Some(v)) => {
    let result = v.wrapping_neg();
    Ok(ColumnarValue::Scalar(ScalarValue::Int64(Some(result))))
}

+ // Unsigned integers - return unchanged
+ ScalarValue::UInt8(_)
+ | ScalarValue::UInt16(_)
+ | ScalarValue::UInt32(_)
+ | ScalarValue::UInt64(_) => Ok(args[0].clone()),
+
// Floating point - simple negation
ScalarValue::Float16(Some(v)) => {
🤖 Prompt for AI Agents
In `@datafusion/spark/src/function/math/negative.rs` around lines 92 - 96, The
scalar match in negative.rs is missing arms for unsigned integer ScalarValue
variants, causing UInt scalars to fall through to not_impl_err! while UInt
arrays are returned unchanged; add match arms for ScalarValue::UInt8,
ScalarValue::UInt16, ScalarValue::UInt32, and ScalarValue::UInt64 to the scalar
branch (the same section that currently handles Int*/Float*/Null) and return
Ok(args[0].clone()) for each so scalar UInts behave consistently with the array
branch (mirror how abs.rs handles UInt scalar cases).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The unsigned integers are supported for arrays but not for scalars. This is inconsistent and support for scalars should be added. Prevents returning an error for using unsigned integers as scalar values.

@martin-augment
Copy link
Owner Author

1. Missing Unsigned Integer Handling

Location: datafusion/spark/src/function/math/negative.rs:90-95

DataType::Null
| DataType::UInt8
| DataType::UInt16
| DataType::UInt32
| DataType::UInt64 => Ok(args[0].clone()),

Issue: For unsigned integer types, the function simply returns the input unchanged. This is incorrect - Spark's negative() should negate unsigned integers by converting them to signed types.

Expected Behavior:

  • negative(5::tinyint unsigned) should return -5
  • The result type should be promoted to a signed type with sufficient range

Recommendation: Either implement proper unsigned integer negation with type promotion, or add a comment explaining why these types are not supported if that's intentional.

value:annoying; category:bug; feedback:The Claude AI reviewer is not correct! Neither Apache Spark, nor PostgreSQL/DuckDB supports negation of "unsigned" types in their SQL.

@martin-augment
Copy link
Owner Author

2. Missing Unit Tests

Location: datafusion/spark/src/function/math/negative.rs

Issue: The implementation lacks Rust unit tests (no #[cfg(test)] module). While SQL logic tests exist, unit tests provide:

  • Faster feedback during development
  • More granular testing of edge cases
  • Better documentation of expected behavior
  • Easier debugging of specific scenarios

Recommendation: Add a #[cfg(test)] module with unit tests similar to those in modulus.rs or hex.rs. Example tests should include:

  • Basic integer negation (array and scalar)
  • Floating-point negation with NaN/Infinity
  • Decimal negation
  • Interval negation
  • NULL handling
  • Wrapping behavior for MIN values

value:incorrect-but-reasonable; category:bug; feedback:The Claude AI reviewer is not correct! The Apache DataFusion team prefers the tests to be implemented in SQL Logic test (.slt) form than as unit tests. SLT tests do not need re-compilation, so they are faster to implement and iterate on.

@martin-augment
Copy link
Owner Author

105-116: Add interval test cases to match new interval support.
Line 105-116 extends coverage to decimals, but there are still no tests for IntervalYearMonth/IntervalDayTime/IntervalMonthDayNano even though the implementation supports them. Adding a few interval cases would prevent regressions and validate the interval branch.

value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct! Support for interval types has been added after a request from a maintainer but there are no SLT tests for these types. Adding tests would prevent regressions in the future which might not be noticed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants