Skip to content

Conversation

@Yuvraj-cyborg
Copy link
Contributor

Which issue does this PR close?

Closes #19348

Rationale for this change

Previously, pow() on decimal types would error for negative exponents and non-integer exponents with messages like:

  • Arrow error: Arithmetic overflow: Unsupported exp value: -5
  • Compute error: Cannot use non-integer exp
  • This was a regression from when decimals were cast to float before pow(). The efficient integer-based algorithm for computing power on scaled integers cannot handle these cases.

What changes are included in this PR?

  • Modified pow_decimal_int to fallback to pow_decimal_float for negative exponents
  • Modified pow_decimal_float to use an efficient integer path for non-negative integer exponents, otherwise fallback to f64 computation

Added pow_decimal_float_fallback function that:

  • Converts the decimal to f64
  • Computes powf(exp)
  • Converts back to the original decimal type with proper scaling
  • Added decimal_from_i128 helper to convert i128 results back to generic decimal types (needed for Decimal256 support)
  • Updated sqllogictests to expect success for negative/non-integer exponents

Are these changes tested?

Yes:

Unit tests for pow_decimal_float_fallback covering negative exponents, fractional exponents, cube roots
Updated SQL logic tests in decimal.slt

Are there any user-facing changes?

Yes. The following queries now work instead of returning errors:

-- Negative exponent
SELECT power(4::decimal(38, 5), -1);  -- Returns 0.25

-- Non-integer exponent  
SELECT power(2.5, 4.2);  -- Returns 46.9

-- Square root via power
SELECT power(4::decimal, 0.5);  -- Returns 2

@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) functions Changes to functions implementation labels Dec 17, 2025
@Yuvraj-cyborg
Copy link
Contributor Author

@Jefffrey could you review this !
And let me know what changes are needed if any

decimal_from_i128::<T>(result_rounded as i128)
}

fn decimal_from_i128<T>(value: i128) -> Result<T, ArrowError>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this intentionally not supporting Decimal256 (i.e. i256) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

practically the f64 computation already loses precision past ~10^15 significant digits , isn't it ??
That's why I thought it isn't needed..

If you say then I can add that too
like add a Decimal256 path.

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do wonder, with how many different paths we're introducing here to account for all edge cases, how other systems handle doing power for numeric/decimals 🤔

Do they just cast to float anyway or they have similar complexity of checks

Comment on lines 219 to 238
if value >= i32::MIN as i128 && value <= i32::MAX as i128 {
return Ok(T::from(value as i32));
}

let is_negative = value < 0;
let abs_value = value.unsigned_abs();

let billion = 1_000_000_000u128;
let mut result = T::from(0);
let mut multiplier = T::from(1);
let billion_t = T::from(1_000_000_000);

let mut remaining = abs_value;
while remaining > 0 {
let chunk = (remaining % billion) as i32;
remaining /= billion;

let chunk_value = T::from(chunk).mul_checked(multiplier).map_err(|_| {
ArrowError::ArithmeticOverflow(format!(
"Overflow while converting {value} to decimal type"
))
})?;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really not quite sure what is happening here 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

converting an i128 result back to the generic type T (which could be i32, i64, i128, or i256). Since we can't directly convert i128 → T for all types, so we split the i128 into chunks of 10^9 (a billion), which fits in i32. Then we reconstruct T by: chunk[n] * (10^9)^n + chunk[n-1] * (10^9)^(n-1) + and so on.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a simplified implementation that has no need for this manual conversion from i128 back to i32/i64

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jefffrey I used NumCast, review this one but... i256 doesn't implement NumCast... So I did that manually.

Let me know what you think on this

cc: @martin-g

@Yuvraj-cyborg
Copy link
Contributor Author

I do wonder, with how many different paths we're introducing here to account for all edge cases, how other systems handle doing power for numeric/decimals 🤔

Do they just cast to float anyway or they have similar complexity of checks

In PostgreSQL they cast decimal to float64, compute pow(), cast back. They accept the precision loss.

Also in MySQL same approach - converts DECIMAL to DOUBLE for POW().

@Yuvraj-cyborg Yuvraj-cyborg force-pushed the pow-with-negative branch 5 times, most recently from 4dda5d8 to 9f79025 Compare December 23, 2025 11:49
@Yuvraj-cyborg
Copy link
Contributor Author

Hey @Jefffrey could you review this !!

Copy link
Contributor

@Jefffrey Jefffrey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing all this logic introduced, I'm beginning to question whether there is actual benefit to having a native log implementation 🤔

Perhaps we should just revert to casting it to float and accept the accuracy loss

Thoughts @theirix ?

"Cannot make unscale factor for {scale} and {exp}"
))
})
<T as From<i32>>::from(10)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a little ugly, I recommend checking the existing traits to see if they have what we already need.

See:

@theirix
Copy link
Contributor

theirix commented Jan 3, 2026

Seeing all this logic introduced, I'm beginning to question whether there is actual benefit to having a native log implementation 🤔

Perhaps we should just revert to casting it to float and accept the accuracy loss

Thoughts @theirix ?

Fair enough, the logic becomes more convoluted.

The original idea was to introduce common decimal operations. Scale-preserving operations like abs, round, gcd, etc., are easy to implement and support. Some other operations with a natural mapping to decimals (like log10, pow10) adjust scales and do not have a natural analogue in the arrow buffer, leading to more complex logic. These operations are typical for data analytics, and applications could benefit from them. So ten-based operations can be calculated precisely, while for the rest and for more complicated operations, of course, it is fine to lose precision using a native float implementation.

First, we should reuse the arrow's foundational primitives as much as possible. If there is an OP_checked, it's better to piggyback on it. A few num traits were recently added to decimals in arrow-buffer, making it easier for us.

Second, I believe more logic should be isolated in calculate_binary_decimal_math, especially for handling different scales, to shift responsibility from UDF implementers (like pow) to middleware. It is in progress, and I'll submit it shortly.

@Jefffrey
Copy link
Contributor

Jefffrey commented Jan 3, 2026

Seeing all this logic introduced, I'm beginning to question whether there is actual benefit to having a native log implementation 🤔
Perhaps we should just revert to casting it to float and accept the accuracy loss
Thoughts @theirix ?

Fair enough, the logic becomes more convoluted.

The original idea was to introduce common decimal operations. Scale-preserving operations like abs, round, gcd, etc., are easy to implement and support. Some other operations with a natural mapping to decimals (like log10, pow10) adjust scales and do not have a natural analogue in the arrow buffer, leading to more complex logic. These operations are typical for data analytics, and applications could benefit from them. So ten-based operations can be calculated precisely, while for the rest and for more complicated operations, of course, it is fine to lose precision using a native float implementation.

First, we should reuse the arrow's foundational primitives as much as possible. If there is an OP_checked, it's better to piggyback on it. A few num traits were recently added to decimals in arrow-buffer, making it easier for us.

Second, I believe more logic should be isolated in calculate_binary_decimal_math, especially for handling different scales, to shift responsibility from UDF implementers (like pow) to middleware. It is in progress, and I'll submit it shortly.

That makes sense. I guess what we could also do to alleviate this complexity (and ensure less performance impact) would be:

  • At invoke time of function, only use native decimal operations when we have a scalar exponent
  • Otherwise fall back to casting to float

This can be done in followup PRs of course but at least sets a roadmap for us.

@theirix
Copy link
Contributor

theirix commented Jan 3, 2026

Seeing all this logic introduced, I'm beginning to question whether there is actual benefit to having a native log implementation 🤔
Perhaps we should just revert to casting it to float and accept the accuracy loss
Thoughts @theirix ?

Fair enough, the logic becomes more convoluted.
The original idea was to introduce common decimal operations. Scale-preserving operations like abs, round, gcd, etc., are easy to implement and support. Some other operations with a natural mapping to decimals (like log10, pow10) adjust scales and do not have a natural analogue in the arrow buffer, leading to more complex logic. These operations are typical for data analytics, and applications could benefit from them. So ten-based operations can be calculated precisely, while for the rest and for more complicated operations, of course, it is fine to lose precision using a native float implementation.
First, we should reuse the arrow's foundational primitives as much as possible. If there is an OP_checked, it's better to piggyback on it. A few num traits were recently added to decimals in arrow-buffer, making it easier for us.
Second, I believe more logic should be isolated in calculate_binary_decimal_math, especially for handling different scales, to shift responsibility from UDF implementers (like pow) to middleware. It is in progress, and I'll submit it shortly.

That makes sense. I guess what we could also do to alleviate this complexity (and ensure less performance impact) would be:

* At invoke time of function, only use native decimal operations when we have a scalar exponent

* Otherwise fall back to casting to float

Sounds like a plan. The routing should be better made based on the type signature, rather than at eval time.

This can be done in followup PRs of course but at least sets a roadmap for us.

@Yuvraj-cyborg
Copy link
Contributor Author

cc: @Jefffrey @theirix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow pow with negative & non-integer exponent on decimals

4 participants