19926: fix: increase ROUND decimal precision to prevent overflow truncation#204
19926: fix: increase ROUND decimal precision to prevent overflow truncation#204martin-augment wants to merge 4 commits intomainfrom
Conversation
Summary of ChangesHello @martin-augment, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves an issue in the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
WalkthroughThe changes refactor the round function implementation to compute return field types using a new ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code Review: ROUND Decimal Precision FixThis PR addresses overflow issues in the ROUND function for decimal types by reducing the output scale to reclaim precision for the integer part. Here's my analysis: ✅ Positive Aspects
🔍 Code Quality Issues1. Duplicated Code (round.rs:146-177)The scale calculation logic is duplicated across all 4 decimal types: if dp_is_scalar {
let new_scale = (*scale).min(decimal_places.max(0) as i8);
Decimal32(*precision, new_scale)
} else {
Decimal32(*precision, *scale)
}Recommendation: Extract this into a helper function: fn calculate_decimal_scale(
precision: u8,
scale: i8,
decimal_places: i32,
dp_is_scalar: bool,
constructor: impl Fn(u8, i8) -> DataType
) -> DataType {
if dp_is_scalar {
let new_scale = scale.min(decimal_places.max(0) as i8);
constructor(precision, new_scale)
} else {
constructor(precision, scale)
}
}2. Duplicated Scale Calculation (round.rs:231-258)Same scale calculation is repeated in the scalar fast path. Should share the same logic with 3. Complex Conditional Logic (round.rs:461-486)The Recommendation: Add more detailed comments explaining:
🐛 Potential Bugs1. Int64 to i32 Cast May Overflow (round.rs:133)Some(ScalarValue::Int64(Some(v))) => (*v as i32, true),Issue: If Recommendation: Use checked conversion: Some(ScalarValue::Int64(Some(v))) => {
if let Ok(v_i32) = i32::try_from(*v) {
(v_i32, true)
} else {
(0, false) // or return an error
}
}2. Inconsistent Null Handling (round.rs:134)When Question: Should 3. Missing Validation (round.rs:148, 156, 164, 172)let new_scale = (*scale).min(decimal_places.max(0) as i8);Issue: If Recommendation: Add validation: if decimal_places > i8::MAX as i32 {
return exec_err!("decimal_places {} exceeds maximum scale {}", decimal_places, i8::MAX);
}⚡ Performance Considerations
🔒 Security ConcernsNo security issues identified. The overflow checks are properly implemented using 📋 Test Coverage ConcernsMissing Test Cases:
Recommendation: Add sqllogictest cases: -- Test column-based decimal_places
SELECT round(cast(col1 as decimal(10,3)), col2) FROM table;
-- Test negative decimal_places boundary
SELECT round(cast('999.99' as decimal(5,2)), -3);
-- Test very large decimal_places
SELECT round(cast('123.456' as decimal(10,5)), 1000);📝 Documentation
🎯 SummaryOverall Assessment: This is a solid fix that addresses a real overflow issue. The core logic is sound and the test coverage demonstrates the key scenarios work correctly. Priority Issues to Address:
Recommendation: Request changes to address the Int64 cast issue and code duplication before merging. Great work identifying and fixing this overflow issue! The approach of reducing scale to reclaim precision is the correct solution. 🎉 |
There was a problem hiding this comment.
Code Review
This pull request fixes an overflow issue in the round function for decimal types by adjusting the output scale. The changes are mainly in datafusion/functions/src/math/round.rs and seem logically correct. The updated tests in .slt files properly cover the new behavior and add more coverage, which is great. My main feedback is regarding code duplication in several places, which could be refactored to improve maintainability. I've left specific suggestions on how to address this.
| Decimal32(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); | ||
| Decimal32(*precision, new_scale) | ||
| } else { | ||
| Decimal32(*precision, *scale) | ||
| } | ||
| } | ||
| Decimal64(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); | ||
| Decimal64(*precision, new_scale) | ||
| } else { | ||
| Decimal64(*precision, *scale) | ||
| } | ||
| } | ||
| Decimal128(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); | ||
| Decimal128(*precision, new_scale) | ||
| } else { | ||
| Decimal128(*precision, *scale) | ||
| } | ||
| } | ||
| Decimal256(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); | ||
| Decimal256(*precision, new_scale) | ||
| } else { | ||
| Decimal256(*precision, *scale) | ||
| } | ||
| } |
There was a problem hiding this comment.
There's significant code duplication in these match arms for handling different decimal types. This can make future maintenance harder. Consider refactoring this logic to reduce repetition, for example by using a macro.
Here's a suggestion using a macro:
macro_rules! adjust_decimal_type {
($DecimalTy:ident, $precision:expr, $scale:expr, $dp_is_scalar:expr, $decimal_places:expr) => {
if $dp_is_scalar {
let new_scale = (*$scale).min($decimal_places.max(0) as i8);
DataType::$DecimalTy(*$precision, new_scale)
} else {
DataType::$DecimalTy(*$precision, *$scale)
}
};
}
// Then in the match statement:
let return_type = match input_type {
Float32 => Float32,
Decimal32(precision, scale) => adjust_decimal_type!(Decimal32, precision, scale, dp_is_scalar, decimal_places),
Decimal64(precision, scale) => adjust_decimal_type!(Decimal64, precision, scale, dp_is_scalar, decimal_places),
Decimal128(precision, scale) => adjust_decimal_type!(Decimal128, precision, scale, dp_is_scalar, decimal_places),
Decimal256(precision, scale) => adjust_decimal_type!(Decimal256, precision, scale, dp_is_scalar, decimal_places),
_ => Float64,
};This would make the code more concise and easier to maintain.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Gemini AI reviewer is correct! Using a macro or a helper function would reduce the amount of duplicated code. Prevents maintaining duplicated code
| ScalarValue::Decimal128(Some(v), precision, scale) => { | ||
| let rounded = round_decimal(*v, *scale, dp)?; | ||
| // Reduce scale to reclaim integer precision | ||
| let new_scale = (*scale).min(dp.max(0) as i8); | ||
| let rounded = round_decimal(*v, *scale, new_scale, dp)?; | ||
| let scalar = | ||
| ScalarValue::Decimal128(Some(rounded), *precision, *scale); | ||
| ScalarValue::Decimal128(Some(rounded), *precision, new_scale); | ||
| Ok(ColumnarValue::Scalar(scalar)) | ||
| } | ||
| ScalarValue::Decimal256(Some(v), precision, scale) => { | ||
| let rounded = round_decimal(*v, *scale, dp)?; | ||
| let new_scale = (*scale).min(dp.max(0) as i8); | ||
| let rounded = round_decimal(*v, *scale, new_scale, dp)?; | ||
| let scalar = | ||
| ScalarValue::Decimal256(Some(rounded), *precision, *scale); | ||
| ScalarValue::Decimal256(Some(rounded), *precision, new_scale); | ||
| Ok(ColumnarValue::Scalar(scalar)) | ||
| } | ||
| ScalarValue::Decimal64(Some(v), precision, scale) => { | ||
| let rounded = round_decimal(*v, *scale, dp)?; | ||
| let new_scale = (*scale).min(dp.max(0) as i8); | ||
| let rounded = round_decimal(*v, *scale, new_scale, dp)?; | ||
| let scalar = | ||
| ScalarValue::Decimal64(Some(rounded), *precision, *scale); | ||
| ScalarValue::Decimal64(Some(rounded), *precision, new_scale); | ||
| Ok(ColumnarValue::Scalar(scalar)) | ||
| } | ||
| ScalarValue::Decimal32(Some(v), precision, scale) => { | ||
| let rounded = round_decimal(*v, *scale, dp)?; | ||
| let new_scale = (*scale).min(dp.max(0) as i8); | ||
| let rounded = round_decimal(*v, *scale, new_scale, dp)?; | ||
| let scalar = | ||
| ScalarValue::Decimal32(Some(rounded), *precision, *scale); | ||
| ScalarValue::Decimal32(Some(rounded), *precision, new_scale); | ||
| Ok(ColumnarValue::Scalar(scalar)) | ||
| } |
There was a problem hiding this comment.
Similar to another comment, there is a lot of duplicated logic for handling different decimal types here. This could be refactored to improve maintainability, for instance, by using a macro.
Here's a possible implementation:
macro_rules! round_decimal_scalar {
($v:expr, $precision:expr, $scale:expr, $dp:expr, $DecimalScalar:ident) => {{{
let new_scale = (*$scale).min($dp.max(0) as i8);
let rounded = round_decimal(*$v, *$scale, new_scale, $dp)?;
let scalar = ScalarValue::$DecimalScalar(Some(rounded), *$precision, new_scale);
Ok(ColumnarValue::Scalar(scalar))
}}};
}
// Then in the match statement:
match value_scalar {
// ...
ScalarValue::Decimal128(Some(v), precision, scale) => {
round_decimal_scalar!(v, precision, scale, dp, Decimal128)
}
ScalarValue::Decimal256(Some(v), precision, scale) => {
round_decimal_scalar!(v, precision, scale, dp, Decimal256)
}
// ... and so on for other decimal types
}There was a problem hiding this comment.
value:useful; category:bug; feedback:The Gemini AI reviewer is correct! Using a macro or a helper function would reduce the amount of duplicated code. Prevents maintaining duplicated code
🤖 Augment PR SummarySummary: Updates the Changes:
Technical Notes: Scale is only reduced when 🤖 Was this summary useful? React with 👍 or 👎 |
| | dt @ Decimal64(_, _) => dt, | ||
| Decimal32(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); |
There was a problem hiding this comment.
decimal_places.max(0) as i8 can wrap for large values (e.g. 1000 -> negative), potentially producing an invalid negative decimal scale; it seems safer to clamp/compare in i32 before converting to i8.
Other Locations
datafusion/functions/src/math/round.rs:156datafusion/functions/src/math/round.rs:164datafusion/functions/src/math/round.rs:172datafusion/functions/src/math/round.rs:233datafusion/functions/src/math/round.rs:240datafusion/functions/src/math/round.rs:247datafusion/functions/src/math/round.rs:254
🤖 Was this useful? React with 👍 or 👎
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! The decimal_places is an input provided by the user, so it should not be trusted blindly. Every conversion to a smaller numeric type should be done in a checked way and too big values should be reported as errors instead of using the overflown value.
| Ok(Arc::new(Field::new( | ||
| self.name(), | ||
| return_type, | ||
| input_field.is_nullable(), |
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Augment AI reviewer is correct! decimal_places=NULL should lead to a NULL result too. This is how other DBs also implement it - Postgres, DuckDB, Apache Spark. Prevents surprising behavior from Apache DataFusion that is different than other similar products.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 3 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
| if args.scalar_arguments.len() > 1 { | ||
| match args.scalar_arguments[1] { | ||
| Some(ScalarValue::Int32(Some(v))) => (*v, true), | ||
| Some(ScalarValue::Int64(Some(v))) => (*v as i32, true), |
There was a problem hiding this comment.
Unsafe Int64 truncation in decimal_places extraction
Medium Severity
The scalar_arguments[1] extraction casts Int64 values to i32 using the unsafe as i32 operator, which silently truncates values outside the i32 range. For example, passing 5000000000i64 would be truncated to 705032704, causing incorrect scale calculation in return_field_from_args. This should use .try_into() to properly handle overflow.
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The decimal_places is an input provided by the user, so it should not be trusted blindly. Every conversion to a smaller numeric type should be done in a checked way and too big values should be reported as errors instead of using the overflown value.
| decimal_places: Option<ArrayRef>, | ||
| ) -> Result<ArrayRef, DataFusionError> { | ||
| let number_rows = value.len(); | ||
| let return_type = value.data_type().clone(); |
There was a problem hiding this comment.
Test helper incorrectly uses input type as return
Medium Severity
The round_arrays test helper uses value.data_type().clone() as the return type, which is incorrect for decimal inputs where the scale should be reduced to match decimal_places. This works for float tests but would produce wrong results for decimal array tests. The helper should compute the correct return type by reducing the scale when decimal_places is a scalar constant.
There was a problem hiding this comment.
value:good-to-have; category:bug; feedback:The Bugbot AI reviewer is correct! The return type might be different for decimal input types which scale is reduced due to carry-over. Any test for decimal that uses this helper will face the issue. Prevents possible test failure if such a test is introduced.
| | dt @ Decimal64(_, _) => dt, | ||
| Decimal32(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); |
There was a problem hiding this comment.
Integer overflow in decimal_places cast to i8
High Severity
The code casts decimal_places.max(0) as i8 without bounds checking, causing integer wrapping when decimal_places > 127. For example, round(value, 200) would compute new_scale = scale.min(200 as i8) = scale.min(-56), producing a negative scale and creating invalid decimal types. This leads to incorrect results or runtime errors. The cast appears at lines 148, 156, 164, 172, 233, 240, 247, and 254.
Additional Locations (2)
There was a problem hiding this comment.
value:useful; category:bug; feedback:The Bugbot AI reviewer is correct! The decimal_places is an input provided by the user, so it should not be trusted blindly. Every conversion to a smaller numeric type should be done in a checked way and too big values should be reported as errors instead of using the overflown value.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@datafusion/functions/src/math/round.rs`:
- Around line 140-177: The decimal branch logic for Decimal32/64/128/256
computes new_scale by casting decimal_places to i8 which can wrap; before
casting clamp decimal_places to the range [-128, 127] or better to the existing
*scale bounds so it cannot overflow (e.g., let clamped =
decimal_places.max(i32::from(-*scale)).min(i32::from(*scale)) or clamp to
i8::MIN..=i8::MAX) and then cast clamped as i8; update the new_scale computation
in the match arms for Decimal32, Decimal64, Decimal128 and Decimal256 and apply
the same clamp for the other similar block referenced in the comment (lines
~232-258).
| // Calculate return type based on input type | ||
| // For decimals: reduce scale to decimal_places (reclaims precision for integer part) | ||
| // This matches Spark/DuckDB behavior where ROUND adjusts the scale | ||
| // BUT only if dp is a constant - otherwise keep original scale | ||
| let return_type = match input_type { | ||
| Float32 => Float32, | ||
| dt @ Decimal128(_, _) | ||
| | dt @ Decimal256(_, _) | ||
| | dt @ Decimal32(_, _) | ||
| | dt @ Decimal64(_, _) => dt, | ||
| Decimal32(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); | ||
| Decimal32(*precision, new_scale) | ||
| } else { | ||
| Decimal32(*precision, *scale) | ||
| } | ||
| } | ||
| Decimal64(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); | ||
| Decimal64(*precision, new_scale) | ||
| } else { | ||
| Decimal64(*precision, *scale) | ||
| } | ||
| } | ||
| Decimal128(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); | ||
| Decimal128(*precision, new_scale) | ||
| } else { | ||
| Decimal128(*precision, *scale) | ||
| } | ||
| } | ||
| Decimal256(precision, scale) => { | ||
| if dp_is_scalar { | ||
| let new_scale = (*scale).min(decimal_places.max(0) as i8); | ||
| Decimal256(*precision, new_scale) | ||
| } else { | ||
| Decimal256(*precision, *scale) | ||
| } | ||
| } |
There was a problem hiding this comment.
Clamp decimal_places before casting to i8 to avoid scale wraparound.
Casting large positive decimal_places directly to i8 can wrap (e.g., 1000 → -24), yielding incorrect output scales/metadata. Use an i32 clamp against the existing scale and then cast.
🛠️ Proposed fix (apply to all decimal variants shown)
@@
- let return_type = match input_type {
+ let dp_clamped = decimal_places.max(0);
+ let return_type = match input_type {
@@
- Decimal32(precision, scale) => {
+ Decimal32(precision, scale) => {
if dp_is_scalar {
- let new_scale = (*scale).min(decimal_places.max(0) as i8);
+ let new_scale = (*scale as i32).min(dp_clamped) as i8;
Decimal32(*precision, new_scale)
} else {
Decimal32(*precision, *scale)
}
}
@@
- Decimal64(precision, scale) => {
+ Decimal64(precision, scale) => {
if dp_is_scalar {
- let new_scale = (*scale).min(decimal_places.max(0) as i8);
+ let new_scale = (*scale as i32).min(dp_clamped) as i8;
Decimal64(*precision, new_scale)
} else {
Decimal64(*precision, *scale)
}
}
@@
- Decimal128(precision, scale) => {
+ Decimal128(precision, scale) => {
if dp_is_scalar {
- let new_scale = (*scale).min(decimal_places.max(0) as i8);
+ let new_scale = (*scale as i32).min(dp_clamped) as i8;
Decimal128(*precision, new_scale)
} else {
Decimal128(*precision, *scale)
}
}
@@
- Decimal256(precision, scale) => {
+ Decimal256(precision, scale) => {
if dp_is_scalar {
- let new_scale = (*scale).min(decimal_places.max(0) as i8);
+ let new_scale = (*scale as i32).min(dp_clamped) as i8;
Decimal256(*precision, new_scale)
} else {
Decimal256(*precision, *scale)
}
}
@@
- let dp = if let ScalarValue::Int32(Some(dp)) = dp_scalar {
+ let dp = if let ScalarValue::Int32(Some(dp)) = dp_scalar {
*dp
} else {
@@
- ScalarValue::Decimal128(Some(v), precision, scale) => {
+ ScalarValue::Decimal128(Some(v), precision, scale) => {
// Reduce scale to reclaim integer precision
- let new_scale = (*scale).min(dp.max(0) as i8);
+ let new_scale = (*scale as i32).min(dp.max(0)) as i8;
let rounded = round_decimal(*v, *scale, new_scale, dp)?;
let scalar =
ScalarValue::Decimal128(Some(rounded), *precision, new_scale);
Ok(ColumnarValue::Scalar(scalar))
}
@@
- ScalarValue::Decimal256(Some(v), precision, scale) => {
- let new_scale = (*scale).min(dp.max(0) as i8);
+ ScalarValue::Decimal256(Some(v), precision, scale) => {
+ let new_scale = (*scale as i32).min(dp.max(0)) as i8;
let rounded = round_decimal(*v, *scale, new_scale, dp)?;
let scalar =
ScalarValue::Decimal256(Some(rounded), *precision, new_scale);
Ok(ColumnarValue::Scalar(scalar))
}
@@
- ScalarValue::Decimal64(Some(v), precision, scale) => {
- let new_scale = (*scale).min(dp.max(0) as i8);
+ ScalarValue::Decimal64(Some(v), precision, scale) => {
+ let new_scale = (*scale as i32).min(dp.max(0)) as i8;
let rounded = round_decimal(*v, *scale, new_scale, dp)?;
let scalar =
ScalarValue::Decimal64(Some(rounded), *precision, new_scale);
Ok(ColumnarValue::Scalar(scalar))
}
@@
- ScalarValue::Decimal32(Some(v), precision, scale) => {
- let new_scale = (*scale).min(dp.max(0) as i8);
+ ScalarValue::Decimal32(Some(v), precision, scale) => {
+ let new_scale = (*scale as i32).min(dp.max(0)) as i8;
let rounded = round_decimal(*v, *scale, new_scale, dp)?;
let scalar =
ScalarValue::Decimal32(Some(rounded), *precision, new_scale);
Ok(ColumnarValue::Scalar(scalar))
}Also applies to: 232-258
🤖 Prompt for AI Agents
In `@datafusion/functions/src/math/round.rs` around lines 140 - 177, The decimal
branch logic for Decimal32/64/128/256 computes new_scale by casting
decimal_places to i8 which can wrap; before casting clamp decimal_places to the
range [-128, 127] or better to the existing *scale bounds so it cannot overflow
(e.g., let clamped =
decimal_places.max(i32::from(-*scale)).min(i32::from(*scale)) or clamp to
i8::MIN..=i8::MAX) and then cast clamped as i8; update the new_scale computation
in the match arms for Decimal32, Decimal64, Decimal128 and Decimal256 and apply
the same clamp for the other similar block referenced in the comment (lines
~232-258).
There was a problem hiding this comment.
value:useful; category:bug; feedback:The CodeRabbit AI reviewer is correct! The decimal_places is an input provided by the user, so it should not be trusted blindly. Every conversion to a smaller numeric type should be done in a checked way and too big values should be reported as errors instead of using the overflown value.
value:useful; category:bug; feedback:The Claude AI reviewer is correct! The |
value:useful; category:bug; feedback:The Claude AI reviewer is correct! decimal_places=NULL should lead to a NULL result too. This is how other DBs also implement it - Postgres, DuckDB, Apache Spark. Prevents surprising behavior from Apache DataFusion that is different than other similar products. |
19926: To review by AI