8719: [Variant] Add try_value/value for VariantArray#9
8719: [Variant] Add try_value/value for VariantArray#9martin-augment wants to merge 2 commits intomainfrom
Conversation
The `try_value` will return Result<Variant, ArrowError> and `value` unwrap from `try_value`
WalkthroughAdded error-handling conversion macro and updated APIs to return generic Result types instead of ArrowError-specific ones. Changed value access patterns and introduced safe casting behavior in variant_get operations. Changes
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
Code ReviewSummaryThis PR adds ✅ Strengths
🔍 Issues & ConcernsCritical: Inconsistent CastOptions usageLocation: The code is checking
This suggests the code is using Arrow's
Recommendation:
Medium: Potential panic in production codeLocation: if value.is_some_and(|v| \!matches\!(data_type, DataType::Struct(_)) && v.is_valid(index)) {
panic\!("Invalid variant, conflicting value and typed_value");
}Issue: This panics on invalid data, but the function returns Recommendation: Return an return Err(ArrowError::InvalidArgumentError(
"Invalid variant: conflicting value and typed_value".to_string()
));Medium: Macro error handling could be clearerLocation: The new Err(e) => Err(ArrowError::CastError(format\!("Cast failed: {e}")))Recommendation: Consider adding more context: Err(e) => Err(ArrowError::CastError(format\!(
"Cast failed for {} at index {}: {}",
stringify\!($t), $index, e
)))Low: Inconsistent Result type aliasesLocation: Multiple files The PR changes return types from Observation: This is actually good - it follows Rust conventions. Just ensure it's consistent across the entire module. 🎯 Performance Considerations
🔒 Security ConcernsNo security issues identified. The error handling improvements actually enhance robustness by:
📋 Test Coverage AssessmentExcellent coverage ✅ The tests cover:
Missing test: Consider adding a test for the conflicting value/typed_value scenario (line 945) once it returns an error instead of panicking. 📝 Minor Suggestions
✅ Recommended ActionsBefore merge:
After these fixes, this will be a solid improvement to the codebase! Overall AssessmentThis is a well-designed PR that meaningfully improves error handling in the variant compute code. The test coverage is thorough and the API design follows Rust best practices. The main concerns are around CastOptions consistency and replacing a panic with proper error handling. Rating: Approve with minor changes requested 👍 |
| /// | ||
| /// Panics if | ||
| /// * the index is out of bounds | ||
| /// * the array value is null |
There was a problem hiding this comment.
The docs for try_value don’t match the implementation: it returns Ok(Variant::Null) when the value is null and does check validity, so it shouldn’t “panic if the array value is null” nor claim it “does not check for nulls.” Consider updating the docs to reflect the actual behavior.
🤖 React with 👍 or 👎 to let us know if the comment was useful.
There was a problem hiding this comment.
value:annoying; category:bug; feedback:Actually the call to typed_value_to_variant() does panic at https://github.com/klion26/arrow-rs/blob/a6a362af67771e2399ea899b2d76e2abd87a39f0/parquet-variant-compute/src/variant_array.rs#L945 if the data_type is not Struct and the value is null (https://github.com/klion26/arrow-rs/blob/a6a362af67771e2399ea899b2d76e2abd87a39f0/arrow-array/src/array/mod.rs#L268)
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
parquet-variant-compute/src/variant_array.rs (2)
1530-1541: Test bug: out-of-bounds index 3 (len=3).This assertion will panic:
assert!(!variant_array.is_null(3));Replace 3 with 2:
- assert!(!variant_array.is_null(3)); + assert!(!variant_array.is_null(2));
1041-1057: Replace timestamp microsecond unwraps with proper error handling to match Time64 pattern.Lines 1041-1057 use
.unwrap()onDateTime::from_timestamp_micros(), which will panic on invalid timestamps. The file already establishes error handling for similar temporal types (Time64Microsecond at line 1028 usesgeneric_conversion_single_value_with_result!).DataType::Timestamp(TimeUnit::Microsecond, Some(_)) => { - generic_conversion_single_value!( + generic_conversion_single_value_with_result!( TimestampMicrosecondType, as_primitive, - |v| DateTime::from_timestamp_micros(v).unwrap(), + |v| DateTime::from_timestamp_micros(v) + .ok_or_else(|| format!("Invalid timestamp microsecond: {v}")), typed_value, index ) } DataType::Timestamp(TimeUnit::Microsecond, None) => { - generic_conversion_single_value!( + generic_conversion_single_value_with_result!( TimestampMicrosecondType, as_primitive, - |v| DateTime::from_timestamp_micros(v).unwrap().naive_utc(), + |v| DateTime::from_timestamp_micros(v) + .map(|dt| dt.naive_utc()) + .ok_or_else(|| format!("Invalid timestamp microsecond: {v}")), typed_value, index ) }
🧹 Nitpick comments (5)
parquet-variant-compute/src/variant_get.rs (2)
145-151: Use append_null on conversion error (safe=true) instead of appending Variant::Null.Mapping try_value errors to a real null avoids misinterpreting Variant::Null as a present value. It also aligns with how typed builders typically treat cast failures.
Suggested change:
- } else { - builder.append_value(target.try_value(i).unwrap_or(Variant::Null))?; + } else { + match target.try_value(i) { + Ok(v) => builder.append_value(v)?, + Err(_) => builder.append_null()?, + } }Please confirm append_null() produces the intended nulls for all target as_type branches in the row builder.
70-81: Safe-mode consistency: treat non-struct child as Missing/NotShredded when safe=true.You already gate the first downcast error on CastOptions.safe. Do the same when the named child exists but isn’t a struct. In safe mode this should likely degrade to Missing/NotShredded for the parent step, not error.
Example patch:
- let struct_array = field.as_struct_opt().ok_or_else(|| { - ArrowError::InvalidArgumentError(format!( - "Expected Struct array while following path, got {}", - field.data_type(), - )) - })?; + let Some(struct_array) = field.as_struct_opt() else { + if !cast_options.safe { + return Err(ArrowError::InvalidArgumentError(format!( + "Expected Struct array while following path, got {}", + field.data_type(), + ))); + } + // Degrade gracefully (treat as missing/not shredded at this level) + return Ok(missing_path_step()); + };Verify expected behavior on nested paths where typed_value has a leaf (non-struct) but the path continues; safe=true should yield nulls instead of errors.
Also applies to: 89-98
parquet-variant-compute/src/type_conversion.rs (1)
305-317: Include value context in cast error for easier debugging.generic_conversion_single_value_with_result currently emits “Cast failed: {e}”. Consider adding the concrete type name and index to aid triage.
For example:
- Err(e) => Err(ArrowError::CastError(format!("Cast failed: {e}"))), + Err(e) => Err(ArrowError::CastError(format!( + "Cast failed at index {idx} (array type: {ty}): {e}", + idx = $index, + ty = <$t as ArrowPrimitiveType>::DATA_TYPE + ))),Confirm ArrowPrimitiveType::DATA_TYPE is available here; otherwise pass type via an additional macro arg.
parquet-variant-compute/src/variant_array.rs (2)
332-341: value() unwrap may panic by design—consider offering a fallible accessor.The docs are clear; still, many callers benefit from a non-panicking API. Consider adding value_or_null(index) -> Option or exposing try_value publicly in docs as the preferred method.
525-527: Iterator uses value() and can panic on conversion errors.Either add a fallible iterator (e.g., try_iter() yielding Result<Option>) or switch to try_value internally and treat errors as None.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
parquet-variant-compute/src/type_conversion.rs(2 hunks)parquet-variant-compute/src/variant_array.rs(19 hunks)parquet-variant-compute/src/variant_get.rs(3 hunks)
🧰 Additional context used
🧬 Code graph analysis (3)
parquet-variant-compute/src/variant_array.rs (3)
parquet-variant/src/variant.rs (24)
try_new(58-66)try_new(294-297)new(316-321)try_from(83-85)try_from(1661-1665)try_from(1671-1675)try_from(1681-1685)from(75-77)from(1486-1488)from(1492-1497)from(1501-1503)from(1507-1509)from(1513-1515)from(1519-1521)from(1525-1532)from(1536-1543)from(1546-1553)from(1557-1565)from(1569-1571)from(1575-1577)from(1581-1583)from(1587-1589)from(1593-1595)metadata(1436-1442)parquet-variant/src/variant/decimal.rs (1)
try_new(91-91)parquet-variant/src/variant/metadata.rs (3)
try_new(69-84)try_new(183-185)new(197-199)
parquet-variant-compute/src/variant_get.rs (3)
parquet-variant-compute/src/variant_array.rs (6)
value(339-341)from(457-459)from(463-465)from(725-727)from(731-733)from(889-894)parquet-variant/src/variant.rs (11)
from(75-77)from(1486-1488)from(1492-1497)from(1501-1503)from(1507-1509)from(1513-1515)from(1519-1521)from(1525-1532)from(1536-1543)from(1546-1553)from(1557-1565)arrow-cast/src/cast/mod.rs (1)
default(82-87)
parquet-variant-compute/src/type_conversion.rs (2)
parquet-variant-compute/src/variant_array.rs (7)
value(339-341)from(457-459)from(463-465)from(725-727)from(731-733)from(889-894)v(1426-1426)parquet-variant/src/variant.rs (11)
from(75-77)from(1486-1488)from(1492-1497)from(1501-1503)from(1507-1509)from(1513-1515)from(1519-1521)from(1525-1532)from(1536-1543)from(1546-1553)from(1557-1565)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (11)
- GitHub Check: Test on Windows
- GitHub Check: Test on Mac
- GitHub Check: Verify MSRV (Minimum Supported Rust Version)
- GitHub Check: Test
- GitHub Check: PySpark Integration Test (stable)
- GitHub Check: claude-review
- GitHub Check: Test
- GitHub Check: Build wasm32
- GitHub Check: Check Compilation
- GitHub Check: codex
- GitHub Check: Rustdocs are clean
🔇 Additional comments (6)
parquet-variant-compute/src/variant_get.rs (1)
3590-3638: Nice coverage for invalid time casting (safe vs unsafe).Tests clearly assert the new behavior and messages; no issues spotted.
parquet-variant-compute/src/type_conversion.rs (2)
279-288: Macros returning Result are the right direction.Changing non_generic_conversion_single_value to return Ok(...) integrates cleanly with callers.
30-40: The CastOptions types are appropriately scoped and pose no duplication concern.The local
CastOptionsintype_conversion.rsis actively used elsewhere in the codebase (e.g., inarrow_to_variant.rsand exported vialib.rs). The separateCastOptionsinarrow-castis a different type with different fields (safevsstrict) in a different crate. These are not duplicates and do not create ambiguity—they are clearly scoped to their respective modules and serve different purposes. No removal or renaming is necessary.Likely an incorrect or invalid review comment.
parquet-variant-compute/src/variant_array.rs (3)
343-383: try_value() Result-based API looks good.Returns Variant::Null when both columns are absent, matching the spec requirement.
1103-1111: Result-returning helpers and canonicalization look solid.cast_to_binary_view_arrays and canonicalize_* now return Result; logic for decimal/timestamp restrictions is consistent with the spec.
Also applies to: 1116-1211, 1213-1219
1565-1625: Good negative tests via invalid_variant_array_test macro.Covers time and decimal precision overflows with clear messages.
|
Findings
Questions
|
value:incorrect-but-reasonable; category:bug; feedback:The AI is confused by the reuse of the same name for two different structs. The changed code uses |
value:good-to-have; category:bug; feedback:The AI reviewer is correct that a software library should avoid panicking as much as possible! It is always better to return an Err and let the user of the API to deal with it. Panicking would halt the whole application. |
value:good-to-have; category:bug; feedback:The AI reviewer is correct that the error message would be more useful with the actual values of the type, its index and the cause of the failure. |
value:good-to-have; category:bug; feedback:The AI reviewer is correct that the codebase would be easier to maintain if the same conventions are followed in all source code files. The inconsistencies lead to confusion and questions. |
value:annoying; category:bug; feedback:The AI reviewer is hallucinating. These files are not changed in this Pull Request |
value:valid-but-wont-fix; category:bug; feedback:The AI reviewer is correct! This is an API breaking change but the PR author documents this change in the PR description which will be used for the release changelog. It is allowed to make such API breaking changes in Arrow-rs because they bump the major part of the version with each release, e.g. 57.0.0 -> 58.0.0. |
8719: To review by AI