Skip to content

8759: AvroError enum for arrow-avro crate#16

Open
martin-augment wants to merge 4 commits intomainfrom
pr-8759-2025-11-10-13-41-50
Open

8759: AvroError enum for arrow-avro crate#16
martin-augment wants to merge 4 commits intomainfrom
pr-8759-2025-11-10-13-41-50

Conversation

@martin-augment
Copy link
Owner

@martin-augment martin-augment commented Nov 10, 2025

8759: To review by AI


Note

Adds a new errors module with AvroError and refactors reader/writer code to use it end-to-end, converting to ArrowError only at public boundaries.

  • Errors:
    • Introduce errors::AvroError enum (e.g., General, NYI, EOF, InvalidArgument, ParseError, SchemaError, External, NeedMoreData*) and Result<T> alias.
    • Implement conversions: From<io::Error/Utf8Error/FromUtf8Error/TryFromIntError/ArrowError> for AvroError, and From<AvroError> for io::Error/ArrowError.
    • Export via pub mod errors; in lib.rs.
  • Reader (reader/*):
    • Replace ArrowError with AvroError in internal APIs (HeaderDecoder, BlockDecoder, AvroCursor, RecordDecoder, union handling, projection/skipping, helpers).
    • Classify incomplete data using AvroError::{EOF,NeedMoreData*}; map to ArrowError only in public returns (Decoder::flush, ReaderBuilder::build*_, Reader::read).
    • Update error messages and tests accordingly.
  • Writer (writer/*):
    • Refactor encoders and formats to return Result<_, AvroError>; map I/O and validation failures to AvroError.
    • Keep external writer entry points unchanged aside from internal error mapping.
  • API surface:
    • Internal functions now return Result<_, AvroError>; public constructors/readers still expose ArrowError by converting from AvroError.
    • Minor error text harmonization (e.g., magic mismatch, UUID/decimal validation).

Written by Cursor Bugbot for commit b40bb9c. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

This PR introduces centralized error handling for the arrow-avro crate by creating a new AvroError enum and Result type alias in errors.rs. Multiple reader and writer modules are refactored to replace ArrowError with AvroError in public and internal signatures throughout the crate, enabling consistent error propagation and handling.

Changes

Cohort / File(s) Summary
Error infrastructure
arrow-avro/src/errors.rs, arrow-avro/src/lib.rs
New centralized error module with AvroError enum (11 variants) and Result<T, E = AvroError> type alias. Implements Display, Error, and From conversions for standard types. Module exported from crate root.
Reader modules - cursor and block
arrow-avro/src/reader/block.rs, arrow-avro/src/reader/cursor.rs
Public and internal method signatures updated to return Result with AvroError. Methods get_u8, get_bool, read_vlq, get_int, get_long, get_bytes, get_float, get_double, and get_fixed refactored. Error variants ArrowError\::ParseError and ArrowError\::EOF replaced with AvroError equivalents.
Reader modules - header and coordination
arrow-avro/src/reader/header.rs, arrow-avro/src/reader/mod.rs
Method signatures changed from Result<T, ArrowError> to Result for read_header, Header::compression, Header::schema, HeaderDecoder::decode, and builder methods. is_incomplete_data accepts &AvroError. Error mapping and propagation updated throughout decoding pipeline.
Reader module - record decoding
arrow-avro/src/reader/record.rs
RecordDecoder, Decoder, UnionDecoder, Skipper, and Projector method signatures refactored to return Result without ArrowError. All ArrowError variants (SchemaError, ParseError, InvalidArgument, NYI) replaced with AvroError equivalents.
Writer modules - encoding
arrow-avro/src/writer/encoder.rs
FieldEncoder::make_encoder, FieldPlan::build, RecordEncoder::build, RecordEncoder::encode, and per-type encoder methods (BooleanEncoder, IntEncoder, StructEncoder, ListEncoder, MapEncoder, UnionEncoder, etc.) updated to return Result. Error handling converted from ArrowError to AvroError.
Writer module - format handling
arrow-avro/src/writer/format.rs
AvroFormat trait method start_stream and implementations for AvroOcfFormat and AvroSoeFormat updated to return Result<()>. Helper functions write_string and write_bytes refactored. Invalid compression now returns AvroError\::InvalidArgument instead of ArrowError.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch pr-8759-2025-11-10-13-41-50

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link

claude bot commented Nov 10, 2025

Pull Request Review: AvroError Enum for arrow-avro Crate

Summary

This PR introduces a dedicated AvroError enum to replace the generic ArrowError usage throughout the arrow-avro crate. This is a well-motivated refactoring that improves error handling specificity and provides better error context for Avro-specific operations.


Code Quality & Best Practices

✅ Strengths

  1. Well-designed error hierarchy: The AvroError enum (errors.rs:30-61) provides comprehensive error variants that cover all Avro-specific error cases:

    • EOF, ParseError, SchemaError for Avro-specific errors
    • ArrowError variant for preserving Arrow errors
    • External for wrapping third-party errors
    • NeedMoreData variants for streaming scenarios
  2. Proper error trait implementations:

    • Display trait (errors.rs:63-87) with clear, informative messages
    • Error::source() (errors.rs:89-96) correctly chains error sources
    • Bidirectional conversions between AvroError and ArrowError (errors.rs:138-146)
  3. Consistent refactoring: All error handling throughout the crate has been systematically updated:

    • Reader modules (cursor.rs, block.rs, header.rs, record.rs)
    • Writer modules (encoder.rs, format.rs)
    • Proper use of ? operator with new error types
  4. Improved error semantics:

    • EOF errors are now distinct from ParseError (e.g., cursor.rs:52, 95, 104)
    • Better error detection in reader/mod.rs:503-507 with is_incomplete_data()

⚠️ Areas for Improvement

1. Potential error information loss (errors.rs:142)

AvroError::ArrowError(inner) => ArrowError::from_external_error(inner),

When converting AvroError::ArrowError back to ArrowError, the boxed ArrowError is wrapped as an external error rather than being unwrapped. This could lead to nested error boxing.

Suggestion: Consider unwrapping the boxed ArrowError:

AvroError::ArrowError(inner) => *inner,

2. Error message consistency (errors.rs:78)

The ParseError variant uses "Parse error:" prefix while most errors in the codebase used "Parser error:".

Impact: This is a breaking change in error message format that could affect code parsing error strings (see test update in header.rs:313).

3. Missing #[derive(Clone)]

The AvroError enum doesn't implement Clone, which may limit its usability in some scenarios where errors need to be cloned.

Note: This might be intentional due to External(Box<dyn Error>), but consider if a custom Clone implementation would be beneficial.


Potential Bugs & Issues

⚠️ Mixed error handling in mod.rs

Location: reader/mod.rs:838

fn flush_and_reset(&mut self) -> Result<Option<RecordBatch>> {
    // ... returns Result<_, AvroError>
    batch.map_err(ArrowError::from)  // Line 838
}

The function signature returns Result<_, AvroError> but converts to ArrowError at the call site. This suggests inconsistent error type expectations.

Similar issue at: reader/mod.rs:1178, 1255

Recommendation: Review the error type boundaries between the internal Avro code and the public API to ensure consistent error type propagation.

✅ Correct error propagation

The is_incomplete_data() function (mod.rs:503-507) correctly identifies incomplete data scenarios using the new error variants, which is a significant improvement.


Performance Considerations

✅ No significant performance impact

  1. Boxing overhead: The ArrowError variant uses Box<ArrowError> (errors.rs:43), which adds one level of indirection. This is acceptable given that errors are typically on cold paths.

  2. Error size: The largest variant is likely External(Box<dyn Error>) or ArrowError(Box<ArrowError>), both pointer-sized. The enum should have reasonable size.

  3. Conversion overhead: The From implementations are straightforward and shouldn't add measurable overhead.

💡 Suggestion

Consider adding #[inline] to the From implementations if profiling shows they're frequently called, though this is likely premature optimization.


Security Concerns

✅ No security issues identified

  1. Integer overflow protection: The TryFromIntError conversion (errors.rs:99-103) properly handles integer overflow cases
  2. UTF-8 validation: UTF-8 errors are properly wrapped (errors.rs:111-120)
  3. No unsafe code: All changes use safe Rust

Test Coverage

⚠️ Limited test coverage for new error types

Observed:

  • Only one test update in header.rs:313 to reflect changed error message format
  • No dedicated tests for the new AvroError enum and its conversions
  • No tests for error source chaining

📋 Recommendations

  1. Add unit tests for errors.rs:

    • Test all From implementations
    • Test Display output for each variant
    • Test Error::source() chaining
    • Test bidirectional AvroErrorArrowError conversions
  2. Add integration tests:

    • Verify error types propagate correctly through the reader/writer pipeline
    • Test that is_incomplete_data() correctly identifies all relevant error variants
  3. Example test:

#[test]
fn test_avro_error_conversions() {
    let io_err = std::io::Error::new(std::io::ErrorKind::UnexpectedEof, "test");
    let avro_err: AvroError = io_err.into();
    assert!(matches!(avro_err, AvroError::External(_)));
    
    let arrow_err: ArrowError = avro_err.into();
    // Verify conversion...
}

Additional Comments

📝 Documentation

The module-level documentation (errors.rs:18) could be expanded to include:

  • When to use each error variant
  • Examples of error creation and handling
  • Guidelines for error conversion

🔄 Breaking Changes

This PR introduces breaking changes:

  1. Error message format changes ("Parser error" → "Parse error")
  2. Public API changes if AvroError is exported
  3. Error type changes in function signatures

Ensure these are documented in the CHANGELOG/release notes.

✅ Overall Assessment

This is a high-quality refactoring that improves the error handling architecture of the arrow-avro crate. The implementation is thorough and consistent. With the suggested improvements to error conversion and test coverage, this will be a solid enhancement to the codebase.

Recommended actions before merge:

  1. Fix the error double-boxing issue in AvroErrorArrowError conversion
  2. Add comprehensive unit tests for the errors module
  3. Verify all existing tests pass
  4. Run clippy and rustfmt
  5. Document breaking changes in release notes

Overall Rating: ⭐⭐⭐⭐ (4/5) - Strong implementation with minor improvements needed

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Wrong Index Corrupts Sync Marker Data

The sync marker write index calculation is incorrect. The code uses 16 - to_decode instead of 16 - self.bytes_remaining to determine where to write bytes in the 16-byte sync marker array. This causes bytes to be written to the wrong positions. For example, when bytes_remaining = 16 and to_decode = 5, bytes are written to positions 11-15 instead of 0-4, corrupting the sync marker data.

arrow-avro/src/reader/block.rs#L114-L116

let to_decode = buf.len().min(self.bytes_remaining);
let write = &mut self.in_progress.sync[16 - to_decode..];
write[..to_decode].copy_from_slice(&buf[..to_decode]);

Fix in Cursor Fix in Web


fn from(e: AvroError) -> Self {
match e {
AvroError::External(inner) => ArrowError::from_external_error(inner),
AvroError::ArrowError(inner) => ArrowError::from_external_error(inner),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Avro Error Conversion: Prevent Redundant Wrapping

When converting AvroError::ArrowError(inner) to ArrowError, the code calls ArrowError::from_external_error(inner) which wraps the boxed ArrowError as an external error. This creates unnecessary error nesting and loses the original error type. The correct approach is to unbox and return the ArrowError directly with *inner instead of wrapping it as an external error.

Fix in Cursor Fix in Web

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Bugbot AI reviewer is correct that there is no need to wrap an ArrowError in another ArrowError. Prevents a needless allocation and duplication.

Copy link

@augmentcode augmentcode bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review completed. 3 suggestions posted.

Comment augment review to trigger a new review at any time.

err,
ArrowError::ParseError(msg)
if msg.contains("Unexpected EOF")
AvroError::EOF(_) | AvroError::NeedMoreData(_) | AvroError::NeedMoreDataRange(_)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new is_incomplete_data only matches EOF and NeedMoreData*, but some sites still return AvroError::ParseError for incomplete input (e.g., read_header returns ParseError("Unexpected EOF while reading Avro header")), which will no longer be treated as incomplete. Consider aligning those call sites to use EOF or broadening this check.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct that this should be an AvroError::EOF error instead of an AvroError::ParseError. This will help is_incomplete_data() to detect it!

let writer_schema = hdr
.schema()
.map_err(|e| ArrowError::ExternalError(Box::new(e)))?
.map_err(|e| AvroError::External(Box::new(e)))?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hdr.schema() already returns AvroError; wrapping it in AvroError::External(Box::new(e)) loses the original variant and source. Propagating the AvroError directly preserves error classification and message.

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct that the error is already an AvroError instance, so there is no need to wrap it in another AvroError. Prevents useless complexity.

return out
.write_all(&src_be[extra..])
.map_err(|e| ArrowError::IoError(format!("write decimal fixed: {e}"), e));
.map_err(|e| AvroError::General(format!("write decimal fixed: {e}")));
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping write_all errors to AvroError::General drops the underlying io::Error source; preserving it (e.g., via ? or an External variant) would retain diagnostics (also applies to similar write_all calls below).

🤖 Was this useful? React with 👍 or 👎

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The Augment AI reviewer is correct that by using the Display implementation of the error it will discard its source (std::io::Error). Prevents losing an important information which might be helpful to resolve the root cause of an error.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
arrow-avro/src/reader/mod.rs (1)

679-707: Blocker: AvroError → ArrowError conversion is relying on implicit From/Into; add explicit mapper and use it at all boundaries.

Several places assume ArrowError::from(AvroError)/e.into()/? will work. That impl can’t live in this crate (orphan rules), so this will fail or be brittle. Define a local converter and use map_err(...) where public API returns ArrowError.

Apply the following changes:

  1. Add a small helper (just below is_incomplete_data):
 fn is_incomplete_data(err: &AvroError) -> bool {
   matches!(err, AvroError::EOF(_) | AvroError::NeedMoreData(_) | AvroError::NeedMoreDataRange(_))
 }
+
+// Convert AvroError to ArrowError for public APIs that must return ArrowError.
+fn to_arrow_error(err: AvroError) -> ArrowError {
+  match err {
+    AvroError::ArrowError(e) => *e,
+    // Default: surface as parse error with message
+    other => ArrowError::ParseError(other.to_string()),
+  }
+}
  1. Decoder::decode: avoid e.into() and ? on AvroError:
-                    Err(e) => return Err(e.into()),
+                    Err(e) => return Err(to_arrow_error(e)),
 ...
-            match self.handle_prefix(&data[total_consumed..])? {
+            match self
+                .handle_prefix(&data[total_consumed..])
+                .map_err(to_arrow_error)? {
  1. Decoder::flush:
-        batch.map_err(ArrowError::from)
+        batch.map_err(to_arrow_error)
  1. ReaderBuilder::build:
-        let header = read_header(&mut reader)?;
-        let decoder = self.make_decoder(Some(&header), self.reader_schema.as_ref())?;
+        let header = read_header(&mut reader).map_err(to_arrow_error)?;
+        let decoder = self
+            .make_decoder(Some(&header), self.reader_schema.as_ref())
+            .map_err(to_arrow_error)?;
  1. ReaderBuilder::build_decoder:
-        self.make_decoder(None, self.reader_schema.as_ref())
-            .map_err(ArrowError::from)
+        self.make_decoder(None, self.reader_schema.as_ref())
+            .map_err(to_arrow_error)
  1. Reader::read (all AvroError-returning calls):
-                let consumed = self.block_decoder.decode(buf)?;
+                let consumed = self.block_decoder.decode(buf).map_err(to_arrow_error)?;
 ...
-                    self.block_data = if let Some(ref codec) = self.header.compression()? {
-                        codec.decompress(&block.data)?
+                    self.block_data = if let Some(ref codec) = self.header.compression().map_err(to_arrow_error)? {
+                        codec.decompress(&block.data).map_err(to_arrow_error)?
                     } else {
                         block.data
                     };
 ...
-                let (consumed, records_decoded) = self
-                    .decoder
-                    .decode_block(&self.block_data[self.block_cursor..], self.block_count)?;
+                let (consumed, records_decoded) = self
+                    .decoder
+                    .decode_block(&self.block_data[self.block_cursor..], self.block_count)
+                    .map_err(to_arrow_error)?;
 ...
-        self.decoder.flush_block().map_err(ArrowError::from)
+        self.decoder.flush_block().map_err(to_arrow_error)

This makes conversions explicit and compilation-safe.

Also applies to: 834-839, 1149-1151, 1171-1179, 1219-1256

🧹 Nitpick comments (2)
arrow-avro/src/reader/mod.rs (2)

1013-1020: Don’t wrap an AvroError into AvroError::External here.

Header::schema() already returns Result<_, AvroError>, with parse context. Wrapping that AvroError as External loses semantics.

Use ? directly:

-            let writer_schema = hdr
-                .schema()
-                .map_err(|e| AvroError::External(Box::new(e)))?
+            let writer_schema = hdr
+                .schema()?
                 .ok_or_else(|| {
                     AvroError::ParseError("No Avro schema present in file header".into())
                 })?;

1045-1069: Small tidy: avoid re-calling store.fingerprints() and reuse the snapshot.

You collect let fingerprints = store.fingerprints(); but iterate for fingerprint in store.fingerprints(). Iterate the snapshot to avoid redundant calls and ensure a consistent snapshot.

-        let mut cache = IndexMap::with_capacity(fingerprints.len().saturating_sub(1));
-        let mut active_decoder: Option<RecordDecoder> = None;
-        for fingerprint in store.fingerprints() {
+        let mut cache = IndexMap::with_capacity(fingerprints.len().saturating_sub(1));
+        let mut active_decoder: Option<RecordDecoder> = None;
+        for fingerprint in fingerprints.iter().copied() {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bd40a52 and b40bb9c.

📒 Files selected for processing (9)
  • arrow-avro/src/errors.rs (1 hunks)
  • arrow-avro/src/lib.rs (1 hunks)
  • arrow-avro/src/reader/block.rs (3 hunks)
  • arrow-avro/src/reader/cursor.rs (2 hunks)
  • arrow-avro/src/reader/header.rs (6 hunks)
  • arrow-avro/src/reader/mod.rs (18 hunks)
  • arrow-avro/src/reader/record.rs (81 hunks)
  • arrow-avro/src/writer/encoder.rs (87 hunks)
  • arrow-avro/src/writer/format.rs (7 hunks)
🧰 Additional context used
🧬 Code graph analysis (7)
arrow-avro/src/reader/block.rs (2)
arrow-avro/src/reader/header.rs (1)
  • decode (178-268)
arrow-avro/src/reader/mod.rs (1)
  • decode (678-708)
arrow-avro/src/reader/header.rs (3)
arrow-avro/src/reader/record.rs (4)
  • schema (147-149)
  • decode (152-169)
  • decode (853-991)
  • decode (1536-1566)
arrow-avro/src/schema.rs (4)
  • schema (362-365)
  • schema (456-460)
  • v (1274-1274)
  • e (976-980)
arrow-avro/src/reader/block.rs (1)
  • decode (78-127)
arrow-avro/src/reader/cursor.rs (1)
arrow-avro/src/reader/vlq.rs (1)
  • read_varint (50-61)
arrow-avro/src/writer/format.rs (2)
arrow-avro/src/reader/header.rs (2)
  • schema (116-124)
  • compression (99-113)
arrow-avro/src/writer/encoder.rs (1)
  • write_long (53-67)
arrow-avro/src/reader/record.rs (3)
arrow-avro/src/reader/block.rs (1)
  • decode (78-127)
arrow-avro/src/reader/header.rs (1)
  • decode (178-268)
arrow-avro/src/writer/encoder.rs (12)
  • try_new (1313-1364)
  • try_new (1435-1464)
  • try_new (1486-1508)
  • try_new (1555-1580)
  • try_new (1616-1641)
  • try_new (1668-1693)
  • b (2033-2035)
  • new (725-731)
  • new (1885-1887)
  • new (1924-1938)
  • build (740-766)
  • build (850-1081)
arrow-avro/src/writer/encoder.rs (2)
arrow-avro/src/writer/mod.rs (13)
  • out (1501-1504)
  • out (1547-1550)
  • out (1587-1590)
  • out (1673-1676)
  • out (1732-1735)
  • out (1777-1780)
  • out (1821-1824)
  • out (1865-1868)
  • new (98-105)
  • new (293-295)
  • new (331-333)
  • build (128-172)
  • a (1679-1682)
arrow-avro/src/errors.rs (1)
  • fmt (64-86)
arrow-avro/src/reader/mod.rs (2)
arrow-avro/src/errors.rs (5)
  • from (100-102)
  • from (106-108)
  • from (112-114)
  • from (118-120)
  • from (124-126)
arrow-avro/src/reader/header.rs (1)
  • schema (116-124)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Cursor Bugbot
  • GitHub Check: claude-review
  • GitHub Check: codex
🔇 Additional comments (11)
arrow-avro/src/writer/format.rs (5)

21-21: LGTM! Import updated for new error model.

The import correctly brings in the new AvroError and Result type alias, aligning with the centralized error handling introduced in this PR.


39-44: Public API change: trait method signature updated.

The return type change from Result<(), ArrowError> to Result<()> (using the new AvroError-based type alias) is a breaking change for any external implementors of this trait. This is expected as part of the error model migration.


116-128: LGTM! SOE compression validation updated correctly.

The error construction now uses AvroError::InvalidArgument with a clear, descriptive message. The validation logic remains unchanged and correctly rejects compression for Single-Object Encoding format.


136-145: LGTM! Helper functions updated consistently.

Both write_string and write_bytes now use the new Result<()> type alias with direct error propagation. The explicit Ok(()) at line 144 is clear and correct.


58-97: Fromio::Error for AvroError is correctly implemented.

Verification confirms that From<io::Error> for AvroError exists in arrow-avro/src/errors.rs, converting io::Error to AvroError::External. The direct use of ? on write_all() calls (lines 77, 95) will correctly propagate errors through the From trait implementation. Code is correct as written.

arrow-avro/src/reader/mod.rs (1)

503-508: Confirm AvroError variants exist (EOF/NeedMoreData/NeedMoreDataRange).

is_incomplete_data matches these variants. If any were renamed/removed in errors.rs, the guard will never trigger and decoding will wrongly fail instead of asking for more bytes. Please confirm the variants’ exact names/fields.

arrow-avro/src/reader/record.rs (4)

24-24: LGTM: Clean import of new error types.

The import of AvroError and Result from the errors module enables the consistent error handling refactoring throughout this file.

Also applies to: 35-38


140-143: LGTM: Appropriate use of AvroError variants.

The code uses distinct AvroError variants appropriately:

  • ParseError for data parsing issues (e.g., invalid record type, decimal overflow)
  • SchemaError for schema validation issues (e.g., union branch count limits)
  • NYI for not-yet-implemented features (e.g., sparse unions)
  • InvalidArgument for configuration errors (e.g., unsupported RunEndEncoded width)

Error messages are descriptive and include relevant context.

Also applies to: 372-375, 388-391, 455-460, 466-471, 483-486, 496-500


4323-4338: LGTM: Test functions updated to use Result type.

Test functions now return Result<()> (which resolves to Result<(), AvroError>), enabling the use of the ? operator for cleaner error propagation in tests. This is consistent with the centralized error handling approach.

Also applies to: 4342-4366, 4370-4395, 4399-4427, 4430-4442


118-144: LGTM: Public API updated to use centralized error types with proper error conversion support.

The public methods try_new_with_options, decode, and flush now return Result<T> using the crate's error type alias. The error conversions at line 178 use .map_err(Into::into) which is supported by the confirmed From<ArrowError> for AvroError implementation at arrow-avro/src/errors.rs:123.

arrow-avro/src/writer/encoder.rs (1)

21-21: LGTM! Clean error handling migration.

The refactoring from ArrowError to AvroError throughout this encoder module is well-executed and consistent. Key observations:

  • All helper functions (write_long, write_int, write_len_prefixed, write_bool, write_sign_extended) correctly return Result<()> using the new error type
  • Error variant choices are appropriate: InvalidArgument for validation/overflow, SchemaError for type mismatches, NYI for unimplemented features, General for I/O errors
  • Overflow checks in time conversion functions (lines 1225-1227, 1238-1240) properly use checked_mul with descriptive error messages
  • All encoder constructors follow the try_new pattern returning Result<Self>
  • Comprehensive test coverage validates the error type changes (tests explicitly check for AvroError::InvalidArgument and AvroError::SchemaError)

The migration maintains the existing logic while improving error handling consistency across the crate.

Also applies to: 53-67, 70-72, 75-79, 82-85, 131-184, 191-199, 227-646, 740-836, 850-1082, 1143-1969

Comment on lines +138 to +144
impl From<AvroError> for ArrowError {
fn from(e: AvroError) -> Self {
match e {
AvroError::External(inner) => ArrowError::from_external_error(inner),
AvroError::ArrowError(inner) => ArrowError::from_external_error(inner),
other => ArrowError::AvroError(other.to_string()),
}
Copy link

@coderabbitai coderabbitai bot Nov 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Preserve original ArrowError variant when converting back

AvroError::ArrowError(inner) currently re-emits as ArrowError::ExternalError, so any downstream code that previously matched on precise ArrowError variants (e.g., SchemaError, ComputeError) will now only see ExternalError, breaking existing error handling. Just return the boxed ArrowError you already own to keep behavior identical to the pre-refactor code.

 impl From<AvroError> for ArrowError {
     fn from(e: AvroError) -> Self {
         match e {
             AvroError::External(inner) => ArrowError::from_external_error(inner),
-            AvroError::ArrowError(inner) => ArrowError::from_external_error(inner),
+            AvroError::ArrowError(inner) => *inner,
             other => ArrowError::AvroError(other.to_string()),
         }
     }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
impl From<AvroError> for ArrowError {
fn from(e: AvroError) -> Self {
match e {
AvroError::External(inner) => ArrowError::from_external_error(inner),
AvroError::ArrowError(inner) => ArrowError::from_external_error(inner),
other => ArrowError::AvroError(other.to_string()),
}
impl From<AvroError> for ArrowError {
fn from(e: AvroError) -> Self {
match e {
AvroError::External(inner) => ArrowError::from_external_error(inner),
AvroError::ArrowError(inner) => *inner,
other => ArrowError::AvroError(other.to_string()),
}
}
}
🤖 Prompt for AI Agents
In arrow-avro/src/errors.rs around lines 138 to 144, the
AvroError::ArrowError(inner) arm currently re-wraps the boxed ArrowError as an
ExternalError; change it to return the original boxed ArrowError (unbox/return
the inner ArrowError directly) so existing downstream pattern matches
(SchemaError, ComputeError, etc.) continue to work unchanged.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

value:good-to-have; category:bug; feedback: The CodeRabbit AI reviewer is correct that there is no need to wrap an ArrowError in another ArrowError. Prevents a needless allocation and duplication.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martin-augment, thank you for confirming! I'm glad the analysis is helpful. The suggestion to return *inner directly will indeed avoid the unnecessary re-wrapping and preserve the original ArrowError variant for downstream consumers. Feel free to apply the suggested change when you're ready to address it.

@github-actions
Copy link

Findings

  • arrow-avro/src/errors.rs:141 – impl From<AvroError> for ArrowError currently maps the AvroError::ArrowError(inner) case to ArrowError::ExternalError. This loses the original ArrowError variant (e.g. SchemaError, InvalidArgumentError) and will surface as an ExternalError to every caller of the reader APIs where we previously returned the concrete variant. That breaks existing matching logic in callers/tests that expect the specific ArrowError variants. Please return *inner instead so the original Arrow error propagates unchanged.

@martin-augment
Copy link
Owner Author

1. Potential error information loss (errors.rs:142)

AvroError::ArrowError(inner) => ArrowError::from_external_error(inner),

When converting AvroError::ArrowError back to ArrowError, the boxed ArrowError is wrapped as an external error rather than being unwrapped. This could lead to nested error boxing.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct that there is no need to wrap an ArrowError in another ArrowError. Prevents a needless allocation and duplication.

@martin-augment
Copy link
Owner Author

  • arrow-avro/src/errors.rs:141 – impl From<AvroError> for ArrowError currently maps the AvroError::ArrowError(inner) case to ArrowError::ExternalError. This loses the original ArrowError variant (e.g. SchemaError, InvalidArgumentError) and will surface as an ExternalError to every caller of the reader APIs where we previously returned the concrete variant. That breaks existing matching logic in callers/tests that expect the specific ArrowError variants. Please return *inner instead so the original Arrow error propagates unchanged.

value:good-to-have; category:bug; feedback: The Codex AI reviewer is correct that there is no need to wrap an ArrowError in another ArrowError. Prevents a needless allocation and duplication.

@martin-augment
Copy link
Owner Author

Mixed error handling in mod.rs

Location: reader/mod.rs:838

fn flush_and_reset(&mut self) -> Result<Option<RecordBatch>> {
    // ... returns Result<_, AvroError>
    batch.map_err(ArrowError::from)  // Line 838
}

The function signature returns Result<_, AvroError> but converts to ArrowError at the call site. This suggests inconsistent error type expectations.

value:good-to-have; category:bug; feedback: The Claude AI reviewer is correct that some methods wrongly still use ArrowError in their return type instead of the new specialized AvroError type. The finding prevents code inconsistency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants