-
Notifications
You must be signed in to change notification settings - Fork 860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add tests that arrow IPC data is validated #7096
base: main
Are you sure you want to change the base?
Conversation
Add tests for invalid arrays
@@ -1744,27 +1745,73 @@ mod tests { | |||
}); | |||
} | |||
|
|||
fn roundtrip_ipc(rb: &RecordBatch) -> RecordBatch { | |||
/// Write the record batch to an in-memory buffer in IPC File format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just refactored these helpers into smaller chunks
|
||
/// Return the first record batch read from the IPC File buffer | ||
/// using the FileDecoder API | ||
fn read_ipc_with_decoder(buf: Vec<u8>) -> Result<RecordBatch, ArrowError> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every time I find myself using FileDecoder
I end up copy/pasting the example.
I am starting to think having something like
arrow-rs/arrow/examples/zero_copy_ipc.rs
Lines 84 to 96 in 468e992
/// Incrementally decodes [`RecordBatch`]es from an IPC file stored in a Arrow | |
/// [`Buffer`] using the [`FileDecoder`] API. | |
/// | |
/// This is a wrapper around the example in the `FileDecoder` which handles the | |
/// low level interaction with the Arrow IPC format. | |
struct IPCBufferDecoder { | |
/// Memory (or memory mapped) Buffer with the data | |
buffer: Buffer, | |
/// Decoder that reads Arrays that refers to the underlying buffers | |
decoder: FileDecoder, | |
/// Location of the batches within the buffer | |
batches: Vec<Block>, | |
} |
(given I am about to go copy/paste it again into comet...)
@@ -2492,4 +2539,109 @@ mod tests { | |||
assert_eq!(decoded_batch.expect("Failed to read RecordBatch"), batch); | |||
}); | |||
} | |||
|
|||
#[test] | |||
fn test_validation_of_invalid_list_array() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any thoughts on how I can make an invalid PrimitiveArray
would be most apprecaited.
Anything i tried to mismatch the len and the actual data, resulted in panics in bounds checks (even with the try_new_unchecked
).
I think it is probably a good thing that it is so hard to create invalid arrays but it would be nice to test this path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayData will normally let you do various inadvisable things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I tried to do was make null buffers that had different sizes than the underlying values -- that actually failed the ArrayData creation checks, but when it was re-read via IPC the resulting arrays were fine.
Maybe it was due to padding or something. I'll play around with it some more
let err = read_ipc(&buf).unwrap_err(); | ||
assert_eq!(err.to_string(), expected_err); | ||
|
||
// TODO verify there is no error when validation is disabled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to add a few more lines here when we disable validation
Which issue does this PR close?
Rationale for this change
To test disabling validation in the IPC reader I first need to show there is validation actually occuring.
It is also probably good in general to have tests showing we validate data when read
What changes are included in this PR?
Add tests that show when invalid arrow data is written to IPC files, the StreamReader, FileReader and FileDecoder catch and verify those errors
Are there any user-facing changes?
This is entirely tests, no changes in functionality