Releases: apache/arrow-rs
Releases · apache/arrow-rs
arrow 54.1.0
Changelog
54.1.0 (2025-01-29)
Implemented enhancements:
- Create GitHub releases automatically on tagging #7041
- Add required methods to access inner builder for
NullBufferBuilder
#7002 [arrow] - Re-export
NullBufferBuilder
in the arrow crate #6975 [arrow] arrow-string
function should support binary input as well #6923 [arrow]- MMap support for IPC files #6709 [arrow]
- fix: mark (Large)ListView as nested and support in equal data type #6995 [arrow] (rluvaton)
- Expose min/max values for Decimal128/256 and improve docs #6992 [arrow] (alamb)
- [Parquet] Improve speed of dictionary encoding NaN float values #6953 [parquet] (adamreeve)
- Optimize
BooleanBufferBuilder
for non nullable columns #6973 [arrow] arrow::compute::concat
should merge dictionary type when concatenating list of dictionaries #6888 [arrow]- Improve error message for unsupported cast between struct and other types #6724 [arrow]
- implement regexp_match, regexp_scalar_match and regexp_array_match for StringViewArray #6717 [arrow]
- Speed up Parquet utf8 validation #6667 [parquet]
Fixed bugs:
- Regression: Concatenating sliced
ListArray
s is broken #7034 PrimitiveDictionaryBuilder
with specific value data type and capacity #7011 [arrow]- Arrow IPC Writer Panics for sliced nested arrays #6997 [arrow]
- RecordBatch with no columns cannot be roundtripped through Parquet #6988 [parquet]
- StringView: Using the Interleave kernel (and potentially others) results in many repeated buffers in variadic_buffers #6780 [arrow]
- fix prefetch of page index #6999 [parquet] (adriangb)
- fix: Parquet column writer
Dictionary(_, Decimal128)
andDictionary(_, Decimal256)
#6987 [parquet] (korowa) - Writing floating point values containing NaN to Parquet is slow when using dictionary encoding #6952 [parquet] [arrow]
- Public API using private types:
Buffer::from_bytes
takes unexportedBytes
#6754 [parquet] [arrow] [arrow-flight] - Some MSRVs are inaccurate #6741 [parquet] [arrow] [arrow-flight]
Documentation updates:
- docs: add to bit slice iterator docs that the start value is inclusive and end value is exclusive #7022 [arrow] (rluvaton)
- Fix duplicate link references in README #7020 (Jefffrey)
- Enhance ListViewArray related docs #7007 [arrow] (Jefffrey)
- Document data type support and examples to predicates
*like
,starts_with
,ends_with
,contains
#7003 [arrow] (alamb) - Minor: improve documentation on timezone representations #7000 [arrow] (alamb)
- Add additional documentation for UTC representation of timestamps #6994 [arrow] (Abdullahsab3)
- Improve
ParquetRecordBatchStreamBuilder
docs / examples #6948 [parquet] (alamb) - Document the
ParquetRecordBatchStream
buffering #6947 [parquet] (alamb) - Minor: improve
zip
kernel docs, add examples #6928 [arrow] (alamb) - Add doctest example for
Buffer::from_bytes
#6920 [arrow] (kylebarron) - [object store] Add planned object_store release schedule to crate readme #6904 (alamb)
- Avoid panics? #6737 [parquet]
Merged pull requests:
- Create GitHub releases automatically on tagging #7042 (kou)
- Fix
concat
for slicedListArrays
#7037 [arrow] (alamb) - Minor: Clarify NullBufferBuilder::new capacity parameter #7016 [arrow] (alamb)
- Add
is_valid
andtruncate
methods toNullBufferBuilder
#7013 [arrow] (Chen-Yuan-Lai) - fix: use the values builder capacity for the hash map in
PrimitiveDictionaryBuilder::new_from_builders
#7012 [arrow] (rluvaton) - Refactor ipc reading code into methods on
ArrayReader
#7006 [arrow] (alamb) - Minor: make it clear Predicate is crate private #7001 [arrow] (alamb)
- fix: Panic on reencoding offsets in arrow-ipc with sliced nested arrays #6998 [arrow] (HawaiianSpork)
- Add check for empty schema in
parquet::schema::types::from_thrift_helper
#6990 [parquet] ([etseidl](https://github.com/...
53.4.0
Changelog
53.4.0 (2025-01-14)
Merged pull requests:
- fix clippy (#6791) (#6940)
- fix: decimal conversion looses value on lower precision (#6836) (#6936)
- perf: Use Cow in get_format_string in FFI_ArrowSchema (#6853) (#6937)
- fix: Encoding of List offsets was incorrect when slice offsets begin …
- [arrow-cast] Support cast numeric to string view (alternate) (#6816) (#…
- Enable matching temporal as from_type to Utf8View (#6872) (#6956)
- [arrow-cast] Support cast boolean from/to string view (#6822) (#6957)
- [53.0.0_maintenance] Fix CI (#6964)
- Add Array::shrink_to_fit(&mut self) to 53.4.0 (#6790) (#6817) (#6962)
Update version to 54.0.0, add CHANGELOG (#6894)
Changelog
54.0.0 (2024-12-18)
Breaking changes:
- avoid redundant parsing of repeated value in RleDecoder #6834 [parquet] (jp0317)
- Handling nullable DictionaryArray in CSV parser #6830 [arrow] (edmondop)
- fix(flightsql): remove Any encoding of DoPutUpdateResult #6825 [arrow] [arrow-flight] (davisp)
- arrow-ipc: Default to not preserving dict IDs #6788 [arrow] (brancz)
- Remove some very old deprecated functions #6774 [parquet] [arrow] (alamb)
- update to pyo3 0.23.0 #6745 [arrow] (psvri)
- Remove APIs deprecated since v 4.4.0 #6722 [arrow] [arrow-flight] (findepi)
- Return
None
when Parquet page indexes are not present in file #6639 [parquet] (etseidl) - Add
ParquetError::NeedMoreData
markParquetError
asnon_exhaustive
#6630 [parquet] (etseidl) - Remove APIs deprecated since v 2.0.0 #6609 [arrow] (findepi)
Implemented enhancements:
- Parquet schema hint doesn't support integer types upcasting #6891 [parquet]
- Parquet UTF-8 max statistics are overly pessimistic #6867 [parquet]
- Add builder support for Int8 keys #6844 [arrow]
- Formalize the name of the nested
Field
in a list #6784 [parquet] [arrow] [arrow-flight] - Allow disabling the writing of Parquet Offset Index #6778 [parquet]
parquet::record::make_row
is not exposed to users, leaving no option to users to manually createRow
objects #6761 [parquet]- Avoid
from_num_days_from_ce_opt
calls intimestamp_s_to_datetime
if we don't need #6746 [arrow] - Support Temporal -> Utf8View casting #6734 [arrow]
- Add Option To Coerce List Type on Parquet Write #6733 [parquet] [arrow]
- Support Numeric -> Utf8View casting #6714 [arrow]
- Support Utf8View <=> boolean casting #6713 [arrow]
Fixed bugs:
Buffer::bit_slice
loses length with byte-aligned offsets #6895 [arrow]- parquet arrow writer doesn't track memory size correctly for fixed sized lists #6839 [parquet]
- Casting Decimal128 to Decimal128 with smaller precision produces incorrect results in some cases #6833 [arrow]
- Should empty nullable dictionary be parsed as null from arrow-csv? #6821 [arrow]
- Array take doesn't make fields nullable #6809
- Arrow Flight Encodes a Slice's List Offsets If the slice offset is starts with zero #6803 [arrow]
- Parquet readers incorrectly interpret legacy nested lists #6756 [parquet]
- filter_bits under-allocates resulting boolean buffer #6750 [arrow]
- Multi-language support issues with Arrow FlightSQL client's execute_update and execute_ingest methods #6545 [arrow] [arrow-flight]
Documentation updates:
- Should we document at what rate deprecated APIs are removed? #6851 [parquet] [arrow]
- Fix docstring for
Format::with_header
inarrow-csv
#6856 [arrow] (kylebarron) - Add deprecation / API removal policy #6852 [parquet] [arrow] (alamb)
- Minor: add example for creating
SchemaDescriptor
#6841 [parquet] (alamb) - chore: enrich panic context when BooleanBuffer fails to create #6810 [arrow] (tisonkun)
Closed issues:
- [FlightSQL] GetCatalogsBuilder does not sort the catalog names #6807 [arrow] [arrow-flight]
- Add a lint to automatically check for unused dependencies #6796 [arrow] [arrow-flight]
Merged pull requests:
- doc: add comment for timezone string #6899 [arrow] (xxchan)
- docs: fix typo #6890 [arrow] (rluvaton)
- Minor: Fix deprecation notice for
arrow_to_parquet_schema
#6889 [parquet] (etseidl) - Add Field::with_dict_is_ordered #6885 [arrow] (alamb)
- Deprecate "max statistics size" property in
WriterProperties
#6884 [parquet] (etseidl) - Add deprecation warnings for everything related to
dict_id
#6873 [[parquet](https://github.com...
Prepare for 53.3.0 release (#6739)
Changelog
53.3.0 (2024-11-17)
Implemented enhancements:
PartialEq
of GenericByteViewArray (StringViewArray / ByteViewArray) that compares on equality rather than logical value #6679 [arrow]- Need a mechanism to handle schema changes due to dictionary hydration in FlightSQL server implementations #6672 [arrow] [arrow-flight]
- Support encoding Utf8View columns to JSON #6642 [arrow]
- Implement
append_n
forBooleanBuilder
#6634 [arrow] - Some take optimizations #6621 [arrow]
- Error Instead of Panic On Attempting to Write More Than 32769 Row Groups #6591 [parquet]
- Make casting from a timestamp without timezone to a timestamp with timezone configurable #6555
- Add
record_batch!
macro for easy record batch creation #6553 [arrow] - Support
Binary
-->Utf8View
casting #6531 [arrow] downcast_primitive_array
anddowncast_dictionary_array
are not hygienic wrt imports #6400 [arrow]- Implement interleave_record_batch #6731 [arrow] (waynexia)
- feat:
record_batch!
macro #6588 [arrow] (ByteBaker)
Fixed bugs:
- Signed decimal e-notation parsing bug #6728 [arrow]
- Add support for Utf8View -> numeric in can_cast_types #6715
- IPC file writer produces incorrect footer when not preserving dict ID #6710 [arrow]
- parquet from_thrift_helper incorrectly checks index #6693 [parquet]
- Primitive REPEATED fields not contained in LIST annotated groups aren't read as lists by record reader #6648 [parquet]
- DictionaryHandling does not recurse into Map fields #6644 [arrow] [arrow-flight]
- Array writer output empty when no record is written #6613 [arrow]
- Archery Integration Test with c# failing on main #6577 [arrow]
- Potential unsoundness in
filter_run_end_array
#6569 [arrow] - Parquet reader can generate incorrect validity buffer information for nested structures #6510 [parquet]
- arrow-array ffi: FFI_ArrowArray.null_count is always interpreted as unsigned and initialized during conversion from C to Rust. #6497 [arrow]
Documentation updates:
- Minor: Document pattern for accessing views in StringView #6673 [arrow] (alamb)
- Improve Array::is_nullable documentation #6615 [arrow] (findepi)
- Minor: improve docs for ByteViewArray->ByteArray From impl #6610 [arrow] (alamb)
Performance improvements:
Closed issues:
- Incorrect like results for pattern starting/ending with
%
percent and containing escape characters #6702 [arrow]
Merged pull requests:
- Fix signed decimal e-notation parsing #6729 [arrow] (gruuya)
- Clean up some arrow-flight tests and duplicated code #6725 [arrow] [arrow-flight] (itsjunetime)
- Update PR template section about API breaking changes #6723 (findepi)
- Support for casting
StringViewArray
toDecimalArray
#6720 [arrow] (tlm365) - File writer preserve dict bug #6711 [arrow] (brancz)
- Add filter_kernel benchmark for run array #6706 [arrow] (delamarch3)
- Fix string view ILIKE checks with NULL values #6705 [arrow] (findepi)
- Implement logical_null_count for more array types #6704 [arrow] (findepi)
- Fix LIKE with escapes #6703 [arrow] (findepi)
- Speed up
filter_bytes
#6699 [arrow] (Dandandan) - Minor: fix misleading comment in byte view #6695 [arrow] (jayzhan211)
- minor fix on checking index #6694 [parquet] (jp0317)
- Undo run end filter performance regression #6691 [arrow] (delamarch3)
- Reimplement
PartialEq
ofGenericByteViewArray
compares by logical value #6689 [arrow] (tlm365) - feat: expose known_schema from FlightDataEncoder #6688 [arrow] [arrow-flight] (nathanielc)
- Update hashbrown requirement from 0.14.2 to 0.15.1 #6684 [parquet] [arrow] (dependabot[bot])
- Support Duration in JSON Reader #6683 [[arrow](https://github.com...
Prepare for 53.2.0 release (#6603)
Changelog
53.2.0 (2024-10-21)
Implemented enhancements:
- Implement arrow_json encoder for Decimal128 & Decimal256 DataTypes #6605 [arrow]
- Support DataType::FixedSizeList in make_builder within struct_builder.rs #6594 [arrow]
- Support DataType::Dictionary in
make_builder
within struct_builder.rs #6589 [arrow] - Interval parsing from string - accept "mon" and "mons" token #6548 [arrow]
AsyncArrowWriter
API to get the total size of a written parquet file #6530 [parquet]append_many
for Dictionary builders #6529 [arrow]- Missing tonic
GRPC_STATUS
with tonic 0.12.1 #6515 [arrow] [arrow-flight] - Add example of how to use parquet metadata reader APIs for a local cache #6504 [parquet]
- Remove reliance on
raw-entry
feature of Hashbrown #6498 [parquet] [arrow] [arrow-flight] - Improve page index metadata loading in
SerializedFileReader::new_with_options
#6491 [parquet] - Release arrow-rs / parquet minor version
53.1.0
(October 2024) #6340 [arrow]
Fixed bugs:
- Compilation fail where
c_char = u8
#6571 [arrow] - Arrow flight CI test failing on
master
#6568 [arrow] [arrow-flight]
Documentation updates:
Closed issues:
Merged pull requests:
- Minor: more comments for
RecordBatch.get_array_memory_size()
#6607 [arrow] (2010YOUY01) - Implement arrow_json encoder for Decimal128 & Decimal256 #6606 [arrow] (phillipleblanc)
- Add support for building FixedSizeListBuilder in struct_builder's mak… #6595 [arrow] (kszlim)
- Add limited support for dictionary builders in
make_builders
for stru… #6593 [arrow] (kszlim) - Fix CI with new valid certificates and add script for future usage #6585 [arrow] [arrow-flight] (itsjunetime)
- Update proc-macro2 requirement from =1.0.87 to =1.0.88 #6579 [arrow] [arrow-flight] (dependabot[bot])
- Fix clippy complaints #6573 [parquet] [arrow] [arrow-flight] (itsjunetime)
- Use c_char instead of i8 to compile on platforms where c_char = u8 #6572 [arrow] (itsjunetime)
- Bump pyspark from 3.3.1 to 3.3.2 in /parquet/pytest #6564 [parquet] (dependabot[bot])
unsafe
improvements #6551 [arrow] (ssbr)- Update README.md #6550 [arrow] [arrow-flight] (Abdullahsab3)
- Fix string '0' cast to decimal with scale 0 #6547 [arrow] (findepi)
- Add finish to
AsyncArrowWriter::finish
#6543 [parquet] (etseidl) - Add append_nulls to dictionary builders #6542 [arrow] (adriangb)
- Improve UnionArray::is_nullable #6540 [arrow] (tustvold)
- Allow to read parquet binary column as UTF8 type #6539 [parquet] (goldmedal)
- Use HashTable instead of raw_entry_mut #6537 [parquet] [arrow] (tustvold)
- Add append_many to dictionary arrays to allow adding repeated values #6534 [arrow] (adriangb)
- Adds documentation and example recommending Vec<ArrayRef> over ChunkedArray #6527 [arrow] (efredine)
- Update proc-macro2 requirement from =1.0.86 to =1.0.87 #6526 [arrow] [arrow-flight] (dependabot[bot])
- Add
ColumnChunkMetadataBuilder
clear APIs #6523 [parquet] (alamb) - Update sysinfo requirement from 0.31.2 to 0.32.0 #6521 [parquet] (dependabot[bot])
- Update Tonic to 0.12.3 #6517 [arrow] [arrow-flight] (cisaacson)
- Detect missing page indexes while reading Parquet metadata #6507 [parquet] (etseidl)
- Use ParquetMetaDataReader to load page indexes in
SerializedFileReader::new_with_options
#6506 [parquet] (etseidl) - Improve parquet
MetadataFetch
andAsyncFileReader
docs [...
Prepare for 53.1.0 release (CHANGELOG and version) (#6501)
Changelog
53.1.0 (2024-10-02)
Implemented enhancements:
- Write null counts in Parquet statistics when they are known to be zero #6502 [parquet]
- Make it easier to find / work with
ByteView
#6478 [arrow] - Update lexical-core version due to soundness issues with current version #6468
- Add builder style API for manipulating
ParquetMetaData
#6465 [parquet] ArrayData.align_buffers
should supportStruct
data type / child data #6461 [arrow]- Add a method to return the number of skipped rows in a
RowSelection
#6428 [parquet] - Bump lexical-core to 1.0 #6397 [arrow]
- Add union_extract kernel #6386 [arrow]
- implement
regexp_is_match_utf8
andregexp_is_match_utf8_scalar
forStringViewArray
#6370 [arrow] - Add support for BinaryView in arrow_string::length #6358 [arrow]
- Add
as_union
toAsArray
#6351 - Ability to append non contiguous strings to
StringBuilder
#6347 [arrow] - Add Catalog DB Schema subcommands to
flight_sql_client
#6331 [arrow] [arrow-flight] - Add support for Utf8View in arrow_string::length #6305 [arrow]
- Reading FIXED_LEN_BYTE_ARRAY columns with nulls is inefficient #6296 [parquet]
- Optionally verify 32-bit CRC checksum when decoding parquet pages #6289 [parquet]
- Speed up
pad_nulls
forFixedLenByteArrayBuffer
#6297 [parquet] (etseidl) - Improve performance of set_bits by avoiding to set individual bits #6288 [arrow] (kazuyukitanimura)
Fixed bugs:
- BitIterator panics when retrieving length #6480 [arrow]
- Flight data retrieved via Python client (wrapping C++) cannot be used by Rust Arrow #6471 [arrow]
- CI integration test failing: Archery test With other arrows #6448 [parquet] [arrow] [arrow-flight]
- IPC not respecting not preserving dict ID #6443 [parquet] [arrow] [arrow-flight]
- Failing CI: Prost requires Rust 1.71.1 #6436 [arrow] [arrow-flight]
- Invalid struct arrays in IPC data causes panic during read #6416 [arrow]
- REE Dicts cannot be encoded/decoded with streaming IPC #6398 [arrow]
- Reading json
map
with non-nullable value schema doesn't error if values are actually null #6391 - StringViewBuilder with deduplication does not clear observed values #6384 [arrow]
- Cast from Decimal(p, s) to dictionary-encoded Decimal(p, s) loses precision and scale #6381 [arrow]
- LocalFileSystem
list
operation returns objects in wrong order #6375 compute::binary_mut
returnsErr(PrimitiveArray<T>)
only with certain arrays #6374 [arrow]- Exporting Binary/Utf8View from arrow-rs to pyarrow fails #6366 [arrow]
- warning: methods
as_any
andnext_batch
are never used inparquet
crate #6143 [parquet]
Documentation updates:
- chore: add docs, part of #37 #6496 [parquet] [arrow] [arrow-flight] (ByteBaker)
- Minor: improve
ChunkedReader
docs #6477 [parquet] (alamb) - Minor: Add some missing documentation to fix CI errors #6445 [arrow] (etseidl)
- Fix doc "bit width" to "byte width" #6434 [arrow] (kylebarron)
- chore: add docs, part of #37 #6433 [arrow] (ByteBaker)
- chore: add docs, part of #37 #6424 [arrow] (ByteBaker)
- Rephrase doc comment #6421 [parquet] [arrow] [arrow-flight] (waynexia)
- Remove "NOT YET FULLY SUPPORTED" comment from DataType::Utf8View/BinaryView #6380 [arrow] (alamb)
- Improve
GenericStringBuilder
documentation #6372 [arrow] (alamb)
Closed issues:
- Columnar json writer for arrow-json #6411
- Primitive
binary
/unary
are not as fast as they could be #6364 [arrow] - Different numeric type may be able to compare #6357
Merged pull requests:
53.0.0
Changelog
53.0.0 (2024-08-31)
Breaking changes:
- parquet_derive: Match fields by name, support reading selected fields rather than all #6269 (double-free)
- Update parquet object_store dependency to 0.11.0 #6264 [parquet] (alamb)
- parquet Statistics - deprecate
has_*
APIs and add_opt
functions that returnOption<T>
#6216 [parquet] (Michael-J-Ward) - Expose bulk ingest in flight sql client and server #6201 [arrow] [arrow-flight] (djanderson)
- Upgrade protobuf definitions to flightsql 17.0 (#6133) #6169 [arrow-flight] (alamb)
- Remove automatic buffering in
ipc::reader::FileReader
for for consistent buffering #6132 [arrow] (V0ldek) - No longer write Parquet column metadata after column chunks *and* in the footer #6117 [parquet] (etseidl)
Implemented enhancements:
- Derive
PartialEq
andEq
forparquet::arrow::ProjectionMask
#6329 [parquet] - Allow converting empty
pyarrow.RecordBatch
toarrow::RecordBatch
#6318 [arrow] - Parquet writer should not write any min/max data to ColumnIndex when all values are null #6315 [parquet]
- Parquet: Add
union
method toRowSelection
#6307 [parquet] - Support writing
UTC adjusted time
arrow array to parquet #6277 [parquet] - A better way to resize the buffer for the snappy encode/decode #6276 [parquet]
- parquet_derive: support reading selected columns from parquet file #6268
- Tests for invalid parquet files #6261 [parquet]
- Implement
date_part
forDuration
#6245 [arrow] - Avoid unnecessary null buffer construction when converting arrays to a different type #6243 [parquet] [arrow]
- Add
parquet_opendal
in related projects #6235 - Look into optimizing reading FixedSizeBinary arrays from parquet #6219 [parquet] [arrow]
- Add benchmarks for
BYTE_STREAM_SPLIT
encoded ParquetFIXED_LEN_BYTE_ARRAY
data #6203 [parquet] - Make it easy to write parquet to object_store -- Implement
AsyncFileWriter
for a type that implementsobj_store::MultipartUpload
forAsyncArrowWriter
#6200 [parquet] - Remove test duplication in parquet statistics tets #6185 [parquet]
- Support BinaryView Types in C Schema FFI #6170 [arrow]
- speedup take_byte_view kernel #6167 [arrow]
- Add support for
StringView
andBinaryView
statistics inStatisticsConverter
#6164 [parquet] - Support casting
BinaryView
-->Utf8
andLargeUtf8
#6162 [arrow] - Implement
filter
kernel specially forFixedSizeByteArray
#6153 [arrow] - Use
LevelHistogram
throughout Parquet metadata #6134 [parquet] - Support DoPutStatementIngest from Arrow Flight SQL 17.0 #6124 [arrow] [arrow-flight]
- ColumnMetaData should no longer be written inline with data #6115 [parquet]
- Implement date_part for
Interval
#6113 [arrow] - Implement
Into<Arc<dyn Array>>
forArrayData
#6104 - Allow flushing or non-buffered writes from
arrow::ipc::writer::StreamWriter
#6099 [arrow] - Default block_size for
StringViewArray
#6094 [arrow] - Remove
Statistics::has_min_max_set
andValueStatistics::has_min_max_set
and useOption
instead #6093 [parquet] - Upgrade arrow-flight to tonic 0.12 #6072
- Improve speed of row converter by skipping utf8 checks #6058 [arrow]
- Extend support for BYTE_STREAM_SPLIT to FIXED_LEN_BYTE_ARRAY, INT32, and INT64 primitive types #6048 [parquet]
- Release arrow-rs / parquet minor version
52.2.0
(August 2024) #5998 [parquet] [arrow]
Fixed bugs:
- Invalid
ColumnIndex
written in parquet #6310 [parquet] - comparison_kernels benchmarks panic #6283 [arrow]
- Printing schema metadata includes possibly incorrect compression level #6270 [parquet]
- Don't panic when creating
Field
fromFFI_ArrowSchema
with no name #6251 [arrow] - lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported #6226 [arrow]
- Parquet Statistics null_count does not distinguish between
0
and not specified #6215 [parquet] - Using a take kernel on a dense union can result in reaching "unreachable" code #6206 [arrow]
- Adding sub day seconds to Date64 is ignored. #6198 [[arrow](https://githu...