Skip to content

Releases: apache/arrow-rs

arrow 54.1.0

29 Jan 13:41
3bf29a2
Compare
Choose a tag to compare

Changelog

54.1.0 (2025-01-29)

Full Changelog

Implemented enhancements:

  • Create GitHub releases automatically on tagging #7041
  • Add required methods to access inner builder for NullBufferBuilder #7002 [arrow]
  • Re-export NullBufferBuilder in the arrow crate #6975 [arrow]
  • arrow-string function should support binary input as well #6923 [arrow]
  • MMap support for IPC files #6709 [arrow]
  • fix: mark (Large)ListView as nested and support in equal data type #6995 [arrow] (rluvaton)
  • Expose min/max values for Decimal128/256 and improve docs #6992 [arrow] (alamb)
  • [Parquet] Improve speed of dictionary encoding NaN float values #6953 [parquet] (adamreeve)
  • Optimize BooleanBufferBuilder for non nullable columns #6973 [arrow]
  • arrow::compute::concat should merge dictionary type when concatenating list of dictionaries #6888 [arrow]
  • Improve error message for unsupported cast between struct and other types #6724 [arrow]
  • implement regexp_match, regexp_scalar_match and regexp_array_match for StringViewArray #6717 [arrow]
  • Speed up Parquet utf8 validation #6667 [parquet]

Fixed bugs:

  • Regression: Concatenating sliced ListArrays is broken #7034
  • PrimitiveDictionaryBuilder with specific value data type and capacity #7011 [arrow]
  • Arrow IPC Writer Panics for sliced nested arrays #6997 [arrow]
  • RecordBatch with no columns cannot be roundtripped through Parquet #6988 [parquet]
  • StringView: Using the Interleave kernel (and potentially others) results in many repeated buffers in variadic_buffers #6780 [arrow]
  • fix prefetch of page index #6999 [parquet] (adriangb)
  • fix: Parquet column writer Dictionary(_, Decimal128) and Dictionary(_, Decimal256) #6987 [parquet] (korowa)
  • Writing floating point values containing NaN to Parquet is slow when using dictionary encoding #6952 [parquet] [arrow]
  • Public API using private types: Buffer::from_bytes takes unexported Bytes #6754 [parquet] [arrow] [arrow-flight]
  • Some MSRVs are inaccurate #6741 [parquet] [arrow] [arrow-flight]

Documentation updates:

Merged pull requests:

Read more

53.4.0

27 Jan 12:08
d3fcb4b
Compare
Choose a tag to compare

Changelog

53.4.0 (2025-01-14)

Full Changelog

Merged pull requests:

  • fix clippy (#6791) (#6940)
  • fix: decimal conversion looses value on lower precision (#6836) (#6936)
  • perf: Use Cow in get_format_string in FFI_ArrowSchema (#6853) (#6937)
  • fix: Encoding of List offsets was incorrect when slice offsets begin …
  • [arrow-cast] Support cast numeric to string view (alternate) (#6816) (#…
  • Enable matching temporal as from_type to Utf8View (#6872) (#6956)
  • [arrow-cast] Support cast boolean from/to string view (#6822) (#6957)
  • [53.0.0_maintenance] Fix CI (#6964)
  • Add Array::shrink_to_fit(&mut self) to 53.4.0 (#6790) (#6817) (#6962)

Update version to 54.0.0, add CHANGELOG (#6894)

27 Jan 12:10
2887cc1
Compare
Choose a tag to compare

Changelog

54.0.0 (2024-12-18)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Parquet schema hint doesn't support integer types upcasting #6891 [parquet]
  • Parquet UTF-8 max statistics are overly pessimistic #6867 [parquet]
  • Add builder support for Int8 keys #6844 [arrow]
  • Formalize the name of the nested Field in a list #6784 [parquet] [arrow] [arrow-flight]
  • Allow disabling the writing of Parquet Offset Index #6778 [parquet]
  • parquet::record::make_row is not exposed to users, leaving no option to users to manually create Row objects #6761 [parquet]
  • Avoid from_num_days_from_ce_opt calls in timestamp_s_to_datetime if we don't need #6746 [arrow]
  • Support Temporal -> Utf8View casting #6734 [arrow]
  • Add Option To Coerce List Type on Parquet Write #6733 [parquet] [arrow]
  • Support Numeric -> Utf8View casting #6714 [arrow]
  • Support Utf8View <=> boolean casting #6713 [arrow]

Fixed bugs:

  • Buffer::bit_slice loses length with byte-aligned offsets #6895 [arrow]
  • parquet arrow writer doesn't track memory size correctly for fixed sized lists #6839 [parquet]
  • Casting Decimal128 to Decimal128 with smaller precision produces incorrect results in some cases #6833 [arrow]
  • Should empty nullable dictionary be parsed as null from arrow-csv? #6821 [arrow]
  • Array take doesn't make fields nullable #6809
  • Arrow Flight Encodes a Slice's List Offsets If the slice offset is starts with zero #6803 [arrow]
  • Parquet readers incorrectly interpret legacy nested lists #6756 [parquet]
  • filter_bits under-allocates resulting boolean buffer #6750 [arrow]
  • Multi-language support issues with Arrow FlightSQL client's execute_update and execute_ingest methods #6545 [arrow] [arrow-flight]

Documentation updates:

Closed issues:

Merged pull requests:

Read more

Prepare for 53.3.0 release (#6739)

27 Jan 12:09
f5b51ff
Compare
Choose a tag to compare

Changelog

53.3.0 (2024-11-17)

Full Changelog

Implemented enhancements:

  • PartialEq of GenericByteViewArray (StringViewArray / ByteViewArray) that compares on equality rather than logical value #6679 [arrow]
  • Need a mechanism to handle schema changes due to dictionary hydration in FlightSQL server implementations #6672 [arrow] [arrow-flight]
  • Support encoding Utf8View columns to JSON #6642 [arrow]
  • Implement append_n for BooleanBuilder #6634 [arrow]
  • Some take optimizations #6621 [arrow]
  • Error Instead of Panic On Attempting to Write More Than 32769 Row Groups #6591 [parquet]
  • Make casting from a timestamp without timezone to a timestamp with timezone configurable #6555
  • Add record_batch! macro for easy record batch creation #6553 [arrow]
  • Support Binary --> Utf8View casting #6531 [arrow]
  • downcast_primitive_array and downcast_dictionary_array are not hygienic wrt imports #6400 [arrow]
  • Implement interleave_record_batch #6731 [arrow] (waynexia)
  • feat: record_batch! macro #6588 [arrow] (ByteBaker)

Fixed bugs:

  • Signed decimal e-notation parsing bug #6728 [arrow]
  • Add support for Utf8View -> numeric in can_cast_types #6715
  • IPC file writer produces incorrect footer when not preserving dict ID #6710 [arrow]
  • parquet from_thrift_helper incorrectly checks index #6693 [parquet]
  • Primitive REPEATED fields not contained in LIST annotated groups aren't read as lists by record reader #6648 [parquet]
  • DictionaryHandling does not recurse into Map fields #6644 [arrow] [arrow-flight]
  • Array writer output empty when no record is written #6613 [arrow]
  • Archery Integration Test with c# failing on main #6577 [arrow]
  • Potential unsoundness in filter_run_end_array #6569 [arrow]
  • Parquet reader can generate incorrect validity buffer information for nested structures #6510 [parquet]
  • arrow-array ffi: FFI_ArrowArray.null_count is always interpreted as unsigned and initialized during conversion from C to Rust. #6497 [arrow]

Documentation updates:

Performance improvements:

Closed issues:

  • Incorrect like results for pattern starting/ending with % percent and containing escape characters #6702 [arrow]

Merged pull requests:

Read more

Prepare for 53.2.0 release (#6603)

27 Jan 12:09
10c4059
Compare
Choose a tag to compare

Changelog

53.2.0 (2024-10-21)

Full Changelog

Implemented enhancements:

  • Implement arrow_json encoder for Decimal128 & Decimal256 DataTypes #6605 [arrow]
  • Support DataType::FixedSizeList in make_builder within struct_builder.rs #6594 [arrow]
  • Support DataType::Dictionary in make_builder within struct_builder.rs #6589 [arrow]
  • Interval parsing from string - accept "mon" and "mons" token #6548 [arrow]
  • AsyncArrowWriter API to get the total size of a written parquet file #6530 [parquet]
  • append_many for Dictionary builders #6529 [arrow]
  • Missing tonic GRPC_STATUS with tonic 0.12.1 #6515 [arrow] [arrow-flight]
  • Add example of how to use parquet metadata reader APIs for a local cache #6504 [parquet]
  • Remove reliance on raw-entry feature of Hashbrown #6498 [parquet] [arrow] [arrow-flight]
  • Improve page index metadata loading in SerializedFileReader::new_with_options #6491 [parquet]
  • Release arrow-rs / parquet minor version 53.1.0 (October 2024) #6340 [arrow]

Fixed bugs:

Documentation updates:

Closed issues:

Merged pull requests:

Read more

Prepare for 53.1.0 release (CHANGELOG and version) (#6501)

27 Jan 12:09
065c7b8
Compare
Choose a tag to compare

Changelog

53.1.0 (2024-10-02)

Full Changelog

Implemented enhancements:

  • Write null counts in Parquet statistics when they are known to be zero #6502 [parquet]
  • Make it easier to find / work with ByteView #6478 [arrow]
  • Update lexical-core version due to soundness issues with current version #6468
  • Add builder style API for manipulating ParquetMetaData #6465 [parquet]
  • ArrayData.align_buffers should support Struct data type / child data #6461 [arrow]
  • Add a method to return the number of skipped rows in a RowSelection #6428 [parquet]
  • Bump lexical-core to 1.0 #6397 [arrow]
  • Add union_extract kernel #6386 [arrow]
  • implement regexp_is_match_utf8 and regexp_is_match_utf8_scalar for StringViewArray #6370 [arrow]
  • Add support for BinaryView in arrow_string::length #6358 [arrow]
  • Add as_union to AsArray #6351
  • Ability to append non contiguous strings to StringBuilder #6347 [arrow]
  • Add Catalog DB Schema subcommands to flight_sql_client #6331 [arrow] [arrow-flight]
  • Add support for Utf8View in arrow_string::length #6305 [arrow]
  • Reading FIXED_LEN_BYTE_ARRAY columns with nulls is inefficient #6296 [parquet]
  • Optionally verify 32-bit CRC checksum when decoding parquet pages #6289 [parquet]
  • Speed up pad_nulls for FixedLenByteArrayBuffer #6297 [parquet] (etseidl)
  • Improve performance of set_bits by avoiding to set individual bits #6288 [arrow] (kazuyukitanimura)

Fixed bugs:

  • BitIterator panics when retrieving length #6480 [arrow]
  • Flight data retrieved via Python client (wrapping C++) cannot be used by Rust Arrow #6471 [arrow]
  • CI integration test failing: Archery test With other arrows #6448 [parquet] [arrow] [arrow-flight]
  • IPC not respecting not preserving dict ID #6443 [parquet] [arrow] [arrow-flight]
  • Failing CI: Prost requires Rust 1.71.1 #6436 [arrow] [arrow-flight]
  • Invalid struct arrays in IPC data causes panic during read #6416 [arrow]
  • REE Dicts cannot be encoded/decoded with streaming IPC #6398 [arrow]
  • Reading json map with non-nullable value schema doesn't error if values are actually null #6391
  • StringViewBuilder with deduplication does not clear observed values #6384 [arrow]
  • Cast from Decimal(p, s) to dictionary-encoded Decimal(p, s) loses precision and scale #6381 [arrow]
  • LocalFileSystem list operation returns objects in wrong order #6375
  • compute::binary_mut returns Err(PrimitiveArray<T>) only with certain arrays #6374 [arrow]
  • Exporting Binary/Utf8View from arrow-rs to pyarrow fails #6366 [arrow]
  • warning: methods as_any and next_batch are never used in parquet crate #6143 [parquet]

Documentation updates:

Closed issues:

  • Columnar json writer for arrow-json #6411
  • Primitive binary/unary are not as fast as they could be #6364 [arrow]
  • Different numeric type may be able to compare #6357

Merged pull requests:

  • fix: override size_hint for BitIterator to return the exact remaining size #6495 [arrow] (Beihao-Zhou)
  • Minor: Fix path in format command in CONTRIBUTING.md #6494 (etseidl)
  • Write null counts in Parquet statistics when they are known [#6490](htt...
Read more

53.0.0

27 Jan 12:09
ffd216d
Compare
Choose a tag to compare

Changelog

53.0.0 (2024-08-31)

Full Changelog

Breaking changes:

Implemented enhancements:

  • Derive PartialEq and Eq for parquet::arrow::ProjectionMask #6329 [parquet]
  • Allow converting empty pyarrow.RecordBatch to arrow::RecordBatch #6318 [arrow]
  • Parquet writer should not write any min/max data to ColumnIndex when all values are null #6315 [parquet]
  • Parquet: Add union method to RowSelection #6307 [parquet]
  • Support writing UTC adjusted time arrow array to parquet #6277 [parquet]
  • A better way to resize the buffer for the snappy encode/decode #6276 [parquet]
  • parquet_derive: support reading selected columns from parquet file #6268
  • Tests for invalid parquet files #6261 [parquet]
  • Implement date_part for Duration #6245 [arrow]
  • Avoid unnecessary null buffer construction when converting arrays to a different type #6243 [parquet] [arrow]
  • Add parquet_opendal in related projects #6235
  • Look into optimizing reading FixedSizeBinary arrays from parquet #6219 [parquet] [arrow]
  • Add benchmarks for BYTE_STREAM_SPLIT encoded Parquet FIXED_LEN_BYTE_ARRAY data #6203 [parquet]
  • Make it easy to write parquet to object_store -- Implement AsyncFileWriter for a type that implements obj_store::MultipartUpload for AsyncArrowWriter #6200 [parquet]
  • Remove test duplication in parquet statistics tets #6185 [parquet]
  • Support BinaryView Types in C Schema FFI #6170 [arrow]
  • speedup take_byte_view kernel #6167 [arrow]
  • Add support for StringView and BinaryView statistics in StatisticsConverter #6164 [parquet]
  • Support casting BinaryView --> Utf8 and LargeUtf8 #6162 [arrow]
  • Implement filter kernel specially for FixedSizeByteArray #6153 [arrow]
  • Use LevelHistogram throughout Parquet metadata #6134 [parquet]
  • Support DoPutStatementIngest from Arrow Flight SQL 17.0 #6124 [arrow] [arrow-flight]
  • ColumnMetaData should no longer be written inline with data #6115 [parquet]
  • Implement date_part for Interval #6113 [arrow]
  • Implement Into<Arc<dyn Array>> for ArrayData #6104
  • Allow flushing or non-buffered writes from arrow::ipc::writer::StreamWriter #6099 [arrow]
  • Default block_size for StringViewArray #6094 [arrow]
  • Remove Statistics::has_min_max_set and ValueStatistics::has_min_max_set and use Option instead #6093 [parquet]
  • Upgrade arrow-flight to tonic 0.12 #6072
  • Improve speed of row converter by skipping utf8 checks #6058 [arrow]
  • Extend support for BYTE_STREAM_SPLIT to FIXED_LEN_BYTE_ARRAY, INT32, and INT64 primitive types #6048 [parquet]
  • Release arrow-rs / parquet minor version 52.2.0 (August 2024) #5998 [parquet] [arrow]

Fixed bugs:

  • Invalid ColumnIndex written in parquet #6310 [parquet]
  • comparison_kernels benchmarks panic #6283 [arrow]
  • Printing schema metadata includes possibly incorrect compression level #6270 [parquet]
  • Don't panic when creating Field from FFI_ArrowSchema with no name #6251 [arrow]
  • lexsort_to_indices should not fallback to non-lexical sort if the datatype is not supported #6226 [arrow]
  • Parquet Statistics null_count does not distinguish between 0 and not specified #6215 [parquet]
  • Using a take kernel on a dense union can result in reaching "unreachable" code #6206 [arrow]
  • Adding sub day seconds to Date64 is ignored. #6198 [[arrow](https://githu...
Read more