-
Notifications
You must be signed in to change notification settings - Fork 447
bug: incorrect Parquet INT96 values from ArrowReader #2299
Description
Describe the bug
iceberg-rust reads INT96 timestamps incorrectly, resulting in ~1170 year offset for dates outside the nanosecond i64 range (~1677-2262).
Example:
- Correct (Iceberg Java):
3332-12-14 11:33:10.965 - iceberg-rust:
2163-11-05 13:24:03.545896
This affects migrated tables where Parquet files were written with INT96 timestamps (common for Spark/Hive migrations via add_files or importSparkTable).
Root Cause
INT96 in Parquet
INT96 is 12 bytes: 8 bytes of nanoseconds-within-day + 4 bytes of Julian day number.
What happens today
- arrow-rs defaults INT96 to
Timestamp(Nanosecond, None)(parquet/src/arrow/schema/primitive.rs:122) - For dates outside ~1677-2262, nanoseconds-since-epoch overflows i64, producing garbage values
- iceberg-rust's
RecordBatchTransformerlater casts toTimestamp(Microsecond)to match the Iceberg schema, but the data is already corrupted by overflow - arrow-rs PR #7285 added support for reading INT96 as other TimeUnits — if you pass
Timestamp(Microsecond)viaArrowReaderOptions::with_schema(), arrow-rs converts correctly without overflow
Why iceberg-rust doesn't pass the right schema hint
In reader.rs, the schema is only overridden via ArrowReaderOptions::with_schema() when Parquet files lack field IDs (branches 2/3 of the schema resolution strategy). Even then, the overridden schema is derived from the Parquet file metadata — which has
Timestamp(Nanosecond) for INT96 columns — not from the Iceberg table schema which correctly specifies Timestamp(Microsecond).
For files with embedded field IDs (branch 1), no schema override is passed at all.
How Iceberg Java handles this
Iceberg Java avoids this entirely by using a custom INT96 column reader that bypasses parquet-mr's default decoding. The reader factory receives the Iceberg expected schema as the authority via readerFuncWithSchema.apply(expectedSchema, fileType) (Parquet.java:1366-1371).
When BaseParquetReaders.primitive() encounters INT96, it dispatches to a TimestampInt96Reader that reads the raw 12 bytes and converts safely:
// GenericParquetReaders.java:172-191
final ByteBuffer byteBuffer = column.nextBinary().toByteBuffer().order(ByteOrder.LITTLE_ENDIAN);
final long timeOfDayNanos = byteBuffer.getLong();
final int julianDay = byteBuffer.getInt();
return Instant.ofEpochMilli(TimeUnit.DAYS.toMillis(julianDay - UNIX_EPOCH_JULIAN))
.plusNanos(timeOfDayNanos)
.atOffset(ZoneOffset.UTC);This avoids overflow by keeping days and nanos separate — it never tries to cram the full value into a single i64 nanoseconds-since-epoch.
iceberg-rust can't easily replicate this custom column reader approach since it delegates to arrow-rs for Parquet reading. The equivalent fix is to pass the correct schema hint so arrow-rs decodes INT96 as microseconds.
Proposed Fix
When building the Arrow schema to pass to ArrowReaderOptions::with_schema(), overlay the Iceberg table schema's timestamp types onto the Parquet-derived schema. For any column where:
- The Parquet physical type is INT96
- The Iceberg type is Timestamp or Timestamptz
Replace Timestamp(Nanosecond, ...) with Timestamp(Microsecond, ...) in the schema hint. This triggers arrow-rs's INT96 conversion logic from PR #7285.
This is the same approach DataFusion uses via its coerce_int96_to_resolution() function (datafusion PR #15537), except the source of truth for the target TimeUnit is the Iceberg schema rather than a user config.
Files to modify
crates/iceberg/src/arrow/reader.rs- After building the Arrow schema from Parquet metadata, walk INT96 timestamp columns and replace their types with the Iceberg schema's timestamp type
- This applies to all three branches of the schema resolution strategy (with/without field IDs, with/without name mapping)
Related
- arrow-rs #7285: Support different TimeUnits and timezones when reading Timestamps from INT96
- datafusion #15537: INT96 handling in DataFusion
- datafusion-comet #3856: Downstream issue in Comet