Performance Issues with Compaction

## Summary

We observe very slow compaction times across a number of different tables, and we'd like to fix it.

## Desired Result

It should take a 1-3 seconds to compact 22MB, and yet I consistently see at least 60 seconds.

## Examples

We ran compaction on a table where it only needed to compact 5 files totaling 315KB of data. According to the metrics from the library, the batch fetch duration was 85,514ms. The total job time (which we measure ourselves) was 93,092ms.

Here is a summary of 10 other tables compacted over the course of an hour.

<meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-4e3de4b6-7fff-6f20-56d9-9e8a63de7330"><div dir="ltr" style="margin-left:0pt;" align="left">
Datetime | Input Files | Input Size (bytes) | Fetch Time | Total Time
-- | -- | -- | -- | --
2026-01-09T18:18:43 | 8 | 153,606 | 12,777ms | 37,641ms
2026-01-09T18:16:31 | 231 | 11,704,639,665 | 1,324,772ms | 536,109ms
2026-01-09T18:13:56 | 21 | 255,614,268 | 95,795ms | 110,706ms
2026-01-09T18:10:59 | 8 | 11,359,652 | 74,555ms | 95,978ms
2026-01-09T18:07:26 | 8 | 144,834 | 32,370ms | 50,014ms
2026-01-09T17:49:55 | 5 | 2,408,452 | 684,052ms | 692,934ms
2026-01-09T17:48:06 | 5 | 1,781,963 | 241,175ms | 251,488ms
2026-01-09T17:34:13 | 5 | 17,838,726 | 140,258ms | 152,979ms
2026-01-09T17:33:56 | 9 | 199,839 | 91,041ms | 102,836ms
2026-01-09T17:21:05 | 5 | 1,465,254 | 416,035ms | 421,645ms

</div><br /></b>

## Reproduction

I can reproduce long times from my laptop on my home network (600Mbps download, 38Mbps upload). I edited [the rest catalog example](https://github.com/nimtable/iceberg-compaction/blob/main/examples/rest-catalog/src/main.rs) in the iceberg-compaction library to run against my table in R2 Data Catalog. I generated fake data tables with N number of files and M number of rows, and loaded them into test tables. I tried adding updates/deletes. I also tried with greater number of columns. I ran with similar configurations as our production system. I hard coded a timer in the `datafusion/mod.rs::rewrite_files` that times the reads, and could confirm read time dominates the total time locally.

<meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-57a2c71d-7fff-0b46-14b6-7659a23f3263"><div dir="ltr" style="margin-left:0pt;" align="left">
Test | Columns | Files | Rows per file | Size per file | Deletes? | Time to Read | Time to compact
-- | -- | -- | -- | -- | -- | -- | --
Test 1 | 4 | 5 | 6000 | 68KB | No | 9s | 17-18s
Test 2 | 8 | 5 | 6000 | 138KB | No | 12s | 21-22s
Test 3 | 8 | 10 | 100,000 | 2.3MB | No | 34-35s | 60s
Test 4 | 8 | 5 | 6000 | 23MB / 138MB | Yes | 26s | 47s
Test 5 | 8 | 10 | 100,000 | 2.3MB | Yes | 43s | 75s
Test 6 | 32 | 5 | 1000 | 125KB | No | 35s | 44s

</div><br /></b>

## Root Cause

Iceberg-rust's ArrowReader only reads one range of bytes at a time. Each read requires an HTTP request to the same object. It reads a range for each column in the table. So as the number of columns in a table increases, so does the number of http requests to read the whole file.

Algorithmically, the number of http requests is O(N x M), where N is the number of columns, and M is the number of files to read. Prefetching reduces the number of http requests to O(M).

Proof: Static analysis of code
- `ArrowFileReader` only implements `get_bytes` on the `ArrowAsyncReader` trait. [iceberg-rust/arrow/reader.rs](https://github.com/risingwavelabs/iceberg-rust/blob/dev_rebase_main_20251111/crates/iceberg/src/arrow/reader.rs#L1685). 
- `ArrowAsyncReader::get_byte_ranges` will fall back to sequentially calling `get_bytes` ([arrow-rs/parquet/arrow/async_reader.rs](https://github.com/apache/arrow-rs/blob/main/parquet/src/arrow/async_reader/mod.rs#L79)

Proof: The data from local testing (see previous section).


## Solutions

I prototyped two solutions which can work. Both eliminated re-reading columns. I'll outline a summary of both approaches. I recommend Option A for now because it's cheaper to implement and it has better performance for compaction. Option B can be submitted as a general purpose improvement.

- **(Recommended) Option A: Pre-fetch files in iceberg-compaction**
  - (+) Aligns with compaction use case, where we know we always want to download 100% of the file.
  - (+) 2x faster than Option B in our tests
  - (+) Less effort than committing a fix to iceberg-rust
  - (-) Read more data into memory at once (although, we were already expecting to load the full file into memory anyway).
- **Option B: Implement `get_file_ranges` on `ArrowFileReader` to actually read multiple ranges at once** 
  - (+) This is a general purpose improvement that can speed up queries as well
  - (-) Twice as slow a Option A in our tests (probably because it reads once for metadata and once again for the data).
  - (-) More effort to implement and to iceberg-rust, and then to add here.

### How to implement Option A: Pre-fetch Files in Iceberg-compaction

I added the optimization in [iceberg_file_task_scan.rs](https://github.com/nimtable/iceberg-compaction/blob/main/core/src/executor/datafusion/iceberg_file_task_scan.rs#L303). The ArrowReaderBuilder does not have a way of piping in bytes or files directly. The only interface to providing files is through the iceberg-rust FileIO object ([link to code](https://github.com/risingwavelabs/iceberg-rust/blob/15e43f41595602f5266de407a37972e1329b6e32/crates/iceberg/src/io/file_io.rs#L62)), which you get from the table. So what I did is use the FileIO instance passed into the function to fetch the file. Then I created a new FileIO which uses a Storage::Memory implementation, and I loaded the file into the memory storage.

This might not be the original intent of the `Storage::Memory` enum variant, but it actually fits the purpose of what we are trying to do. Instead of having the ArrowReader trigger calls to a remote object store using a FileIO with an `S3::Storage`, we want to give it bytes that are held in memory. We don't want it to call the object store at all. Thus, the `Storage::Memory` is a perfect fit.

### Performance Tests

_Note that I skipped Test 1 when evaluating the improvement_

<meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-57a2c71d-7fff-0b46-14b6-7659a23f3263"><div dir="ltr" style="margin-left:0pt;" align="left">
Test | Columns | Files | Rows per file | Size per file | Deletes? | Old Read | Old Total | Improved Read | Improved Total | Read Speed Up
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
Test 1 | 4 | 5 | 6000 | 68KB | No |  9s | 17-18s | n/a | n/a | n/a
Test 2 | 8 | 5 | 6000 | 138KB | No | 12s | 21-22s | 2.3s | 12s | 5x
Test 3 | 8 | 10 | 100,000 | 2.3MB | No | 34-35s | 60s | 9s | 29s | 4x
Test 4 | 8 | 5 | 6000 | 23MB / 138MB | Yes | 26s | 47s | 3s | 24s | 8x
Test 5 | 8 | 10 | 100,000 | 2.3MB | Yes | 43s | 75s | 9.5s | 42s | 4x
Test 6 | 32 | 5 | 1000 | 125KB | No | 35s | 44s | 2.2s | 12s | 16x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Issues with Compaction #119

Summary

Desired Result

Examples

Reproduction

Root Cause

Solutions

How to implement Option A: Pre-fetch Files in Iceberg-compaction

Performance Tests

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Datetime	Input Files	Input Size (bytes)	Fetch Time	Total Time
2026-01-09T18:18:43	8	153,606	12,777ms	37,641ms
2026-01-09T18:16:31	231	11,704,639,665	1,324,772ms	536,109ms
2026-01-09T18:13:56	21	255,614,268	95,795ms	110,706ms
2026-01-09T18:10:59	8	11,359,652	74,555ms	95,978ms
2026-01-09T18:07:26	8	144,834	32,370ms	50,014ms
2026-01-09T17:49:55	5	2,408,452	684,052ms	692,934ms
2026-01-09T17:48:06	5	1,781,963	241,175ms	251,488ms
2026-01-09T17:34:13	5	17,838,726	140,258ms	152,979ms
2026-01-09T17:33:56	9	199,839	91,041ms	102,836ms
2026-01-09T17:21:05	5	1,465,254	416,035ms	421,645ms

Test	Columns	Files	Rows per file	Size per file	Deletes?	Time to Read	Time to compact
Test 1	4	5	6000	68KB	No	9s	17-18s
Test 2	8	5	6000	138KB	No	12s	21-22s
Test 3	8	10	100,000	2.3MB	No	34-35s	60s
Test 4	8	5	6000	23MB / 138MB	Yes	26s	47s
Test 5	8	10	100,000	2.3MB	Yes	43s	75s
Test 6	32	5	1000	125KB	No	35s	44s

Performance Issues with Compaction #119

Description

Summary

Desired Result

Examples

Reproduction

Root Cause

Solutions

How to implement Option A: Pre-fetch Files in Iceberg-compaction

Performance Tests

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions