Skip to content

Performance Issues with Compaction #119

@nagraham

Description

@nagraham

Summary

We observe very slow compaction times across a number of different tables, and we'd like to fix it.

Desired Result

It should take a 1-3 seconds to compact 22MB, and yet I consistently see at least 60 seconds.

Examples

We ran compaction on a table where it only needed to compact 5 files totaling 315KB of data. According to the metrics from the library, the batch fetch duration was 85,514ms. The total job time (which we measure ourselves) was 93,092ms.

Here is a summary of 10 other tables compacted over the course of an hour.

Datetime Input Files Input Size (bytes) Fetch Time Total Time
2026-01-09T18:18:43 8 153,606 12,777ms 37,641ms
2026-01-09T18:16:31 231 11,704,639,665 1,324,772ms 536,109ms
2026-01-09T18:13:56 21 255,614,268 95,795ms 110,706ms
2026-01-09T18:10:59 8 11,359,652 74,555ms 95,978ms
2026-01-09T18:07:26 8 144,834 32,370ms 50,014ms
2026-01-09T17:49:55 5 2,408,452 684,052ms 692,934ms
2026-01-09T17:48:06 5 1,781,963 241,175ms 251,488ms
2026-01-09T17:34:13 5 17,838,726 140,258ms 152,979ms
2026-01-09T17:33:56 9 199,839 91,041ms 102,836ms
2026-01-09T17:21:05 5 1,465,254 416,035ms 421,645ms

Reproduction

I can reproduce long times from my laptop on my home network (600Mbps download, 38Mbps upload). I edited the rest catalog example in the iceberg-compaction library to run against my table in R2 Data Catalog. I generated fake data tables with N number of files and M number of rows, and loaded them into test tables. I tried adding updates/deletes. I also tried with greater number of columns. I ran with similar configurations as our production system. I hard coded a timer in the datafusion/mod.rs::rewrite_files that times the reads, and could confirm read time dominates the total time locally.

Test Columns Files Rows per file Size per file Deletes? Time to Read Time to compact
Test 1 4 5 6000 68KB No 9s 17-18s
Test 2 8 5 6000 138KB No 12s 21-22s
Test 3 8 10 100,000 2.3MB No 34-35s 60s
Test 4 8 5 6000 23MB / 138MB Yes 26s 47s
Test 5 8 10 100,000 2.3MB Yes 43s 75s
Test 6 32 5 1000 125KB No 35s 44s

Root Cause

Iceberg-rust's ArrowReader only reads one range of bytes at a time. Each read requires an HTTP request to the same object. It reads a range for each column in the table. So as the number of columns in a table increases, so does the number of http requests to read the whole file.

Algorithmically, the number of http requests is O(N x M), where N is the number of columns, and M is the number of files to read. Prefetching reduces the number of http requests to O(M).

Proof: Static analysis of code

Proof: The data from local testing (see previous section).

Solutions

I prototyped two solutions which can work. Both eliminated re-reading columns. I'll outline a summary of both approaches. I recommend Option A for now because it's cheaper to implement and it has better performance for compaction. Option B can be submitted as a general purpose improvement.

  • (Recommended) Option A: Pre-fetch files in iceberg-compaction
    • (+) Aligns with compaction use case, where we know we always want to download 100% of the file.
    • (+) 2x faster than Option B in our tests
    • (+) Less effort than committing a fix to iceberg-rust
    • (-) Read more data into memory at once (although, we were already expecting to load the full file into memory anyway).
  • Option B: Implement get_file_ranges on ArrowFileReader to actually read multiple ranges at once
    • (+) This is a general purpose improvement that can speed up queries as well
    • (-) Twice as slow a Option A in our tests (probably because it reads once for metadata and once again for the data).
    • (-) More effort to implement and to iceberg-rust, and then to add here.

How to implement Option A: Pre-fetch Files in Iceberg-compaction

I added the optimization in iceberg_file_task_scan.rs. The ArrowReaderBuilder does not have a way of piping in bytes or files directly. The only interface to providing files is through the iceberg-rust FileIO object (link to code), which you get from the table. So what I did is use the FileIO instance passed into the function to fetch the file. Then I created a new FileIO which uses a Storage::Memory implementation, and I loaded the file into the memory storage.

This might not be the original intent of the Storage::Memory enum variant, but it actually fits the purpose of what we are trying to do. Instead of having the ArrowReader trigger calls to a remote object store using a FileIO with an S3::Storage, we want to give it bytes that are held in memory. We don't want it to call the object store at all. Thus, the Storage::Memory is a perfect fit.

Performance Tests

Note that I skipped Test 1 when evaluating the improvement

Test Columns Files Rows per file Size per file Deletes? Old Read Old Total Improved Read Improved Total Read Speed Up
Test 1 4 5 6000 68KB No 9s 17-18s n/a n/a n/a
Test 2 8 5 6000 138KB No 12s 21-22s 2.3s 12s 5x
Test 3 8 10 100,000 2.3MB No 34-35s 60s 9s 29s 4x
Test 4 8 5 6000 23MB / 138MB Yes 26s 47s 3s 24s 8x
Test 5 8 10 100,000 2.3MB Yes 43s 75s 9.5s 42s 4x
Test 6 32 5 1000 125KB No 35s 44s 2.2s 12s 16x

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions