XL provides two distinct I/O implementations with different performance characteristics. Understanding when to use each is critical for optimal performance.
| Dataset Size | Styling Needs | Read Mode | Write Mode | Memory | Performance |
|---|---|---|---|---|---|
| < 10k rows | Any | In-memory | In-memory | ~10MB | Excellent |
| 10k-50k rows | Full styling | In-memory | In-memory | ~25-50MB | Good |
| 50k-100k rows | Full styling | In-memory | In-memory | ~50-100MB | Fair |
| 100k+ rows | Minimal | Streaming (sequential) | Streaming | ~10-50MB (SST dependent) | Excellent |
| 100k+ rows | Full styling | In-memory for now | In-memory for now | O(n) | See roadmap P7.5 |
Use For: Small‑to‑medium workbooks (<100k rows) or any workload that needs full styling and metadata.
Characteristics:
- Builds full XML trees in memory.
- Full feature support (SST, styles, merged cells, RichText).
- Deterministic output (byte‑identical structure on re‑write).
- Compact XML with DEFLATED compression by default; pretty‑printed STORED output is available for debugging via
WriterConfig.debug.
API:
import com.tjclp.xl.io.ExcelIO
import cats.effect.IO
val excel = ExcelIO.instance[IO]
// Write
excel.write(workbook, path).unsafeRunSync()
// Read
excel.read(path).map { wb =>
// Process workbook
}.unsafeRunSync()Memory Profile (approximate):
10k rows: ~10MB
50k rows: ~50MB
100k rows: ~100MB
500k rows: ~500MB (not recommended)
Performance:
- Write: ~100k rows/sec
- Read: ~55k rows/sec
Use For: Large data generation (100k+ rows) when you can live with minimal styling and inline strings.
Characteristics:
- O(1) constant memory for worksheet data (~10MB regardless of row count).
- fs2‑data‑xml event streaming; no intermediate XML trees.
- Limitations: No SST support (inline strings only), minimal styles (no rich formatting or merged cells).
API:
import com.tjclp.xl.io.Excel
import com.tjclp.xl.io.RowData
import fs2.Stream
val excel = com.tjclp.xl.io.Excel.forIO
// Generate 1M rows
Stream.range(1, 1_000_001)
.map(i => RowData(i, Map(
0 -> CellValue.Text(s"Row $i"),
1 -> CellValue.Number(BigDecimal(i))
)))
.through(excel.writeStream(path, "Data"))
.compile.drain
.unsafeRunSync()
// Memory: ~10MB constant!Memory Profile:
100k rows: ~10MB (constant)
1M rows: ~10MB (constant)
10M rows: ~10MB (constant)
Performance:
- Write: ~88k rows/sec in benchmarks.
- In practice, ~4–5x faster and vastly more memory‑efficient than typical in‑memory POI usage for large datasets.
Tradeoffs:
- ✅ Constant memory for worksheet data.
- ✅ Excellent throughput at high row counts.
- ❌ No SST (larger files if many duplicate strings).
- ❌ Minimal styles only (no rich formatting at scale).
Use For: Sequential processing of large worksheets (filtering, aggregation, ETL) where you don’t need full styling information.
Characteristics:
- O(1) memory for worksheet data by streaming XML via
fs2.io.readInputStream+ fs2‑data‑xml. SharedStrings(if present) are parsed once up front, so memory use scales with the number of unique strings, not total rows.- Emits
Stream[F, RowData]with 1‑based row indices and 0‑based column indices.
API:
import com.tjclp.xl.io.Excel
import cats.effect.IO
import cats.effect.unsafe.implicits.global
val excel = Excel.forIO
excel.readStream(path)
.filter(_.rowIndex > 1) // skip header
.evalMap(row => IO.println(row))
.compile.drain
.unsafeRunSync()Tradeoffs:
- ✅ True streaming (no materialization of full sheets).
- ✅ Great for ETL and analytics pipelines.
- ❌ Does not expose full formatting/metadata; for that, use the in‑memory APIs.
Bad (N intermediate sheets):
(1 to 10000).foldLeft(sheet) { (s, i) =>
s.put(ref"A$i", CellValue.Number(i))
}
// Creates 10,000 intermediate Sheet instancesGood (single putAll):
val cells = (1 to 10000).map(i => Cell(ref"A$i", CellValue.Number(i)))
sheet.put(cells)
// Creates 1 Sheet instanceBest (batch put with codecs):
sheet.put(
(1 to 10000).map(i => ref"A$i" -> i)*
)
// Type-safe, auto-format inference, single allocationBad (eager evaluation):
var s = sheet
for i <- 1 to 1000 do
s = s.put(ref"A$i", s"Row $i") // 1000 Sheet instances
s = s.merge(ref"A$i:B$i") // 1000 more Sheet instancesGood (deferred with Patch):
import com.tjclp.xl.dsl.*
val patch = (1 to 1000).foldLeft(Patch.empty: Patch) { (p, i) =>
p ++ (ref"A$i" := s"Row $i") ++ ref"A$i:B$i".merge
}
sheet.put(patch) // Execute once at endImpact: Reduces intermediate allocations by 50-70%
Manual (verbose, error‑prone):
import com.tjclp.xl.*
val dateStyle = CellStyle.default.numFmt(NumFmt.Date)
val decimalStyle = CellStyle.default.numFmt(NumFmt.Number)
sheet
.put(ref"A1", CellValue.Text("Revenue"))
.put(ref"B1", CellValue.DateTime(LocalDate.of(2025, 11, 10)))
.withCellStyle(ref"B1", dateStyle) // Must manually set format
.put(ref"C1", CellValue.Number(BigDecimal("1000000.50")))
.withCellStyle(ref"C1", decimalStyle) // Must manually set formatBatch put with codecs (clean, automatic):
import com.tjclp.xl.*
import com.tjclp.xl.codec.syntax.*
sheet.put(
ref"A1" -> "Revenue", // String
ref"B1" -> LocalDate.of(2025, 11, 10), // Auto: date format
ref"C1" -> BigDecimal("1000000.50") // Auto: number format
)Benefits:
- Auto‑inferred number formats via
CellCodec. - Type‑safe (compile error if unsupported type).
- Single bulk update instead of many tiny ones.
- Cleaner code.
Bad (materializes in memory first):
val allRows = database.fetchAll() // Load 1M rows into memory
val cells = allRows.map(row => /* convert to cells */)
sheet.put(cells) // Then writeGood (stream directly):
database.stream() // fs2.Stream[IO, Row]
.map(row => RowData(/* convert */))
.through(excel.writeStream(path, "Data"))
.compile.drain
.unsafeRunSync()Impact: Constant memory regardless of source size
Current (after lazy optimizations):
// Already optimized! range.cells returns Iterator
sheet.fillBy(range"A1:Z100") { (col, row) =>
CellValue.Text(s"${col.toLetter}${row.index1}")
}
// No intermediate Vector allocationBefore optimization (old code):
val newCells = range.cells.map(...).toVector // ❌ Unnecessary
sheet.put(newCells)Impact: 20-30% memory reduction for large ranges (already fixed in latest version)
XL In-Memory:
- Write: ~1.1s @ ~100MB memory
- Read: ~1.8s @ ~100MB memory
XL Streaming:
- Write: ~1.1s @ ~10MB memory (constant)
- Read: ~1.8s @ ~10MB memory (constant, P6.6 complete)
Apache POI SXSSF (streaming):
- Write: ~5s @ ~800MB memory
- Read: ~8s @ ~1GB memory
Improvement:
- XL write: 4.5x faster, 80x less memory
- XL read: 4.4x faster, 80x less memory
// ❌ Bad: O(n) sheet copies
val result = data.foldLeft(sheet) { (s, item) =>
s.put(ref"A${item.id}", item.name)
}
// ✅ Good: Single sheet copy
val cells = data.map(item => Cell(ref"A${item.id}", CellValue.Text(item.name)))
sheet.put(cells)Impact: 10-20x faster for large datasets
// ❌ Bad: Recomputes 4 times (was even worse before optimization)
for _ <- 1 to 4 do
println(sheet.usedRange) // Traverses all cells each time
// ✅ Good: Compute once
val range = sheet.usedRange
for _ <- 1 to 4 do
println(range)Note: usedRange is now single-pass (75% faster after lazy optimizations), but still recomputes on every call.
// ✅ Good: Streaming read for sequential ETL (constant memory)
excel.readStream(largeFile)
// ✅ Good: In-memory read when you need styles/metadata
excel.read(largeFile)Status: Streaming read is fixed (P6.6); choose based on feature needs, not memory concerns.
- < 1000 rows: Any approach works fine
- Simple scripts: Clarity > performance
- Prototyping: Optimize later
- 10k-100k rows: Use batching (putAll, putMixed)
- Complex styling: Use Patch composition
- Repeated operations: Cache usedRange, style registries
- 100k+ rows: Use streaming write
- Limited memory: Constant-memory critical
- Production ETL: Performance matters
Rule of Thumb: 1 cell ≈ 100 bytes in memory (JVM overhead)
10k cells: ~1 MB
100k cells: ~10 MB
1M cells: ~100 MB
10M cells: ~1 GB (use streaming!)
Factors that increase memory:
- Rich styles: +50 bytes per styled cell
- Merged cells: +200 bytes per merge
- Rich text: +100 bytes per additional run
- Formulas: +150 bytes per formula cell
- ✅ Streaming write: O(1) memory, ~88k rows/sec
- ✅ Streaming read: O(1) memory via
fs2.io.readInputStream(+ SST materialized once) - ✅ Lazy optimizations: 30-75% speedup (memoized canonicalKey, single-pass usedRange)
- ✅ Fix streaming read to O(1) memory
- ✅ Add memory regression tests
- ✅ DEFLATED compression by default (5-10x smaller files)
- ✅ Configurable compression mode
- ✅ Two-phase streaming writer with full SST/styles
- ✅ O(1) memory for writes with rich formatting
- ✅ Disk-backed SST for very large datasets
- ⬜ Parallel XML serialization (4-8 threads)
- ⬜ JMH benchmark suite vs POI
- ⬜ Automatic query optimization (builder pattern)
// Use simple in-memory API
val sheet = Sheet("Data")
.put(ref"A1", "Title")
.put(ref"B1", 100)
ExcelIO.instance.write[IO](Workbook(Vector(sheet)), path).unsafeRunSync()// Use batching
import com.tjclp.xl.codec.syntax.*
val cells = (1 to 100_000).map(i =>
ref"A$i" -> s"Row $i"
)
val sheet = Sheet("Data").put(cells*)
ExcelIO.instance.write[IO](Workbook(Vector(sheet)), path).unsafeRunSync()// Use streaming write
import com.tjclp.xl.io.Excel
import fs2.Stream
Stream.range(1, 1_000_001)
.map(i => RowData(i, Map(0 -> CellValue.Text(s"Row $i"))))
.through(Excel.forIO.writeStream(path, "Data"))
.compile.drain
.unsafeRunSync()// Use streaming read for constant memory (O(1))
Excel.forIO.readStream(path)
.evalMap(row => IO.println(row))
.compile.drain
.unsafeRunSync()- STATUS.md - Current performance numbers
- io-modes.md - Architecture decisions
- examples.md - Code samples