Skip to content

feat(streaming): SST deduplication and style registry in streaming writes #223

@arcaputo3

Description

@arcaputo3

Summary

Streaming writes currently emit inline strings with no SST deduplication and default styles only. Add two-phase streaming write support for shared strings and cell formatting.

Current State

  • Streaming reads: production-ready, O(1) worksheet memory via SAX
  • Streaming writes: functional but limited — inline strings only, no rich formatting
  • StreamingXmlWriter uses BoundsAccumulator for dimensions but no SST/style phases

Proposed Approach (from smart-streaming design doc)

Phase 1 — SST dedup:

  • First pass: collect unique strings, build SST index
  • Second pass: emit worksheet XML with SST references instead of inline strings
  • Result: smaller files, Excel-standard SST usage

Phase 2 — Style registry:

  • StyledRowData type carrying cell values + style IDs
  • Build StyleRegistry during streaming, emit styles.xml
  • Enable formatted streaming output (number formats, fonts, fills, borders)

Phase 3 — Merged cells:

  • Emit <mergeCells> from streaming writers (currently in-memory only)

Impact

Enables large file creation (100k+ rows) with proper formatting — currently requires full in-memory path.

Consolidates Linear TJC-319, TJC-320, TJC-486.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions