Skip to content

SQL UPDATE Support for DataFusion Integration #2205

@wirybeaver

Description

@wirybeaver

What's the feature are you trying to implement?

Implement SQL UPDATE functionality for iceberg-datafusion integration, enabling row-level updates with WHERE clause filtering. This completes the essential DML operations alongside INSERT and the upcoming MERGE INTO support #2201.

SQL Example

  UPDATE orders
  SET status = 'shipped', shipped_date = current_date()
  WHERE status = 'pending' AND payment_confirmed = true;

Overall Architecture

UPDATE table SET col1 = val1, col2 = val2 WHERE condition
   ↓
TableProvider::update() [NEW]
   ↓
IcebergTableScan (with WHERE filters)
   ↓
[Optional if partitioned: Project partition + Repartition + Sort for partitioned tables]
   ↓
IcebergUpdateWriteExec [NEW] - Apply assignments, write new files, track deleted
files
   ↓
CoalescePartitionsExec (reuse existing)
   ↓
IcebergUpdateCommitExec [NEW] - Commit via RowDelta transaction
   ↓
RecordBatch(count: UInt64)

Strategy: Copy-on-Write (COW)

  1. Scan table with WHERE filters to find matching rows
  2. Apply UPDATE assignments (evaluate expressions)
  3. Write modified rows to new data files
  4. Mark original files as deleted
  5. Commit atomically with RowDelta (add new files + remove old files)

The following tasks are already completed on the PoC branch with 6 commits. Will raise formal PRs one after another as the fork repo doesn't support stacking PRs.

  • feat(transaction): add RowDelta transaction action for row-level modi… #2203, shared with MERGE INTO
  • IcebergUpdateWriteExec - write phase with conditional assignment application
  • Add performance optimizations: Partition-aware updates (only scan/rewrite affected partitions) + File-level filtering using manifest statistics
  • IcebergUpdateCommitExec - commit phase using RowDelta transaction
  • TableProvider::update() trait implementation (Datafusion Hook)
  • Integration tests and validation

Willingness to contribute

I can contribute to this feature independently

Metadata

Metadata

Assignees

No one assigned

    Labels

    epicEpic issue

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions