-
Notifications
You must be signed in to change notification settings - Fork 421
Open
Labels
epicEpic issueEpic issue
Description
What's the feature are you trying to implement?
Implement SQL UPDATE functionality for iceberg-datafusion integration, enabling row-level updates with WHERE clause filtering. This completes the essential DML operations alongside INSERT and the upcoming MERGE INTO support #2201.
SQL Example
UPDATE orders
SET status = 'shipped', shipped_date = current_date()
WHERE status = 'pending' AND payment_confirmed = true;
Overall Architecture
UPDATE table SET col1 = val1, col2 = val2 WHERE condition
↓
TableProvider::update() [NEW]
↓
IcebergTableScan (with WHERE filters)
↓
[Optional if partitioned: Project partition + Repartition + Sort for partitioned tables]
↓
IcebergUpdateWriteExec [NEW] - Apply assignments, write new files, track deleted
files
↓
CoalescePartitionsExec (reuse existing)
↓
IcebergUpdateCommitExec [NEW] - Commit via RowDelta transaction
↓
RecordBatch(count: UInt64)
Strategy: Copy-on-Write (COW)
- Scan table with WHERE filters to find matching rows
- Apply UPDATE assignments (evaluate expressions)
- Write modified rows to new data files
- Mark original files as deleted
- Commit atomically with RowDelta (add new files + remove old files)
The following tasks are already completed on the PoC branch with 6 commits. Will raise formal PRs one after another as the fork repo doesn't support stacking PRs.
- feat(transaction): add RowDelta transaction action for row-level modi… #2203, shared with
MERGE INTO - IcebergUpdateWriteExec - write phase with conditional assignment application
- Add performance optimizations: Partition-aware updates (only scan/rewrite affected partitions) + File-level filtering using manifest statistics
- IcebergUpdateCommitExec - commit phase using RowDelta transaction
- TableProvider::update() trait implementation (Datafusion Hook)
- Integration tests and validation
Willingness to contribute
I can contribute to this feature independently
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
epicEpic issueEpic issue