Skip to content

refactor(undo): iterate affected files child-first via recursive CTE#59

Merged
mfreed merged 1 commit into
mainfrom
refactor/undo-child-first-iteration
May 31, 2026
Merged

refactor(undo): iterate affected files child-first via recursive CTE#59
mfreed merged 1 commit into
mainfrom
refactor/undo-child-first-iteration

Conversation

@mfreed
Copy link
Copy Markdown
Member

@mfreed mfreed commented May 31, 2026

QueryUndoAffectedFiles previously returned affected files ordered by file_id ASC, which approximates parent-first iteration (UUIDv7 = creation time, dirs are created before their contents). This ordering was the precondition for the cascade-artifact bug fixed previously: under parent-first iteration, each child UPSERT's bump_parent_mtime cascade landed on the already-restored parent and wrote a fresh history row -- newer than the parent's own restore row. The earlier post-hoc "newest history row" capture in Step 3 then resolved to the cascade artifact instead of the parent's own restore snapshot, silently corrupting undo-of-undo for directory renames.

The inline version_id capture made correctness order-independent, so this rewrite is defense in depth, not a correctness fix. Switch the affected-files query to a recursive CTE that walks each affected file's parent_id chain UP and sorts by depth DESC (deepest first, file_id ASC tiebreaker among siblings). Two-tier ancestor lookup: for ancestors in the affected set, use the parent_id from the entry's snapshot (affected.version_id -> history row); for untouched ancestors, use current source.parent_id. The snapshot path is what makes child-first ordering work for the rm-rf-then-undo case, where all affected source rows are absent and a source-only walk would default everything to depth 0.

Result: cascades from child restores either land on absent parents (cascade UPDATE matches zero rows, no row written) or on pre-restore parents (cascade row written but superseded by the parent's own subsequent UPSERT as newest). The parent's own restore row is always "newest" for the parent's file_id at end-of-transaction. If the inline version_id capture ever regresses, child-first iteration preserves the correctness invariant. Cascade-noise reduction in the common rm-rf-then-undo pattern is a bonus.

Function comment in query.go has the full design rationale, including the simpler form this rewrite replaces (kept inline as a reference).

Adds TestQueryUndoAffectedFiles_TopologicalOrder (unit), and TestSynth_UndoIterationOrder_ChildBeforeParent and TestSynth_UndoChildFirst_MinimizesCascadeArtifacts (integration). Cascade-noise test asserts delta=6 history rows added to d during rm-rf-then-undo; pre-rewrite the same scenario adds 9 (extra 3 cascade rows from Step-2 child UPSERTs onto the just-restored parent).

QueryUndoAffectedFiles previously returned affected files ordered by
file_id ASC, which approximates parent-first iteration (UUIDv7 =
creation time, dirs are created before their contents). This ordering
was the precondition for the cascade-artifact bug fixed previously:
under parent-first iteration, each child UPSERT's bump_parent_mtime
cascade landed on the already-restored parent and wrote a fresh history
row -- newer than the parent's own restore row. The earlier post-hoc
"newest history row" capture in Step 3 then resolved to the cascade
artifact instead of the parent's own restore snapshot, silently
corrupting undo-of-undo for directory renames.

The inline version_id capture made correctness order-independent, so
this rewrite is defense in depth, not a correctness fix. Switch the
affected-files query to a recursive CTE that walks each affected file's
parent_id chain UP and sorts by depth DESC (deepest first, file_id ASC
tiebreaker among siblings). Two-tier ancestor lookup: for ancestors in
the affected set, use the parent_id from the entry's snapshot
(affected.version_id -> history row); for untouched ancestors, use
current source.parent_id. The snapshot path is what makes child-first
ordering work for the rm-rf-then-undo case, where all affected source
rows are absent and a source-only walk would default everything to
depth 0.

Result: cascades from child restores either land on absent parents
(cascade UPDATE matches zero rows, no row written) or on pre-restore
parents (cascade row written but superseded by the parent's own
subsequent UPSERT as newest). The parent's own restore row is always
"newest" for the parent's file_id at end-of-transaction. If the inline
version_id capture ever regresses, child-first iteration preserves the
correctness invariant. Cascade-noise reduction in the common
rm-rf-then-undo pattern is a bonus.

Function comment in query.go has the full design rationale, including
the simpler form this rewrite replaces (kept inline as a reference).

Adds TestQueryUndoAffectedFiles_TopologicalOrder (unit), and
TestSynth_UndoIterationOrder_ChildBeforeParent and
TestSynth_UndoChildFirst_MinimizesCascadeArtifacts (integration).
Cascade-noise test asserts delta=6 history rows added to d during
rm-rf-then-undo; pre-rewrite the same scenario adds 9 (extra 3 cascade
rows from Step-2 child UPSERTs onto the just-restored parent).
@mfreed mfreed merged commit cc7ca03 into main May 31, 2026
2 checks passed
@mfreed mfreed deleted the refactor/undo-child-first-iteration branch May 31, 2026 04:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant