perf+fix: diff-based attribution, unified rebase path, daemon guard fixes#817
Merged
perf+fix: diff-based attribution, unified rebase path, daemon guard fixes#817
Conversation
- Add is_valid_oid/is_zero_oid guards to the RefUpdated handler in rewrite_events_from_semantic_events, preventing errors when the generic analyzer emits RefUpdated events with zero OIDs (e.g. git branch creation). - Swap reflog fallback order in parse_update_ref_heads: consult the target ref's own reflog before HEAD's reflog, since update-ref does not modify HEAD and the HEAD reflog could contain an unrelated entry with the same OID from a concurrent operation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The update-ref and reset-as-rebase handlers were passing onto_head=None to build_rebase_commit_mappings, causing it to walk all commits back to the merge_base between old_head and new_head. For Graphite restacks with long shared history this meant scanning hundreds of irrelevant commits. Derive the onto hint from the first parent of new_head, which constrains the commit walk to just the rebased commits — matching the behavior of the regular RebaseComplete handler which gets onto_head from stable_rebase_heads_from_worktree. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Only attempt rebase commit mapping for non-ancestor resets (e.g. Graphite restacking), not for backward resets like git reset --soft HEAD~1 where new_head is an ancestor of old_head. This matches the wrapper's post_reset_hook which checks is_ancestor before calling apply_wrapper_plumbing_rewrite_if_possible. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ransfer Replace the expensive char-level byte diffing (via AttributionTracker::update_attributions) and VirtualAttributions wrapper with a lightweight line-level diff approach using the existing imara-diff Myers algorithm. This eliminates both the VA construction overhead and the O(n*m) char-level transform, replacing them with O(n+m) line-level positional transfer. Key changes: - Add diff_based_line_attribution_transfer() using capture_diff_slices for positional mapping - Extract compute_line_attrs_for_changed_file() to deduplicate fast/slow path logic - Remove VirtualAttributions wrapper construction from rebase v2 path - Remove dead code: transform_changed_files_to_final_state, content_has_intersection_with_author_map - Fix metrics leak bug: subtract prompt_line_metrics before deleted-file early return - Fix clippy warnings in benchmark tests Benchmarks show ~4x improvement for per-commit transform and ~3.4x for full pipeline. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dated handler Address Devin review feedback: 1. Remove first_parent_oid onto_head hint from both Reset and RefUpdated handlers. For multi-commit rebases (D--B'--C'), first_parent_oid(C') returns B' (an intermediate commit), not D (the actual onto target). This causes build_rebase_commit_mappings to truncate the new commits list, creating mismatched commit mappings and losing authorship data. 2. Add missing is_ancestor_commit guard to RefUpdated handler. The Reset handler already had this guard to skip backward ref updates, but RefUpdated did not. Without it, backward ref moves (e.g. git update-ref to an older commit) would emit spurious rebase_complete events with incorrect commit mappings. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…eritance Address Devin review feedback: the old transform_changed_files_to_final_state preserved attributions and file contents when a file was deleted mid-rebase, so that if the file reappeared in a later commit, positional diff-based transfer could inherit from the pre-deletion state. Both fast and slow paths now keep current_attributions and current_file_contents intact on deletion. The slow path re-adds the subtracted metrics to maintain balance (the file is excluded from serialized notes via existing_files check). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…emplate The fast path metadata template was pre-built once with initial prompt metrics (accepted_lines, overridden_lines), so all rebased commits shared identical metrics regardless of attribution changes. Now both fast and slow paths track prompt_line_metrics per commit and serialize fresh metadata. Also adds two regression tests: - test_rebase_prompt_metrics_update_per_commit: verifies accepted_lines differs between commits when AI lines increase - test_rebase_file_delete_recreate_preserves_attribution: verifies attributions survive a delete-recreate cycle within a rebase sequence Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove the use_line_lookup_fast_path branching that maintained two parallel transform and serialization paths. Both paths already used identical diff-based attribution transfer logic; the only difference was serialization strategy (cached string fragments vs structured AuthorshipLog). The unified path uses the structured AuthorshipLog approach for all cases, eliminating a class of divergence bugs (frozen metrics, missing state updates, deletion handling inconsistencies) that required separate fixes for each path. Removed: - use_line_lookup_fast_path flag and all branching on it - cached_file_attestation_text per-file string cache - serialize_file_attestation (fast-path string fragment serializer) - serialize_attestation_from_line_attrs (fast-path line attrs serializer) - Pre-caching of attestation text from initial state Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Split the monolithic `collect_changed_file_contents_for_commits` into three phases: 1. `run_diff_tree_for_commits` — fast metadata-only phase using `git diff-tree --stdin` to discover changed files and blob OIDs 2. `batch_read_blob_contents_parallel` — reads blob contents using up to 4 concurrent `git cat-file --batch` processes (chunks of 200 OIDs each), using the established smol async pattern 3. `assemble_changed_contents` — pure data transformation Also adds a large-scale benchmark test (`benchmark_large_scale_mixed`) with 200 files (mixed 1k/5k lines), 100+ commits, and structured timing output showing git time vs overhead percentage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ently exec_git_stdin wrote all stdin data before reading stdout. When the child process produces enough output to fill the OS pipe buffer (~64KB), it blocks on write. But the parent is still blocked writing to stdin, causing a deadlock. This manifests with large git cat-file --batch and git diff-tree --stdin calls (e.g. 65+ files × 50+ commits). Fix by spawning stdin writes in a separate thread so wait_with_output() can drain stdout concurrently. Broken pipe errors are tolerated since the child may exit before consuming all input. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
With the -z flag, git diff-tree --stdin separates the commit SHA header with a null byte (\0), not a newline (\n). The parser was looking for newlines, finding none, and silently treating every commit as having zero changed files. This meant the diff-tree optimization path never actually detected file changes during rebase authorship rewrite. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of calling serialize_to_string() per commit (which rebuilds the entire JSON), pre-cache each file's attestation text at initialization and only re-serialize changed files via serialize_attestation_from_line_attrs. Note assembly becomes pure string concatenation of cached fragments + a pre-split metadata template with commit SHA substitution. This restores the fast serialization optimization that was removed in the "unify fast/slow rebase paths" refactoring, while keeping the more correct diff-based attribution transfer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Remove unused upsert_file_attestation (replaced by fast cached serialization), remove dead loop_attestation_ms timing variable, and collapse nested if-let chains per clippy suggestions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Addresses review feedback from #813 and adds significant performance, correctness, and simplification improvements to the rebase authorship rewriting pipeline.
Performance: diff-based line attribution transfer
AttributionTracker::update_attributions) andVirtualAttributionswrapper with lightweight line-level diffing using the existing imara-diff Myers algorithmPerformance: fast cached serialization
serialize_to_string()per commitFix: prevent pipe deadlock in exec_git_stdin
Fix: diff-tree --stdin parsing for -z null-terminated output
-zflag, commit SHA headers ingit diff-tree --stdinoutput are null-terminated, not newline-terminatedRefactor: unified rebase path
use_line_lookup_fast_pathbranching that maintained two parallel transform and serialization pathsFix: per-commit prompt metrics
accepted_lines,overridden_lines) now update per commit instead of being frozen from initial stateFix: file deletion/reappearance inheritance
Fix: daemon event handler guards
is_valid_oid/is_zero_oidguards to RefUpdated handler (matching Reset handler)is_ancestor_commitguard to RefUpdated handler, preventing spurious rebase events on backward ref updatesfirst_parent_oidonto_head hint — incorrect for multi-commit rebasesparse_update_ref_heads(target ref before HEAD)Other
update-refand rebase-like reset detection in daemon mode for Graphite compatibilityupsert_file_attestation, unused timing variables)Test plan
🤖 Generated with Claude Code