Skip to content

Optimize checkpoint performance 2-5x for realistic workloads#823

Open
svarlamov wants to merge 3 commits intomainfrom
perf/checkpoint-2x-speedup
Open

Optimize checkpoint performance 2-5x for realistic workloads#823
svarlamov wants to merge 3 commits intomainfrom
perf/checkpoint-2x-speedup

Conversation

@svarlamov
Copy link
Copy Markdown
Member

@svarlamov svarlamov commented Mar 27, 2026

Summary

  • Eliminate redundant checkpoint reads: Cache checkpoints in ResolvedCheckpointExecution to avoid 3-4 redundant read_all_checkpoints() calls per operation
  • O(n+m) attribution gap filling: Rewrite attribute_unattributed_ranges using merged-interval sweep instead of per-character overlap checks (was O(n*m))
  • Incremental JSONL append: Skip full file rewrite when prior checkpoints are already pruned — just append the new checkpoint line
  • Eliminate redundant diff computation: Derive line stats from the attribution diff, removing a second compute_file_line_stats diff pass
  • Content-addressed blob dedup: Skip writing blobs that already exist on disk
  • Reduce async overhead: Increase sync threshold to 30 files to avoid Arc wrapping, semaphore, and smol::unblock overhead for typical workloads
  • Eliminate unnecessary clones: Move entries/checkpoints instead of cloning in the hot path
  • Optimize serialization: Use BufWriter for checkpoint JSONL writes

Benchmark results

Realistic A/B comparison (partial edits, accumulated checkpoints):

Scenario Baseline Optimized Speedup
500 lines, 20% edit, 20 CPs 63ms 49ms 1.28x
1000 lines, 10% edit, 20 CPs 100ms 52ms 1.91x
1000 lines, 30% edit, 20 CPs 127ms 55ms 2.30x
2000 lines, 10% edit, 0 CPs 107ms 52ms 2.07x
2000 lines, 10% edit, 20 CPs 259ms 58ms 4.49x
3000 lines, 5% edit, 20 CPs 343ms 61ms 5.61x

The improvement scales with file size and accumulated checkpoint count — the most impactful scenarios for agent workflows.

Test plan

  • All 3032 integration tests pass
  • All 116 checkpoint-specific integration tests pass
  • All 243 blame integration tests pass
  • All 24 attribution unit tests pass
  • A/B benchmark proves 2x+ speedup for realistic workloads (1000+ line files with accumulated checkpoints)
  • CI checks pass on ubuntu

🤖 Generated with Claude Code


Open with Devin

Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 5 additional findings in Devin Review.

Open in Devin Review

Comment on lines +405 to +426
let len = checkpoints.len();
if len >= 2 {
// Collect new file names into owned strings to avoid borrow conflict
let new_files: HashSet<String> = checkpoints[len - 1]
.entries
.iter()
.map(|e| e.file.clone())
.collect();
let prev = &mut checkpoints[len - 2];
let mut any_pruned = false;
for entry in &mut prev.entries {
if new_files.contains(&entry.file) && !entry.attributions.is_empty() {
entry.attributions.clear();
any_pruned = true;
}
}
if !any_pruned {
// Fast path: just append the new checkpoint to the file
return self.append_single_checkpoint(checkpoints.last().unwrap());
}
// Pruning changed the second-to-last checkpoint, fall through to full rewrite
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 append_checkpoint_with_existing only prunes second-to-last checkpoint, missing stale attributions in earlier ones

The fast-path optimization in append_checkpoint_with_existing only checks checkpoints[len-2] (the second-to-last checkpoint) for attributions that need clearing. If the newly appended checkpoint touches a file whose most recent entry is in an earlier checkpoint (not the second-to-last), that earlier checkpoint's attributions are never pruned and the fast append path is incorrectly taken.

Reproduction scenario
  1. C0 checkpoints file A → disk: [C0(A with attrs)]
  2. C1 checkpoints file B → fast path appends, disk: [C0(A with attrs), C1(B with attrs)]
  3. C2 checkpoints file A → checks C1 (has B, not A), any_pruned=false → fast path appends, disk: [C0(A WITH stale attrs), C1(B with attrs), C2(A with attrs)]

C0's attributions for file A should have been cleared since C2 is now the newest for file A. The old append_checkpoint (src/git/repo_storage.rs:371-378) correctly handled this by calling prune_old_char_attributions on ALL checkpoints before every write.

Impact is limited to storage bloat (stale char-level attributions persist until a full rewrite happens), since build_previous_file_state_maps at src/commands/checkpoint.rs:1452 only uses the latest entry per file for computation.

Prompt for agents
In src/git/repo_storage.rs, the append_checkpoint_with_existing method at line 388 only checks the second-to-last checkpoint (checkpoints[len-2]) for pruning needs. It should instead check ALL earlier checkpoints for entries matching the new checkpoint's files. One approach: after stripping the transcript (line 392-398), iterate backwards from checkpoints[len-2] to checkpoints[0], checking if any entry's file is in new_files and has non-empty attributions. If any such entry is found, fall through to the full prune_old_char_attributions + write_all_checkpoints path. Only take the fast append path if NO earlier checkpoint has attributions that need clearing for the new checkpoint's files.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +366 to +414
match &op {
DiffOp::Delete {
old_index, old_len, ..
} => {
computation.line_stats.deletions += *old_len as u32;
for i in *old_index..(*old_index + *old_len) {
if let Some(line) = old_lines.get(i)
&& !line.text.trim().is_empty()
{
computation.line_stats.deletions_sloc += 1;
}
}
}
DiffOp::Insert {
new_index, new_len, ..
} => {
computation.line_stats.additions += *new_len as u32;
for i in *new_index..(*new_index + *new_len) {
if let Some(line) = new_lines.get(i)
&& !line.text.trim().is_empty()
{
computation.line_stats.additions_sloc += 1;
}
}
}
DiffOp::Replace {
old_index,
old_len,
new_index,
new_len,
} => {
computation.line_stats.deletions += *old_len as u32;
computation.line_stats.additions += *new_len as u32;
for i in *old_index..(*old_index + *old_len) {
if let Some(line) = old_lines.get(i)
&& !line.text.trim().is_empty()
{
computation.line_stats.deletions_sloc += 1;
}
}
for i in *new_index..(*new_index + *new_len) {
if let Some(line) = new_lines.get(i)
&& !line.text.trim().is_empty()
{
computation.line_stats.additions_sloc += 1;
}
}
}
DiffOp::Equal { .. } => unreachable!(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 DiffLineStats uses different diff post-processing than the replaced compute_file_line_stats, changing reported metrics

The new DiffLineStats accumulated in compute_diffs counts lines from capture_diff_slices (src/authorship/imara_diff_utils.rs:132), which does NOT call diff.postprocess_lines(). The old compute_file_line_stats (src/commands/checkpoint.rs:2012) used compute_line_changes (src/authorship/imara_diff_utils.rs:178) which DOES call diff.postprocess_lines(&input) at line 185. The postprocess_lines method adjusts hunk boundaries to produce more natural diffs matching git's output, which can change which lines are counted as added/deleted vs equal. This means the additions, deletions, additions_sloc, and deletions_sloc values persisted in checkpoint.line_stats and emitted as checkpoint metrics can differ from the old behavior for the same file changes.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

svarlamov and others added 3 commits March 27, 2026 16:56
Key optimizations:
- Eliminate redundant read_all_checkpoints calls (3-4 per operation → 1)
  by caching checkpoints in ResolvedCheckpointExecution
- Optimize attribute_unattributed_ranges from O(n*m) to O(n+m) using
  merged-interval sweep instead of per-character overlap checks
- Incremental JSONL append: skip full file rewrite when prior checkpoints
  are already pruned, just append the new checkpoint line
- Derive line stats from the attribution diff computation, eliminating a
  redundant second diff pass in compute_file_line_stats
- Content-addressed blob dedup: skip writing blobs that already exist
- Increase sync threshold to 30 files to avoid async task spawning
  overhead (Arc wrapping, semaphore, smol::unblock) for typical workloads
- Eliminate unnecessary clones of entries and checkpoints in the hot path
- Fast-path skip for hash migration when no 7-char hashes exist
- Use BufWriter for checkpoint serialization

Benchmark results (realistic partial-edit scenarios, A/B vs baseline):
  500 lines, 20% edit, 20 CPs:    63ms → 49ms  (1.28x)
  1000 lines, 10% edit, 20 CPs:  100ms → 52ms  (1.91x)
  1000 lines, 30% edit, 20 CPs:  127ms → 55ms  (2.30x)
  2000 lines, 10% edit, 20 CPs:  259ms → 58ms  (4.49x)
  3000 lines, 5% edit, 20 CPs:   343ms → 61ms  (5.61x)

All 3032 integration tests pass with no regressions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@svarlamov svarlamov force-pushed the perf/checkpoint-2x-speedup branch from 63a5b26 to 9d7e120 Compare March 27, 2026 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant