Skip to content

Commit 455611c

Browse files
authored
Defer state/block pruning until after block cascade completes (#240)
## Motivation During the devnet4 run (2026-03-13), all three ethlambda nodes entered an **infinite re-processing loop** at slot ~15276, generating ~3.5GB of logs each and consuming 100% CPU for hours. This PR fixes the root cause by deferring heavy state/block pruning until after a block processing cascade completes, so parent states survive long enough for their children to be processed. ## Root Cause The infinite loop is caused by **fallback pruning running inside the block processing cascade**, deleting states that pending children still need. ### The three interacting mechanisms **1. Asymmetric retention creates a state-header gap** When finalization stalls, fallback pruning keeps only `STATES_TO_KEEP=900` states but `BLOCKS_TO_KEEP=21600` headers. Block headers exist in DB without their states. **2. Chain walk reaches protected checkpoints** When a block arrives with a missing parent, `process_or_pend_block` walks ancestor headers looking for one whose parent has state. Protected checkpoints (justified/finalized) always have state, so the walk can reach blocks thousands of slots behind head. **3. Mid-cascade pruning deletes just-computed states** `on_block_core` calls `update_checkpoints` after every block, which runs `prune_old_states`. States for old slots (far behind head) are immediately deleted — even if they were just computed milliseconds ago by the same cascade. ### The loop ``` ┌──────────────────────────────────────────────┐ │ │ ▼ │ 1. Chain walk finds block 15266 (parent=4dda, justified) │ → parent state exists (protected) → enqueue for processing │ │ │ 2. Cascade processes 15266 → 15269 → ... → 15276 │ → states computed and stored │ │ │ 3. Each on_block_core calls update_checkpoints │ → fallback pruning runs → states for slots 15266-15276 │ are IMMEDIATELY deleted (slot < head - 900) │ │ │ 4. collect_pending_children(15276) finds block 15278 │ → process_or_pend_block(15278) │ → has_state(parent=15276) → FALSE (just pruned!) │ → stores as pending │ │ │ 5. Chain walk for 15278 re-discovers 15266 │ → parent 4dda still has state (protected) │ → enqueue 15266 ─────────────────────────────────────────────→─┘ ``` ### How it was triggered in devnet4 1. 9 validators, 7 clients. Finalization stalled at slot 15261 due to a fork at slot 15264 (qlean diverged). 2. At ~10:13:40 UTC, qlean's alternate fork blocks arrived at ethlambda via gossip. 3. The chain walk for these blocks traversed ~2000 slots back to the justified checkpoint. 4. The cascade re-processed blocks 15266→15276, but fallback pruning deleted each state immediately. 5. All three ethlambda nodes (validators 6, 7, 8) entered the loop simultaneously. ## Solution **Defer heavy pruning (states + blocks) until after the block cascade completes.** ### Before (pruning runs per-block, mid-cascade) ``` on_block └─ while queue: └─ process_or_pend_block └─ on_block_core └─ update_checkpoints ├─ write metadata ← immediate ├─ prune_live_chain ← immediate ├─ prune_gossip_signatures ← immediate ├─ prune_old_states ← DELETES PARENT STATES MID-CASCADE └─ prune_old_blocks ← DELETES BLOCK DATA MID-CASCADE ``` ### After (pruning deferred to end of cascade) ``` on_block └─ while queue: │ └─ process_or_pend_block │ └─ on_block_core │ └─ update_checkpoints │ ├─ write metadata ← immediate │ ├─ prune_live_chain ← immediate (fork choice correctness) │ ├─ prune_gossip_signatures ← immediate (cheap) │ └─ (no state/block pruning) │ └─ store.prune_old_data() ← runs ONCE after cascade ``` ### Split of `update_checkpoints` | Operation | Where it runs | Why | |-----------|--------------|-----| | Write head/justified/finalized metadata | `update_checkpoints` (per-block) | Checkpoints must be current for fork choice | | `prune_live_chain` | `update_checkpoints` (per-block) | Affects fork choice traversal | | `prune_gossip_signatures` | `update_checkpoints` (per-block) | Cheap, correctness-related | | `prune_attestation_data_by_root` | `update_checkpoints` (per-block) | Cheap, correctness-related | | `prune_old_states` | **`prune_old_data`** (after cascade) | Heavy, causes infinite loop if mid-cascade | | `prune_old_blocks` | **`prune_old_data`** (after cascade) | Heavy, coupled with state pruning | ### Why this fixes the loop With deferred pruning, the devnet4 scenario plays out safely: 1. Cascade processes 15266 → 15269 → ... → 15276 → **states are KEPT** (no pruning mid-cascade) 2. `collect_pending_children(15276)` finds 15278 → `has_state(parent=15276)` → **TRUE** (state still exists) 3. 15278 processes successfully, cascade continues through children 4. Queue empties, `while` loop ends 5. `prune_old_data()` runs once — deletes old states 6. Cascade is already done — no one re-triggers it ### Cross-client validation We surveyed how other lean consensus clients handle this (Lighthouse, Zeam, Ream, Qlean, Lantern, Grandine). **None of them prune states mid-cascade.** Common patterns: - **Zeam**: Canonicality-based pruning, only after finalization or after long stalls (14,400 slots). Never during block processing. - **Ream**: Prunes one state per tick (not during block import). - **Grandine**: Never prunes states (in-memory forever). - **Lighthouse**: Background migrator thread, decoupled from block import. ## Changes - **`crates/storage/src/store.rs`**: Split `update_checkpoints` — extract `prune_old_states`/`prune_old_blocks` into new `prune_old_data()` method. Lightweight pruning (live chain, signatures, attestation data) stays in `update_checkpoints`. - **`crates/blockchain/src/lib.rs`**: Call `store.prune_old_data()` once after the `on_block` while loop completes. - **Tests**: Updated `fallback_pruning_*` tests to call `prune_old_data()` explicitly. ## How to Test 1. `make test` — all 125 tests pass including 27 fork choice spec tests 2. Deploy to devnet with a multi-client setup where finalization stalls and alternate fork blocks arrive 3. Verify ethlambda nodes do not enter re-processing loops (no repeated "Block imported successfully" for the same slot in logs) 4. Monitor memory during long finalization stalls — temporary state accumulation during cascades is bounded by cascade size
1 parent 50527b2 commit 455611c

File tree

2 files changed

+34
-34
lines changed

2 files changed

+34
-34
lines changed

crates/blockchain/src/lib.rs

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -299,6 +299,11 @@ impl BlockChainServer {
299299
while let Some(block) = queue.pop_front() {
300300
self.process_or_pend_block(block, &mut queue);
301301
}
302+
303+
// Prune old states and blocks AFTER the entire cascade completes.
304+
// Running this mid-cascade would delete states that pending children
305+
// still need, causing re-processing loops when fallback pruning is active.
306+
self.store.prune_old_data();
302307
}
303308

304309
/// Try to process a single block. If its parent state is missing, store it

crates/storage/src/store.rs

Lines changed: 29 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -470,53 +470,41 @@ impl Store {
470470
batch.put_batch(Table::Metadata, entries).expect("put");
471471
batch.commit().expect("commit");
472472

473-
// Prune after successful checkpoint update
473+
// Lightweight pruning that should happen immediately on finalization advance:
474+
// live chain index, signatures, and attestation data. These are cheap and
475+
// affect fork choice correctness (live chain) or attestation processing.
476+
// Heavy state/block pruning is deferred to prune_old_data().
474477
if let Some(finalized) = checkpoints.finalized
475478
&& finalized.slot > old_finalized_slot
476479
{
477480
let pruned_chain = self.prune_live_chain(finalized.slot);
478-
479-
// Prune signatures and attestation data for finalized slots
480481
let pruned_sigs = self.prune_gossip_signatures(finalized.slot);
481482
let pruned_att_data = self.prune_attestation_data_by_root(finalized.slot);
482-
// Prune old states before blocks: state pruning uses headers for slot lookup
483-
let protected_roots = [finalized.root, self.latest_justified().root];
484-
let pruned_states = self.prune_old_states(&protected_roots);
485-
let pruned_blocks = self.prune_old_blocks(&protected_roots);
486-
487-
if pruned_chain > 0
488-
|| pruned_sigs > 0
489-
|| pruned_att_data > 0
490-
|| pruned_states > 0
491-
|| pruned_blocks > 0
492-
{
483+
484+
if pruned_chain > 0 || pruned_sigs > 0 || pruned_att_data > 0 {
493485
info!(
494486
finalized_slot = finalized.slot,
495-
pruned_chain,
496-
pruned_sigs,
497-
pruned_att_data,
498-
pruned_states,
499-
pruned_blocks,
500-
"Pruned finalized data"
501-
);
502-
}
503-
} else {
504-
// Fallback pruning when finalization is stalled.
505-
// When finalization doesn't advance, the normal pruning path above never
506-
// triggers. Prune old states and blocks on every head update to keep
507-
// storage bounded. The prune methods are no-ops when within retention limits.
508-
let protected_roots = [self.latest_finalized().root, self.latest_justified().root];
509-
let pruned_states = self.prune_old_states(&protected_roots);
510-
let pruned_blocks = self.prune_old_blocks(&protected_roots);
511-
if pruned_states > 0 || pruned_blocks > 0 {
512-
info!(
513-
pruned_states,
514-
pruned_blocks, "Fallback pruning (finalization stalled)"
487+
pruned_chain, pruned_sigs, pruned_att_data, "Pruned finalized data"
515488
);
516489
}
517490
}
518491
}
519492

493+
/// Prune old states and blocks to keep storage bounded.
494+
///
495+
/// This is separated from `update_checkpoints` so callers can defer heavy
496+
/// pruning until after a batch of blocks has been fully processed. Running
497+
/// this mid-cascade would delete states that pending children still need,
498+
/// causing infinite re-processing loops when fallback pruning is active.
499+
pub fn prune_old_data(&mut self) {
500+
let protected_roots = [self.latest_finalized().root, self.latest_justified().root];
501+
let pruned_states = self.prune_old_states(&protected_roots);
502+
let pruned_blocks = self.prune_old_blocks(&protected_roots);
503+
if pruned_states > 0 || pruned_blocks > 0 {
504+
info!(pruned_states, pruned_blocks, "Pruned old states and blocks");
505+
}
506+
}
507+
520508
// ============ Blocks ============
521509

522510
/// Get block data for fork choice: root -> (slot, parent_root).
@@ -1486,6 +1474,12 @@ mod tests {
14861474
let head_root = root(total_states as u64 - 1);
14871475
store.update_checkpoints(ForkCheckpoints::head_only(head_root));
14881476

1477+
// update_checkpoints no longer prunes states/blocks inline — the caller
1478+
// must invoke prune_old_data() separately (after a block cascade completes).
1479+
assert_eq!(count_entries(backend.as_ref(), Table::States), total_states);
1480+
1481+
store.prune_old_data();
1482+
14891483
// 3005 headers total. Top 3000 by slot are kept in the retention window,
14901484
// leaving 5 candidates. 2 are protected (finalized + justified),
14911485
// so 3 are pruned → 3005 - 3 = 3002 states remaining.
@@ -1530,6 +1524,7 @@ mod tests {
15301524
// Use the last inserted root as head
15311525
let head_root = root(STATES_TO_KEEP as u64 - 1);
15321526
store.update_checkpoints(ForkCheckpoints::head_only(head_root));
1527+
store.prune_old_data();
15331528

15341529
// Nothing should be pruned (within retention window)
15351530
assert_eq!(

0 commit comments

Comments
 (0)