Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet

# Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet

## Steps to Reproduce

1. Start Besu with default Bonsai storage in full sync mode:
   ```bash
   besu --network=mainnet --sync-mode=FULL --data-storage-format=BONSAI \
        --data-path=/path/to/data
   ```
2. Start a consensus client (e.g., Teku with checkpoint sync) pointed at Besu's engine API.
3. Wait for sync to reach block ~2,306,000.

**Expected behavior:** Sync continues through the September 2016 DoS attack blocks at reduced but non-zero throughput.

**Actual behavior:** The `importBlock` thread enters a livelock in `Address.addressHash()`'s Guava `LoadingCache`, contending with the parallel transaction processing worker threads. Sync throughput drops from ~6 Mg/s to 0.458 Mg/s, then to zero. The node stops importing blocks entirely and all RPC endpoints become unresponsive.

**Frequency:** 100% deterministic on every full sync attempt with Bonsai storage.

## Performance Degradation Timeline

Full sync was performed on a DigitalOcean droplet (4 vCPU, 8 GB RAM, volume storage) with Besu `v26.3.0-SNAPSHOT` (main branch at `fe2cbfe812`, which includes the fix for #9963) and Teku `v26.4.0` using checkpoint sync. Default Bonsai configuration — no special flags.

| Time (UTC) | Block | Blocks/10min | Mg/s | Notes |
|------------|-------|-------------|------|-------|
| 13:29 | 0 | — | — | Sync started |
| 13:40 | 80,000 | 70,000 | 4 | Frontier era, mostly empty |
| 15:00 | 520,000 | 38,000 | 5–6 | |
| 17:00 | 890,000 | 35,000 | 6–9 | |
| 19:20 | 1,260,000 | 22,000 | 5–7 | Post-Homestead |
| 22:00 | 1,660,000 | 17,000 | 6–7 | Pre-DAO activity |
| 01:10 | 1,920,000 | 13,000 | 5–8 | DAO fork |
| 05:50 | 2,220,000 | 11,000 | 5–6 | Approaching DoS zone |
| 06:43 | 2,280,000 | 4,600 | 4–6 | **Entering DoS zone** |
| 07:11 | 2,306,000 | — | 6.2 | Last normal speed |
| 07:15 | 2,306,200 | — | **0.458** | **Throughput cliff: 13x drop** |
| 07:15–08:05 | 2,306,351 | 0 | **0** | **Completely stalled, 50+ min** |

**Total time genesis to stall: 17 hours 46 minutes.**

## Root Cause

### The attack contract

Block 2,306,351 (September 22, 2016) contains 13 transactions. Transaction 5 (`0x4135170d...`) calls contract `0xd6a64d7e8c8a94fa5068ca33229d88436a743b14` with **4,300,000 gas** (92% of the block gas limit) and zero-length input data.

This is a known Shanghai DoS attack contract (bytecode contains the literal string `fromshanghai`):
- **8,832 bytes** of bytecode with **110 EXTCODESIZE** and **105 EXTCODECOPY** operations targeting **50+ hardcoded addresses**
- Pre-EIP-150 gas costs: EXTCODESIZE = 20 gas, EXTCODECOPY = 20 gas base
- With 4.3M gas, the contract executes **thousands of loop iterations**, each touching 50+ addresses
- This produces **tens of thousands of state lookups** in a single transaction

The same contract appears in multiple blocks in this range (e.g., block 2,316,320 with 990K gas), as well as similar attack contracts.

### Why Bonsai stalls

Pre-Byzantium blocks (before block 4,370,000 on mainnet) require intermediate state roots in transaction receipts. This means `FrontierTransactionReceiptFactory.create()` calls `BonsaiWorldState.frontierRootHash()` **for every transaction**, which triggers a full trie reconstruction via `calculateRootHash()` → `applyUpdatesAndComputeRoot()` → `commitAccountTrieAndComputeRoot()`.

Each call to `frontierRootHash()`:
1. Deep-copies the entire world state accumulator (`accumulator.copy()`)
2. Clears storage for self-destructed accounts
3. Reconstructs **every** account's storage trie from flat storage
4. Reconstructs the **entire** account trie, walking all BranchNodes
5. Commits the trie (Keccak256 hashing of every node)

On DoS attack blocks with transactions that touch thousands of accounts, the trie grows large and each `frontierRootHash()` call becomes progressively more expensive. By the time the node reaches the attack transactions, a single `frontierRootHash()` call can take **minutes** of CPU time.

### Two manifestations of the same bug

**With default configuration (parallel processing enabled):** The parallel transaction preprocessing threads and the sequential fallback import thread both call `Address.addressHash()` simultaneously, contending on the same Guava `LoadingCache` segment lock. This causes a **livelock** — zero progress indefinitely.

**With parallel processing disabled (`--bonsai-parallel-tx-processing-enabled=false` and `--bonsai-parallel-state-root-computation-enabled=false`):** No livelock, but individual DoS blocks take **30–60+ minutes** of sustained CPU at 250%+. The node becomes completely unresponsive (HTTP and IPC RPC both time out). After a restart, the node resumes from the same block and advances a few thousand blocks before hitting the next heavy block and stalling again.

Critically, **restarting the node unblocks it temporarily** — the world state accumulator starts empty for the block, making the first `frontierRootHash()` calls cheap. But as more transactions are processed within the block, the accumulator grows and each subsequent call becomes more expensive.

## Thread Dumps

### 1. Default config — livelock on Guava cache (block 2,306,351)

The import thread is stuck on `Address.addressHash()`, contending with parallel worker threads:

```
"EthScheduler-Services-1406 (importBlock)" cpu=8847177.42ms WAITING (parking)
  - parking to wait for <0x00000000889001c0> (a j.u.c.l.ReentrantLock$NonfairSync)
  at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2113)
  at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4946)
  at o.h.b.datatypes.Address.addressHash(Address.java:248)
  at o.h.b.ethereum.trie.pathbased.bonsai.worldview.BonsaiWorldState.get(BonsaiWorldState.java:412)
  [... StackedUpdater nesting ×3 ...]
  at o.h.b.evm.operation.ExtCodeSizeOperation.execute(ExtCodeSizeOperation.java:63)
  at o.h.b.ethereum.mainnet.parallelization.MainnetParallelBlockProcessor
      .lambda$getTransactionProcessingResult$1(MainnetParallelBlockProcessor.java:118)
  at java.util.Optional.orElseGet(Optional.java:364)   ← sequential fallback path
```

### 2. No parallel — trie put (block ~2,320,700, 40+ min stuck)

```
"EthScheduler-Services-24 (importBlock)" cpu=4286272.55ms RUNNABLE
  at com.google.common.cache.LocalCache$Segment.recordRead(LocalCache.java:2538)
  at o.h.b.ethereum.trie.pathbased.bonsai.cache.BonsaiCachedMerkleTrieLoader
      .getAccountStateTrieNode(BonsaiCachedMerkleTrieLoader.java:143)
  at o.h.b.ethereum.trie.StoredNode.load(StoredNode.java:133)
  at o.h.b.ethereum.trie.patricia.PutVisitor.visit(PutVisitor.java:74)
  at o.h.b.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:85) [×4 levels]
  at o.h.b.ethereum.trie.StoredMerkleTrie.put(StoredMerkleTrie.java:114)
  at o.h.b.ethereum.trie.pathbased.bonsai.worldview.BonsaiWorldState
      .updateTheAccounts(BonsaiWorldState.java:203)
  at o.h.b.ethereum.trie.pathbased.bonsai.worldview.BonsaiWorldState
      .commitAccountTrieAndComputeRoot(BonsaiWorldState.java:175)
  [... same frontierRootHash path ...]
```

### 3. No parallel — Keccak hashing (block ~2,315,700, 30+ min stuck)

```
"EthScheduler-Services-80 (importBlock)" cpu=640196.32ms RUNNABLE
  at org.bouncycastle.crypto.digests.KeccakDigest.<init>(Unknown Source)
  at o.h.b.crypto.Hash.keccak256(Hash.java:88)
  at o.h.b.ethereum.trie.patricia.BranchNode.getHash(BranchNode.java:163)
  at o.h.b.ethereum.trie.CommitVisitor.maybeStoreNode(CommitVisitor.java:77)
  at o.h.b.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:59)
  at o.h.b.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95) [×5 levels]
  at o.h.b.ethereum.trie.StoredMerkleTrie.commit(StoredMerkleTrie.java:149)
  at o.h.b.ethereum.trie.pathbased.bonsai.worldview.BonsaiWorldState
      .commitAccountTrieAndComputeRoot(BonsaiWorldState.java:178)
  [... same frontierRootHash path ...]
```

All three thread dumps converge on the same code path: `frontierRootHash()` → `commitAccountTrieAndComputeRoot()`.

## Minimal Reproducer (Unit Test)

The following test simulates a block with 500 transactions, calling `frontierRootHash()` after each one (as `FrontierTransactionReceiptFactory` does for pre-Byzantium blocks). It compares performance with parallel state root computation enabled vs disabled, using RocksDB-backed storage.

```java
class BonsaiFrontierRootHashPerformanceTest {

  private static final int NUM_ACCOUNTS = 500;
  private static final int SLOTS_PER_ACCOUNT = 50;
  private static final double MAX_PARALLEL_SLOWDOWN_FACTOR = 1.5;

  @TempDir Path tempDir;

  @Test
  void parallelTrieShouldNotAddExcessiveOverheadToFrontierRootHash() {
    final long sequentialNanos =
        measureFrontierRootHash(tempDir.resolve("sequential"), false);
    final long parallelNanos =
        measureFrontierRootHash(tempDir.resolve("parallel"), true);
    final double slowdownFactor = (double) parallelNanos / sequentialNanos;
    // Actual on NVMe SSD: sequential=2253ms, parallel=4281ms, factor=1.90x
    // On network-attached storage (cloud VPS): factor is significantly worse.
    assertThat(slowdownFactor)
        .as("Parallel trie should not make frontierRootHash() more than %.1fx slower "
            + "(sequential=%dms, parallel=%dms, factor=%.2f)",
            MAX_PARALLEL_SLOWDOWN_FACTOR,
            sequentialNanos / 1_000_000, parallelNanos / 1_000_000, slowdownFactor)
        .isLessThanOrEqualTo(MAX_PARALLEL_SLOWDOWN_FACTOR);
  }

  private long measureFrontierRootHash(final Path dbPath, final boolean parallelEnabled) {
    final Blockchain blockchain = mock(Blockchain.class);
    final PathBasedExtraStorageConfiguration extraConfig =
        ImmutablePathBasedExtraStorageConfiguration.builder()
            .parallelStateRootComputationEnabled(parallelEnabled)
            .parallelTxProcessingEnabled(false)
            .build();
    final StorageProvider storageProvider = createRocksDBStorageProvider(dbPath);
    final BonsaiWorldStateKeyValueStorage storage =
        (BonsaiWorldStateKeyValueStorage)
            storageProvider.createWorldStateStorage(
                DataStorageConfiguration.DEFAULT_BONSAI_CONFIG);
    final BonsaiWorldStateProvider archive =
        new BonsaiWorldStateProvider(storage, blockchain, extraConfig,
            new BonsaiCachedMerkleTrieLoader(new NoOpMetricsSystem()),
            null, EvmConfiguration.DEFAULT, throwingWorldStateHealerSupplier(),
            new CodeCache());
    final MutableWorldState worldState = archive.getWorldState();

    // Populate initial state: 500 accounts × 50 storage slots
    final WorldUpdater setup = worldState.updater();
    for (int i = 0; i < NUM_ACCOUNTS; i++) {
      final MutableAccount account = setup.createAccount(accountAddress(i));
      account.setBalance(Wei.of(1));
      for (int s = 0; s < SLOTS_PER_ACCOUNT; s++)
        account.setStorageValue(UInt256.valueOf(s), UInt256.valueOf(s + 1));
    }
    setup.commit();
    worldState.persist(null);

    // Warm up
    WorldUpdater warmup = worldState.updater();
    warmup.getAccount(accountAddress(0)).setBalance(Wei.of(999));
    warmup.commit();
    worldState.frontierRootHash();

    // Measure 500 frontierRootHash() calls (simulating 500 txs in a block)
    final long start = System.nanoTime();
    Hash prev = null;
    for (int i = 0; i < NUM_ACCOUNTS; i++) {
      final WorldUpdater updater = worldState.updater();
      final MutableAccount account = updater.getAccount(accountAddress(i));
      account.setBalance(Wei.of(1000 + i));
      account.setStorageValue(UInt256.valueOf(0), UInt256.valueOf(i + 1));
      updater.commit();
      final Hash hash = worldState.frontierRootHash();
      assertThat(hash).isNotNull();
      if (prev != null) assertThat(hash).isNotEqualTo(prev);
      prev = hash;
    }
    final long elapsed = System.nanoTime() - start;
    try { storageProvider.close(); } catch (Exception e) { /* ignore */ }
    return elapsed;
  }

  // createRocksDBStorageProvider creates a RocksDB-backed StorageProvider
  // using KeyValueStorageProviderBuilder + RocksDBKeyValueStorageFactory
  // pointed at the given dataPath. (Full implementation in the test file.)

  private static Address accountAddress(final int index) {
    return Address.fromHexStringStrict(String.format("0x%040x", 0x1000 + index));
  }
}
```

**Test result on local NVMe SSD:**
```
frontierRootHash x500 (RocksDB): sequential=2253ms, parallel=4281ms, factor=1.90
```

The parallel trie adds **1.90x overhead** even on fast local storage. On the DigitalOcean VPS with network-attached storage where the full sync was tested, the overhead is dramatically worse — enough to cause the complete stall described above.

## Workaround

Disabling both parallel features makes the sync *possible* but extremely slow (~30–60 min per heavy block, requiring periodic restarts):

```toml
bonsai-parallel-tx-processing-enabled=false
bonsai-parallel-state-root-computation-enabled=false
```

## Verified On

| Config | Commit | Stuck Block | Behavior |
|--------|--------|-------------|----------|
| Default (parallel enabled) | `fe2cbfe812` | 2,306,351 | Livelock — 0 progress, 50+ min |
| Parallel disabled | `fe2cbfe812` | ~2,315,700 | 30–60 min per heavy block, needs restarts |

Both tested on Ubuntu 22.04, OpenJDK 21.0.10, DigitalOcean 4 vCPU / 8 GB RAM / network-attached volume storage.

## Versions

* **Software version:** Besu 26.3.0-SNAPSHOT (`fe2cbfe812`, includes fix for #9963)
* **Java version:** OpenJDK 21.0.10+7-Ubuntu
* **OS:** Ubuntu 22.04 (linux-x86_64)
* **Consensus Client:** Teku 26.4.0 (with checkpoint sync)
* **Hardware:** DigitalOcean 4 vCPU, 8 GB RAM, network-attached volume storage

## Additional Information

* **Impact:** `besu --sync-mode=FULL --data-storage-format=BONSAI` cannot sync through the 2016 DoS attack blocks (~2.28M–2.46M) on ETH mainnet with default configuration. With parallel processing disabled, it can technically progress but requires manual restarts every ~5,000–8,000 blocks and individual blocks can take 30–60+ minutes, rendering the node unresponsive during that time. Any node operator attempting full sync from genesis on any pre-Byzantium network hits this wall.
* **Root cause scope:** The `frontierRootHash()` per-transaction trie reconstruction is the fundamental issue. The parallel processing features exacerbate it (livelock vs. slow) but are not the root cause. FOREST storage does not have this problem because `frontierRootHash()` returns the already-computed root hash in O(1).
* **Related:** #9963 (`clearStorage` infinite loop at block 347,481) — same entry point (`frontierRootHash()`), different bottleneck. The fix for #9963 is included in the tested version.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet #10155

Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet

Steps to Reproduce

Performance Degradation Timeline

Root Cause

The attack contract

Why Bonsai stalls

Two manifestations of the same bug

Thread Dumps

1. Default config — livelock on Guava cache (block 2,306,351)

2. No parallel — trie put (block ~2,320,700, 40+ min stuck)

3. No parallel — Keccak hashing (block ~2,315,700, 30+ min stuck)

Minimal Reproducer (Unit Test)

Workaround

Verified On

Versions

Additional Information

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Time (UTC)	Block	Blocks/10min	Mg/s	Notes
13:29	0	—	—	Sync started
13:40	80,000	70,000	4	Frontier era, mostly empty
15:00	520,000	38,000	5–6
17:00	890,000	35,000	6–9
19:20	1,260,000	22,000	5–7	Post-Homestead
22:00	1,660,000	17,000	6–7	Pre-DAO activity
01:10	1,920,000	13,000	5–8	DAO fork
05:50	2,220,000	11,000	5–6	Approaching DoS zone
06:43	2,280,000	4,600	4–6	Entering DoS zone
07:11	2,306,000	—	6.2	Last normal speed
07:15	2,306,200	—	0.458	Throughput cliff: 13x drop
07:15–08:05	2,306,351	0	0	Completely stalled, 50+ min

Config	Commit	Stuck Block	Behavior
Default (parallel enabled)	`fe2cbfe812`	2,306,351	Livelock — 0 progress, 50+ min
Parallel disabled	`fe2cbfe812`	~2,315,700	30–60 min per heavy block, needs restarts

Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet #10155

Description

Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet

Steps to Reproduce

Performance Degradation Timeline

Root Cause

The attack contract

Why Bonsai stalls

Two manifestations of the same bug

Thread Dumps

1. Default config — livelock on Guava cache (block 2,306,351)

2. No parallel — trie put (block ~2,320,700, 40+ min stuck)

3. No parallel — Keccak hashing (block ~2,315,700, 30+ min stuck)

Minimal Reproducer (Unit Test)

Workaround

Verified On

Versions

Additional Information

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions