Skip to content

Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet #10155

@diega

Description

@diega

Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet

Steps to Reproduce

  1. Start Besu with default Bonsai storage in full sync mode:
    besu --network=mainnet --sync-mode=FULL --data-storage-format=BONSAI \
         --data-path=/path/to/data
  2. Start a consensus client (e.g., Teku with checkpoint sync) pointed at Besu's engine API.
  3. Wait for sync to reach block ~2,306,000.

Expected behavior: Sync continues through the September 2016 DoS attack blocks at reduced but non-zero throughput.

Actual behavior: The importBlock thread enters a livelock in Address.addressHash()'s Guava LoadingCache, contending with the parallel transaction processing worker threads. Sync throughput drops from ~6 Mg/s to 0.458 Mg/s, then to zero. The node stops importing blocks entirely and all RPC endpoints become unresponsive.

Frequency: 100% deterministic on every full sync attempt with Bonsai storage.

Performance Degradation Timeline

Full sync was performed on a DigitalOcean droplet (4 vCPU, 8 GB RAM, volume storage) with Besu v26.3.0-SNAPSHOT (main branch at fe2cbfe812, which includes the fix for #9963) and Teku v26.4.0 using checkpoint sync. Default Bonsai configuration — no special flags.

Time (UTC) Block Blocks/10min Mg/s Notes
13:29 0 Sync started
13:40 80,000 70,000 4 Frontier era, mostly empty
15:00 520,000 38,000 5–6
17:00 890,000 35,000 6–9
19:20 1,260,000 22,000 5–7 Post-Homestead
22:00 1,660,000 17,000 6–7 Pre-DAO activity
01:10 1,920,000 13,000 5–8 DAO fork
05:50 2,220,000 11,000 5–6 Approaching DoS zone
06:43 2,280,000 4,600 4–6 Entering DoS zone
07:11 2,306,000 6.2 Last normal speed
07:15 2,306,200 0.458 Throughput cliff: 13x drop
07:15–08:05 2,306,351 0 0 Completely stalled, 50+ min

Total time genesis to stall: 17 hours 46 minutes.

Root Cause

The attack contract

Block 2,306,351 (September 22, 2016) contains 13 transactions. Transaction 5 (0x4135170d...) calls contract 0xd6a64d7e8c8a94fa5068ca33229d88436a743b14 with 4,300,000 gas (92% of the block gas limit) and zero-length input data.

This is a known Shanghai DoS attack contract (bytecode contains the literal string fromshanghai):

  • 8,832 bytes of bytecode with 110 EXTCODESIZE and 105 EXTCODECOPY operations targeting 50+ hardcoded addresses
  • Pre-EIP-150 gas costs: EXTCODESIZE = 20 gas, EXTCODECOPY = 20 gas base
  • With 4.3M gas, the contract executes thousands of loop iterations, each touching 50+ addresses
  • This produces tens of thousands of state lookups in a single transaction

The same contract appears in multiple blocks in this range (e.g., block 2,316,320 with 990K gas), as well as similar attack contracts.

Why Bonsai stalls

Pre-Byzantium blocks (before block 4,370,000 on mainnet) require intermediate state roots in transaction receipts. This means FrontierTransactionReceiptFactory.create() calls BonsaiWorldState.frontierRootHash() for every transaction, which triggers a full trie reconstruction via calculateRootHash()applyUpdatesAndComputeRoot()commitAccountTrieAndComputeRoot().

Each call to frontierRootHash():

  1. Deep-copies the entire world state accumulator (accumulator.copy())
  2. Clears storage for self-destructed accounts
  3. Reconstructs every account's storage trie from flat storage
  4. Reconstructs the entire account trie, walking all BranchNodes
  5. Commits the trie (Keccak256 hashing of every node)

On DoS attack blocks with transactions that touch thousands of accounts, the trie grows large and each frontierRootHash() call becomes progressively more expensive. By the time the node reaches the attack transactions, a single frontierRootHash() call can take minutes of CPU time.

Two manifestations of the same bug

With default configuration (parallel processing enabled): The parallel transaction preprocessing threads and the sequential fallback import thread both call Address.addressHash() simultaneously, contending on the same Guava LoadingCache segment lock. This causes a livelock — zero progress indefinitely.

With parallel processing disabled (--bonsai-parallel-tx-processing-enabled=false and --bonsai-parallel-state-root-computation-enabled=false): No livelock, but individual DoS blocks take 30–60+ minutes of sustained CPU at 250%+. The node becomes completely unresponsive (HTTP and IPC RPC both time out). After a restart, the node resumes from the same block and advances a few thousand blocks before hitting the next heavy block and stalling again.

Critically, restarting the node unblocks it temporarily — the world state accumulator starts empty for the block, making the first frontierRootHash() calls cheap. But as more transactions are processed within the block, the accumulator grows and each subsequent call becomes more expensive.

Thread Dumps

1. Default config — livelock on Guava cache (block 2,306,351)

The import thread is stuck on Address.addressHash(), contending with parallel worker threads:

"EthScheduler-Services-1406 (importBlock)" cpu=8847177.42ms WAITING (parking)
  - parking to wait for <0x00000000889001c0> (a j.u.c.l.ReentrantLock$NonfairSync)
  at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2113)
  at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4946)
  at o.h.b.datatypes.Address.addressHash(Address.java:248)
  at o.h.b.ethereum.trie.pathbased.bonsai.worldview.BonsaiWorldState.get(BonsaiWorldState.java:412)
  [... StackedUpdater nesting ×3 ...]
  at o.h.b.evm.operation.ExtCodeSizeOperation.execute(ExtCodeSizeOperation.java:63)
  at o.h.b.ethereum.mainnet.parallelization.MainnetParallelBlockProcessor
      .lambda$getTransactionProcessingResult$1(MainnetParallelBlockProcessor.java:118)
  at java.util.Optional.orElseGet(Optional.java:364)   ← sequential fallback path

2. No parallel — trie put (block ~2,320,700, 40+ min stuck)

"EthScheduler-Services-24 (importBlock)" cpu=4286272.55ms RUNNABLE
  at com.google.common.cache.LocalCache$Segment.recordRead(LocalCache.java:2538)
  at o.h.b.ethereum.trie.pathbased.bonsai.cache.BonsaiCachedMerkleTrieLoader
      .getAccountStateTrieNode(BonsaiCachedMerkleTrieLoader.java:143)
  at o.h.b.ethereum.trie.StoredNode.load(StoredNode.java:133)
  at o.h.b.ethereum.trie.patricia.PutVisitor.visit(PutVisitor.java:74)
  at o.h.b.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:85) [×4 levels]
  at o.h.b.ethereum.trie.StoredMerkleTrie.put(StoredMerkleTrie.java:114)
  at o.h.b.ethereum.trie.pathbased.bonsai.worldview.BonsaiWorldState
      .updateTheAccounts(BonsaiWorldState.java:203)
  at o.h.b.ethereum.trie.pathbased.bonsai.worldview.BonsaiWorldState
      .commitAccountTrieAndComputeRoot(BonsaiWorldState.java:175)
  [... same frontierRootHash path ...]

3. No parallel — Keccak hashing (block ~2,315,700, 30+ min stuck)

"EthScheduler-Services-80 (importBlock)" cpu=640196.32ms RUNNABLE
  at org.bouncycastle.crypto.digests.KeccakDigest.<init>(Unknown Source)
  at o.h.b.crypto.Hash.keccak256(Hash.java:88)
  at o.h.b.ethereum.trie.patricia.BranchNode.getHash(BranchNode.java:163)
  at o.h.b.ethereum.trie.CommitVisitor.maybeStoreNode(CommitVisitor.java:77)
  at o.h.b.ethereum.trie.CommitVisitor.visit(CommitVisitor.java:59)
  at o.h.b.ethereum.trie.patricia.BranchNode.accept(BranchNode.java:95) [×5 levels]
  at o.h.b.ethereum.trie.StoredMerkleTrie.commit(StoredMerkleTrie.java:149)
  at o.h.b.ethereum.trie.pathbased.bonsai.worldview.BonsaiWorldState
      .commitAccountTrieAndComputeRoot(BonsaiWorldState.java:178)
  [... same frontierRootHash path ...]

All three thread dumps converge on the same code path: frontierRootHash()commitAccountTrieAndComputeRoot().

Minimal Reproducer (Unit Test)

The following test simulates a block with 500 transactions, calling frontierRootHash() after each one (as FrontierTransactionReceiptFactory does for pre-Byzantium blocks). It compares performance with parallel state root computation enabled vs disabled, using RocksDB-backed storage.

class BonsaiFrontierRootHashPerformanceTest {

  private static final int NUM_ACCOUNTS = 500;
  private static final int SLOTS_PER_ACCOUNT = 50;
  private static final double MAX_PARALLEL_SLOWDOWN_FACTOR = 1.5;

  @TempDir Path tempDir;

  @Test
  void parallelTrieShouldNotAddExcessiveOverheadToFrontierRootHash() {
    final long sequentialNanos =
        measureFrontierRootHash(tempDir.resolve("sequential"), false);
    final long parallelNanos =
        measureFrontierRootHash(tempDir.resolve("parallel"), true);
    final double slowdownFactor = (double) parallelNanos / sequentialNanos;
    // Actual on NVMe SSD: sequential=2253ms, parallel=4281ms, factor=1.90x
    // On network-attached storage (cloud VPS): factor is significantly worse.
    assertThat(slowdownFactor)
        .as("Parallel trie should not make frontierRootHash() more than %.1fx slower "
            + "(sequential=%dms, parallel=%dms, factor=%.2f)",
            MAX_PARALLEL_SLOWDOWN_FACTOR,
            sequentialNanos / 1_000_000, parallelNanos / 1_000_000, slowdownFactor)
        .isLessThanOrEqualTo(MAX_PARALLEL_SLOWDOWN_FACTOR);
  }

  private long measureFrontierRootHash(final Path dbPath, final boolean parallelEnabled) {
    final Blockchain blockchain = mock(Blockchain.class);
    final PathBasedExtraStorageConfiguration extraConfig =
        ImmutablePathBasedExtraStorageConfiguration.builder()
            .parallelStateRootComputationEnabled(parallelEnabled)
            .parallelTxProcessingEnabled(false)
            .build();
    final StorageProvider storageProvider = createRocksDBStorageProvider(dbPath);
    final BonsaiWorldStateKeyValueStorage storage =
        (BonsaiWorldStateKeyValueStorage)
            storageProvider.createWorldStateStorage(
                DataStorageConfiguration.DEFAULT_BONSAI_CONFIG);
    final BonsaiWorldStateProvider archive =
        new BonsaiWorldStateProvider(storage, blockchain, extraConfig,
            new BonsaiCachedMerkleTrieLoader(new NoOpMetricsSystem()),
            null, EvmConfiguration.DEFAULT, throwingWorldStateHealerSupplier(),
            new CodeCache());
    final MutableWorldState worldState = archive.getWorldState();

    // Populate initial state: 500 accounts × 50 storage slots
    final WorldUpdater setup = worldState.updater();
    for (int i = 0; i < NUM_ACCOUNTS; i++) {
      final MutableAccount account = setup.createAccount(accountAddress(i));
      account.setBalance(Wei.of(1));
      for (int s = 0; s < SLOTS_PER_ACCOUNT; s++)
        account.setStorageValue(UInt256.valueOf(s), UInt256.valueOf(s + 1));
    }
    setup.commit();
    worldState.persist(null);

    // Warm up
    WorldUpdater warmup = worldState.updater();
    warmup.getAccount(accountAddress(0)).setBalance(Wei.of(999));
    warmup.commit();
    worldState.frontierRootHash();

    // Measure 500 frontierRootHash() calls (simulating 500 txs in a block)
    final long start = System.nanoTime();
    Hash prev = null;
    for (int i = 0; i < NUM_ACCOUNTS; i++) {
      final WorldUpdater updater = worldState.updater();
      final MutableAccount account = updater.getAccount(accountAddress(i));
      account.setBalance(Wei.of(1000 + i));
      account.setStorageValue(UInt256.valueOf(0), UInt256.valueOf(i + 1));
      updater.commit();
      final Hash hash = worldState.frontierRootHash();
      assertThat(hash).isNotNull();
      if (prev != null) assertThat(hash).isNotEqualTo(prev);
      prev = hash;
    }
    final long elapsed = System.nanoTime() - start;
    try { storageProvider.close(); } catch (Exception e) { /* ignore */ }
    return elapsed;
  }

  // createRocksDBStorageProvider creates a RocksDB-backed StorageProvider
  // using KeyValueStorageProviderBuilder + RocksDBKeyValueStorageFactory
  // pointed at the given dataPath. (Full implementation in the test file.)

  private static Address accountAddress(final int index) {
    return Address.fromHexStringStrict(String.format("0x%040x", 0x1000 + index));
  }
}

Test result on local NVMe SSD:

frontierRootHash x500 (RocksDB): sequential=2253ms, parallel=4281ms, factor=1.90

The parallel trie adds 1.90x overhead even on fast local storage. On the DigitalOcean VPS with network-attached storage where the full sync was tested, the overhead is dramatically worse — enough to cause the complete stall described above.

Workaround

Disabling both parallel features makes the sync possible but extremely slow (~30–60 min per heavy block, requiring periodic restarts):

bonsai-parallel-tx-processing-enabled=false
bonsai-parallel-state-root-computation-enabled=false

Verified On

Config Commit Stuck Block Behavior
Default (parallel enabled) fe2cbfe812 2,306,351 Livelock — 0 progress, 50+ min
Parallel disabled fe2cbfe812 ~2,315,700 30–60 min per heavy block, needs restarts

Both tested on Ubuntu 22.04, OpenJDK 21.0.10, DigitalOcean 4 vCPU / 8 GB RAM / network-attached volume storage.

Versions

Additional Information

  • Impact: besu --sync-mode=FULL --data-storage-format=BONSAI cannot sync through the 2016 DoS attack blocks (~2.28M–2.46M) on ETH mainnet with default configuration. With parallel processing disabled, it can technically progress but requires manual restarts every ~5,000–8,000 blocks and individual blocks can take 30–60+ minutes, rendering the node unresponsive during that time. Any node operator attempting full sync from genesis on any pre-Byzantium network hits this wall.
  • Root cause scope: The frontierRootHash() per-transaction trie reconstruction is the fundamental issue. The parallel processing features exacerbate it (livelock vs. slow) but are not the root cause. FOREST storage does not have this problem because frontierRootHash() returns the already-computed root hash in O(1).
  • Related: Bonsai clearStorage() causes full sync to hang indefinitely at block 347,481 #9963 (clearStorage infinite loop at block 347,481) — same entry point (frontierRootHash()), different bottleneck. The fix for Bonsai clearStorage() causes full sync to hang indefinitely at block 347,481 #9963 is included in the tested version.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions