Skip to content

fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks#10220

Open
diega wants to merge 3 commits intobesu-eth:mainfrom
diega:frontier-root-hash-tracker
Open

fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks#10220
diega wants to merge 3 commits intobesu-eth:mainfrom
diega:frontier-root-hash-tracker

Conversation

@diega
Copy link
Copy Markdown
Contributor

@diega diega commented Apr 10, 2026

PR description

Bonsai full sync hangs indefinitely at the 2016 DoS attack blocks (~2.3M) on mainnet because frontierRootHash() copies the entire accumulator and rebuilds the Merkle trie from scratch on every per-transaction call, giving O(N²) total cost per block.

Commit 1: copy constructor refactor

Replaces the two-phase cloneFromUpdater copy pattern in PathBasedWorldStateUpdateAccumulator with a proper copy constructor chain. Each level copies its own fields with compile-time type safety. This is a prerequisite for commit 2, which adds a subclass-specific field (frontierDirtyAddresses) that needs to be copied correctly.

Commit 2: incremental frontier root computation

Two trackers make the frontier receipt path incremental instead of rebuilding from scratch on every call. They share lifecycle (built once per block, reset on persist()), are paired via FrontierRootHashTracker.StorageRootUpdater, and cover the two layers of the state that the frontier path touches.

FrontierRootHashTracker handles the account trie. The accumulator's commit() override captures addresses from getUpdatedAccounts() and getDeletedAccountAddresses() into a frontierDirtyAddresses set (after super.commit() has populated accountsToUpdate, so every dirty address is guaranteed to resolve). On each frontierRootHash() call the tracker lazily creates the account trie, applies only the addresses dirtied since the previous call, and clears them on success.

FrontierStorageRootTracker handles the storage tries. It keeps each dirty account's storage trie alive and applies only the slot deltas introduced since the last call. Two implementations:

  • CachingFrontierStorageRootTracker: the incremental path described above.
  • NO_OP: wired in when worldStateConfig.isTrieDisabled() is true. The frontier receipt path is unreachable in that mode (snap sync downloads receipts from peers rather than recomputing them), so this sentinel just makes the invariant explicit and keeps the caching impl branch-free on the config flag.

Why it's safe to operate on the live accumulator instead of copying it: the previous accumulator.copy() was introduced to prevent setStorageRoot() mutations from corrupting the live state. But setStorageRoot() sets deterministic values computed from the trie: the same values that persist() will compute at end-of-block. The frontier path writes no trie nodes to storage. The accumulator is only read and mutated in the same way that persist() would later.

Commit 3: BonsaiTrieFactory

Commit 2 gives the frontier path its own trie lifecycle, but the underlying tries still went through the normal createTrie() path, which returns ParallelStoredMerklePatriciaTrie when parallel computation is enabled. This adds ForkJoinPool scheduling overhead on every per-transaction call without benefit: the frontier path is inherently sequential because each receipt depends on the prior transaction's state root.

This commit introduces BonsaiTrieFactory with a TrieMode enum (ALWAYS_SEQUENTIAL vs PARALLELIZE_ALLOWED) that centralizes the trie implementation decision. The frontier path passes ALWAYS_SEQUENTIAL for both account and storage tries, eliminating the parallel overhead. The normal persist() / block computation path continues to use PARALLELIZE_ALLOWED.

Post-Byzantium receipts use a status code instead of a state root, so the frontier path is only active for pre-Byzantium blocks. Neither commit changes behavior for post-Byzantium blocks or the normal block computation path.

Empirical validation

Full sync of mainnet with Bonsai + --bonsai-parallel-tx-processing-enabled, driven by a Teku 26.2.0 on the same machine via the engine API:

Build Result
main Silently stalls at block ~347k inside clearStorage → entriesFrom → TreeMap.put. No OOM, no exception; CPU pegged on the importBlock thread until it quiesces. A thread dump shows ~25 min of accumulated CPU time in that single path.
main + these commits + #10235 (Caffeine Address.hashCache) Cleared the full 2,283,416–2,700,000 "from Shanghai" DoS window with 0 state-root mismatches and 0 stalls. Throughput drops to ~3–10 Mg/s inside the window (blocks are legitimately heavy) and recovers to 55–80 Mg/s once out, as expected.

The 347k stall is not inside the DoS zone; it's caused by the O(N²) frontier receipt path itself, triggered by any pre-Byzantium block that SELFDESTRUCTs a contract with non-trivial storage (common in the Frontier/Homestead era). The DoS blocks are where the issue becomes spectacular, but the bug is active much earlier. The two fixes are complementary: either one alone leaves one of the two stalls standing; only the combination clears the window end to end.

Verifying the performance improvement locally

The O(N²) scaling can be reproduced by measuring individual frontierRootHash() call times across a simulated block (500 accounts × 20 storage slots, no persist() between calls):

// Setup: create 500 accounts each with 20 storage slots, commit, persist
// ...
for (int i = 0; i < 500; i++) {
    WorldUpdater updater = worldState.updater();
    updater.getAccount(addresses[i]).setBalance(Wei.of(1000 + i));
    updater.commit();
    updater.markTransactionBoundary();

    long start = System.nanoTime();
    worldState.frontierRootHash();
    long elapsed = System.nanoTime() - start;
    // Before fix: elapsed grows linearly with i (O(N²) total)
    // After fix: elapsed stays constant
}

Before this PR: the last call takes 8–11x longer than the first.
After this PR: the last call is the same speed or faster than the first.

The parallel trie overhead can be verified by comparing frontierRootHash() wall time with parallelStateRootComputationEnabled=true vs false using RocksDB-backed storage. Before this PR: ~1.9x overhead. After: ~1.0x.

Fixed Issue(s)

fixes #10155

Thanks for sending a pull request! Have you done the following?

  • Checked out our contribution guidelines?
  • Considered documentation and added the doc-change-required label to this PR if updates are required.
  • Considered the changelog and included an update if required.
  • For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

  • spotless: ./gradlew spotlessApply
  • unit tests: ./gradlew build
  • acceptance tests: ./gradlew acceptanceTest
  • integration tests: ./gradlew integrationTest
  • reference tests: ./gradlew ethereum:referenceTests:referenceTests

@diega diega force-pushed the frontier-root-hash-tracker branch from 89ba8ec to a55b9d8 Compare April 13, 2026 23:02
…or hierarchy

Replace the two-phase copy pattern (construct empty + cloneFromUpdater)
with a proper copy constructor chain. Each level copies its own fields
with compile-time type safety, eliminating the need for a separate
public cloneFromUpdater method that could be called without copying
subclass-specific state.

Signed-off-by: Diego López León <dieguitoll@gmail.com>
@diega diega force-pushed the frontier-root-hash-tracker branch from a55b9d8 to 6955d74 Compare April 13, 2026 23:27
@diega diega force-pushed the frontier-root-hash-tracker branch from 6955d74 to 91b010c Compare April 16, 2026 04:12
diega added 2 commits April 16, 2026 01:39
…antium blocks

On pre-Byzantium blocks, FrontierTransactionReceiptFactory calls
frontierRootHash() for every transaction to include intermediate state
roots in receipts. The previous implementation copied the entire
accumulator and rebuilt the Merkle trie from scratch on each call, giving
O(N) cost per call and O(N²) total for a block with N transactions. On
the 2016 DoS attack blocks (~2.3M on mainnet), this caused full sync to
stall indefinitely.

Two trackers now cache the relevant trie state per block and apply only
the deltas since the last call:

- FrontierRootHashTracker: keeps the account trie alive, re-applies only
  accounts dirtied since the previous call.

- FrontierStorageRootTracker (interface + CachingFrontierStorageRootTracker
  impl + NO_OP for trie-disabled snap sync): keeps each dirty account's
  storage trie alive and applies only the slot deltas since the previous
  call.

The storage tracker can't rely on the accumulator's isUnchanged flag as
a simple skip: that flag compares a slot's updated value against its
value at block start, not against whatever the cached trie last wrote.
When a later tx restores the original value after an earlier tx modified
it, isUnchanged is true but the cache still holds the earlier write. The
tracker records the last value applied per slot so the revert is
re-applied; isUnchanged is used only as the first-write shortcut when
the slot was never cached.

Signed-off-by: Diego López León <dieguitoll@gmail.com>
…head in frontier path

Adds a Usage-based trie construction policy that selects the appropriate
MerkleTrie implementation based on execution context:

- BLOCK_COMPUTATION: throughput-oriented, respects parallelStateRootComputationEnabled
- FRONTIER_INCREMENTAL: latency-sensitive, always uses StoredMerklePatriciaTrie

This eliminates the ParallelStoredMerklePatriciaTrie ForkJoinPool overhead
from per-transaction frontierRootHash() calls. The factory replaces the
inline createTrie() logic and is used by both the normal block computation
path (unchanged behavior) and the incremental frontier path via
FrontierRootHashTracker (now guaranteed sequential for both account and
storage tries).

Signed-off-by: Diego López León <dieguitoll@gmail.com>
@diega diega force-pushed the frontier-root-hash-tracker branch from 91b010c to 4dc9c1d Compare April 16, 2026 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bonsai full sync hangs indefinitely at 2016 DoS attack blocks (~2.3M) on mainnet

2 participants