fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks by diega · Pull Request #10220 · besu-eth/besu

diega · 2026-04-10T19:17:10Z

PR description

Bonsai full sync hangs indefinitely at the 2016 DoS attack blocks (~2.3M) on mainnet because frontierRootHash() copies the entire accumulator and rebuilds the Merkle trie from scratch on every per-transaction call, giving O(N²) total cost per block.

Commit 1: copy constructor refactor

Replaces the two-phase cloneFromUpdater copy pattern in PathBasedWorldStateUpdateAccumulator with a proper copy constructor chain. Each level copies its own fields with compile-time type safety. This is a prerequisite for commit 2, which adds a subclass-specific field (frontierDirtyAddresses) that needs to be copied correctly.

Commit 2: incremental frontier root computation

Two trackers make the frontier receipt path incremental instead of rebuilding from scratch on every call. They share lifecycle (built once per block, reset on persist()), are paired via FrontierRootHashTracker.StorageRootUpdater, and cover the two layers of the state that the frontier path touches.

FrontierRootHashTracker handles the account trie. The accumulator's commit() override captures addresses from getUpdatedAccounts() and getDeletedAccountAddresses() into a frontierDirtyAddresses set (after super.commit() has populated accountsToUpdate, so every dirty address is guaranteed to resolve). On each frontierRootHash() call the tracker lazily creates the account trie, applies only the addresses dirtied since the previous call, and clears them on success.

FrontierStorageRootTracker handles the storage tries. It keeps each dirty account's storage trie alive and applies only the slot deltas introduced since the last call. Two implementations:

CachingFrontierStorageRootTracker: the incremental path described above.
NO_OP: wired in when worldStateConfig.isTrieDisabled() is true. The frontier receipt path is unreachable in that mode (snap sync downloads receipts from peers rather than recomputing them), so this sentinel just makes the invariant explicit and keeps the caching impl branch-free on the config flag.

Why it's safe to operate on the live accumulator instead of copying it: the previous accumulator.copy() was introduced to prevent setStorageRoot() mutations from corrupting the live state. But setStorageRoot() sets deterministic values computed from the trie: the same values that persist() will compute at end-of-block. The frontier path writes no trie nodes to storage. The accumulator is only read and mutated in the same way that persist() would later.

Commit 3: BonsaiTrieFactory

Commit 2 gives the frontier path its own trie lifecycle, but the underlying tries still went through the normal createTrie() path, which returns ParallelStoredMerklePatriciaTrie when parallel computation is enabled. This adds ForkJoinPool scheduling overhead on every per-transaction call without benefit: the frontier path is inherently sequential because each receipt depends on the prior transaction's state root.

This commit introduces BonsaiTrieFactory with a TrieMode enum (ALWAYS_SEQUENTIAL vs PARALLELIZE_ALLOWED) that centralizes the trie implementation decision. The frontier path passes ALWAYS_SEQUENTIAL for both account and storage tries, eliminating the parallel overhead. The normal persist() / block computation path continues to use PARALLELIZE_ALLOWED.

Post-Byzantium receipts use a status code instead of a state root, so the frontier path is only active for pre-Byzantium blocks. Neither commit changes behavior for post-Byzantium blocks or the normal block computation path.

Empirical validation

Full sync of mainnet with Bonsai + --bonsai-parallel-tx-processing-enabled, driven by a Teku 26.2.0 on the same machine via the engine API:

Build	Result
`main`	Silently stalls at block ~347k inside `clearStorage → entriesFrom → TreeMap.put`. No OOM, no exception; CPU pegged on the importBlock thread until it quiesces. A thread dump shows ~25 min of accumulated CPU time in that single path.
`main` + these commits + #10235 (Caffeine `Address.hashCache`)	Cleared the full 2,283,416–2,700,000 "from Shanghai" DoS window with 0 state-root mismatches and 0 stalls. Throughput drops to ~3–10 Mg/s inside the window (blocks are legitimately heavy) and recovers to 55–80 Mg/s once out, as expected.

The 347k stall is not inside the DoS zone; it's caused by the O(N²) frontier receipt path itself, triggered by any pre-Byzantium block that SELFDESTRUCTs a contract with non-trivial storage (common in the Frontier/Homestead era). The DoS blocks are where the issue becomes spectacular, but the bug is active much earlier. The two fixes are complementary: either one alone leaves one of the two stalls standing; only the combination clears the window end to end.

Verifying the performance improvement locally

The O(N²) scaling can be reproduced by measuring individual frontierRootHash() call times across a simulated block (500 accounts × 20 storage slots, no persist() between calls):

// Setup: create 500 accounts each with 20 storage slots, commit, persist
// ...
for (int i = 0; i < 500; i++) {
    WorldUpdater updater = worldState.updater();
    updater.getAccount(addresses[i]).setBalance(Wei.of(1000 + i));
    updater.commit();
    updater.markTransactionBoundary();

    long start = System.nanoTime();
    worldState.frontierRootHash();
    long elapsed = System.nanoTime() - start;
    // Before fix: elapsed grows linearly with i (O(N²) total)
    // After fix: elapsed stays constant
}

Before this PR: the last call takes 8–11x longer than the first.
After this PR: the last call is the same speed or faster than the first.

The parallel trie overhead can be verified by comparing frontierRootHash() wall time with parallelStateRootComputationEnabled=true vs false using RocksDB-backed storage. Before this PR: ~1.9x overhead. After: ~1.0x.

Fixed Issue(s)

fixes #10155

Thanks for sending a pull request! Have you done the following?

Checked out our contribution guidelines?
Considered documentation and added the doc-change-required label to this PR if updates are required.
Considered the changelog and included an update if required.
For database changes (e.g. KeyValueSegmentIdentifier) considered compatibility and performed forwards and backwards compatibility tests

Locally, you can run these tests to catch failures early:

spotless: ./gradlew spotlessApply
unit tests: ./gradlew build
acceptance tests: ./gradlew acceptanceTest
integration tests: ./gradlew integrationTest
reference tests: ./gradlew ethereum:referenceTests:referenceTests

…or hierarchy Replace the two-phase copy pattern (construct empty + cloneFromUpdater) with a proper copy constructor chain. Each level copies its own fields with compile-time type safety, eliminating the need for a separate public cloneFromUpdater method that could be called without copying subclass-specific state. Signed-off-by: Diego López León <dieguitoll@gmail.com>

…antium blocks On pre-Byzantium blocks, FrontierTransactionReceiptFactory calls frontierRootHash() for every transaction to include intermediate state roots in receipts. The previous implementation copied the entire accumulator and rebuilt the Merkle trie from scratch on each call, giving O(N) cost per call and O(N²) total for a block with N transactions. On the 2016 DoS attack blocks (~2.3M on mainnet), this caused full sync to stall indefinitely. Two trackers now cache the relevant trie state per block and apply only the deltas since the last call: - FrontierRootHashTracker: keeps the account trie alive, re-applies only accounts dirtied since the previous call. - FrontierStorageRootTracker (interface + CachingFrontierStorageRootTracker impl + NO_OP for trie-disabled snap sync): keeps each dirty account's storage trie alive and applies only the slot deltas since the previous call. The storage tracker can't rely on the accumulator's isUnchanged flag as a simple skip: that flag compares a slot's updated value against its value at block start, not against whatever the cached trie last wrote. When a later tx restores the original value after an earlier tx modified it, isUnchanged is true but the cache still holds the earlier write. The tracker records the last value applied per slot so the revert is re-applied; isUnchanged is used only as the first-write shortcut when the slot was never cached. Signed-off-by: Diego López León <dieguitoll@gmail.com>

…head in frontier path Adds a Usage-based trie construction policy that selects the appropriate MerkleTrie implementation based on execution context: - BLOCK_COMPUTATION: throughput-oriented, respects parallelStateRootComputationEnabled - FRONTIER_INCREMENTAL: latency-sensitive, always uses StoredMerklePatriciaTrie This eliminates the ParallelStoredMerklePatriciaTrie ForkJoinPool overhead from per-transaction frontierRootHash() calls. The factory replaces the inline createTrie() logic and is used by both the normal block computation path (unchanged behavior) and the incremental frontier path via FrontierRootHashTracker (now guaranteed sequential for both account and storage tries). Signed-off-by: Diego López León <dieguitoll@gmail.com>

diega force-pushed the frontier-root-hash-tracker branch from 89ba8ec to a55b9d8 Compare April 13, 2026 23:02

diega force-pushed the frontier-root-hash-tracker branch from a55b9d8 to 6955d74 Compare April 13, 2026 23:27

jflo assigned matkt Apr 14, 2026

diega force-pushed the frontier-root-hash-tracker branch from 6955d74 to 91b010c Compare April 16, 2026 04:12

diega added 2 commits April 16, 2026 01:39

diega force-pushed the frontier-root-hash-tracker branch from 91b010c to 4dc9c1d Compare April 16, 2026 04:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks#10220

fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks#10220
diega wants to merge 3 commits intobesu-eth:mainfrom
diega:frontier-root-hash-tracker

diega commented Apr 10, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

diega commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR description

Commit 1: copy constructor refactor

Commit 2: incremental frontier root computation

Commit 3: BonsaiTrieFactory

Empirical validation

Fixed Issue(s)

Thanks for sending a pull request! Have you done the following?

Locally, you can run these tests to catch failures early:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

diega commented Apr 10, 2026 •

edited

Loading