fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks#10220
Open
diega wants to merge 3 commits intobesu-eth:mainfrom
Open
fix: incremental frontierRootHash to avoid O(N²) on pre-Byzantium blocks#10220diega wants to merge 3 commits intobesu-eth:mainfrom
diega wants to merge 3 commits intobesu-eth:mainfrom
Conversation
89ba8ec to
a55b9d8
Compare
…or hierarchy Replace the two-phase copy pattern (construct empty + cloneFromUpdater) with a proper copy constructor chain. Each level copies its own fields with compile-time type safety, eliminating the need for a separate public cloneFromUpdater method that could be called without copying subclass-specific state. Signed-off-by: Diego López León <dieguitoll@gmail.com>
a55b9d8 to
6955d74
Compare
6955d74 to
91b010c
Compare
…antium blocks On pre-Byzantium blocks, FrontierTransactionReceiptFactory calls frontierRootHash() for every transaction to include intermediate state roots in receipts. The previous implementation copied the entire accumulator and rebuilt the Merkle trie from scratch on each call, giving O(N) cost per call and O(N²) total for a block with N transactions. On the 2016 DoS attack blocks (~2.3M on mainnet), this caused full sync to stall indefinitely. Two trackers now cache the relevant trie state per block and apply only the deltas since the last call: - FrontierRootHashTracker: keeps the account trie alive, re-applies only accounts dirtied since the previous call. - FrontierStorageRootTracker (interface + CachingFrontierStorageRootTracker impl + NO_OP for trie-disabled snap sync): keeps each dirty account's storage trie alive and applies only the slot deltas since the previous call. The storage tracker can't rely on the accumulator's isUnchanged flag as a simple skip: that flag compares a slot's updated value against its value at block start, not against whatever the cached trie last wrote. When a later tx restores the original value after an earlier tx modified it, isUnchanged is true but the cache still holds the earlier write. The tracker records the last value applied per slot so the revert is re-applied; isUnchanged is used only as the first-write shortcut when the slot was never cached. Signed-off-by: Diego López León <dieguitoll@gmail.com>
…head in frontier path Adds a Usage-based trie construction policy that selects the appropriate MerkleTrie implementation based on execution context: - BLOCK_COMPUTATION: throughput-oriented, respects parallelStateRootComputationEnabled - FRONTIER_INCREMENTAL: latency-sensitive, always uses StoredMerklePatriciaTrie This eliminates the ParallelStoredMerklePatriciaTrie ForkJoinPool overhead from per-transaction frontierRootHash() calls. The factory replaces the inline createTrie() logic and is used by both the normal block computation path (unchanged behavior) and the incremental frontier path via FrontierRootHashTracker (now guaranteed sequential for both account and storage tries). Signed-off-by: Diego López León <dieguitoll@gmail.com>
91b010c to
4dc9c1d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PR description
Bonsai full sync hangs indefinitely at the 2016 DoS attack blocks (~2.3M) on mainnet because
frontierRootHash()copies the entire accumulator and rebuilds the Merkle trie from scratch on every per-transaction call, giving O(N²) total cost per block.Commit 1: copy constructor refactor
Replaces the two-phase
cloneFromUpdatercopy pattern inPathBasedWorldStateUpdateAccumulatorwith a proper copy constructor chain. Each level copies its own fields with compile-time type safety. This is a prerequisite for commit 2, which adds a subclass-specific field (frontierDirtyAddresses) that needs to be copied correctly.Commit 2: incremental frontier root computation
Two trackers make the frontier receipt path incremental instead of rebuilding from scratch on every call. They share lifecycle (built once per block, reset on
persist()), are paired viaFrontierRootHashTracker.StorageRootUpdater, and cover the two layers of the state that the frontier path touches.FrontierRootHashTrackerhandles the account trie. The accumulator'scommit()override captures addresses fromgetUpdatedAccounts()andgetDeletedAccountAddresses()into afrontierDirtyAddressesset (aftersuper.commit()has populatedaccountsToUpdate, so every dirty address is guaranteed to resolve). On eachfrontierRootHash()call the tracker lazily creates the account trie, applies only the addresses dirtied since the previous call, and clears them on success.FrontierStorageRootTrackerhandles the storage tries. It keeps each dirty account's storage trie alive and applies only the slot deltas introduced since the last call. Two implementations:CachingFrontierStorageRootTracker: the incremental path described above.NO_OP: wired in whenworldStateConfig.isTrieDisabled()is true. The frontier receipt path is unreachable in that mode (snap sync downloads receipts from peers rather than recomputing them), so this sentinel just makes the invariant explicit and keeps the caching impl branch-free on the config flag.Why it's safe to operate on the live accumulator instead of copying it: the previous
accumulator.copy()was introduced to preventsetStorageRoot()mutations from corrupting the live state. ButsetStorageRoot()sets deterministic values computed from the trie: the same values thatpersist()will compute at end-of-block. The frontier path writes no trie nodes to storage. The accumulator is only read and mutated in the same way thatpersist()would later.Commit 3: BonsaiTrieFactory
Commit 2 gives the frontier path its own trie lifecycle, but the underlying tries still went through the normal
createTrie()path, which returnsParallelStoredMerklePatriciaTriewhen parallel computation is enabled. This adds ForkJoinPool scheduling overhead on every per-transaction call without benefit: the frontier path is inherently sequential because each receipt depends on the prior transaction's state root.This commit introduces
BonsaiTrieFactorywith aTrieModeenum (ALWAYS_SEQUENTIALvsPARALLELIZE_ALLOWED) that centralizes the trie implementation decision. The frontier path passesALWAYS_SEQUENTIALfor both account and storage tries, eliminating the parallel overhead. The normalpersist()/ block computation path continues to usePARALLELIZE_ALLOWED.Post-Byzantium receipts use a status code instead of a state root, so the frontier path is only active for pre-Byzantium blocks. Neither commit changes behavior for post-Byzantium blocks or the normal block computation path.
Empirical validation
Full sync of mainnet with Bonsai +
--bonsai-parallel-tx-processing-enabled, driven by a Teku 26.2.0 on the same machine via the engine API:mainclearStorage → entriesFrom → TreeMap.put. No OOM, no exception; CPU pegged on the importBlock thread until it quiesces. A thread dump shows ~25 min of accumulated CPU time in that single path.main+ these commits + #10235 (CaffeineAddress.hashCache)The 347k stall is not inside the DoS zone; it's caused by the O(N²) frontier receipt path itself, triggered by any pre-Byzantium block that SELFDESTRUCTs a contract with non-trivial storage (common in the Frontier/Homestead era). The DoS blocks are where the issue becomes spectacular, but the bug is active much earlier. The two fixes are complementary: either one alone leaves one of the two stalls standing; only the combination clears the window end to end.
Verifying the performance improvement locally
The O(N²) scaling can be reproduced by measuring individual
frontierRootHash()call times across a simulated block (500 accounts × 20 storage slots, nopersist()between calls):Before this PR: the last call takes 8–11x longer than the first.
After this PR: the last call is the same speed or faster than the first.
The parallel trie overhead can be verified by comparing
frontierRootHash()wall time withparallelStateRootComputationEnabled=truevsfalseusing RocksDB-backed storage. Before this PR: ~1.9x overhead. After: ~1.0x.Fixed Issue(s)
fixes #10155
Thanks for sending a pull request! Have you done the following?
doc-change-requiredlabel to this PR if updates are required.Locally, you can run these tests to catch failures early:
./gradlew spotlessApply./gradlew build./gradlew acceptanceTest./gradlew integrationTest./gradlew ethereum:referenceTests:referenceTests