-
Notifications
You must be signed in to change notification settings - Fork 102
perf(core): snapshots #3211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
perf(core): snapshots #3211
Conversation
Co-authored-by: Juan Bono <[email protected]>
…ases (#3072) **Motivation** Original PR was missing ReDB and InMemory implementations. **Description** Merges the single-update method as a special case of the batch one. Part of #3043 --------- Co-authored-by: Juan Bono <[email protected]> Co-authored-by: Ivan Litteri <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds flat account info and storage snapshots with undo/replay support to improve read performance and handle chain re-orgs.
- Introduce write‐log tables and processing functions in all StoreEngine backends (RedB, LMDBX, in‐memory).
- Extend the StoreEngine API with undo/replay and genesis setup methods.
- Hook snapshot reconstruction into sync and fork‐choice logic.
Reviewed Changes
Copilot reviewed 25 out of 25 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
crates/storage/store_db/redb.rs | Added RDB tables, undo/replay methods and log processing hooks |
crates/storage/store_db/libmdbx.rs | Added LMDBX table definitions, undo/replay helpers and batch writes |
crates/storage/store_db/in_memory.rs | Added in‐memory snapshot state, undo/replay and log tracking |
crates/storage/api.rs | Extended StoreEngine trait with snapshot and genesis methods |
crates/blockchain/fork_choice.rs | Integrated reconstruct_snapshots_for_new_canonical_chain call |
Comments suppressed due to low confidence (3)
crates/storage/api.rs:23
- New snapshot undo/replay methods lack unit or integration tests. Add tests for
undo_writes_until_canonical
,replay_writes_until_head
, and genesis setup to cover edge cases like deep re-orgs.
async fn undo_writes_until_canonical(&self) -> Result<(), StoreError>;
crates/storage/api.rs:23
- [nitpick] Public API methods
undo_writes_until_canonical
,replay_writes_until_head
, and genesis setup lack doc comments explaining their semantics and requirements. Please add documentation to clarify expected behavior.
async fn undo_writes_until_canonical(&self) -> Result<(), StoreError>;
crates/storage/store.rs:66
- [nitpick] The field
storage_log_updates
parallelsaccount_info_log_updates
, but its name omits theaccount_
prefix. Consider renaming toaccount_storage_log_updates
for consistency.
pub storage_log_updates: Vec<AccountStorageLogEntry>,
crates/storage/store_db/in_memory.rs
Outdated
|
||
// We update this here to ensure it's the previous block according | ||
// to the logs found. | ||
BlockNumHash(block_num, snapshot_hash) = parent_block; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line attempts to assign to block_num
and snapshot_hash
via a destructuring pattern but doesn’t actually update those variables. Replace it with let (block_num, snapshot_hash) = parent_block;
and update current_snapshot
accordingly.
BlockNumHash(block_num, snapshot_hash) = parent_block; | |
let (block_num, snapshot_hash) = parent_block; | |
current_snapshot = BlockNumHash(block_num, snapshot_hash); |
Copilot uses AI. Check for mistakes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both variables are mutable
let mut block_num = current_snapshot.0;
let mut snapshot_hash = current_snapshot.1;
No significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
|
No significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
|
No significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, all of my comments can be addressed in a separate PR.
genesis_accounts: &[(Address, u64, U256, H256, bool)], | ||
) -> Result<(), StoreError>; | ||
|
||
/// Gets the block hash for the current snapshot |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: maybe clarify that the result is an option because there may be no current snapshot (if there is, there will always be an associated block hash).
|
||
#[cfg(feature = "libmdbx")] | ||
impl From<anyhow::Error> for StoreError { | ||
fn from(err: anyhow::Error) -> Self { | ||
StoreError::LibmdbxError(err) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems convoluted and counterintuitive. anyhow:Error is very generic. How can we make sure it would always be a libmdbx error? why do we need this?
#[cfg(feature = "redb")] | ||
pub type AccountInfoLogEntryRLP = Rlp<AccountInfoLogEntry>; | ||
#[cfg(feature = "redb")] | ||
pub type AccountStorageLogEntryRLP = Rlp<AccountStorageLogEntry>; | ||
#[cfg(feature = "redb")] | ||
pub type BlockNumHashRLP = Rlp<BlockNumHash>; | ||
#[cfg(feature = "redb")] | ||
pub type AccountAddressRLP = Rlp<Address>; | ||
#[cfg(feature = "redb")] | ||
pub type AccountInfoRLP = Rlp<AccountInfo>; | ||
#[cfg(feature = "redb")] | ||
pub type AccountStorageKeyRLP = Rlp<H256>; | ||
#[cfg(feature = "redb")] | ||
pub type AccountStorageValueRLP = Rlp<U256>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is redb specific, why do we need it in the storage rlp directory? Can we not have it in the redb file instead?
// the log for those blocks | ||
// 4. Commit the transaction | ||
// | ||
// This function assumes that `new_canonical_blocks` is sorted by block number. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is "new canonical blocks"?
// TODO: build_account_info_logs and build_account_storage_logs became a bit | ||
// redundant. Just pass the account_updates whole and process inside the | ||
// apply_updates function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sounds very sensible. Let's add an issue for this.
let mut state_logs = state_log_cursor.seek_closest(to_snapshot)?; | ||
|
||
let mut account_updates = vec![]; | ||
while let Some(((final_block, parent_block), log_entry)) = state_logs { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I see we repeat this pattern quite a lot (seek closest + while). Shall we refactor it to an iterator se we can use it like this? self.get_logs(cursor, snapshot).map(...)
.
Not for this PR and also optional, might be more complex than it seems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking merge of this PR until we run the numbers along with pruning
No significant difference was registered for any benchmark run. Detailed ResultsBenchmark Results: BubbleSort
Benchmark Results: ERC20Approval
Benchmark Results: ERC20Mint
Benchmark Results: ERC20Transfer
Benchmark Results: Factorial
Benchmark Results: FactorialRecursive
Benchmark Results: Fibonacci
Benchmark Results: FibonacciRecursive
Benchmark Results: ManyHashes
Benchmark Results: MstoreBench
Benchmark Results: Push
|
Benchmark for d86adfaClick to view benchmark
|
Benchmark Block Execution Results Comparison Against Main
|
Moving to draft |
I think we can close this PR since it didn't give us the results we expected. What do you think, @Oppen? |
Account Info and Account Storage Snapshot Feature
Goal
Improve database read access performance for block execution, while keeping write access overhead in check.
MVP Requirements
Missing for MVP
V2 Requirements
AccountUpdate
s for simplicity.Later Work
XOR
of old and new values instead of the values themselves in the log entries. The storage should become about half, while the code for rewinding and replaying would change only in the direction of iteration, asXOR
is a symmetric operation.closes #1997