Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions crates/storage/provider/src/providers/database/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,8 @@ impl<TX: DbTx + DbTxMut + 'static, N: NodeTypesForProvider> DatabaseProvider<TX,

debug!(target: "providers::db", block_count = %blocks.len(), "Writing blocks and execution data to storage");

let mut aggregated_hashed_post_state_sorted = HashedPostStateSorted::default();

// TODO: Do performant / batched writes for each type of object
// instead of a loop over all blocks,
// meaning:
Expand All @@ -325,13 +327,16 @@ impl<TX: DbTx + DbTxMut + 'static, N: NodeTypesForProvider> DatabaseProvider<TX,
// Must be written after blocks because of the receipt lookup.
self.write_state(&execution_output, OriginalValuesKnown::No)?;

// insert hashes and intermediate merkle nodes
self.write_hashed_state(&hashed_state)?;
// aggregate hashed post states for batch insert
aggregated_hashed_post_state_sorted.extend_ref(&hashed_state);

self.write_trie_changesets(block_number, &trie_updates, None)?;
self.write_trie_updates_sorted(&trie_updates)?;
}

// batch insert hash post states
self.write_hashed_state(&aggregated_hashed_post_state_sorted)?;

// update history indices
self.update_history_indices(first_number..=last_block_number)?;

Expand Down Expand Up @@ -1711,7 +1716,7 @@ impl<TX: DbTxMut + DbTx + 'static, N: NodeTypesForProvider> StateWriter
) -> ProviderResult<()> {
// Write storage changes
tracing::trace!("Writing storage changes");
let mut storages_cursor = self.tx_ref().cursor_dup_write::<tables::PlainStorageState>()?;
let mut storages_cursor = self.tx_ref().cursor_dup_read::<tables::PlainStorageState>()?;
let mut storage_changeset_cursor =
self.tx_ref().cursor_dup_write::<tables::StorageChangeSets>()?;
for (block_index, mut storage_changes) in reverts.storage.into_iter().enumerate() {
Expand Down
55 changes: 18 additions & 37 deletions crates/trie/common/src/utils.rs
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
use alloc::vec::Vec;
use itertools::Itertools;

/// Helper function to extend a sorted vector with another sorted vector.
/// Values from `other` take precedence for duplicate keys.
///
/// This function efficiently merges two sorted vectors by:
/// 1. Iterating through the target vector with mutable references
/// 2. Using a peekable iterator for the other vector
/// 3. For each target item, processing other items that come before or equal to it
/// 4. Collecting items from other that need to be inserted
/// 5. Appending and re-sorting only if new items were added
/// **Both `target` and `other` MUST be sorted by key** for this function to work correctly.
/// If not, the result will be incorrect and may not maintain sorted order.
///
/// This function efficiently merges two sorted vectors using `itertools::merge_join_by`:
/// 1. Merges in one pass (O(n+m)) by comparing keys
/// 2. For duplicate keys, `other` values take precedence
pub(crate) fn extend_sorted_vec<K, V>(target: &mut Vec<(K, V)>, other: &[(K, V)])
where
K: Clone + Ord,
Expand All @@ -18,36 +19,16 @@ where
return;
}

let mut other_iter = other.iter().peekable();
let mut to_insert = Vec::new();

// Iterate through target and update/collect items from other
for target_item in target.iter_mut() {
while let Some(other_item) = other_iter.peek() {
use core::cmp::Ordering;
match other_item.0.cmp(&target_item.0) {
Ordering::Less => {
// Other item comes before current target item, collect it
to_insert.push(other_iter.next().unwrap().clone());
}
Ordering::Equal => {
// Same key, update target with other's value
target_item.1 = other_iter.next().unwrap().1.clone();
break;
}
Ordering::Greater => {
// Other item comes after current target item, keep target unchanged
break;
}
*target = target
.iter()
.merge_join_by(other.iter(), |(k1, _), (k2, _)| k1.cmp(k2))
.map(|entry| {
// Extract the item: for duplicates, use other's value
match entry {
itertools::EitherOrBoth::Left(item) |
itertools::EitherOrBoth::Right(item) |
itertools::EitherOrBoth::Both(_, item) => item.clone(),
}
}
}

// Append collected new items, as well as any remaining from `other` which are necessarily also
// new, and sort if needed
if !to_insert.is_empty() || other_iter.peek().is_some() {
target.extend(to_insert);
target.extend(other_iter.cloned());
target.sort_unstable_by(|a, b| a.0.cmp(&b.0));
}
})
.collect();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous implementation was specifically designed to avoid having to do a big collect like this; the resulting memory allocation from this collect dwarfs any ostensible speedup you get from not having to sort. I just did a bench comparing your implementation to the previous and this new one is about 2x slower for synthetic datasets:

Image

You can see the bench here if you're curious

Copy link
Contributor Author

@duyquang6 duyquang6 Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very useful bench @mediocregopher
when I first use old version with aggregated hashed state to bench, the result is not good. This is why I think there something wrong with extend_ref or extend_sorted_vec

when I use your bench compare with custom own merge version (not used merge_join_by), it only shine at other target size smaller than other size, but overall case, old version still win. That give me some hint to use this function better, is keep target size and other size similar or larger so might benefit old version

shine case (new better)
image

image

but overall (size similar or target size > other) old still better
image

image

Copy link
Contributor Author

@duyquang6 duyquang6 Nov 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dump the raw data of HashedPostStateSorted when bench with native-transfer

here is bench result of extend_ref, new version of both is better than in this testcase

image

can double check the bench here - already attach hashed state raw data
Could be raw data might have properties that benchmark doesn’t fully cover 🤔 ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@duyquang6 these are interesting results, your benches for extend_sorted_vec_comparison/t10_o1000 conflict with what I originally saw in mine, but now I'm able to replicate, so there's some inconsistency there that I still need to figure out.

If yours is faster for t10_o1000 I expect it's because it's doing the full allocation up-front, whereas mine is likely doing two larger allocations at the end with the extend calls.

What do you think about trying out something like:

    // Where "50" is a made up number that needs to be tuned
    if other.len() > target.len() * 50 {
        return extend_sorted_vec_custom(target, other);
    }

    extend_sorted_vec(target, other)

This way maybe we better cover all cases.

}
Loading