Skip to content

Conversation

mralj
Copy link
Collaborator

@mralj mralj commented Mar 3, 2025

📝 Summary

rbuilder can now work with any node via IPC.
This means that any node can provide the state for rbuilder, revm is still used as the EVM.
Node requirements are that it exposes the following RPC calls:

Calls not available in the ETH JSON RPC spec:

  1. rbuilder_calculateStateRoot for state root calculation
  2. rbuilder_getCodeByHash given Bytecode hash, returns Bytecode

Calls optimised for rbuilder, but have a counterpart in the ETH JSON RPC spec:

  1. rbuilder_getBlockHash gets the block hash given block number (similar to eth_getBlockByNumber, but just returns block hash)
  2. rbuilder_getAccount gets account info (similar to eth_getProof, but w/o proof stuff)

To use rbuilder with node via IPC, the config.toml must have the following config (example):

[ipc_provider]
ipc_path = "/root/execution-data/nethermind.ipc"
mempool_server_url = "ws://localhost:8546"
request_timeout_ms = 75

Implementation details

IPC

This implementation was initially intended to introduce a remote state provider. By remote, I mean that the idea was that state could be provided via HTTP/WS/IPC. Unfortunately, due to implementation issues/constraints, I've decided only to implement state provisioning via IPC.
I don't think this has any practical downside, especially since the state provider must be fast. There is a non-trivial number of calls to read state (~300/s), meaning it would be unrealistic to use this over the network and have near disk read latency.

Code-wise, issues above and constraints stem mainly from the fact that the traits to read state are sync. Initially, I relied on tokio and alloy.rs to fetch this remote state, but this implementation had many issues.
Firstly, each call to fetch any data (e.g. fetching account or Bytecode) had to be wrapped in a function call like this:

    /// Runs fututre in sync context
    // StateProvider(Factory) traits require sync context, but calls to remote provider are async
    // What's more, rbuilder is executed in async context, so we have a situation
    // async -> sync -> async
    // This helper function allows execution in such environment
    fn run<F, R>(&self, f: F) -> R
    where
        F: Future<Output = R>,
    {
        tokio::task::block_in_place(|| self.runtime_handle.block_on(f))
    }

This adds additional overhead on Tokio runtime and doesn't play well with some parts of the codebase, specifically mutex locking. Not to go too deep into explaining issues around this in the PR description, but we would end up in scenarios where the whole Tokio runtime I/O would be blocked or we would dead-lock parking-lot mutexes (in some scenarios).

The solutions (monitoring thread + async mutex) seemed hacky and suboptimal.
This is why, in the end, I reached for sync (but concurrent) IPC solution, which I implemented from scratch here. It's called REIPC (coming from request/response IPC).

This solution was tested using Nethermind node, and while Nethermind will do some improvements on IPC to reduce latency and increase throughput, here are the initial request&response latencies:

CleanShot 2025-03-01 at 11 44 11@2x

Dashmap & QuickCache in IPC provider

We need caches because otherwise the number of IPC calls would be pretty high (in thousands per sec).
I also reached for concurrent caches because both StateProviderFactory and StateProvider need to be Send + Sync.

QuickCache is used so that I don't have to implement concurrent-cache invalidation by hand :)
Here is some info on QuickCache vs Moka (other popular caching crate), TL;DR; for our simple case QuickCache seems a better fit.

On enum StateProviderFactories

The reason cli.rs passes StateProviderFactories enum to config.new_builder is because I wanted the state provider for rbuilder to be chosen via config at the runtime.
This is why, AFAIK, static dispatch is not an option. So I was left with the choice of refactoring to dynamic dispatch (akin to StateProviderBox) OR the enum solution.
I chose the enum solution for following reasons:

  1. It seemed to me that code diff would be smaller (smaller change to implement)
  2. AFIAK it's faster. Enum matching will compile to JMP vs CALL (in case of dynamic dispatch), which compiler will be able to optimize better (especially in this scenario), and given branch predicitoning I guess that it'll be almost free (+ no need for vtable/pointer loading).

On MempoolSource enum

The reason I chose to use WebSockets for streaming transactions when rbuilder uses IPC state provider is because, currently, REIPC doesn't support ETH subscriptions/streams.

✅ I have completed the following steps:

  • Run make lint
  • Run make test
  • Added tests (if applicable)

@mralj mralj marked this pull request as ready for review March 3, 2025 15:12
@mralj mralj self-assigned this Mar 3, 2025
@mralj mralj requested review from SozinM and julio4 March 3, 2025 15:13
let provider = ProviderBuilder::new().on_ipc(ipc).await?;
let mempool = config
.mempool_source
.ok_or_else(|| eyre::eyre!("No TX source configured"))?;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No txpool source configured

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we fail on empty mempool - let's fail on config parsing stage if possible?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if we fail on empty mempool - let's fail on config parsing stage if possible?

I have followed logic of already existing code. My understanding is that failing during config parsing would be undesirable because rbuilder still can receive orders (transacitons) via bundle API

}

let key: U256 = storage_key.into();
let storage = self
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could it return None?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this in reIpc:

            ResponsePayload::Success(_) => {
                match resp.try_success_as::<T>() {
                    Some(Ok(r)) => Ok(r),
                    Some(Err(e)) => Err(e.into()),
                    // the response was received successfully, but it contains Err payload
                    // we shouldn't have  ended up here
                    None => Err(RpcError::JsonErrPayloadMisinterpretedAsSuccess),
                }
            }

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So i assume you return an error on empty response

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We discussed this during call, but I'll also leave the comment here.

I'm not handling the None case in all of these calls you pointed out because I wrote the API on the Nethermind client so that it will either succeed and return some value or return an error.
Meaning there isn't any scenario in which the response from the server would return smth. like Ok<None>.
But I agree with your comments, this was bad design, which has now been fixed :)


Regarding the comment about reipc, the Error is not returned on empty response.
The None branch in code above is hit in case Response was marked as Success, but during deserialisation it was deserialised as Error - I'm not sure in which scenario this happens; I don't think it should ever happen.

@@ -51,3 +58,75 @@ pub trait RootHasher: std::fmt::Debug + Send + Sync {
/// State root for changes outcome on top of parent block.
fn state_root(&self, outcome: &ExecutionOutcome) -> Result<B256, RootHashError>;
}

/// All supported state provider factories
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could add issue to use Box so we could simplify it later

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow, how Boxing would help/simplify this one?

@mralj mralj requested a review from SozinM March 6, 2025 14:32
@mralj mralj force-pushed the mralj/ipc-state-provider-reipc branch from f3e9863 to d98c019 Compare March 11, 2025 10:30
Comment on lines +11 to +12
use dashmap::DashMap;
use quick_cache::sync::Cache;
Copy link
Member

@julio4 julio4 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like both dashmap and quickcache are used to handle some caches, is there a reason to use different ones?
Edit: I see it's for concurrent-cache invalidation, which is only needed for state_provider_by_hash?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, and it only needed for that one since it is only the only cache which lives as long as StateProviderFactory

Comment on lines 134 to 137
let span =
trace_span!("header", id = rand::random::<u64>(), block_hash = %block_hash.to_string());
let _guard = span.enter();
trace!("header: get");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using tracing instrument attribute feature to reduce boilerplate
See https://docs.rs/tracing/latest/tracing/attr.instrument.html

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've encapsulated this via rpc_call :)

Comment on lines 139 to 146
let header = self
.ipc_provider
.call::<_, Option<<alloy_network::Ethereum as alloy_network::Network>::BlockResponse>>(
"eth_getBlockByHash",
(block_hash, false),
)
.map_err(ipc_to_provider_error)?
.map(|b| b.header.inner);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rpc calls seems to be generic over parameter and return, and with the same error handling mechanism.
Consider extracting it in a function to be able to call let res: U = self.rpc_call("eth_method", params)?; with a small wrapper like:

fn rpc_call<T, U>(&self, rpc_method: &str, params: T) -> ProviderResult<Option<U>> {
    self.ipc_provider
        .call::<T, Option<U>>(method, params)
        .map_err(ipc_to_provider_error)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could also encapsulates the tracing/span directly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, done!

@mralj
Copy link
Collaborator Author

mralj commented Mar 11, 2025

closing in favour of flashbots#489

@mralj mralj closed this Mar 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants