IPC state provider #489

mralj · 2025-03-11T17:07:52Z

📝 Summary

rbuilder can now work with any node via IPC.
This means that any node can provide the state for rbuilder, revm is still used as the EVM.
Node requirements are that it exposes the following RPC calls:

Calls not available in the ETH JSON RPC spec:

rbuilder_calculateStateRoot for state root calculation
rbuilder_getCodeByHash given Bytecode hash, returns Bytecode

Calls optimised for rbuilder, but have a counterpart in the ETH JSON RPC spec:

rbuilder_getBlockHash gets the block hash given block number (similar to eth_getBlockByNumber, but just returns block hash)
rbuilder_getAccount gets account info (similar to eth_getProof, but w/o proof stuff)

To use rbuilder with node via IPC, the config.toml must have the following config (example):

[ipc_provider]
ipc_path = "/root/execution-data/nethermind.ipc"
mempool_server_url = "ws://localhost:8546"
request_timeout_ms = 75

Implementation details

IPC

This implementation was initially intended to introduce a remote state provider. By remote, I mean that the idea was that state could be provided via HTTP/WS/IPC. Unfortunately, due to implementation issues/constraints, I've decided only to implement state provisioning via IPC.
I don't think this has any practical downside, especially since the state provider must be fast. There is a non-trivial number of calls to read state (~300/s), meaning it would be unrealistic to use this over the network and have near disk read latency.

Code-wise, issues above and constraints stem mainly from the fact that the traits to read state are sync. Initially, I relied on tokio and alloy.rs to fetch this remote state, but this implementation had many issues.
Firstly, each call to fetch any data (e.g. fetching account or Bytecode) had to be wrapped in a function call like this:

    /// Runs fututre in sync context
    // StateProvider(Factory) traits require sync context, but calls to remote provider are async
    // What's more, rbuilder is executed in async context, so we have a situation
    // async -> sync -> async
    // This helper function allows execution in such environment
    fn run<F, R>(&self, f: F) -> R
    where
        F: Future<Output = R>,
    {
        tokio::task::block_in_place(|| self.runtime_handle.block_on(f))
    }

This adds additional overhead on Tokio runtime and doesn't play well with some parts of the codebase, specifically mutex locking. Not to go too deep into explaining issues around this in the PR description, but we would end up in scenarios where the whole Tokio runtime I/O would be blocked or we would dead-lock parking-lot mutexes (in some scenarios).

The solutions (monitoring thread + async mutex) seemed hacky and suboptimal.
This is why, in the end, I reached for sync (but concurrent) IPC solution, which I implemented from scratch here. It's called REIPC (coming from request/response IPC).

This solution was tested using Nethermind node, and while Nethermind will do some improvements on IPC to reduce latency and increase throughput, here are the initial request&response latencies:

Dashmap & QuickCache in IPC provider

We need caches because otherwise the number of IPC calls would be pretty high (in thousands per sec).
I also reached for concurrent caches because both StateProviderFactory and StateProvider need to be Send + Sync.

QuickCache is used so that I don't have to implement concurrent-cache invalidation by hand :)
Here is some info on QuickCache vs Moka (other popular caching crate), TL;DR; for our simple case QuickCache seems a better fit.

On `enum StateProviderFactories`

~~The reason cli.rs passes StateProviderFactories enum to config.new_builder is because I wanted the state provider for rbuilder to be chosen via config at the runtime.~~
~~This is why, AFAIK, static dispatch is not an option. So I was left with the choice of refactoring to dynamic dispatch (akin to StateProviderBox) OR the enum solution.~~
~~I chose the enum solution for following reasons:~~
~~1. It seemed to me that code diff would be smaller (smaller change to implement)~~
2. AFIAK it's faster. Enum matching will compile to JMP vs CALL (in case of dynamic dispatch), which compiler will be able to optimize better (especially in this scenario), and given branch predicitoning I guess that it'll be almost free (+ no need for vtable/pointer loading).]

UPDATE: this was properly handled in this commit thanks to suggestions from @ZanCorDX

On `MempoolSource` enum

The reason I chose to use WebSockets for streaming transactions when rbuilder uses IPC state provider is because, currently, REIPC doesn't support ETH subscriptions/streams.

✅ I have completed the following steps:

Run make lint
Run make test
Added tests (if applicable)

SozinM · 2025-03-12T04:06:38Z

crates/rbuilder/src/provider/ipc_state_provider.rs

+    /// Gets block header given block hash
+    fn header_by_number(&self, num: u64) -> ProviderResult<Option<Header>> {
+        let block = rpc_call::<
+            _,


I'm in favor of using concrete type here. This will help in case we will do any refactoring and accidentally change parameters. This way we would be able to find problem instantly.

Sorry, I completely missed the comment 🤦‍♂️
I disagree; the type here is inferred from the function arguments, i.e., (num, false). ' If we mess it up, rust will yell at us.
That being said, I don't feel strongly about this if @ZanCorDX agrees with you, I'll refactor :)

hm, will it yell if we change params to (num, false, false) for example? If yes - all good
But i assume it will send them to rpc and get error because there is extra false param

Correct, it won't :)
So the idea is that if we have written this out in 2 places (in the type and in the args) there is smaller chance of us making mistake in both places when refactoring?

ie. If we change one and not the other compiler will give us error, and we'll stop to think about i (hopefully not just copying args to type, because then compiler will tell you everything is ok and you'll still get rpc error xD)

But all in all I agree this reduces the chance of developer error, I'll fix it :)

@SozinM done :)
Sorry it took me so long!

fbe5087

dvush · 2025-03-12T08:46:01Z

Great! Thanks, I'll review it this week. (cc @ZanCorDX don't merge before pls)

## 📝 Summary When testing #489, I noticed that when processing new orders, if one of them fails, we stop further processing. This PR addresses this issue. ## 💡 Motivation and Context The effect of the above-explained behaviour, in the worst case scenario, is that the whole slot can _fail_, meaning if the processing of the new order fails for the very first order, the whole slot ends up being empty, and in non-worse scenario we loose % of transactions, depending on when the processing was stopped. ### When processing new order fails? The processing of the new orders will fail only if the _StateProvider_ was unable to _provide_ the Nonce, i.e. the _StateProvider_ errors. This is edge-case, but this can happen. I don't see any downsides with this approach, so this is why I opened PR instead of an issue. (I'm not saying hat there aren't any, just that _I_ don't see them 🙂) ### Process new simulations Similar argument can be made for the processing new simulations, specifically for the following: ```Rust pub fn submit_simulation_tasks_results( &mut self, results: Vec<SimulatedResult>, ) -> Result<(), ProviderError> { for result in results { // NOTE: we can refactor the following line to // let _ = self.process_simulation_task_result(result); self.process_simulation_task_result(result)?; } Ok(()) } ``` The code above only errors in case of the _nonce issues_, and in that scenario we are stopping processing of all simulation tasks. I haven't pushed this change because I wanted so se how you feel about the this, especially since simulations _seem more serious_ than orders. --- ## ✅ I have completed the following steps: * [x] Run `make lint` * [x] Run `make test` * [ ] Added tests (if applicable)

dvush

Awesome, pretty clear and isolated change. I've been running it locally and it works fine.

The only thing that I notices is root hash is much slower on this version. I can give more data tomorrow when I run the default rbuilder. But if feels like its 2-3x slower.

I think our root hash implementation (eth-sparse-mpt) is quite good for the block building purposes and it will be much better than generic root hash when we finish all the possible improvements and it can be used for integrations with other clients through API. The only thing that it needs is ability to fetch proofs using path. Instead of default eth_getProof that asks for address we need to fetch proof by hash(address) and hash(slot) for the given hash(address)/address

dvush · 2025-03-27T16:31:51Z

crates/rbuilder/src/provider/ipc_state_provider.rs

+    ipc_provider: RpcProvider,
+
+    code_cache: Arc<DashMap<B256, Bytecode>>,
+    state_provider_by_hash: Arc<Cache<BlockHash, Arc<IpcStateProvider>>>,


Can you please clarify for me the difference between DashMap and Cache here?

code_cache can live as long as rbuilder lives, i.e., it doesn't have to be invalidated because it will stay the same no matter the current state.

On the other hand, we need to clean state_provider_by_hash after we are done with the StateProvider for some block, because after slot is finished, this data becomes useless and just grows.
The reason I went with Cache here instead of Dashmap is because I didn't want to write concurrent cache invalidation myself :)

mralj · 2025-03-27T18:38:18Z

I think our root hash implementation (eth-sparse-mpt) is quite good for the block building purposes and it will be much better than generic root hash when we finish all the possible improvements and it can be used for integrations with other clients through API. The only thing that it needs is ability to fetch proofs using path. Instead of default eth_getProof that asks for address we need to fetch proof by hash(address) and hash(slot) for the given hash(address)/address.

Thanks for pointing this out, I'll discuss with my (NMC) team and see how we can leverage this, I'll also add some logging to our node to see how much milli seconds State Root Calculation actually takes and will get back to you on numbers :)

mralj requested review from dvush, ZanCorDX, ferranbt and metachris as code owners March 11, 2025 17:07

mralj mentioned this pull request Mar 11, 2025

Mralj/ipc state provider reipc NethermindEth/rbuilder#8

Closed

3 tasks

mralj force-pushed the mralj/ipc-state-provider-reipc branch from 2624095 to a9c6535 Compare March 11, 2025 18:06

SozinM reviewed Mar 12, 2025

View reviewed changes

This comment was marked as resolved.

Sign in to view

mralj force-pushed the mralj/ipc-state-provider-reipc branch from 1b2b61b to 2dc4f41 Compare March 14, 2025 11:01

mralj mentioned this pull request Mar 15, 2025

continue processing orders on error #506

Merged

3 tasks

mralj force-pushed the mralj/ipc-state-provider-reipc branch 2 times, most recently from 8fece37 to 71283e6 Compare March 24, 2025 09:30

mralj force-pushed the mralj/ipc-state-provider-reipc branch from 71283e6 to 7514d49 Compare March 27, 2025 15:16

dvush approved these changes Mar 27, 2025

View reviewed changes

mralj added 13 commits March 28, 2025 16:40

ipc state provider

b7dd977

state provider factory from config

79bb256

duration config fix

6c5f00f

ipc state provider code cleanup

13495de

rename: provider_factory -> reth_provider_factory

820a4d7

rename: tx_pool -> mempool

3354f52

bugfix

8f9df6a

be explicit in config that request_timeout is in ms

2c33f17

make clippy happy :)

8248216

post-review fixes

36a0776

[post-review] added rpc_call to reduce duplication

ef83d5f

remove unnecessary StateProviderFactories enum

9f5868a

post-rebase bugfixes

14af20a

add explicit typing to RPC calls

eed8474

mralj force-pushed the mralj/ipc-state-provider-reipc branch from fbe5087 to eed8474 Compare March 28, 2025 15:46

ZanCorDX merged commit 9aed6b7 into flashbots:develop Mar 28, 2025
2 of 4 checks passed

mralj deleted the mralj/ipc-state-provider-reipc branch April 2, 2025 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IPC state provider #489

IPC state provider #489

Uh oh!

mralj commented Mar 11, 2025 •

edited

Loading

Uh oh!

SozinM Mar 12, 2025

Uh oh!

mralj Mar 24, 2025 •

edited

Loading

Uh oh!

SozinM Mar 26, 2025

Uh oh!

mralj Mar 26, 2025 •

edited

Loading

Uh oh!

mralj Mar 27, 2025

Uh oh!

dvush commented Mar 12, 2025

Uh oh!

This comment was marked as resolved.

dvush left a comment

Uh oh!

dvush Mar 27, 2025

Uh oh!

mralj Mar 27, 2025 •

edited

Loading

Uh oh!

mralj commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

IPC state provider #489

IPC state provider #489

Uh oh!

Conversation

mralj commented Mar 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Summary

Implementation details

IPC

Dashmap & QuickCache in IPC provider

On enum StateProviderFactories

On MempoolSource enum

✅ I have completed the following steps:

Uh oh!

SozinM Mar 12, 2025

Choose a reason for hiding this comment

Uh oh!

mralj Mar 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SozinM Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

mralj Mar 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mralj Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

dvush commented Mar 12, 2025

Uh oh!

This comment was marked as resolved.

dvush left a comment

Choose a reason for hiding this comment

Uh oh!

dvush Mar 27, 2025

Choose a reason for hiding this comment

Uh oh!

mralj Mar 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mralj commented Mar 27, 2025

Uh oh!

Uh oh!

Uh oh!

mralj commented Mar 11, 2025 •

edited

Loading

On `enum StateProviderFactories`

On `MempoolSource` enum

mralj Mar 24, 2025 •

edited

Loading

mralj Mar 26, 2025 •

edited

Loading

mralj Mar 27, 2025 •

edited

Loading