-
Notifications
You must be signed in to change notification settings - Fork 163
IPC state provider #489
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IPC state provider #489
Conversation
2624095
to
a9c6535
Compare
/// Gets block header given block hash | ||
fn header_by_number(&self, num: u64) -> ProviderResult<Option<Header>> { | ||
let block = rpc_call::< | ||
_, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm in favor of using concrete type here. This will help in case we will do any refactoring and accidentally change parameters. This way we would be able to find problem instantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I completely missed the comment 🤦♂️
I disagree; the type here is inferred from the function arguments, i.e., (num, false)
. ' If we mess it up, rust will yell at us.
That being said, I don't feel strongly about this if @ZanCorDX agrees with you, I'll refactor :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, will it yell if we change params to (num, false, false) for example? If yes - all good
But i assume it will send them to rpc and get error because there is extra false param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, it won't :)
So the idea is that if we have written this out in 2 places (in the type and in the args) there is smaller chance of us making mistake in both places when refactoring?
ie. If we change one and not the other compiler will give us error, and we'll stop to think about i (hopefully not just copying args to type, because then compiler will tell you everything is ok and you'll still get rpc error xD)
But all in all I agree this reduces the chance of developer error, I'll fix it :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! Thanks, I'll review it this week. (cc @ZanCorDX don't merge before pls) |
This comment was marked as resolved.
This comment was marked as resolved.
1b2b61b
to
2dc4f41
Compare
## 📝 Summary When testing #489, I noticed that when processing new orders, if one of them fails, we stop further processing. This PR addresses this issue. ## 💡 Motivation and Context The effect of the above-explained behaviour, in the worst case scenario, is that the whole slot can _fail_, meaning if the processing of the new order fails for the very first order, the whole slot ends up being empty, and in non-worse scenario we loose % of transactions, depending on when the processing was stopped. ### When processing new order fails? The processing of the new orders will fail only if the _StateProvider_ was unable to _provide_ the Nonce, i.e. the _StateProvider_ errors. This is edge-case, but this can happen. I don't see any downsides with this approach, so this is why I opened PR instead of an issue. (I'm not saying hat there aren't any, just that _I_ don't see them 🙂) ### Process new simulations Similar argument can be made for the processing new simulations, specifically for the following: ```Rust pub fn submit_simulation_tasks_results( &mut self, results: Vec<SimulatedResult>, ) -> Result<(), ProviderError> { for result in results { // NOTE: we can refactor the following line to // let _ = self.process_simulation_task_result(result); self.process_simulation_task_result(result)?; } Ok(()) } ``` The code above only errors in case of the _nonce issues_, and in that scenario we are stopping processing of all simulation tasks. I haven't pushed this change because I wanted so se how you feel about the this, especially since simulations _seem more serious_ than orders. --- ## ✅ I have completed the following steps: * [x] Run `make lint` * [x] Run `make test` * [ ] Added tests (if applicable)
8fece37
to
71283e6
Compare
71283e6
to
7514d49
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, pretty clear and isolated change. I've been running it locally and it works fine.
The only thing that I notices is root hash is much slower on this version. I can give more data tomorrow when I run the default rbuilder. But if feels like its 2-3x slower.
I think our root hash implementation (eth-sparse-mpt) is quite good for the block building purposes and it will be much better than generic root hash when we finish all the possible improvements and it can be used for integrations with other clients through API. The only thing that it needs is ability to fetch proofs using path. Instead of default eth_getProof that asks for address we need to fetch proof by hash(address) and hash(slot) for the given hash(address)/address
ipc_provider: RpcProvider, | ||
|
||
code_cache: Arc<DashMap<B256, Bytecode>>, | ||
state_provider_by_hash: Arc<Cache<BlockHash, Arc<IpcStateProvider>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please clarify for me the difference between DashMap and Cache here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
code_cache
can live as long as rbuilder
lives, i.e., it doesn't have to be invalidated because it will stay the same no matter the current state.
On the other hand, we need to clean state_provider_by_hash
after we are done with the StateProvider for some block, because after slot is finished, this data becomes useless and just grows.
The reason I went with Cache
here instead of Dashmap
is because I didn't want to write concurrent cache invalidation myself :)
Thanks for pointing this out, I'll discuss with my (NMC) team and see how we can leverage this, I'll also add some logging to our node to see how much milli seconds State Root Calculation actually takes and will get back to you on numbers :) |
fbe5087
to
eed8474
Compare
📝 Summary
rbuilder
can now work with any node via IPC.This means that any node can provide the state for
rbuilder
,revm
is still used as the EVM.Node requirements are that it exposes the following RPC calls:
Calls not available in the ETH JSON RPC spec:
rbuilder_calculateStateRoot
for state root calculationrbuilder_getCodeByHash
given Bytecode hash, returns BytecodeCalls optimised for
rbuilder
, but have a counterpart in the ETH JSON RPC spec:rbuilder_getBlockHash
gets the block hash given block number (similar toeth_getBlockByNumber
, but just returns block hash)rbuilder_getAccount
gets account info (similar toeth_getProof
, but w/o proof stuff)To use
rbuilder
with node via IPC, theconfig.toml
must have the following config (example):Implementation details
IPC
This implementation was initially intended to introduce a remote state provider. By remote, I mean that the idea was that state could be provided via HTTP/WS/IPC. Unfortunately, due to implementation issues/constraints, I've decided only to implement state provisioning via IPC.
I don't think this has any practical downside, especially since the state provider must be fast. There is a non-trivial number of calls to read state (~300/s), meaning it would be unrealistic to use this over the network and have near disk read latency.
Code-wise, issues above and constraints stem mainly from the fact that the traits to read state are
sync
. Initially, I relied on tokio and alloy.rs to fetch this remote state, but this implementation had many issues.Firstly, each call to fetch any data (e.g. fetching account or Bytecode) had to be wrapped in a function call like this:
This adds additional overhead on Tokio runtime and doesn't play well with some parts of the codebase, specifically mutex locking. Not to go too deep into explaining issues around this in the PR description, but we would end up in scenarios where the whole Tokio runtime I/O would be blocked or we would dead-lock parking-lot mutexes (in some scenarios).
The solutions (monitoring thread + async mutex) seemed hacky and suboptimal.
This is why, in the end, I reached for sync (but concurrent) IPC solution, which I implemented from scratch here. It's called REIPC (coming from request/response IPC).
This solution was tested using Nethermind node, and while Nethermind will do some improvements on IPC to reduce latency and increase throughput, here are the initial request&response latencies:
Dashmap & QuickCache in IPC provider
We need caches because otherwise the number of IPC calls would be pretty high (in thousands per sec).
I also reached for concurrent caches because both
StateProviderFactory
andStateProvider
need to beSend + Sync
.QuickCache is used so that I don't have to implement concurrent-cache invalidation by hand :)
Here is some info on QuickCache vs Moka (other popular caching crate), TL;DR; for our simple case QuickCache seems a better fit.
On
enum StateProviderFactories
The reasoncli.rs
passesStateProviderFactories
enum toconfig.new_builder
is because I wanted the state provider forrbuilder
to be chosen via config at the runtime.This is why, AFAIK, static dispatch is not an option. So I was left with the choice of refactoring to dynamic dispatch (akin toStateProviderBox
) OR theenum
solution.I chose theenum
solution for following reasons:1. It seemed to me that code diff would be smaller (smaller change to implement)2. AFIAK it's faster. Enum matching will compile toJMP
vsCALL
(in case of dynamic dispatch), which compiler will be able to optimize better (especially in this scenario), and given branch predicitoning I guess that it'll be almost free (+ no need for vtable/pointer loading).]UPDATE: this was properly handled in this commit thanks to suggestions from @ZanCorDX
On
MempoolSource
enumThe reason I chose to use WebSockets for streaming transactions when
rbuilder
uses IPC state provider is because, currently, REIPC doesn't support ETH subscriptions/streams.✅ I have completed the following steps:
make lint
make test