Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize block time to head #5413

Open
2 of 5 tasks
dapplion opened this issue Apr 25, 2023 · 0 comments
Open
2 of 5 tasks

Optimize block time to head #5413

dapplion opened this issue Apr 25, 2023 · 0 comments
Labels
prio-medium Resolve this some time soon (tm). scope-performance Performance issue and ideas to improve performance.

Comments

@dapplion
Copy link
Contributor

dapplion commented Apr 25, 2023

In unstable metrics show that the time to process a block = seen in gossip until set as head ranges 600-900ms. This includes calling the execution client to notify the payload. Running the state transition function for a block takes ~50ms and computing the head another ~50ms. So the long process times are due to the shared resources with other items like processing attestations.

Block process timeline

Step Avg time
Gossip validation 30 ms
Send notify payload 10 ms
State transition 45 ms
State hashing 25 ms
Verify block sigs 25 ms
notify new payload idle 300 ms
import attestations 10 ms
recompute head 100 ms
persist LC data 25 ms
total 570 ms

CPU profiles show that attestation processing interlaces with block processing, It's unclear now if the 300 ms time for the new payload response is caused by that or not. See two examples of idle and busy loop waiting for notify new payload response

image

image

Research / gather more data

  • What's the actual notify new payload time from the point of view of the execution client
    • On unstable mainnet, average time is 100ms
rpc_duration_engine_newPayloadV2_success {quantile="0.5"} 104116.5
rpc_duration_engine_newPayloadV2_success {quantile="0.75"} 132396.75
rpc_duration_engine_newPayloadV2_success {quantile="0.95"} 225751.89999999988
rpc_duration_engine_newPayloadV2_success {quantile="0.99"} 364260.3900000002
rpc_duration_engine_newPayloadV2_success {quantile="0.999"} 1.0846435890000006e+06
rpc_duration_engine_newPayloadV2_success {quantile="0.9999"} 1.090077e+06
  • If significantly less than 300ms, add a snooper to watch traffic and understand where is the delay
  • At what point into the slot is the new payload HTTP request actually dispatched from the OS?

Optimization targets

  • Algorithmic optimization of recompute head
  • Stack recompute head with new payload waiting - investigated, found that it doesn't improve timing
  • Complete block import before new payload response setting it as optimistic head and set as VALID on response latter - likely to cause missed attestations
  • Block network processor queues when importing recent blocks. To allocate all cycles to block processor without interlaced attestations
  • Optimize persist LC data, or schedule for latter to complete import faster

CC @tuyennhv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
prio-medium Resolve this some time soon (tm). scope-performance Performance issue and ideas to improve performance.
Projects
None yet
Development

No branches or pull requests

3 participants