feat(benchmark): add `benchmark_test` test type #1945

LouisTsai-Csie · 2025-07-24T07:59:30Z

🗒️ Description

As EIP-7825 is introduced in Fusaka upgrade, most of the legacy test case would fail. This issue add two test wrappers, benchmark_test and benchmark_state_test, to replace pure blockchain_test and state_test test type.

🔗 Related Issues or PRs

Issue #1896

✅ Checklist

All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
```
uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
```
All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
All: Considered adding an entry to CHANGELOG.md.
All: Considered updating the online docs in the ./docs/ directory. (will come in a separate PR)
All: Set appropriate labels for the changes (only maintainers can apply labels).

LouisTsai-Csie · 2025-08-14T16:30:25Z

There are some issue in generating the fixture. I compare to the newly created fixture, and the size is much larger than the original one. This should not happen and there should be the same content, so the same size. But this is not a big problem now.

The major issue now is to resolve the failing test in CI, which I could not reproduce now locally.

CPerezz · 2025-08-29T12:59:04Z

This can come in handy for benchmark tests as basically they force the consumption of all the gas available. And that condition forces us to implement padding techniques to consume EXACTLY all the gas available in a block.

When in reality, for a benchmark, we don't care about this at all.
PRs affected:

LouisTsai-Csie · 2025-08-29T16:09:39Z

@CPerezz I think this is still necessary for Nethermind team (Increasing gas limit) and zkEVM team (proving the entire block)? For gas limit testing, I am not sure if they can only run 1 tx and then derive the entire block execution time from it

CPerezz · 2025-08-30T12:07:43Z

@CPerezz I think this is still necessary for Nethermind team (Increasing gas limit) and zkEVM team (proving the entire block)? For gas limit testing, I am not sure if they can only run 1 tx and then derive the entire block execution time from it

But you can emit a warning if needed. Why does it need to be a failure not spending ALL the gas exactly? I agree it has to be within a bound. Sure. But to the unit in precision is really different. Specially when you have to account for mem expansion and other costs. It's almost impossible to not need padding.

I'm not advocating to remove this completely. But to relax it maybe. Or at least, it would be useful to know why does it need to fail specifically? When and Why was this introduced?

LouisTsai-Csie · 2025-08-30T15:58:47Z

@CPerezz Thank you for explanation, it is very clear! I will review the features included again and discuss with the team.

As you see this is still a draft and we welcome any feedback, we also want to know what does stateless client team need for benchmarking, what's your consideration when benchmarking?

CPerezz · 2025-09-01T05:33:49Z

@LouisTsai-Csie So I'm just speaking in regards of "State bottlenecks" project. Which is within the stateless-consensus team. Our goal is to measure how different client impls behave when under heavy load and different state sizes among other things.

For that, we need these kind of benchmarks. But it results quite tricky to match perfectly the gas spent. And it's not required at all to be spent. 1% of wiggle room is enough to consider the benchmark useful even if it doesn't spend all the gas of the block.

src/ethereum_test_specs/benchmark.py

marioevz

After going through the current implementation and thinking about it I think this PR is mostly on the right track.

My suggestions would be:

We have a single new spec benchmark_tests that receives setup_txs and workload_txs, or a generator.
We have multiple generator subclasses all of which subclass BenchmarkCodeGenerator and an implement generate_setup_txs and generate_workload_txs (and perhaps deploy_contracts).
Internally benchmark_tests takes setup_txs (or calls generator.generate_setup_txs()) and, if any, generates a first setup block, and then takes workload_txs (or calls generator.generate_workload_txs()) and puts them in the a different block.

src/ethereum_test_specs/benchmark_state.py

src/ethereum_test_tools/benchmark_code_generator.py

LouisTsai-Csie · 2025-09-11T12:59:06Z

I refactor the helper function and add the context manager feature.

During the update, some question and todo came to my mind:

Where would be the best place for the benchmark_code_generator.py file? Now it is under ethereum_test_benchmark? I originally put it under ethereum_test_tools, but i keep facing circular import issue between ethereum_test_tools <-> ethereum_test_spec package
I have not yet removed the benchmark_state_test fixture, I will do so after we confirm it is not necessary with geth team
Should we also add metadata here? like how it does in the PR feat(execute): Add identifiers to sent txs #2056

marioevz · 2025-09-11T18:11:54Z

Regarding the questions you have:

Where would be the best place for the benchmark_code_generator.py file? Now it is under ethereum_test_benchmark? I originally put it under ethereum_test_tools, but i keep facing circular import issue between ethereum_test_tools <-> ethereum_test_spec package
I think having the ethereum_test_benchmark package is great, because we are going to keep growing the tools we use for benchmarking in the repo,

Maybe we could move the abstract class BenchmarkCodeGenerator to src/ethereum_test_specs/benchmark.py (while leaving JumpLoopGenerator and ExtCallGenerator in src/ethereum_test_benchmark/benchmark_code_generator.py) because you can use it as an input field to BenchmarkTest/BenchmarkStateTest and you can avoid the circular dependency in that case.

I have not yet removed the benchmark_state_test fixture, I will do so after we confirm it is not necessary with geth team

Sgtm, I'm still open to be convinced that we indeed need it.

Should we also add metadata here? like how it does in the PR

That might be out of scope for this PR and we should leave that for the PR that touches the execute command to better align it with the new formats.

marioevz

Looking really good! I think the code generators are fantastic, and the only part I feel we should take out and move into another PR is the BenchmarkManager.

src/ethereum_test_specs/benchmark.py

tests/benchmark/test_worst_compute.py

src/ethereum_test_benchmark/benchmark_code_generator.py

src/ethereum_test_specs/benchmark.py

CPerezz · 2025-09-16T09:30:19Z

Unsure if this is somehow related. But JIC mentioning it here.

In #2090 we arrived to the following conclusion:

State-related tests might execute only in 2 ways:

You use stubed-contracts feat(execute): Support for contract address stubs #2073 because the state already has the contracts/accounts deployed.

You deploy the contracts/accounts and then proceed as in 1.

For that reason, we realized that benchmark-state-tests always end up being executed in execute mode. Never in fill.

Therefore, the way I found to profit off of this dual mode is to allow fix-mode to take care of the pre-state deployment/generation (making sure it doesn't run in case it identifies that the state is about to deploy already is).
_Notice here, that things like the gas_benchmark_value are useful as they let us understand how much gas we want to spend in execute-mode and deploy as many contracts/accounts as necessary to enable such attack using CREATE2 deterministic addressing for example.
Then, execute-mode runs as a usual benchmark-test. Though things like #2155 would come in handy to make our life easier.

LMK what you think @LouisTsai-Csie @fselmo .

If this approach doesn't make sense. Could you let me know what;'s the best way to bring all Bloatnet benchmarks into EEST?

fselmo

I honestly don't have a lot to add here, this looks amazing 🔥. Really elegant approach. I added a lot of words (that's just my nature 😆) but there's really just some minor miscalculations that we should have sanity checks for anyhow. Otherwise this is looking excellent!

Major question I have is whether this will all still work if we rip out the phase manager and leave it for another PR. I think we can... is there a reason to keep it?

src/ethereum_test_specs/benchmark.py

src/ethereum_test_vm/bytecode.py

tests/benchmark/test_worst_blocks.py

src/ethereum_test_specs/benchmark.py

src/ethereum_test_benchmark/benchmark_code_generator.py

src/ethereum_test_specs/benchmark.py

fselmo · 2025-09-23T16:15:23Z

@LouisTsai-Csie this looks good and I've approved and marked all comments as resolved. Please add a CHANGELOG entry for this before merging. We should add some documentation for this as well. Is this important to get in quick (defer docs to another followup PR) or should we do this here before merge?

fselmo

Approving again 🙂. See my previous comment about documentation. Are you willing to add it here or should we merge this and address separately? We should also add a CHANGELOG entry here.

…ration

LouisTsai-Csie · 2025-09-24T17:06:01Z

I add a changelog entry, and I prefer to add the documentation in the follow-up PR, as there might be some small changes in the interface! Thanks @fselmo

Note: please feel free to edit the changelog and update the description directly!

* feat: wrap blockchain test for benchmark * feat: wrap state test for benchmark * feat(benchmark): add code generator to generate transaction * fix: resolve typing issue * refactor: update benchmark code generator and test wrapper * fix: udpate example changes * refactor: resolve typing and update func interface * refactor: remove benchmark state test wrapper * fix: pydantic model validation for benchmark manager * refactor synatx and parameter * refactor: remove benchmark manager feature * refactor: update logic and add benchmark tests * refactor: enforce single property requirement in blockchain test generation * refactor: update Bytecode serialization schema to use format_ser_schema * refactor: update import paths * refactor: update serialization schema * refactor: remove unused parameters * doc: add changelog entry * fix typo

LouisTsai-Csie self-assigned this Jul 24, 2025

LouisTsai-Csie added feature:benchmark type:feat type: Feature labels Jul 24, 2025

LouisTsai-Csie mentioned this pull request Aug 5, 2025

test: add max block size test using access lists #1932

Merged

8 tasks

LouisTsai-Csie force-pushed the benchmark-test-type branch 2 times, most recently from 641036c to af00ec2 Compare August 8, 2025 10:07

LouisTsai-Csie marked this pull request as ready for review August 11, 2025 09:52

LouisTsai-Csie force-pushed the benchmark-test-type branch from af00ec2 to de7f485 Compare August 14, 2025 12:50

LouisTsai-Csie marked this pull request as draft August 14, 2025 16:30

LouisTsai-Csie mentioned this pull request Sep 2, 2025

feat(bloatnet): Add first multi-opcode benchmarks for Bloatnet #2090

Closed

6 tasks

marioevz reviewed Sep 8, 2025

View reviewed changes

src/ethereum_test_specs/benchmark.py Outdated Show resolved Hide resolved

LouisTsai-Csie force-pushed the benchmark-test-type branch from de7f485 to 688e861 Compare September 9, 2025 06:57

marioevz reviewed Sep 9, 2025

View reviewed changes

src/ethereum_test_specs/benchmark_state.py Outdated Show resolved Hide resolved

src/ethereum_test_tools/benchmark_code_generator.py Outdated Show resolved Hide resolved

LouisTsai-Csie force-pushed the benchmark-test-type branch from f99f318 to 30b4f76 Compare September 11, 2025 12:51

LouisTsai-Csie marked this pull request as ready for review September 11, 2025 14:45

marioevz reviewed Sep 11, 2025

View reviewed changes

LouisTsai-Csie force-pushed the benchmark-test-type branch 2 times, most recently from e803c98 to 1da86fa Compare September 15, 2025 09:54

danceratopz mentioned this pull request Sep 15, 2025

All Core Devs - Testing (ACDT) #53 | Sep 15 2025 ethereum/pm#1719

Closed

3 tasks

fselmo reviewed Sep 16, 2025

View reviewed changes

LouisTsai-Csie force-pushed the benchmark-test-type branch from 1da86fa to 50ec823 Compare September 17, 2025 08:02

LouisTsai-Csie force-pushed the benchmark-test-type branch from 96b5b42 to a0e1038 Compare September 22, 2025 04:10

LouisTsai-Csie requested a review from fselmo September 23, 2025 16:01

fselmo approved these changes Sep 23, 2025

View reviewed changes

LouisTsai-Csie added 18 commits September 25, 2025 00:52

feat: wrap blockchain test for benchmark

a83ee49

feat: wrap state test for benchmark

09c09cb

feat(benchmark): add code generator to generate transaction

87bd45d

fix: resolve typing issue

8ca027f

refactor: update benchmark code generator and test wrapper

d76104f

fix: udpate example changes

51d6817

refactor: resolve typing and update func interface

99f22d7

refactor: remove benchmark state test wrapper

67a07d7

fix: pydantic model validation for benchmark manager

2e34a6a

refactor synatx and parameter

6470b46

refactor: remove benchmark manager feature

56e3b28

refactor: update logic and add benchmark tests

d88f680

refactor: enforce single property requirement in blockchain test gene…

80281e9

…ration

refactor: update Bytecode serialization schema to use format_ser_schema

0a0c149

refactor: update import paths

f5ca3e5

refactor: update serialization schema

c4e8fbd

refactor: remove unused parameters

1df840b

doc: add changelog entry

e2f462b

LouisTsai-Csie force-pushed the benchmark-test-type branch from a0e1038 to e2f462b Compare September 24, 2025 17:02

fix typo

0e597d5

fselmo merged commit a2f2841 into ethereum:main Sep 24, 2025
16 checks passed

jochem-brouwer mentioned this pull request Oct 16, 2025

Some benchmark tests will not execute the exact gas as used in the --gas-benchmark-value: the equality assertion should be removed ethereum/execution-specs#1621

Open

spencer-tb mentioned this pull request Oct 9, 2025

tracker(fork): osaka fork mega meta issue ethereum/execution-specs#1558

Open

30 tasks

LouisTsai-Csie mentioned this pull request Oct 3, 2025

feat(benchmark): create new benchmark_test test type #1896

Closed

feat(benchmark): add benchmark_test test type #1945

feat(benchmark): add benchmark_test test type #1945

Uh oh!

Conversation

LouisTsai-Csie commented Jul 24, 2025 • edited by fselmo Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

Uh oh!

LouisTsai-Csie commented Aug 14, 2025

Uh oh!

CPerezz commented Aug 29, 2025

Uh oh!

LouisTsai-Csie commented Aug 29, 2025

Uh oh!

CPerezz commented Aug 30, 2025

Uh oh!

LouisTsai-Csie commented Aug 30, 2025

Uh oh!

CPerezz commented Sep 1, 2025

Uh oh!

Uh oh!

marioevz left a comment • edited by LouisTsai-Csie Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

LouisTsai-Csie commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marioevz commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

marioevz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

CPerezz commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fselmo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fselmo commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fselmo left a comment

Choose a reason for hiding this comment

Uh oh!

LouisTsai-Csie commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat(benchmark): add `benchmark_test` test type #1945

feat(benchmark): add `benchmark_test` test type #1945

LouisTsai-Csie commented Jul 24, 2025 •

edited by fselmo

Loading

marioevz left a comment •

edited by LouisTsai-Csie

Loading

LouisTsai-Csie commented Sep 11, 2025 •

edited

Loading

marioevz commented Sep 11, 2025 •

edited

Loading

CPerezz commented Sep 16, 2025 •

edited

Loading

fselmo commented Sep 23, 2025 •

edited

Loading

LouisTsai-Csie commented Sep 24, 2025 •

edited

Loading