Skip to content

Conversation

gballet
Copy link
Member

@gballet gballet commented Aug 14, 2025

πŸ—’οΈ Description

Add a test required as part of the BloatNet effort. This is the

πŸ”— Related Issues or PRs

Not an issue, but a test plan is described here.

βœ… Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx --with=tox-uv tox -e lint,typecheck,spellcheck,markdownlint
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
Signed-off-by: Guillaume Ballet <[email protected]>
@gballet gballet changed the title BloatNet: add first few single-opcode test for state access. feat(tests): add first few single-opcode test for state access in BloatNet Aug 14, 2025
Signed-off-by: Guillaume Ballet <[email protected]>

remove leftover single whitespace :|
@gballet gballet force-pushed the bloatnet-test-SSTORE branch from b6cd62a to 374e08a Compare August 14, 2025 19:16
@LouisTsai-Csie
Copy link
Collaborator

LouisTsai-Csie commented Aug 15, 2025

Hello @gballet ! Thanks for adding this case.

This is the issue tracker for bloatnet test cases, could you please help me (1) add the PR to the issue tracker PR description (like this) (2) link this PR to the issue, this would help us better track the progress, thank you!

For benchmark test, we now add new cases under tests/benchmark, and I think test_worst_stateful_opcodes.py best fit in your test.

I also add some review below, please feel free to let me know if you have any issue! If you want some reference for benchmark test, maybe you can take a look at this This is a similar case for this benchmark! You can take a look at this structure!

@gballet
Copy link
Member Author

gballet commented Aug 21, 2025

Hey @LouisTsai-Csie thanks for the feedback.

This is the issue tracker for bloatnet test cases, could you please help me (1) add the PR to the issue tracker PR description (like this) (2) link this PR to the issue, this would help us better track the progress, thank you!

I tried to do this, but this looks like it's very involved. I did my best effort but since I don't know what you're expecting, and also that I don't have all the time in the world, I'll leave it in your court to comment on that. #2064

For benchmark test, we now add new cases under tests/benchmark, and I think test_worst_stateful_opcodes.py best fit in your test.

if I do that, how do I run the test? it seems to ignore them after I moved it to the directory. I have pushed it to this PR for your consideration.

I also add some review below, please feel free to let me know if you have any issue! If you want some reference for benchmark test, maybe you can take a look at this This is a similar case for this benchmark! You can take a look at this structure!

Thanks for the reference.

@LouisTsai-Csie
Copy link
Collaborator

@gballet Appologies. I forgot to link the issue tracker for you. We've created an issue tracker based on your documentation.

I help you link this PR to the SSTORE β€” Fill block with SSTORE(0 β†’ 1) to maximize new storage slot creation, please let me know if this does not fit in the category.

Also, it would be great if you can help me review if there is anything missing / wrong in our issue tracker!

@gballet
Copy link
Member Author

gballet commented Aug 21, 2025

I'll need to have a closer look, but it seems fine as a first pass. Do you know what the problem is with moving my file to benchmarks?

@LouisTsai-Csie
Copy link
Collaborator

LouisTsai-Csie commented Aug 21, 2025

Our documentation is incomplete (I will fix them ASAP), for running the test, you will need to add a flag -m benchmark to run the test under the benchmark/ folder. By default, these tests are ignored to avoid some overhead in the CI/release process

This is the command on our documentation:

fill -v tests/benchmark/test_worst_blocks.py::test_block_full_of_ether_transfers --fork Osaka

But I would add some flag to run it:

uv run fill -v tests/benchmark/test_worst_blocks.py::test_block_full_of_ether_transfers --fork Osaka -m benchmark --clean
  • uv run: we use uv as package manager
  • -m benchmark: We need this flag or benchmark test will be ignored by default
  • --clean: you will need this if you already fill test before.

Please let me know if there is anything unclear to you!

@gballet gballet marked this pull request as ready for review August 27, 2025 08:53
Copy link
Collaborator

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @gballet. I did a first pass at this strictly just from setup. I didn't dig into the actual test case to make sure it's doing what we want it to do. I am going to take a deeper look at the logic.

@fselmo
Copy link
Collaborator

fselmo commented Aug 28, 2025

I just wanted to add a bit more context on the gas_benchmark_value. This allows us to run something like:

uv run fill --fork=Prague -m benchmark --gas-benchmark-values 1,10,30,45,100,150 --clean -k bloatnet

This allows us to test against the different gas limit values specified for the block, not transaction gas cap (1 = 1 Mgas). I am looking a bit deeper into the PR next but wanted to provide some better context.

fselmo and others added 5 commits September 2, 2025 16:59
* refactor(tests): Proposed patch for bloatnet SSTORE tests

* refactor(tests): Update tests from comments on PR

PR: #1
Signed-off-by: fselmo <[email protected]>

* Use parametrization of the value that is written to

---------

Signed-off-by: fselmo <[email protected]>
Co-authored-by: Guillaume Ballet <[email protected]>
Co-authored-by: felipe <[email protected]>
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is some refactoring for the code, please help update them, thanks!

Let's wait for the answer of the execute mode for bloatnet scenario.

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just one comment that was making the test fail when using small --gas-benchmark-values values.

gballet and others added 3 commits September 9, 2025 13:15
Co-authored-by: θ”‘δ½³θͺ  Louis Tsai <[email protected]>
Co-authored-by: Mario Vega <[email protected]>
Copy link
Collaborator

@fselmo fselmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some things to think about.

For Osaka, when the tx_gas_cap is present, I don't think the SLOAD tests are doing what, as I read the code, is the intention of the test. We build up enough transactions to max out the SLOADs per transaction - but when we reach the last tx, all the contract knows to do is the same static amount of SLOADs and it can't do any less. I think these could benefit from the similar flow of the SSTORE tests where we can actually control the slots as separate contracts per transaction and we can actually get to some higher numbers with these tests.

Let me know if I missed something and let me know your thoughts on this.


### πŸ§ͺ Test Cases

- ✨ [BloatNet](https://bloatnet.info)/Multidimensional Metering: Add benchmarks to be used as part of the BloatNet project and also for Multidimensional Metering.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes need to be rebased with main and this note needs to be moved up above 5.0.0 which has been released and into the newer "Unreleased" section.

@@ -0,0 +1,377 @@
"""
abstract: Tests that benchmarks EVMs to estimate the costs of stateful opcodes.
Tests that benchmarks EVMs to estimate the costs of stateful opcodes..
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is here twice

Suggested change
Tests that benchmarks EVMs to estimate the costs of stateful opcodes..


# Calculate how many warm loads we can actually do
remaining_budget = actual_opcode_budget - warmup_gas
actual_warm_loads = min(num_warm_loads_per_tx, remaining_budget // warm_sload_cost)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the concept of actual_warm_loads here. The bytecode of the contract doesn't know when to stop. We do the same amount of sloads no matter what call we make to it because that's what it's written to do, no? This is a problem for Osaka when the tx_gas_cap is actually present.

Maybe this test just needs to consider that all transactions except for the last transaction will all go through the full bytecode SLOAD range. The issue is when you get to the last transaction and you are trying to reach some limit, we simply can't call the contract anymore because we will run out of gas and revert - since the bytecode is hard-coded to a static range of sloads. This is the transaction where actual_warm_laods would work but it doesn't stop there and it keeps going and reverts so we actually shouldn't count those toward the benchmark.

I think both of these tests could use the SSTORE tests approach where there is a contract per transaction with the exact amount of SLOADS per transaciton so that we can actually max out the SLOADs instead of wasting the last transaction which could potentially fit in a lot more SLOADs.


tx_gas_used = intrinsic_gas + (actual_slots * cold_sload_cost)
if actual_slots < max_slots_per_tx:
total_block_gas_used += tx_gas_limit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is the same case here as the warm test case. The very last transaction will hit this and it will not have enough gas to call the defined amount of SLOADs in the bytecode and the last tx won't count toward the benchmark.

Keep in mind this is for Osaka. For Prague I think this is all running ok.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants