-
Notifications
You must be signed in to change notification settings - Fork 177
feat(bloatnet): Add first multi-opcode benchmarks for Bloatnet #2090
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from all commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
a1f2153
Add BloatNet tests
gballet 02d65b4
try building the contract
gballet e721cc6
fix: SSTORE 0 -> 1 match all values in the state
gballet d1cad25
add the tx for 0 -> 1 and 1 -> 2
gballet 16f6d30
fix: linter issues
gballet 374e08a
remove more whitespaces
gballet 333c876
fix formatting
gballet 79a95b8
move to benchmarks
gballet 8131e98
fix linter value
gballet 5f805fd
use the gas limit from the environment
gballet 090a400
parameterize the written value in SSTORE
gballet cd02a02
fix linter issues
gballet 1f3c381
update CHANGELOG.md
gballet f6def7e
fix format
gballet 7e20a50
simplify syntax
gballet c24ad35
fix: start with an empty contract storage
gballet fc27e53
more fixes, but the result is still incorrect
gballet 7d87262
fix: finally fix the tests
gballet 8556014
linter fix
gballet 326915e
add SLOAD tests
gballet 1f8e62a
test(benchmark): implement CREATE2 addressing for bloatnet tests
CPerezz 8babb13
refactor(benchmark): optimize gas calculations in bloatnet tests
CPerezz e70132b
refactor(benchmark): bloatnet tests with unique bytecode for I/O opt…
CPerezz 0e889d7
refactor(benchmark): replace custom CREATE2 address calculation with …
CPerezz e4583b6
CREATE2 factory approach working
CPerezz 06f9a63
Version with EIP-7997 model working
CPerezz 49c1343
refactor(benchmark): imrpove contract deployment script with interact…
CPerezz 2875cf4
delete: remove obsolete test_create2.py script
CPerezz b634ca3
refactor(benchmark): optimize gas calculations for BALANCE + EXTCODEC…
CPerezz 774c56c
refactor(benchmark): support non-fixed max_codesize
CPerezz 6e6863a
chore: Remove all 24kB "hardcoded" refs
CPerezz f2cd5f9
fix: pre-commit lint hooks
CPerezz cf2c7c6
push updated deploy_create2_factory refactored with EEST as dep
CPerezz a862f76
refactor(benchmark): enhance CREATE2 factory deployment and testing
CPerezz 55396fb
remove: old_deploy_factory script
CPerezz File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,144 @@ | ||
# BloatNet Benchmark Tests setup guide | ||
|
||
## Overview | ||
|
||
This README pretends to be a guide for any user that wants to run the bloatnet test/benchmark suite in any network. | ||
BloatNet bench cases can be seen in: https://hackmd.io/9icZeLN7R0Sk5mIjKlZAHQ. | ||
The idea of all these tests is to stress client implementations to find out where the limits of processing are focusing specifically on state-related operations. | ||
|
||
In this document you will find a guide that will help you deploy all the setup contracts required by the benchmarks in `/benchmarks/bloatnet`. | ||
|
||
## Gas Cost Constants | ||
|
||
### BALANCE + EXTCODESIZE Pattern | ||
**Gas per contract: ~2,772** | ||
- `SHA3` (CREATE2 address generation): 30 gas (static) + 18 gas (dynamic for 85 bytes) | ||
- `BALANCE` (cold access): 2,600 gas | ||
- `POP`: 2 gas | ||
- `EXTCODESIZE` (warm): 100 gas | ||
- `POP`: 2 gas | ||
- Memory operations and loop overhead: ~20 gas | ||
|
||
### BALANCE + EXTCODECOPY(single-byte) Pattern | ||
**Gas per contract: ~2,775** | ||
- `SHA3` (CREATE2 address generation): 30 gas (static) + 18 gas (dynamic for 85 bytes) | ||
- `BALANCE` (cold access): 2,600 gas | ||
- `POP`: 2 gas | ||
- `EXTCODECOPY` (warm, 1 byte): 100 gas (base) + 3 gas (copy 1 byte) | ||
- Memory operations: 4 gas | ||
- Loop overhead: ~20 gas | ||
|
||
Note: Reading just 1 byte (specifically the last byte at offset 24575) forces the client | ||
to load the entire 24KB contract from disk while minimizing gas cost. This allows | ||
targeting nearly as many contracts as the EXTCODESIZE pattern while forcing maximum I/O. | ||
|
||
## Required Contracts Calculation Example: | ||
|
||
### For BALANCE + EXTCODESIZE: | ||
| Gas Limit | Contracts Needed | Calculation | | ||
| --------- | ---------------- | ------------------- | | ||
| 1M | 352 | 1,000,000 ÷ 2,772 | | ||
| 5M | 1,769 | 5,000,000 ÷ 2,772 | | ||
| 50M | 17,690 | 50,000,000 ÷ 2,772 | | ||
| 150M | 53,071 | 150,000,000 ÷ 2,772 | | ||
|
||
### For BALANCE + EXTCODECOPY: | ||
| Gas Limit | Contracts Needed | Calculation | | ||
| --------- | ---------------- | ------------------- | | ||
| 1M | 352 | 1,000,000 ÷ 2,775 | | ||
| 5M | 1,768 | 5,000,000 ÷ 2,775 | | ||
| 50M | 17,684 | 50,000,000 ÷ 2,775 | | ||
| 150M | 53,053 | 150,000,000 ÷ 2,775 | | ||
|
||
The CREATE2 address generation adds ~48 gas per contract but eliminates memory limitations in test framework. | ||
|
||
## Quick Start: 150M Gas Attack | ||
|
||
### 1. Deploy CREATE2 Factory with Initcode Template | ||
|
||
```bash | ||
# Deploy the factory and initcode template (one-time setup) | ||
python3 tests/benchmark/bloatnet/deploy_create2_factory_refactored.py | ||
|
||
# Output will show: | ||
# Factory deployed at: 0x... <-- Save this address | ||
# Init code hash: 0x... <-- Save this hash | ||
``` | ||
|
||
### 2. Deploy Contracts | ||
|
||
Deploy contracts using the factory. Each contract will be unique due to ADDRESS-based randomness in the initcode. | ||
|
||
#### Calculate Contracts Needed | ||
|
||
Before running the deployment, calculate the number of contracts needed: | ||
- For 150M gas BALANCE+EXTCODESIZE: 53,071 contracts | ||
- For 150M gas BALANCE+EXTCODECOPY: 53,053 contracts | ||
|
||
_Deploy enough contracts to cover the max gas you plan to use in your tests/benchmarks._ | ||
|
||
#### Running the Deployment | ||
|
||
```bash | ||
# Deploy contracts for 150M gas attack | ||
python3 tests/benchmark/bloatnet/deploy_create2_factory_refactored.py \ | ||
--deploy-contracts 53100 | ||
|
||
# For smaller tests (e.g., 1M gas) | ||
python3 tests/benchmark/bloatnet/deploy_create2_factory_refactored.py \ | ||
--deploy-contracts 370 | ||
``` | ||
|
||
#### Deployment Output | ||
|
||
After successful deployment, the script will display: | ||
|
||
``` | ||
✅ Successfully deployed 53100 contracts | ||
NUM_DEPLOYED_CONTRACTS = 53100 | ||
``` | ||
|
||
### 3. Update Test Configuration | ||
|
||
Edit `tests/benchmark/bloatnet/test_bloatnet.py` and update with values from deployment: | ||
|
||
```python | ||
FACTORY_ADDRESS = Address("0x...") # From step 1 output | ||
INIT_CODE_HASH = bytes.fromhex("...") # From step 1 output | ||
NUM_DEPLOYED_CONTRACTS = 53100 # From step 2 output | ||
``` | ||
|
||
### 4. Run Benchmark Tests | ||
|
||
#### Generate Test Fixtures | ||
```bash | ||
# Run with specific gas values (in millions) | ||
uv run fill --fork=Prague --gas-benchmark-values=150 \ | ||
tests/benchmark/bloatnet/test_bloatnet.py --clean | ||
|
||
# Multiple gas values | ||
uv run fill --fork=Prague --gas-benchmark-values=1,5,50,150 \ | ||
tests/benchmark/bloatnet/test_bloatnet.py | ||
``` | ||
|
||
#### Execute Against Live Client | ||
```bash | ||
# Start a test node (e.g., Geth) | ||
geth --dev --http --http.api eth,web3,net,debug | ||
|
||
# Run tests | ||
uv run execute remote --rpc-endpoint http://127.0.0.1:8545 \ | ||
--rpc-chain-id 1337 --rpc-seed-key 0x0000000000000000000000000000000000000000000000000000000000000001 \ | ||
tests/benchmark/bloatnet/test_bloatnet.py \ | ||
--fork=Prague --gas-benchmark-values=150 -v | ||
``` | ||
|
||
#### With EVM Traces for Analysis | ||
```bash | ||
uv run fill --fork=Prague --gas-benchmark-values=150 \ | ||
--evm-dump-dir=traces/ --traces \ | ||
tests/benchmark/bloatnet/test_bloatnet.py | ||
|
||
# Analyze opcodes executed | ||
jq -r '.opName' traces/**/*.jsonl | sort | uniq -c | ||
``` |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tests is basically two scenarios in one. It tests both BALANCE (which marks it warm) and EXTCODESIZE.
Note that for accounts in the Merkle Patricia Trie in the state, account are stored as:
[nonce, balance, storageRoot, codeHash]
Thus reading balance from MPT will "just" require reading the account.
EXTCODESIZE
however means we have to querycodeHash
, and to get the size we have to lookup all the code from the DB in order to determine the size (this assumes that the client has not optimized this some way, for instance via an extra database like acodeHash => codeSize
lookup which would skip first reading all the code to determine size).So, I believe we need scenarios for BALANCE/EXTCODESIZE.
For EXTCODESIZE, I think this benchmark test is what you want:
execution-spec-tests/tests/benchmark/test_worst_bytecode.py
Line 41 in 291fe00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For BALANCE (cold) this test:
execution-spec-tests/tests/benchmark/test_worst_stateful_opcodes.py
Line 39 in 291fe00
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow you here.
If these standalone scenarios already exist as you correctly pointed out, and my PR adds the combination of them into a single test, what is actually needed further from what this PR adds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you claiming we need to implement something? What I want is to test the combination of the 2 together. And observe if any client has optimizations that can be applied. This is all part of the following scenarios I want to implement for bloatnet: https://hackmd.io/9icZeLN7R0Sk5mIjKlZAHQ#Opcode-State-Access-Combination-Tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry! You are right, I was thinking of this from a different perspective (opcodes in isolation). The combined test is indeed not written.