Skip to content

fix(test-benchmark): nonce mismatch issue in SLOAD & SSTORE benchmark#2617

Draft
LouisTsai-Csie wants to merge 1 commit intoethereum:forks/amsterdamfrom
LouisTsai-Csie:fix-auth-nonce
Draft

fix(test-benchmark): nonce mismatch issue in SLOAD & SSTORE benchmark#2617
LouisTsai-Csie wants to merge 1 commit intoethereum:forks/amsterdamfrom
LouisTsai-Csie:fix-auth-nonce

Conversation

@LouisTsai-Csie
Copy link
Copy Markdown
Collaborator

@LouisTsai-Csie LouisTsai-Csie commented Apr 3, 2026

🗒️ Description

Here is the summary for the payload verification for test_sload_bloated and test_sstore_bloated.

Payload Information

Related Link

Network Information

  • Network: perf-devnet-3
  • Snapshot block height:
  • Account info (address, nonce)
    • 1GB: (0x3F8074692982594c1936bd27433A8B6e5d77e0f0, 6)
    • 10GB: (0x87A6314da5Ac8832F6e7A176C8FB133B19f5be04, 2)
    • 20GB: (0x772604ee92EBc9AfA5B6CE561F6f6A4C4Cdd214a, 2)

Please check _STORAGE_BLOATED_EOA_KEYS variable under tests/benchmark/stateful/helpers.py, and convert the private key into the account address.

I verify the correctness with (1) the nonce being used and (2) the opcode count value. The verification script is as follows:

{
  "name": "block-parser",
  "version": "1.0.0",
  "type": "module",
  "description": "Script to parse and process blockchain blocks",
  "main": "block.js",
  "scripts": {
    "start": "node block.js"
  },
  "dependencies": {
    "@ethereumjs/tx": "^5.4.0",
    "@ethereumjs/util": "^9.1.0",
    "@ethereumjs/rlp": "^4.0.0"
  }
}

Please place them under the same folder path and run npm install. Once completed, update the filepath in the script and run npm start or node block.js

test_sstore_bloated

Configuration

  • fork: Osaka
  • existing_slot: False
  • write_new_value: False
  • token_name: 1GB
  • gas limit: 90M

Please check this generated-stateful-tests-stateful-perf-devnet-3-23949562466 for test_single_opcode.py__test_sstore_bloated[fork_Osaka-benchmark_test-existing_slots_False-write_new_value_False-token_name_1GB-benchmark_90M].txt file (under setup folder), i use it as an example.

In the block.js file, please update the filepath and points to the file mentioned above.

The result shows that the nonce in the authorization list starts at 6 for init_tx and 7 for runtime_tx, which matches the on-chain nonce value.

"authorizedList": [
	{
	  "chainId": "0x",
	  "address": "0x55e5b385b218a8a94d5766e423fb25e6ad9c9ffa",
	  "nonce": "0x06",
	  "yParity": "0x",
	  "r": "0xc1d8ef10fb2fb305ff83be4d5474c6661f1d0e256279ab952b9c6a814bf90bd3",
	  "s": "0x1bb7ca0248d101ba8f1ecf7d793ece69cd5d5a8fd6e0e24d5fe218b9d72256b1"
	}
],

From the opcode count file for this payload generation, the (SSTORE, SLOAD) opcode counts are (4049, 6), while the (SSTORE, SLOAD) opcode counts for the local fill mode run are (4074, 23). IMO this difference is reasonable as the local fill mode run includes the system contract interaction.

I have manually reviewed the other combinations for the test. The nonce value in the payloads match the ones on the live network.

test_sload_bloated

Configuration

  • fork: Osaka
  • existing_slot: False
  • write_new_value: False
  • token_name: 1GB
  • gas limit: 90M

Please check this generated-stateful-tests-stateful-perf-devnet-3-23949903761 file for test_single_opcode.py__test_sload_bloated[fork_Osaka-benchmark_test-existing_slots_False-token_name_1GB-benchmark_90M].txt file (under the setup folder). I've reviewed the nonce value using the same approach as test_sstore_bloated and ensure it is the same as the ones on the live network.

For the opcode count, the (SSTORE, SLOAD) pair of the payload is (41855, 6) while for local fill mode it is (41873, 30). I consider this is within the safe area, as the latter one includes the opcode count for system contract interaction.

How to locally fill the test? Please follow the example command:

uv run fill \
  -v \
  --clean \
  --evm-bin <evm-bin-path> \
  --gas-benchmark-values 90 \
  --fork Osaka \
tests/benchmark/stateful/bloatnet/test_single_opcode.py::test_sstore_bloated \
  --address-stubs tests/benchmark/stateful/stubs/stubs_bloatnet.json \
  --rpc-endpoint <eth-rpc-endpoint>

Postmortem

Why do we need this new live_eth_rpc fixture? And why does the previous test run not fetch the nonce value from the network?

EELS has two test modes: execute-remote, which provides an eth_rpc fixture to interact with a live network, and fill, which has no such fixture.

The previous implementation declared eth_rpc: EthRPC | None = None as a test parameter, expecting execute-remote to inject the live EthRPC instance and fill mode to fall back to None.

...
def test_sload_bloated(
    benchmark_test: BenchmarkTestFiller,
    ...
    existing_slots: bool,
    eth_rpc: EthRPC | None = None, # fixture passed here
) -> None:
...

However, because pytest treats a parameter with a default value as already satisfied, it never overrides the default with the actual fixture, so even in execute-remote mode, eth_rpc was always None and the on-chain nonce was never fetched. The workaround introduces a new live_eth_rpc bridge fixture that internally calls request.getfixturevalue("eth_rpc") to dynamically look up the real fixture when it exists (execute-remote) and returns None when it does not (fill mode); tests now declare live_eth_rpc without a default, forcing pytest to always resolve it. That said, this is not an ideal long-term approach for all remaining test cases, we need a mechanism that takes a private key as input, interacts with the corresponding on-chain account, and automatically manages nonce increments, since handling this manually is fragile.

🔗 Related Issues or PRs

N/A.

✅ Checklist

  • All: Ran fast static checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    just static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.
  • Tests: For PRs implementing a missed test case, update the post-mortem document to add an entry the list.
  • Ported Tests: All converted JSON/YML tests from ethereum/tests or tests/static have been assigned @ported_from marker.

Cute Animal Picture

Put a link to a cute animal picture inside the parenthesis-->

@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 86.24%. Comparing base (4bf8bbe) to head (78a2094).

Additional details and impacted files
@@               Coverage Diff                @@
##           forks/amsterdam    #2617   +/-   ##
================================================
  Coverage            86.24%   86.24%           
================================================
  Files                  599      599           
  Lines                36984    36984           
  Branches              3795     3795           
================================================
  Hits                 31895    31895           
  Misses                4525     4525           
  Partials               564      564           
Flag Coverage Δ
unittests 86.24% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jochem-brouwer
Copy link
Copy Markdown
Member

I don´t understand, why does this seem to fix the issue?

@LouisTsai-Csie LouisTsai-Csie marked this pull request as ready for review April 6, 2026 07:27
@LouisTsai-Csie
Copy link
Copy Markdown
Collaborator Author

@jochem-brouwer i've updated the PR description, please take a look at the Postmortem part, the main reason is that we do not inject ethRPC into the file, which is breaking the framework.

@jochem-brouwer
Copy link
Copy Markdown
Member

Cool. This makes sense! This is indeed an intermediate option to fix this now, we should look for a more robust option ASAP. Can we track this in an issue? This is specifically when fill vs execute mode diverges. We could also mark these type of tests to be filled with execute by default throwing a warning when we try to fill with "fill".

@LouisTsai-Csie
Copy link
Copy Markdown
Collaborator Author

LouisTsai-Csie commented Apr 6, 2026

@jochem-brouwer I have the PR for the refactor, please take a look if you are interested! With this approach, we could remove ethRPC instance and only rely on the stub integration.

@jochem-brouwer
Copy link
Copy Markdown
Member

jochem-brouwer commented Apr 9, 2026

If this run was generated on changes of this PR: https://github.com/NethermindEth/gas-benchmarks/actions/runs/23949903761 then this benchmark now fills correctly, or is this also fixed by #2624?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants