Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test failure: restart-scenarios-if-test-none #1151

Open
heifner opened this issue Feb 7, 2025 · 3 comments
Open

Test failure: restart-scenarios-if-test-none #1151

heifner opened this issue Feb 7, 2025 · 3 comments
Assignees
Labels
bug The product is not working as was intended. 👍 lgtm OCI Work exclusive to OCI team

Comments

@heifner
Copy link
Member

heifner commented Feb 7, 2025

https://github.com/AntelopeIO/spring/actions/runs/13206036299/job/36869762867

Failed during initialization of the cluster. It was setting up initial transfers:

2025-02-07T18:59:17.188381    cmd: ['/__w/spring/spring/build/bin/cleos', '--url', 'http://localhost:8788', '--wallet-url', 'http://localhost:9899', '--no-auto-keosd', 'push', 'action', '-j', 'eosio.token', 'transfer', '{"from":"eosio","to":"defproduceru","quantity":"1000000.0000 SYS","memo":"init transfer"}', '--permission', 'eosio@active']
2025-02-07T18:59:17.431208       cmd returned transaction id: 3ad5e20af753a2e9627b0220ea13aaf3b2fa80d7df5423eed1df77f8fe85a0a6, status: executed, (possible) block num: 117 

The transaction executed speculatively and into a block on the producer. However, the producer forked out the block. It forked out the block when it was no longer producing. Presumably the trx was sitting in the unapplied trx queue waiting to be put into the next produced block. However, the test timed out before that could happen.

@heifner
Copy link
Member Author

heifner commented Feb 12, 2025

Seems like the same scenario: https://github.com/AntelopeIO/spring/actions/runs/13289996228/job/37108918288
Test: read-only-trx-basic-if-test

@bhazzard
Copy link

Trivial to change to use transaction retry.

Also worth investigating why a fork happened in this test.

@bhazzard bhazzard added bug The product is not working as was intended. 👍 lgtm and removed triage labels Feb 13, 2025
@heifner heifner self-assigned this Feb 14, 2025
@heifner heifner added the OCI Work exclusive to OCI team label Feb 14, 2025
@heifner
Copy link
Member Author

heifner commented Feb 14, 2025

Why this test had a large fork

Producer received block 115 on time

debug 2025-02-07T18:59:10.852 net-1     controller.cpp:4004           verify_basic_proper_ ] received block: #115 2025-02-07T18:59:11.000 defproducerp 00000073e992cf19ed990bdc2acaa6f232156db355b74adc88fbfa5faf515cd4, qc claim: {"block_num":114,"is_strong_qc":true}, qc present, previous: 00000072598f534b44cbc679c0e90bf5b217d040e6df161874bcc4c6a1f55537

The main thread was busy speculatively executing a trx that didn't fit.

debug 2025-02-07T18:59:10.866 nodeos    producer_plugin.cpp:2588      handle_push_result   ] [TRX_TRACE] Speculative execution COULD NOT FIT, elapsed 413162us, tx: 1550266ed14a61c298bb4272e9964b2567b33aa312f9735559629983ca62c4ef RETRYING

The main thread then immediately restarts a speculative block without giving up the main thread.

debug 2025-02-07T18:59:10.866 nodeos    producer_plugin.cpp:968       restart_speculative_ ] Restarting exhausted speculative block `115`

The start block was not interrupted because the call into producer_plugin happened after the start of the block and presumably before it made it into the forkdb.

debug 2025-02-07T18:59:11.161 net-1     net_plugin.cpp:3981           operator()           ] validated block header, forkdb add appended_to_head, broadcasting immediately, connection - 6, blk num = 115, id = 00000073e992cf19ed990bdc2acaa6f232156db355b74adc88fbfa5faf515cd4

Meanwhile the processing of the trx that didn't fit before is on-going and block 116 came in:

debug 2025-02-07T18:59:11.189 net-0     controller.cpp:4004           verify_basic_proper_ ] received block: #116 2025-02-07T18:59:11.500 defproducerp 0000007422cdedcfc5f0bdf1c1510b13d152c64869ba4819b06dd538eaa55021, qc claim: {"block_num":115,"is_strong_qc":true}, qc present, previous: 00000073e992cf19ed990bdc2acaa6f232156db355b74adc88fbfa5faf515cd4

The speculative execution of the trx finally finishes:

debug 2025-02-07T18:59:11.460 nodeos    producer_plugin.cpp:2496      log_trx_results      ] [TRX_TRACE] Speculative execution is ACCEPTING tx: {"expiration":"2025-02-07T18:59:39","ref_block_num":110,"ref_block_prefix":3284311498,"max_net_usage_words":0,"max_cpu_usage_ms":0,"delay_sec":0,"context_free_actions":[{"account":"eosio.null","name":"nonce","authorization":[],"data":{"size":8,"hex":"32dd1ff4912d0600"},"hex_data":{"size":8,"hex":"32dd1ff4912d0600"}}],"actions":[{"account":"eosio","name":"setcode","authorization":[{"actor":"eosio","permission":"active"}],"code_hash":"d6572d6104f3559d4515c746f8d54effa11417ce45b9cfc64231e6af75c6dacc","data":{"account":"eosio","vmtype":0,"vmversion":0,"code":{"size":201591,"trimmed_hex":"0061736d0100000001e7023460000060027f7f0060037f7f7f0060017e0060037f7f7f017f60027f7f017f60027f7f017e6000017e60047e7e7e7e017f60067e"}},"hex_data":{"size":201604,"trimmed_hex":"0000000000ea30550000f7a60c0061736d0100000001e7023460000060027f7f0060037f7f7f0060017e0060037f7f7f017f60027f7f017f60027f7f017e6000"}},{"account":"eosio","name":"setabi","authorization":[{"actor":"eosio","permission":"active"}],"data":{"account":"eosio","abi":{"size":7158,"trimmed_hex":"0e656f73696f3a3a6162692f312e32011d425f706169725f74696d655f706f696e745f7365635f696e7436345f4519706169725f74696d655f706f696e745f73"}},"hex_data":{"size":7168,"trimmed_hex":"0000000000ea3055f6370e656f73696f3a3a6162692f312e32011d425f706169725f74696d655f706f696e745f7365635f696e7436345f4519706169725f7469"}}]}
debug 2025-02-07T18:59:11.547 nodeos    producer_plugin.cpp:2498      log_trx_results      ] [TRX_TRACE] Speculative execution is ACCEPTING tx: {"id":"1550266ed14a61c298bb4272e9964b2567b33aa312f9735559629983ca62c4ef","block_num":115,"block_time":"2025-02-07T18:59:11.000","producer_block_id":null,"receipt":{"status":"executed","cpu_usage_us":463415,"net_usage_words":7092},"elapsed":463415,"net_usage":56736,"scheduled":false,"action_traces":[{"action_ordinal":1,"creator_action_ordinal":0,"closest_unnotified_ancestor_action_ordinal":0,"receipt":{"receiver":"eosio.null","act_digest":"766ea17a960cdfa730bae6d9912d434d22d4c5a26c3b76e9f13b26a3108f8bd5","global_sequence":180,"recv_sequence":4,"auth_sequence":[],"code_sequence":0,"abi_sequence":0},"receiver":"eosio.null","act":{"account":"eosio.null","name":"nonce","authorization":[],"data":{"size":8,"hex":"32dd1ff4912d0600"},"hex_data":{"size":8,"hex":"32dd1ff4912d0600"}},"context_free":true,"elapsed":14,"console":"","trx_id":"1550266ed14a61c298bb4272e9964b2567b33aa312f9735559629983ca62c4ef","block_num":115,"block_time":"2025-02-07T18:59:11.000","producer_block_id":null,"account_ram_deltas":[],"except":null,"error_code":null,"return_value_hex_data":""},{"action_ordinal":2,"creator_action_ordinal":0,"closest_unnotified_ancestor_action_ordinal":0,"receipt":{"receiver":"eosio","act_digest":"a1ab8c9473b77e17c3f4ac8a72760ddee357f9414a3f5db9bd9af8f9e3721367","global_sequence":181,"recv_sequence":175,"auth_sequence":[["eosio",174]],"code_sequence":3,"abi_sequence":2},"receiver":"eosio","act":{"account":"eosio","name":"setcode","authorization":[{"actor":"eosio","permission":"active"}],"code_hash":"d6572d6104f3559d4515c746f8d54effa11417ce45b9cfc64231e6af75c6dacc","data":{"account":"eosio","vmtype":0,"vmversion":0,"code":{"size":201591,"trimmed_hex":"0061736d0100000001e7023460000060027f7f0060037f7f7f0060017e0060037f7f7f017f60027f7f017f60027f7f017e6000017e60047e7e7e7e017f60067e"}},"hex_data":{"size":201604,"trimmed_hex":"0000000000ea30550000f7a60c0061736d0100000001e7023460000060027f7f0060037f7f7f0060017e0060037f7f7f017f60027f7f017f60027f7f017e6000"}},"context_free":false,"elapsed":462201,"console":"","trx_id":"1550266ed14a61c298bb4272e9964b2567b33aa312f9735559629983ca62c4ef","block_num":115,"block_time":"2025-02-07T18:59:11.000","producer_block_id":null,"account_ram_deltas":[{"account":"eosio","delta":1562800}],"except":null,"error_code":null,"return_value_hex_data":""},{"action_ordinal":3,"creator_action_ordinal":0,"closest_unnotified_ancestor_action_ordinal":0,"receipt":{"receiver":"eosio","act_digest":"4879f36cd87a31abd335b71894e138a146f4663c09737bd2ca18b5e97a16f1b5","global_sequence":182,"recv_sequence":176,"auth_sequence":[["eosio",175]],"code_sequence":3,"abi_sequence":3},"receiver":"eosio","act":{"account":"eosio","name":"setabi","authorization":[{"actor":"eosio","permission":"active"}],"data":{"account":"eosio","abi":{"size":7158,"trimmed_hex":"0e656f73696f3a3a6162692f312e32011d425f706169725f74696d655f706f696e745f7365635f696e7436345f4519706169725f74696d655f706f696e745f73"}},"hex_data":{"size":7168,"trimmed_hex":"0000000000ea3055f6370e656f73696f3a3a6162692f312e32011d425f706169725f74696d655f706f696e745f7365635f696e7436345f4519706169725f7469"}},"context_free":false,"elapsed":355,"console":"","trx_id":"1550266ed14a61c298bb4272e9964b2567b33aa312f9735559629983ca62c4ef","block_num":115,"block_time":"2025-02-07T18:59:11.000","producer_block_id":null,"account_ram_deltas":[{"account":"eosio","delta":4621}],"except":null,"error_code":null,"return_value_hex_data":""}],"account_ram_delta":null,"except":null,"error_code":null}
debug 2025-02-07T18:59:11.547 nodeos    net_plugin.cpp:3892           operator()           ] chain accepted transaction, bcast 1550266ed14a61c298bb4272e9964b2567b33aa312f9735559629983ca62c4ef
debug 2025-02-07T18:59:11.547 nodeos    producer_plugin.cpp:2806      process_incoming_trx ] Processed 1 pending transactions, 0 left

Now it is time for this node to produce.

debug 2025-02-07T18:59:11.547 nodeos    producer_plugin.cpp:2869      schedule_production_ ] Speculative Block Created; Scheduling Speculative/Production Change
debug 2025-02-07T18:59:11.547 nodeos    producer_plugin.cpp:2923      schedule_delayed_pro ] Scheduling Speculative/Production Change at 2025-02-07T18:59:11.380

When we produce we do not apply any outstanding blocks we just start producing. This is so production is not delayed processing late blocks. Since 114 is the last applied block, the producer produces its own 115.

debug 2025-02-07T18:59:11.547 nodeos    producer_plugin.cpp:2231      start_block          ] Starting block #115 2025-02-07T18:59:12.000 producer defproducerq, deadline 2025-02-07T18:59:11.962
debug 2025-02-07T18:59:11.547 nodeos    controller.cpp:4625           clear_expired_input_ ] removed 0 expired transactions of the 14 input dedup list, pending block time 2025-02-07T18:59:12.000

This will of course cause a fork. Also interesting enough, liveness then fails on our own block.

debug 2025-02-07T18:59:12.059 chain-0   finalizer.cpp:40              decide_vote          ] liveness check failed, block 115 00000073f7ed960f6f5019ae5e92d86242385756c8e85a04e0a55435c64df067: 2025-02-07T18:59:10.500 <= 2025-02-07T18:59:10.500, fsi.lock 114 00000072598f534b44cbc679c0e90bf5b217d040e6df161874bcc4c6a1f55537, latest_qc_claim: {"block_num":114,"is_strong_qc":true}
debug 2025-02-07T18:59:12.059 chain-0   finalizer.cpp:105             decide_vote          ] block=115 00000073f7ed960f6f5019ae5e92d86242385756c8e85a04e0a55435c64df067, liveness=false, safety=true, can vote=true, voting=weak_vote, locked=114 00000072598f534b44cbc679c0e90bf5b217d040e6df161874bcc4c6a1f55537

This continues as the producer tries to produce a full 60 blocks off of a block that it can't nor anyone else can lock on. This is 60 blocks instead of just 12 because the node has five configured producers (defproducerq, defproducerr, defproducers, defproducert, defproduceru). It actually stops producing because it stops getting votes:

error 2025-02-07T18:59:18.920 nodeos    producer_plugin.cpp:2088      determine_pending_bl ] Not producing block because no recent votes received from others, block 2025-02-07T18:59:19.500, other votes 2025-02-07T18:59:11.510, conflict block 2025-02-07T18:59:12.907

Once it stops producing it can then process blocks and do a fork switch.

info  2025-02-07T18:59:18.920 nodeos    controller.cpp:4403           operator()           ] switching forks from 00000081a573d6c067b95ab98fde8cdfdba1f81c6cd9e23f1d4e68b89ef34ee0 (block number 129) [ latest_qc_block_timestamp: 2025-02-07T18:59:10.500, timestamp: 2025-02-07T18:59:19.000, id: 00000081a573d6c067b95ab98fde8cdfdba1f81c6cd9e23f1d4e68b89ef34ee0 ] to 0000007422cdedcfc5f0bdf1c1510b13d152c64869ba4819b06dd538eaa55021 (block number 116) [ latest_qc_block_timestamp: 2025-02-07T18:59:11.000, timestamp: 2025-02-07T18:59:11.500, id: 0000007422cdedcfc5f0bdf1c1510b13d152c64869ba4819b06dd538eaa55021 ]

heifner added a commit that referenced this issue Feb 14, 2025
…n integration tests to match eos mainnet value
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug The product is not working as was intended. 👍 lgtm OCI Work exclusive to OCI team
Projects
Status: Todo
Development

No branches or pull requests

3 participants