Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix extremely slow tests #403

Open
anorth opened this issue May 25, 2022 · 18 comments
Open

Fix extremely slow tests #403

anorth opened this issue May 25, 2022 · 18 comments
Labels
good first issue Good for newcomers help wanted Extra attention is needed P2 testing

Comments

@anorth
Copy link
Member

anorth commented May 25, 2022

Some of the tests ported from specs-actors run very slowly, taking minutes. This is too slow for a dev cycle except as a final pre-PR or CI run. I haven't investigated why they're slow. I would expect that almost all unit tests are very fast, though there may be some scenario tests that are slower.

  • Speed up tests where possible (better faking for epochs passing?)
  • Put necessarily slow tests behind an opt-in flag (and ideally a Makefile target that's easy to remember)

Timing information on my laptop are here (download the file to view in browser).

@vmx
Copy link
Contributor

vmx commented May 25, 2022

One workaround is to run tests in release mode cargo test --release.

@LesnyRumcajs
Copy link
Contributor

LesnyRumcajs commented May 26, 2022

@vmx It's a good idea but I'd be careful with adding only --release flag. It disables a bunch of useful assertions and checks (e.g. overflow). We could go ahead with customizing the test profile though, for example putting

[profile.test]
opt-level = 2

By the way, could you share how did you obtain such nicely formatted results?

@vmx
Copy link
Contributor

vmx commented May 27, 2022

@vmx It's a good idea but I'd be careful with adding only --release flag. It disables a bunch of useful assertions and checks (e.g. overflow).

Yes, one has to be careful. Though I guess CI still runs in non-release mode, so those errors would be caught there.

@arajasek
Copy link
Contributor

Yeah, this needs to be actioned on -- I think one immediate action might be to see if the VM can be passed by reference in the itest harness.

@LesnyRumcajs
Copy link
Contributor

I've just checked, for the moment (at least on my machine) adding optimization flags will not significantly cut the testing stage time because of the increased compilation overhead.

Clean build & test with opt-level = 1 takes ~240s on my machine while regular debug build & test takes 280s. opt-level = 2 takes 286s.

@Stebalien
Copy link
Member

This is starting to get really annoying. CI can take 30m. One solution would be to run different test types in different CI jobs so we can get some parallelism.

Yeah, this needs to be actioned on -- I think one immediate action might be to see if the VM can be passed by reference in the itest harness.

I worked with @arajasek on this, but it didn't help. We'll probably need to profile a bit to figure out what's actually slowing us down here.

@anorth
Copy link
Member Author

anorth commented Aug 11, 2022

From @Kubuxu (#493)

Here is a list of slowest tests:

"extend_sector_with_deals",408.601976852
"sector_assignment::assign_sectors_to_deadlines",107.120695873
"deal_starts_on_day_boundary",85.341671754
"overdue_precommit",54.788396499
"aggregate_one_precommit_expires",54.227713798
"deal_is_correctly_processed_if_first_cron_after_expiry",39.801908073999996
"all_payments_are_made_for_a_deal_client_withdraws_collateral_and_client_account_is_removed",39.76637831
"regular_payments_till_deal_expires_and_then_locked_funds_are_unlocked",31.483729491
"expired_deal_should_unlock_the_remaining_client_and_provider_locked_balance_after_payment_and_deal_should_be_deleted",31.43561346
"payment_for_a_deal_if_deal_is_already_expired_before_a_cron_tick",30.252185392
"locked_fund_tracking_states",29.044090355
"regular_payments_till_deal_expires_and_then_we_attempt_to_slash_it_but_it_will_not_be_slashed",27.649656669
"sector_number_allocation::compaction_with_mask",27.548936622
"deal_is_slashed_at_the_end_epoch_should_not_be_slashed_and_should_be_considered_expired",27.483643367
"cron_enrolls_on_precommit_prove_commits_and_continues_enrolling",15.796063935
"deals_are_scheduled_for_expiry_later_than_the_end_epoch",15.711722069
"deal_is_processed_after_its_end_epoch_should_expire_correctly",15.59722642
"terminate_valid_deals_along_with_expired_and_cleaned_up_deal",15.436974767
"prove_replica_update_multi_dline",14.614798868
"deal_starts_partway_through_day",14.267301481
"replica_update_full_path_success",9.688842753
"unhealthy_sector_failure",7.010497463
"logic::tests::test_baseline_reward_growth",5.832560179
"upgrade_and_miss_post",5.727294328
"bad_post_upgrade_dispute",5.108202526
"immutable_deadline_failure",4.865702177
"terminate_sectors",4.352078482
"upgrade_bad_post_dispute",3.818503731
"deal_included_in_multiple_sectors_failure",1.839123565
"cron_enrolls_on_precommit_expires_on_pcd_expiration_re_enrolls_on_new_precommit_after_falling_out_of_date",1.821081602
"enroll_pcd_expire_re_enroll_x_3",1.809942304
"terminated_sector_failure",1.66520403
"terminate_after_upgrade",1.625922842
"wrong_deadline_index_failure",1.579381272
"no_dispute_after_upgrade",1.574299084
"replica_update_simple_path_success",1.573051465
"extend_after_upgrade",1.565043908
"aggregate_size_limits",1.564398283
"can_dispute_test_after_proving_period_changes",1.554922924
"wrong_partition_index_failure",1.553842959
"bad_batch_size_failure",1.5160462689999998
"miner_actor_test_partitions::test_max_sectors",1.497599348
"submit_post_succeeds",1.23869197
"skipping_a_fault_from_the_wrong_partition_is_an_error",1.194715234
"skip_sector",1.146965282
"duplicate_proof_rejected_with_many_partitions",1.117239538
"flipped_signs",1.008091481
"missed_first_post_deadline",0.917265732
"cron_enrolls_on_precommit_expires_on_pcd_expiration_re_enrolls_on_new_precommit_immediately",0.905557911
"batch_onboarding",0.847320299
"deal_is_slashed",0.809300361
"regular_payments_till_deal_is_slashed_and_then_slashing_is_processed",0.660549269
"fail_when_deal_update_epoch_is_in_the_future",0.463788053
"psd_duplicate_deal_in_state",0.43923798
"deal_payment_and_slashing_correctly_processed_in_same_crontick",0.428351955
"psd_all_deals_are_good",0.425141336
"bad_post_fails_when_verified",0.407585023
"psd_duplicate_deal_in_batch",0.406242259
"psd_not_enough_provider_lockup_for_batch",0.403334628
"psd_mistmatched_provider",0.39482613
"psd_bad_piece_size",0.384623666
"psd_client_address_cannot_be_resolved",0.382464756
"psd_no_client_lockup",0.381657817
"psd_all_deals_are_bad",0.376180503
"aggregate_bad_sector_number",0.360074666
"values_in_range",0.351792002
"psd_verified_deal_fails_getting_datacap",0.347774531
"aggregate_bad_sender",0.320253617
"fail_when_deal_is_activated_but_proposal_is_not_found",0.312669436
"slash_multiple_deals_in_the_same_epoch",0.299785649
"test_delete_self",0.291632451
"skipped_recoveries_are_penalized_and_do_not_recover_power",0.29039561
"psd_random_assortment_of_failures",0.289096722
"slash_a_deal_and_make_payment_for_another_deal_in_the_same_epoch",0.283034494
"detects_and_penalizes_faults",0.27317561
"skipped_faults_adjust_power",0.270810576
"successful_recoveries_recover_power",0.259472953
"psd_not_enought_client_lockup_for_batch",0.254026276
"internal_tests::test_assign_proving_period_boundary",0.246418096
"psd_start_time_in_past",0.234921258
"compacting_a_partition_with_both_live_and_dead_sectors_removes_dead_sectors_retains_live_sectors",0.195819283
"submit_porep_for_bulk_verify_tests::aborts_when_too_many_poreps",0.176729618
"cron_processing_happens_at_processing_epoch_not_start_epoch",0.167896989
"timed_out_and_verified_deals_are_slashed_deleted_and_sent_to_the_registry_actor",0.15582935
"test_cron_run_trigger_faults",0.151840195
"declare_fault_pays_fee_at_window_post",0.14969636
"timed_out_deal_is_slashed_and_deleted",0.148827262
"miner_actor_precommit_batch::max_sectors",0.135992206
"miner_tests::withdraw_all_funds",0.1198387
"test_cron_tick",0.117247472
"fail_to_compact_partitions_with_faults",0.113431358
"miner_tests::withdraw_0",0.112585182
"swap_self_2_of_3",0.109880236
"remove_datacap_simple_successful_path",0.109312704
"miner_tests::withdraw_as_much_as_possible",0.104859643
"both_positive_velocity",0.097876937
"miner_tests::withdraw_from_non_owner_address_fails",0.089522658
"cron_processing_of_deal_after_missed_activation_should_fail_and_slash",0.084379711
"swap_self_1_of_2",0.08121591
"test_proposal_hash",0.076959189
"market_tests::withdraw_as_much_as_possible",0.076596128
"logic::tests::test_baseline_reward",0.069829036
"market_tests::withdraw_0",0.067188339
"publish_multiple_deals_for_different_clients_and_ensure_balances_are_correct",0.065155427
"updates_many_sectors",0.060939021
"active_deals_multiple_times_with_different_providers",0.057608943
"recovery_must_pay_back_fee_debt",0.057047111
"prove_commit_just_after_period_start_permits_post",0.056381252
"skipping_all_sectors_in_a_partition_rejected",0.056113036
"test_sent",0.055694886
"supports_extensions_off_deadline_boundary",0.055249106
"market_tests::withdraw_all_funds",0.054921439
"cannot_terminate_a_sector_when_the_challenge_window_is_open",0.053886462
"fails_to_compact_partitions_with_unproven_sectors",0.051425514
"create_miner_test",0.047144058
"logic::tests::test_simple_reward",0.046215533
"miner_actor_precommit_batch::thirty_two_sectors",0.037139229
"basic_post_and_dispute",0.03700262
"compute_data_commitment::fail_whole_call_when_one_commitment_fails_syscall",0.036692618
"miner_actor_test_commitment::fails_with_too_many_deals",0.036385177
"compute_data_commitment::successfully_compute_cid",0.03636072
"terminate_multiple_deals_from_multiple_providers",0.036242004
"duplicate_proof_rejected",0.033520295
"post_with_unproven_faults_recoveries_untracted_recoveries",0.032581953
"compute_data_commitment::success_with_multiple_sector_commitments",0.031248236
"retract_recoveries",0.030125561
"prove_commit_aborts_if_pledge_requirement_not_met",0.027909083
"removes_sector_with_correct_accounting",0.027601379
"post_all_the_things",0.027241568
"valid_precommits_then_aggregate_provecommit",0.026959774
"sector_expires_and_repays_fee_debt",0.026423234
"invalid_submissions",0.025706291
"test_vesting",0.025028847
"sector_with_non_positive_lifetime_is_skipped_in_confirmation",0.024431773
"pops_early_terminations",0.023968015
"prove_sectors_from_batch_pre_commit",0.02380283
"post_with_skipped_unproven",0.023802528
"removing_no_partitions_does_nothing",0.023793659
"updates_expiration_with_valid_params",0.023618485
"recovery_happy_path",0.023595964
"sector_expires",0.023523625
"removes_partitions",0.023315335
"rejects_extension_past_max_for_seal_proof",0.023313743
"cannot_remove_missing_partition",0.023166014
"faulty_sectors_expire",0.023065424
"recovery_fails_during_active_consensus_fault",0.02292687
"rejects_extension_too_far_in_future",0.022611978
"terminate_proven_and_faulty",0.022534712
"rejects_negative_extensions",0.022267024
"drop_invalid_prove_commit_while_processing_valid_one",0.021947018
"can_pop_early_terminations_in_multiple_steps",0.021898679
"can_dispute_up_till_window_end_but_not_after",0.021844348
"compact_sector_numbers_test::one_of_the_control_addresses_can_also_compact_sectors",0.021815652
"compact_sector_numbers_test::fail_if_caller_is_not_among_caller_worker_or_control_addresses",0.021550243
"invalid_proof_rejected",0.021485488
"terminate_valid_deals_along_with_just_expired_deal",0.021335656
"terminating_new_deals_and_an_already_terminated_deal_only_terminates_the_new_deals",0.020584189
"publish_a_deal_after_activating_a_previous_deal_which_has_a_start_epoch_far_in_the_future",0.019467945
"compact_sector_numbers_test::compact_sector_numbers_then_pre_commit",0.019385706
"cannot_dispute_posts_when_the_challenge_window_is_open",0.01916853
"compute_data_commitment::fail_whole_call_when_one_deal_proposal_of_one_sector_is_absent",0.018845698
"fail_to_activate_all_deals_if_one_deal_fails",0.01877438
"cannot_declare_faults_recovered_in_missing_partitions",0.017621862
"fail_when_deal_has_already_been_activated",0.017393475
"terminate_unproven_and_faulty",0.017157675
"compute_data_commitment::fail_when_syscall_returns_an_error",0.017106803
"marks_faulty",0.016783825
"simple_deal",0.016434714
"funds_vest",0.015771024
"balance_after_withdrawal_must_always_be_greater_than_or_equal_to_locked_amount",0.015595524
"penalty_is_partially_burnt_and_stored_as_fee_debt",0.015582614
"terminates_sectors",0.0150607
"compact_sector_numbers_test::owner_can_also_compact_sectors",0.015031347
"fail_when_caller_is_not_the_provider_of_the_deal",0.014589711
"successfully_check_sector_is_proven",0.014573504
"do_not_terminate_deal_if_end_epoch_is_equal_to_or_less_than_current_epoch",0.014516684
"worker_balance_after_withdrawal_must_account_for_slashed_funds",0.014432014
"compaction_should_be_allowed_following_the_dispute_window",0.014101064
"deadline_after_next_deadline_should_still_be_open_for_compaction",0.013983766
"fails_if_new_worker_address_does_not_have_a_code",0.013933126
"post_missing_partition",0.013858885
"miner_actor_test_commitment::invalid_pre_commit_rejected",0.013811814
"fails_if_worker_public_key_is_not_bls_but_secp",0.013708832
"change_cannot_be_overridden",0.013633052
"fails_if_unable_to_resolve_control_address",0.013421356
"crontick_for_a_deal_at_its_start_epoch_results_in_zero_payment_and_no_slashing",0.013206507
"miner_actor_test_commitment::precommit_checks_seal_proof_version",0.013156523
"cannot_declare_faults_in_missing_partitions",0.013133594
"post_partition_twice",0.013059723
"deadlines_challenged_last_proving_period_should_still_be_in_dispute_window",0.013010106
"miner_actor_precommit_batch::too_many_sectors",0.013007114
"only_nominee_can_confirm",0.012961733
"only_owner_can_change_proposal",0.012822609
"miner_actor_test_commitment::insufficient_funds_for_pre_commit",0.012813008
"adds_sectors_and_proves",0.012764153
"compaction_should_be_forbidden_during_the_dispute_window",0.012744799
"test_can_clear_peer_id",0.012712616
"publishing_timed_out_deal_again_should_work_after_cron_tick_as_it_should_no_longer_be_pending",0.012660165
"can_set_clear_multiaddrs",0.012583418
"can_set_multiaddrs",0.012470317```

@anorth anorth mentioned this issue Aug 11, 2022
@anorth anorth moved this to Todo in Network nv17 Aug 11, 2022
@anorth anorth moved this from Todo / In Scope to Opportunistic in Network nv17 Aug 12, 2022
@anorth anorth changed the title Some tests are very slow Fix extremely slow tests Aug 12, 2022
@lyswifter
Copy link
Contributor

lyswifter commented Aug 18, 2022

@anorth What I figure out with some logs for the slowest test function gives two aspect reasons to show:

  1. each EpochTick(the CronMethod) took too long time
    image

  2. there exists too many epochs to tigger the cron method, this caused each of the outer advance_by_deadline function took more than 1.5s.
    image

As its hardly to speed up the cron method, maybe is there any way to reduce the loop times with no effect to the testing purpose.

@jennijuju jennijuju moved this from Opportunistic to In Progress in Network nv17 Aug 18, 2022
@Stebalien
Copy link
Member

maybe is there any way to reduce the loop times with no effect to the testing purpose.

Unfortunately, that's really tricky. We might be able to just skip a bunch of cron ticks, but the system doesn't expect that and it probably won't work.

@Kubuxu
Copy link
Contributor

Kubuxu commented Aug 19, 2022

Unfortunately, that's really tricky. We might be able to just skip a bunch of cron ticks, but the system doesn't expect that and it probably won't work.

Don't we implement cron catchup? If so it should be safe enough to skip in increments of 15-20 epochs or maybe even more.

@anorth
Copy link
Member Author

anorth commented Aug 21, 2022

Don't we implement cron catchup?

Some of the code may rely on the every-epoch behaviour. I know I advocated for simplifying to do so once that behaviour was reliable.

@jennijuju
Copy link
Member

@Kubuxu / @anorth
should we give _If so it should be safe enough to skip in increments of 15-20 epochs _ a try?

@Stebalien
Copy link
Member

I don't think skipping epochs is a viable approach, but we may be able to at least optimize this behavior.

@Stebalien
Copy link
Member

We could also look into:

  1. Making the specific tests take fewer epochs. They can likely be optimized.
  2. Running things in parallel.

@Stebalien Stebalien added P0 and removed P2 labels Aug 29, 2022
@Kubuxu
Copy link
Contributor

Kubuxu commented Aug 29, 2022

I was able to save 40% with #585 but I still think there is something wrong with the test.

Edit: I got another 32% (total 56% improvement) with an additional commit.

@Kubuxu
Copy link
Contributor

Kubuxu commented Aug 29, 2022

I'm attaching a flamegraph of the extend_sector_with_deals after #585

flamegraph.svg.zip

@Kubuxu Kubuxu removed their assignment Aug 30, 2022
@Kubuxu
Copy link
Contributor

Kubuxu commented Aug 30, 2022

@jennijuju I wasn't planning to work on it much more right now. Someone still needs to analyse why the test is so slow, there has to be another underlying reason.

Also, we already skip across epochs, we call cron at the end of the deadline and then at the start of the deadline.

@jennijuju jennijuju moved this from In Progress to Opportunistic in Network nv17 Sep 8, 2022
@jennijuju jennijuju added P2 and removed P0 labels Nov 25, 2022
@jennijuju jennijuju removed this from Network nv17 Nov 25, 2022
@jennijuju jennijuju moved this to 📋 Backlog in Network v18 Nov 25, 2022
@anorth anorth added testing and removed area/test labels Mar 30, 2023
@rjan90 rjan90 added this to FilOz Sep 17, 2024
@github-project-automation github-project-automation bot moved this to 📌 Triage in FilOz Sep 17, 2024
@rjan90 rjan90 moved this from 📌 Triage to 🐱 Todo in FilOz Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed P2 testing
Projects
Status: 🐱 Todo
Status: 📋 Nice To Haves
Development

No branches or pull requests

8 participants