-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCIP-3461: Optimize mainnet soak test #1458
Conversation
Hey @matYang @kalverra @AnieeG @emate @andrevmatos @mateusz-sekara, Could you all please take a look at this PR and share your feedback? |
@@ -133,8 +135,8 @@ jobs: | |||
matrix: | |||
config: [mainnet.toml] | |||
needs: [ build-chainlink, build-test-image ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep any of the load test options around if this is fully converting to just smoke? Is this used for other purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if it's only smoke now, it might make sense to run this directly in github action instead of remote runner, It will save a lot of time reducing the step for building the test image, Also no need for K8 env for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, I left that as is to have capability available and we can run it whenever we need to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
] | ||
|
||
BiDirectionalLane = true | ||
PhaseTimeout = '20m' | ||
PhaseTimeout = '40m' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phase timeouts varies widely between lanes. I think we need a better solution to define this per lane. This 20m timeout works for fastest lanes but fails to wait for most of the lanes. Even with this 40m update, still there may be lanes which takes more time than that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to report those lanes.
Why not update the cron schedule as well? |
I think we can keep this as is and can reduce it if needed. |
Quality Gate passedIssues Measures |
Motivation
Optimize the resource and fund consumption of the mainnet soak test. Currently, the mainnet soak test runs every six hours in Kubernetes, executing one CCIP transaction per hour for a duration of five hours across 28 bidirectional lanes. This setup is designed to provide consistent observability data on the CCIP mainnet, allowing us to differentiate between service outages and quiet periods. However, these tests are inefficient, consuming Kubernetes resources and significant mainnet funds.
https://smartcontract-it.atlassian.net/browse/CCIP-3461
Ideas:
Discussion thread is initiated with o11y team and decided that it's not required to create txs for every hour instead create one tx if there are no activity for last 24h.
Convert to smoke test as we are planning to fire only one request instead of Soak test.
Converting to smoke will elevate the K8 resource consumption as the test will run using github runner.
Solution
Key outcomes:
Present transaction count: 1tx * (28 * 2)lanes * 24hrs = 1344
After this change: 1tx * (49 * 2)lanes = 98 which is close to 92% reduction with additional 42 lanes coverage.
As per the last analysis on the cost, we spend around 84k per quarter.
I expect the fund reduction close to 90% which will give saving close to 300k annually.