Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CON-11819 spread sync of load balancers #802

Merged
merged 6 commits into from
Feb 14, 2025

Conversation

llDrLove
Copy link
Contributor

@llDrLove llDrLove commented Feb 14, 2025

Spread the sync period of the loadbalancer unevenly to prevent a spike of loadbalancer API calls at the same time. We saw an issue when multiple clusters see their maintenance window happening at the same time and the CCM component is restarted at the same time. The sync tag period will match closely for all those CCM pods and all of those clusters will make a LIST api calls relatively close to each other resulting in spikes each 15minutes. This PR attempts to mitigate this issue by adding some randomness to the sync period (5minutes) as an initial delay. Thus, the new sync period will remain 15 minutes but the subsequent sync tag will be off between 0 to 300 seconds.

@llDrLove llDrLove self-assigned this Feb 14, 2025
@llDrLove llDrLove changed the title Con 11819 spread sync of load balancers CON-11819 spread sync of load balancers Feb 14, 2025
@@ -85,6 +86,14 @@ func (s *tickerSyncer) Sync(name string, period time.Duration, stopCh <-chan str
klog.Errorf("%s failed: %s", name, err)
}

initialDelayTicker := time.NewTicker(initialDelay)
defer initialDelayTicker.Stop()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) could use time.After() https://pkg.go.dev/time#After instead of ticker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using time.After() instead. Thanks

@@ -126,7 +135,7 @@ func (r *ResourcesController) Run(stopCh <-chan struct{}) {
klog.Info("No cluster ID configured -- skipping cluster dependent syncers.")
return
}
go r.syncer.Sync("tags syncer", controllerSyncTagsPeriod, stopCh, r.syncTags)
go r.syncer.Sync("tags syncer", controllerSyncTagsPeriod, time.Second*time.Duration(rand.Int31n(300)), stopCh, r.syncTags)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that it's initial delay, maybe let's increase the spread to ~10 min?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bumped to 600 (10minutes)

@llDrLove llDrLove merged commit ff62883 into master Feb 14, 2025
3 checks passed
@llDrLove llDrLove deleted the CON-11819-spread-sync-of-load-balancers branch February 14, 2025 18:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants