Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clustering traffic admission control #2970

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

thampiotr
Copy link
Contributor

@thampiotr thampiotr commented Mar 12, 2025

PR Description

Added --cluster.wait-for-size and --cluster.wait-timeout flags which allow to specify the minimum cluster size
required before components that use clustering begin processing traffic to ensure adequate cluster capacity is
available.

Extended the existing tests, including the e2e tests added previously.

Which issue(s) this PR fixes

Fixes #201

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated
  • Documentation added
  • Tests updated
  • Config converters updated

@thampiotr thampiotr force-pushed the thampiotr/clustering-traffic-admission-control branch from 76ddbff to 2bb4e5b Compare March 26, 2025 15:13
Copy link
Contributor

github-actions bot commented Mar 27, 2025

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to rename this file to cluster.go and the cluster.go -> service.go as it makes more sense, but it messes up the diff a lot, so I will leave this for later.

@thampiotr thampiotr marked this pull request as ready for review March 27, 2025 15:15
@thampiotr thampiotr requested review from clayton-cornell and a team as code owners March 27, 2025 15:15
Copy link
Contributor Author

@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ready for feedback, I have just a few small things to address in the meantime.

@dehaansa
Copy link
Contributor

In cases with very large k8s clusters for example, where discovery is immense and costly, I think we would want to wait to even do discovery until after cluster has converged.

Do you agree? Should there be an additional option to wait for cluster converge before doing any work?

@thampiotr
Copy link
Contributor Author

thampiotr commented Mar 31, 2025

In cases with very large k8s clusters for example, where discovery is immense and costly, I think we would want to wait to even do discovery until after cluster has converged.

Do you agree? Should there be an additional option to wait for cluster converge before doing any work?

The discovery is the same for each instance, regardless of number of instances in the cluster, so stopping them from doing work will not improve things. Every instance needs to be able to handle the entire cluster discovery in current architecture. This PR follows the design that I shared here where we explicitly said that

we scope this behaviour only to components that support clustering. Other components will run as usual.

I think scaling discovery is a separate problem to address (and some plans on what to do are here) and once we have it sharded in some way we can definitely include the min cluster requirements to it in the future.

@thampiotr
Copy link
Contributor Author

I will still need to add the following, as per design:

Debug info should ideally explain the situation to the user in the UI.
The clustering overview dashboard should show clearly when the minimum size is not met. May need to add metrics for it.

I would like to do this in a follow-up PR.

@thampiotr thampiotr force-pushed the thampiotr/clustering-traffic-admission-control branch from fa2747c to 50df532 Compare March 31, 2025 12:54
@clayton-cornell clayton-cornell added the type/docs Docs Squad label across all Grafana Labs repos label Mar 31, 2025
@thampiotr thampiotr requested a review from a team as a code owner April 1, 2025 14:13
@@ -269,62 +299,16 @@ func (s *Service) Run(ctx context.Context, host service.Host) error {
ctx, cancel := context.WithCancel(ctx)
defer cancel()

limiter := rate.NewLimiter(rate.Every(stateUpdateMinInterval), 1)
s.node.Observe(ckit.FuncObserver(func(peers []peer.Peer) (reregister bool) {
tracer := s.tracer.Tracer("")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic was moved below to a goroutine that handles dispatching cluster change notifications. This is because when cluster minimum size timer expires we also need to dispatch cluster change notifications, even though the peers in ckit didn't change.

Comment on lines 208 to 209
// Set the gauge to the configured minimum cluster size
minClusterSizeGauge.Set(float64(opts.MinimumClusterSize))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding this static metric so we can clearly show on dashboards when we're below minimum. Will make dashboards changes in a follow-up PR.

@thampiotr thampiotr force-pushed the thampiotr/clustering-traffic-admission-control branch from 643510b to dcc03cd Compare April 1, 2025 16:36
Comment on lines +76 to +77
// slow components can currently lead to timeouts and communication errors
// TODO: consider decoupling cluster operations from runtime/components performance
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually have done this by putting the dispatching of cluster updates on a separate goroutine. TODO: remove this comment.

_, subSpan := tracer.Start(spanCtx, "NotifyClusterChange", trace.WithSpanKind(trace.SpanKindInternal))
subSpan.SetAttributes(attribute.String("component_id", comp.ID.String()))

clusterComponent.NotifyClusterChange()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this fire in a goroutine? To ensure all components get notified close to simultaneously?

"minimum_cluster_size", c.opts.MinimumClusterSize,
"peers_count", len(c.sharder.Peers()),
)
c.clusterChangeCallback()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels off that the callback is called before releasing the lock, but maybe I'm thinking about it wrong.

span.SetAttributes(attribute.Int("minimum_cluster_size", s.opts.MinimumClusterSize))

// Notify all components about the clustering change.
components := component.GetAllComponents(host, component.InfoOptions{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to think about this change -

  1. We get a new callback on node.Observe() from ckit. This triggers a notification.
  2. The notification calls all cluster aware components' NotifyClusterChange(). This triggers a Ready() call in most/all cluster aware components.
  3. The first cluster aware component (dependent on limiter) triggers a relevant stateChange if there is one.
  4. The state change triggers a notification, which calls all cluster aware components' NotifyClusterChange()
  5. All? Cluster aware components hit the limiter.

Does this sound right? Something here feels off, if I'm understanding it correctly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/docs Docs Squad label across all Grafana Labs repos
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Proposal: Only attempt to send metrics after joining a sufficiently large cluster
3 participants