-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics for managed resources count #4031
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: oliviassss The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
83dec2e
to
a487343
Compare
main.go
Outdated
select { | ||
case <-ticker.C: | ||
// Update managed resource metrics | ||
err := lbcMetricsCollector.UpdateManagedK8sResourceMetrics(context.Background()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
collecting all these resources during the same tick might lead to sparse metrics. I would suggest a ticker per resource to improve performance and metric reliability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense, will do 3 tickers - 1 for k8s resources, 1 for ALB and 1 for NLB. Just in case the API call has latency, but it should be rare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zac-nixon Hi, I though twice but decided to keep them in the same ticker, because I'd like to have all the metrics to be updated in one loop. Though I increased the ticker to 2min, to reduce unnecessary calls, as we don't expect a super timely metrics. Also added a TODO to update the metrics per reconciliation.
a487343
to
9700d0e
Compare
9700d0e
to
4e1b9f3
Compare
}, | ||
}, | ||
} | ||
resources, err := c.rgt.GetResourcesAsList(ctx, req) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should not call AWS APIs to get those counter metrics. This can cause significant performance impact when there are large amount of LBs.
It shall be technical possible to get the number of LBs managed by the controller without using k8s apis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I used RGT API call, so it's just 1 api call every 2min. But I do agree it's better to update the metrics per CRUD event. let me double check and get back.
@@ -202,6 +210,28 @@ func main() { | |||
deferredTGBQueue.Run() | |||
}() | |||
|
|||
// TODO: we can better improve this to update the metrics per reconcile | |||
go func() { | |||
ticker := time.NewTicker(2 * time.Minute) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not flexible and we don't be able to get real-time metrics..
i think we should trigger this when there are service/ingress/ingressGroup events happens.
for example, trigger a function in metricsCollector from within ingressGroupController.
e.g. to track the number of ALBs:
(note: just a naive thought, there could be edge cases like manually removed finalizer that shall be handled properly)
inject a lbMetricsCollector
within ingressGroupController, when
- when it reconciles an ingressGroup, it calls lbMetricsCollector.trackIngressGroup(group's name, maybe some other params necessary like active members)
- after it successfully deleted ingressGroup, it calls lbMetricsCollector.untrackIngressGroup(group's name)
Then lbMetricsCollector shall have a correct view of number of currently managed ingressGroups at realtime.
Issue
Description
This PR adds the metrics of managed resources count, for ingress, service type of load balancer (nlb), targetgroupbinding. And get the count of AWS resources like ALB and NLB via resourcegrouptagging API. Adds the
tag:GetResources
iam policy in the template.Test
Created 4 ingresses (2 of them are in the same ingress group), and 2 service type of load balancer (nlb). Verified the prometheus metrics are as expected
In the metrics I have:
Checklist
README.md
, or thedocs
directory)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯