Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(loki/src/k8s_events): correctly update health state for the component during operation #6385

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

hainenber
Copy link
Contributor

PR Description

Which issue(s) this PR fixes

Fixes grafana/alloy#274

Notes to the Reviewer

PR Checklist

  • CHANGELOG.md updated

@hainenber hainenber force-pushed the correct-health-state-for-loki_source_kubernetes_events branch 2 times, most recently from 85b7a36 to 7a73fc9 Compare February 17, 2024 08:17
@@ -152,6 +156,7 @@ func (c *Component) Run(ctx context.Context) error {
c.tasksMut.RUnlock()

if err := c.runner.ApplyTasks(ctx, tasks); err != nil {
c.setHealth(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this solves the issue in #6181 entirely.

ApplyTasks will return an error if the runner is closed, or if the context is already cancelled/exceeded via ctx.Err; we don't have a way to propagate errors directly from the underlying tasks as the worker interface is Run(ctx context.Context).

To make this happen we'd need to have a goroutine-safe mechanism of bubbling up the errors and sieving through them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so my approach is to change the worker interface as there aren't many implementations yet.

The ApplyTask function will now collect returned errors and store within the caller's struct.
 
I add a new method for Worker interface to get the bubbled-up errors, GetWorkerErrors to help setting the component's health properly.

PTAL if you have any concern re: goroutine-safe aspect. Thanks!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed my approach and now choose one with smaller blast radius. That is, the eventController retain any error when executing Run method and there's a recurring actor in the component to set health based on the returned results.

Not exactly ideal since the component's health is not reflected right away but I think it's a worthwhile tradeoff

@hainenber hainenber force-pushed the correct-health-state-for-loki_source_kubernetes_events branch from c184e12 to 61693df Compare February 29, 2024 16:43
Copy link
Contributor

github-actions bot commented Apr 6, 2024

This PR has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If you do not have enough time to follow up on this PR or you think it's no longer relevant, consider closing it.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your PR will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@github-actions github-actions bot added the needs-attention An issue or PR has been sitting around and needs attention. label Apr 6, 2024
@rfratto rfratto added variant/flow Relatd to Grafana Agent Flow. bug Something isn't working labels Apr 9, 2024
@github-actions github-actions bot removed the needs-attention An issue or PR has been sitting around and needs attention. label Apr 11, 2024
Copy link
Contributor

This PR has not had any activity in the past 30 days, so the needs-attention label has been added to it.
If you do not have enough time to follow up on this PR or you think it's no longer relevant, consider closing it.
The needs-attention label signals to maintainers that something has fallen through the cracks. No action is needed by you; your PR will be kept open and you do not have to respond to this comment. The label will be removed the next time this job runs if there is new activity.
Thank you for your contributions!

@github-actions github-actions bot added the needs-attention An issue or PR has been sitting around and needs attention. label May 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-attention An issue or PR has been sitting around and needs attention. variant/flow Relatd to Grafana Agent Flow.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Metric agent_component_controller_running_components when component fails
3 participants