failing loki remote backend prevents working backends from receiving data regularly while forwarding logs to multiple loki clients at once #6963
Labels
bug
Something isn't working
needs-attention
An issue or PR has been sitting around and needs attention.
variant/static
Related to Grafana Agent Static.
What's wrong?
I've noticed that in case of a multi loki clients setup to forward logs to, if one of the loki clients starts failing for some reason, eg. - no process listening on the specified port, etc, it starves other working loki endpoints to receive data as well until the failing client exhausts all of its
max_retries
(default = 10). Once the loop gets reset, the same issue repeats itself again.In the end, the working clients only get the data every 6 minutes or so based on what the
max_period
is set to (Default = 5m). This also leads to "gaps" in the grafana dashboard while looking at the data for those clients,Steps to reproduce
Take a look at this nominal config -
./agent-local-config.yaml
Start the agent as
Now, let's assume that the localhost:13100 instance is missing for some reason. In such a case I expected the other endpoint (logs.my-loki-instance) to be able to receive data at the configured scrape intervals (60s), but that doesn't happen as explained above.
System information
Linux 6.5.0-15-generic
Software version
Grafana Agent 0.35.0 and master atm
Configuration
The text was updated successfully, but these errors were encountered: