-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aggregated metrics: Use sum_over_time query for aggregated metric queries #789
Conversation
private getMetricExpression(service: string, serviceName: CustomConstantVariable) { | ||
if (serviceName.state.value === AGGREGATED_SERVICE_NAME) { | ||
return `sum by (${LEVEL_VARIABLE_VALUE}) (sum_over_time({${AGGREGATED_SERVICE_NAME}=\`${service}\`} | logfmt | drop __error__ | unwrap count [$__auto]))`; | ||
} | ||
return `sum by (${LEVEL_VARIABLE_VALUE}) (count_over_time({${SERVICE_NAME_EXPR}=\`${service}\`} | drop __error__ [$__auto]))`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svennergr @matyax does anyone know why we're dropping errors if we're not using a parser? 😕
…fore we clear the body
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the quick fix!
if (serviceName.state.value === AGGREGATED_SERVICE_NAME) { | ||
return `sum by (${LEVEL_VARIABLE_VALUE}) (sum_over_time({${AGGREGATED_SERVICE_NAME}=\`${service}\`} | logfmt | drop __error__ | unwrap count [$__auto]))`; | ||
} | ||
return `sum by (${LEVEL_VARIABLE_VALUE}) (count_over_time({${SERVICE_NAME}=\`${service}\`} | drop __error__ [$__auto]))`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the drop __error__
here. If we did need it, it would mean that getting values from structured metadata could produce errors, which I don't think is possible, but will double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so yeah, for a count_over_time
using a structured metadata grouping I can't find a way it would produce an error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove | drop __error__
from both expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, good point, especially since we're doing the grouping by level, they'll effectively get dropped anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for the unwrap
we may want a | __error__=""
instead, which will drop all lines that could not be unwrapped. However, that won't happen since we're creating this stream, so it might be a belt and suspenders situation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if something changes in the way we create the stream, and we do need to drop errors, that would mean we have a bug in the way we're creating the stream that we'd need to fix on the loki side. If we do drop errors then we still have a bug in loki but we would probably not catch it as quickly, so my argument would be to not drop or exclude errors in either query
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Current log counts being displayed are incorrect, the updated sum_over_time query unwrapping by count should return the correct volume counts.