-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor metrics processing #34
Open
nadiamoe
wants to merge
10
commits into
main
Choose a base branch
from
metric-processing-v2
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
89a17a4
to
980c3ba
Compare
cf080eb
to
eb85f9e
Compare
7b04031
to
08dac7f
Compare
214a896
to
7364a6b
Compare
46cf5b8
to
c242885
Compare
Not seen in practice with iterations=1, but part of the test suite
a9004ff
to
f446106
Compare
f446106
to
ebe3731
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note
Reviewers: This PR changes the
output.go
file substantially, making the diff useless. I recommend to look a the new code only, and focus efforts on considering whether the tests are sufficient to prove backwards-compatibility.This PR overhauls how metrics are processed in the SM output extension, deliberately taking a few tradeoffs against performance to make a code that is going to be chatty and verbose as easy to read as possible.
How the new code works
The new code works like this:
When samples arrive through
func (o *Output) AddMetricSamples(containers []metrics.SampleContainer)
, samples are added to a map in themetricStore
. This map indexes metrics by timeseries, with the value being the numeric value of the timeseries. If a sample for a given timeseries arrives more than once, it is immediately aggregated this early, depending on the metric type: Counters are added to its previous value, gauges replace the previous value, and Rates and Trends are averaged. At this point, metrics are aggregated but their labels and names are untouched.When
func (o *Output) Stop() error
is called, transformation takes place in three steps.First, new timeseries are Derived from existing ones. This includes creating new timeseries by changing the metric name (e.g. adding suffixes), or creating new trimeseries with information contained in another. This process can use one trimeseries to create serveral derived ones, but every derived trimeseries must come exactly from one timeseries output by k6. The derivation process walks through the
metricStore
sequentially and exactly once, so it is no possible to combine multiple k6 metrics, or derive derivates.Next, some trimeseries are Removed. This is done by checking if the metric name is on a "remove list". This step helps to both remove unused metrics, but also to remove trimeseries that we cloned with a new name in the Derive step, effectively allowing us to rename a trimeseries.
Finally, labels are stripped from trimeseries. Some labels are removed from all timeseries, while others are removed from all except some. This process does not aggregate the resulting timeseries in any way, and it is technically possible to obtain duplicated timeseries as a result. This however should be very unlikely in practice as we remove labels that are known to not contribute to cardinality, just noise.
Each transformation step walks through the map once, with the Derivation process being the most expensive as new entries are added to it.
After the transformation is done, metrics are serialized into a prometheus text format and written to the output file.
Transformations
The following transformations are currently implemented, in the
Derive-Remove-RemoveLabels
process described above:http_reqs
tohttp_requests_total
http_req_failed
tohttp_requests_failed_total
http_info
gauge with info-like labels fromhttp_reqs
, with a value of1
.tls_version
,proto
labels are kept inhttp_info
and removed from all other metrics.http_ssl
gauge with a value of1
ifhttp_reqs
had atls_version
label,0
otherwise.http_got_expected_response
gauge, which is 1 ifhttp_reqs
had theexpected_response
label set totrue
, and 0 otherwise.expected_response
label is removed from all other metrics.http_error_code
gauge, whose value equals the value of theerror_code
label originally present.error_code
label is removed from all other metrics.error
label, containing a textual representation from the error, is removed from all metrics.http_status_code
gauge, whose value equals the value of thestatus
label originally present.http_version
gauge, whose value equals the value of the HTTP protocol version parsed from the originalproto
label.data_(sent|received)
metrics are renamed todata_(sent|received)_bytes
http_req_duration
is renamed and scaled tohttp_total_duration_seconds
iteration_duration
is renamed and scaled toiteration_duration_seconds
http_req_(blocked|connecting|receiving|sending|handshaking|waiting) are converted to
http_duration_secondswith a
phase` label, with the value of this label being a slightly reworded version of the original (preserving pre-refactor behavior).vus
,vus_max
, anditerations
metrics are removedgroup
label is removed from all metricschecks
metric is renamed tocheck_success_rate
checks
metric is emitted as logstime="2025-01-14T18:21:59+01:00" level=info msg="check result" check="is 200" count=3 metric=checks_total scenario=default value=0.3
checks
is derived into achecks_total
counter, with aresult="(fail|success)"
label is created from.This were, to my understanding, all transformations done using the previous approach.
Opinions
This new approach is deliberately opinionated in a couple regards:
Trend
s andRate
s is the average. This second part is easy to change if the need arises, but so far it doesn't seem to me to be the case, as most of the times we will want to report the average only.Open ends
name
label. It was previously not emitted, and now is. This seems to be the right thing to do according to k6 maintainers.checks_total
and the newly addedcheck_success_rate
now carry acheck
label containing the check name. Before this refactor this label was stripped on the argument of being high-cardinality, but given we're allowing URLs as labels I don't see this being a particularly strong argument. As we cannot perform per-label aggregations, we would need to drop this metric entirely if we want to get rid of thecheck
label.