Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: wal fsync latency histogram different between prometheus and db console #115825

Closed
kvoli opened this issue Dec 7, 2023 · 3 comments · Fixed by #116895
Closed

storage: wal fsync latency histogram different between prometheus and db console #115825

kvoli opened this issue Dec 7, 2023 · 3 comments · Fixed by #116895
Assignees
Labels
A-observability-inf branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-23.2-scale-testing issues found during 23.2 scale testing O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster

Comments

@kvoli
Copy link
Collaborator

kvoli commented Dec 7, 2023

Describe the problem

The storage.wal.fsync.latency metric is different when viewed in grafana (querying prometheus) vs DB Console. The DB Console chart appears to be incorrect as the values are overly smoothed.

To Reproduce

export cluster=austen-fsync-latency-hist-repro
roachprod create $cluster -n 3 --local-ssd=false
roachprod stage $cluster release v23.2.0-beta.1
roachprod start $cluster
roachprod run $cluster -- './cockroach workload run kv --min-block-bytes=4048 --max-block-bytes=4048'
# - open up db-console page and navigate to custom metrics
# - check the custom metric `storage.wal.fsync.latency-p99`
# - check grafana:
histogram_quantile(0.99, sum by (instance,le) (rate(storage_wal_fsync_latency_bucket[15s])))

Expected behavior

DB Console and prometheus are comparable.

Additional data / screenshots

image
image

Environment:

  • CockroachDB version v23.2.0-beta.1
  • Server OS: Ubuntu
  • Client app KV workload

Additional context

Fsync metric on DBConsole is not usable. This is unfortunate as it is a relied upon signal for disk saturation.

Jira issue: CRDB-34231

@kvoli kvoli added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 O-23.2-scale-testing issues found during 23.2 scale testing labels Dec 7, 2023
@blathers-crl blathers-crl bot added the T-storage Storage Team label Dec 7, 2023
@aadityasondhi aadityasondhi self-assigned this Dec 11, 2023
@abarganier abarganier added T-observability-inf and removed T-storage Storage Team labels Dec 13, 2023
@abarganier abarganier self-assigned this Dec 13, 2023
@abarganier
Copy link
Contributor

I was poking around the Histogram code today to do some optimizing for allocations, and I noticed some weirdness that led me here.

I believe #104088 broke the ManualWindowHistogram, which is what's causing this.

When grabbing a window to calculate a quantile against, we are using the cumulative histogram. Shouldn't we just be using the current one?

if err := mwh.mu.cum.Write(cur); err != nil {

As of the linked PR, the cur window that's part of the ManualWindowHistogram is written to, but never used.

@aadityasondhi
Copy link
Collaborator

Yep, I noticed those things too. Putting up a fix soon.

aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Dec 14, 2023
These histograms were incorrectly not setting `mwh.prev`. The
`ToPrometheusMetric` was also merging the `cum` with the `prev`.
Instead, it should have been using `cur`. Since both `Update()` and
`Rotate()` set `cur` to be the correct current window already.

Informs cockroachdb#115825.

Release note: None
@aadityasondhi
Copy link
Collaborator

This bug was introduced in 4a2d06a and then was exemplified by some of the changes in bae5045.

There are two main problems (one from each):

  1. The original commit was not windowing the newly introduced histogram using the same logic as the default one does. We have a prev and cur windowed histogram that cover the duration of the histogram. At every tick, we rotate them and we use a merged view to calculate derivative statistics. This is the same as using rate over 1m in Grafana.
  2. The next change incorrectly was merging the cumulative and prev histogram. This is incorrect for obvious reasons and surfaced the problems that we see above.

In addition, both changes were missing tests for the windowing logic for manual histograms.

I made a fix that is within the confines of the current interface. I still strongly believe we should strongly consider taking on #116584 to make this interface easier to use so such bugs are difficult to introduce.

Below is a test I ran using the same workload @kvoli mentioned above.

DB Console:
image

Grafana:
image

I will put a PR with these fixes soon.

aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Dec 20, 2023
These histograms were missing the windowing logic we use in histograms
to store them in our internal time series DB. The logic we follow there
is that we keep 3 histograms. The first one is a cumulative one that we
export to prometheus through the `status/vars` page. The second and
third are `prev` and `cur` that together cover the entire duration of
the histogram, default being 60s. When the ticker rotates these, we swap
out the prev with the current, and start a brand new current histogram.
When calculating derivitive statistics, we use the combined view of the
windows which is used to display these metrics in DB console. This is
similar to the way Grafana handles these using its rate interval.

Fixes cockroachdb#115825.

Release note: None
aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Dec 20, 2023
These histograms were missing the windowing logic we use in histograms
to store them in our internal time series DB. The logic we follow there
is that we keep 3 histograms. The first one is a cumulative one that we
export to prometheus through the `status/vars` page. The second and
third are `prev` and `cur` that together cover the entire duration of
the histogram, default being 60s. When the ticker rotates these, we swap
out the prev with the current, and start a brand new current histogram.
When calculating derivative statistics, we use the combined view of the
windows which is used to display these metrics in DB console. This is
similar to the way Grafana handles these using its rate interval.

Fixes cockroachdb#115825.

Release note: None
aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Jan 2, 2024
These histograms were missing the windowing logic we use in histograms
to store them in our internal time series DB. The logic we follow there
is that we keep 3 histograms. The first one is a cumulative one that we
export to prometheus through the `status/vars` page. The second and
third are `prev` and `cur` that together cover the entire duration of
the histogram, default being 60s. When the ticker rotates these, we swap
out the prev with the current, and start a brand new current histogram.
When calculating derivative statistics, we use the combined view of the
windows which is used to display these metrics in DB console. This is
similar to the way Grafana handles these using its rate interval.

Fixes cockroachdb#115825.

Release note: None
craig bot pushed a commit that referenced this issue Jan 3, 2024
116895: metric: fix windowing for manual histograms r=abarganier,itsbilal a=aadityasondhi

These histograms were missing the windowing logic we use in histograms to store them in our internal time series DB. The logic we follow there is that we keep 3 histograms. The first one is a cumulative one that we export to prometheus through the `status/vars` page. The second and third are `prev` and `cur` that together cover the entire duration of the histogram, default being 60s. When the ticker rotates these, we swap out the prev with the current, and start a brand new current histogram. When calculating derivative statistics, we use the combined view of the windows which is used to display these metrics in DB console. This is similar to the way Grafana handles these using its rate interval.

Fixes #115825.

Release note: None

117097: span: Add strict mode r=miretskiy a=miretskiy

Two commits to help debug #116661 

1.
Log "slow span" messages both from the coordinator and from the aggregators.

2.
span: Add strict mode

Add strict mode for span frontier implementations that
requires the forwarded span to be a sub-span of the spans
tracked by the frontier.

The functionality is opt-in, and is enabled via
`COCKROACH_SPAN_FRONTIER_STRICT_MODE_ENABLED` env var.

Enroll cdc roachtests to use strict mode.

Informs #116661

Release Note: None

Co-authored-by: Aaditya Sondhi <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@craig craig bot closed this as completed in 4070f26 Jan 3, 2024
aadityasondhi added a commit to aadityasondhi/cockroach that referenced this issue Jan 4, 2024
These histograms were missing the windowing logic we use in histograms
to store them in our internal time series DB. The logic we follow there
is that we keep 3 histograms. The first one is a cumulative one that we
export to prometheus through the `status/vars` page. The second and
third are `prev` and `cur` that together cover the entire duration of
the histogram, default being 60s. When the ticker rotates these, we swap
out the prev with the current, and start a brand new current histogram.
When calculating derivative statistics, we use the combined view of the
windows which is used to display these metrics in DB console. This is
similar to the way Grafana handles these using its rate interval.

Fixes cockroachdb#115825.

Release note: None
aadityasondhi added a commit that referenced this issue Jan 8, 2024
These histograms were missing the windowing logic we use in histograms
to store them in our internal time series DB. The logic we follow there
is that we keep 3 histograms. The first one is a cumulative one that we
export to prometheus through the `status/vars` page. The second and
third are `prev` and `cur` that together cover the entire duration of
the histogram, default being 60s. When the ticker rotates these, we swap
out the prev with the current, and start a brand new current histogram.
When calculating derivative statistics, we use the combined view of the
windows which is used to display these metrics in DB console. This is
similar to the way Grafana handles these using its rate interval.

Fixes #115825.

Release note: None
@jbowens jbowens moved this to Done in [Deprecated] Storage Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-observability-inf branch-release-23.2 Used to mark GA and release blockers, technical advisories, and bugs for 23.2 C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-23.2-scale-testing issues found during 23.2 scale testing O-testcluster Issues found or occurred on a test cluster, i.e. a long-running internal cluster
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants