Skip to content

[BugFix] Emit valid Prometheus # TYPE lines for BE/CN JVM metrics#75240

Merged
alvin-phoenix-ai merged 1 commit into
StarRocks:mainfrom
srihithg:bugfix-jvm-metrics-prometheus-type
Jun 25, 2026
Merged

[BugFix] Emit valid Prometheus # TYPE lines for BE/CN JVM metrics#75240
alvin-phoenix-ai merged 1 commit into
StarRocks:mainfrom
srihithg:bugfix-jvm-metrics-prometheus-type

Conversation

@srihithg

@srihithg srihithg commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

Why I'm doing:

enable_jvm_metrics = true makes the BE/CN /metrics endpoint emit invalid
Prometheus text. Each JVM gauge is registered with its label baked into the
metric-name string, e.g.:

register_metric("jvm_heap_size_bytes{type=\"committed\"}", &gauge);

so the exposition prints that string verbatim after # TYPE :

# TYPE starrocks_be_jvm_heap_size_bytes{type="committed"} gauge   <-- illegal
starrocks_be_jvm_heap_size_bytes{type="committed"} 905969664

A # TYPE metric name must not contain a label set. Prometheus aborts the
entire scrape on the first text-format error, so flipping this flag on
takes the node's whole metric set offline (up == 0) — not just the JVM
series. promtool check metrics rejects it with invalid metric name in comment. Reproduced on 4.0.10/4.0.11 and main; the feature shipped in 4.0.0
(#62210).

What I'm doing:

Register each JVM gauge under a bare metric name plus a real type label via
MetricLabels, mirroring the existing stream_load_metrics pattern. The
metrics framework then emits one valid # TYPE <name> line followed by the
labelled samples:

# TYPE starrocks_be_jvm_heap_size_bytes gauge
starrocks_be_jvm_heap_size_bytes{type="committed"} 905969664
starrocks_be_jvm_heap_size_bytes{type="max"} 21474836480
starrocks_be_jvm_heap_size_bytes{type="used"} 256609448

The sample lines (name + labels) are byte-identical to before, so the
time-series identity is unchanged and existing dashboards/alerts keep working;
only the previously-malformed # TYPE comment is corrected. No metric is
renamed, so no docs change is required.

jvm_metrics_test is updated to look metrics up by name + label, plus a
regression guard asserting the old braced names no longer resolve (which would
mean the label was still baked into the name).

Fixes #75159

What type of PR is this:

  • BugFix
  • Feature
  • Enhancement
  • Refactor
  • UT
  • Doc
  • Tool

Does this PR entail a change in behavior?

  • Yes, this PR will result in a change in behavior.
  • No, this PR will not result in a change in behavior.

If yes, please specify the type of change:

  • Interface/UI changes: syntax, type conversion, expression evaluation, display information
  • Parameter changes: default values, similar parameters but with different default values
  • Policy changes: use new policy to replace old one, functionality automatically enabled
  • Feature removed
  • Miscellaneous: upgrade & downgrade compatibility, etc.

Checklist:

  • I have added test cases for my bug fix or my new feature
  • This pr needs user documentation (for new or modified features or behaviors)
    • I have added documentation for my new feature or new function
    • This pr needs auto generate documentation
  • This is a backport pr

Bugfix cherry-pick branch check:

  • I have checked the version labels which the pr will be auto-backported to the target branch
    • 4.1
    • 4.0
    • 3.5

JVM gauges were registered with the label baked into the metric-name
string, e.g. register_metric("jvm_heap_size_bytes{type=\"used\"}", ...).
The Prometheus exposition then printed that string verbatim after
"# TYPE ", producing illegal comment lines such as:

    # TYPE starrocks_be_jvm_heap_size_bytes{type="committed"} gauge

A "# TYPE" metric name must not contain a label set, so a strict parser
rejects it. Prometheus aborts the whole scrape on the first text-format
error, so enabling enable_jvm_metrics took the node's entire metric set
offline (up=0), not just the JVM series.

Register each gauge under a bare name plus a real "type" label via
MetricLabels, mirroring stream_load_metrics. This yields one valid
"# TYPE <name>" line followed by the labelled samples. The sample lines
(name + labels) are byte-identical to before, so existing dashboards and
alerts are unaffected.

Update jvm_metrics_test to look metrics up by name + label, and add a
regression guard asserting the braced names no longer resolve.

Fixes StarRocks#75159

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Srihith Garlapati <srihith.garlapati@gmail.com>
@CelerData-Reviewer

Copy link
Copy Markdown

@codex review

@github-actions github-actions Bot requested review from kevincai and stdpain June 23, 2026 23:11
@chatgpt-codex-connector

Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Chef's kiss.

Reviewed commit: b22cbb7a3a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@github-actions

Copy link
Copy Markdown
Contributor

[Java-Extensions Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions

Copy link
Copy Markdown
Contributor

[FE Incremental Coverage Report]

pass : 0 / 0 (0%)

@github-actions

Copy link
Copy Markdown
Contributor

[BE Incremental Coverage Report]

pass : 0 / 0 (0%)

@alvin-phoenix-ai alvin-phoenix-ai merged commit ba3b1f1 into StarRocks:main Jun 25, 2026
87 of 92 checks passed
@github-actions

Copy link
Copy Markdown
Contributor

@Mergifyio backport branch-4.0

@github-actions github-actions Bot removed the 4.0 label Jun 25, 2026
@github-actions

Copy link
Copy Markdown
Contributor

@Mergifyio backport branch-4.1

@github-actions github-actions Bot removed the 4.1 label Jun 25, 2026
@mergify

mergify Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

backport branch-4.0

✅ Backports have been created

Details

@mergify

mergify Bot commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

backport branch-4.1

✅ Backports have been created

Details

wanpengfei-git pushed a commit that referenced this pull request Jun 25, 2026
…ckport #75240) (#75368)

Signed-off-by: Srihith Garlapati <srihith.garlapati@gmail.com>
Co-authored-by: srihithg <78094568+srihithg@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
wanpengfei-git pushed a commit that referenced this pull request Jun 25, 2026
…ckport #75240) (#75369)

Signed-off-by: Srihith Garlapati <srihith.garlapati@gmail.com>
Co-authored-by: srihithg <78094568+srihithg@users.noreply.github.com>
Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] enable_jvm_metrics=true produces invalid Prometheus # TYPE lines, breaking the entire BE/CN /metrics scrape

4 participants