Make Orchestrator metrics singleton #1301

eero-t · 2025-02-18T14:24:21Z

Description

Make Orchestrator metrics singleton, so that even if applications instantiate multiple Orchestrators, "megaservice_*" metrics collect data from all of them.

This is intended as proper fix for the #1280 workaround. When only CI tests created multiple Orchestrator instances, changing metric prefix was fine, but now that applications (e.g. DocSum) do that too, that's not the case any more.

(Another option would be to add arguments for passing Orchestrator instance names as metric prefixes, to name and differentiate metrics for each Orchestrator instance. However, that would have needed changes in 4 OPEA projects instead of just this one, and dashboards & benchmarks would then need to hard-code those per-application prefixes.)

Issues

Adding Orchestrator instances to applications changes their metric names, depending on in which order those instances are created, and it distorts the whole application metric values. This breaks dashboards and benchmarks relying on those metric names

Type of change

Bug fix (non-breaking change which fixes an issue)

Dependencies

n/a.

Tests

Metric CI tests added in Megaservice / orchestrator metric testing + fixes #1348 pass
Manual testing to verify that:
- Metrics are OK for ChatQnA
- DocSum megaservice metrics use only single prefix after this PR

codecov · 2025-02-18T14:29:13Z

Codecov Report

Attention: Patch coverage is 84.61538% with 2 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
comps/cores/mega/orchestrator.py	84.61%	2 Missing ⚠️

Files with missing lines	Coverage Δ
comps/cores/mega/orchestrator.py	`90.98% <84.61%> (-0.23%)`	⬇️

eero-t · 2025-02-21T20:23:08Z

Rebased to main, did testing and removed draft status.

Metrics work. There's something funky with DocSum LLM-uservice HuggingFace API endpoint handling, when it is stressed more (it seems to try to connect HF site for every query and eventually fail), but I don't see how it could be related to these changes.

xiguiw · 2025-02-25T13:26:06Z

@eero-t
Is it possible to create a test case in CI test to test this?

eero-t · 2025-02-25T18:18:28Z

@eero-t Is it possible to create a test case in CI test to test this?

@xiguiw I'll try to modify (streaming) ServiceOrchestrator test to check also metrics.

eero-t · 2025-02-26T17:04:26Z

CI fails on something completely unrelated to changes in this PR:

chatqna-gaudi-nginx-server Error manifest for opea/nginx:comps not found: manifest unknown: manifest unknown
Error response from daemon: manifest for opea/nginx:comps not found: manifest unknown: manifest unknown

eero-t · 2025-02-26T19:15:33Z

@eero-t Is it possible to create a test case in CI test to test this?

@xiguiw Ok, I've expanded (streaming) ServiceOrchestrator test to cover also checking of its streaming metrics.

It showed odd metrics values, which did not happen with real apps, only with test code.

=> It was ServiceOrchestrator bug in propagating first token status, which is now fixed.

CI "example-test" continues to fail, but it's completely unrelated to this PR, and should not block merging:

[ retrieval ] HTTP status is not 200. Received status was 000
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/llama_index/core/utils.py", line 63, in __init__
    nltk.data.find("corpora/stopwords")
  File "/usr/local/lib/python3.11/site-packages/nltk/data.py", line 579, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource stopwords not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('stopwords')

eero-t · 2025-02-27T16:14:17Z

https://github.com/opea-project/GenAIComps/pull/1340/files and https://github.com/opea-project/GenAIComps/pull/1343/files should fix the CI issue:

Resource stopwords not found.
Please use the NLTK Downloader to obtain the resource

But I think it may be cleaner if I submit commits for the metrics test and fix to issues it revealed, as a separate PR.

eero-t · 2025-02-28T18:30:58Z

I split test code addition and fixes to #1348, it should be merged first.

So that even if applications instantiate multiple Orchestrators, "megaservice_*" metrics collect data from all of them. Another option would be to add arguments for passing Orchestrator instance names as metric prefixes, to name and differentiate metrics for each Orchestrator instance. However, that would have needed changes in 3 OPEA projects instead of just this one, and dashboards would then need to hard-code those per-application prefixes. Signed-off-by: Eero Tamminen <[email protected]>

eero-t requested review from lvliang-intel, ftian1, letonghan, XinyaoWa and Spycsh as code owners February 18, 2025 14:24

eero-t marked this pull request as draft February 18, 2025 14:24

eero-t force-pushed the metric-singleton branch from 4006b06 to 929a652 Compare February 18, 2025 14:26

eero-t marked this pull request as ready for review February 21, 2025 20:15

eero-t force-pushed the metric-singleton branch from 929a652 to 69801ba Compare February 21, 2025 20:16

eero-t force-pushed the metric-singleton branch from 69801ba to f6fc0e0 Compare February 25, 2025 18:14

eero-t requested a review from chensuyue as a code owner February 25, 2025 18:14

eero-t force-pushed the metric-singleton branch 3 times, most recently from 3904431 to 8575e74 Compare February 26, 2025 16:51

eero-t force-pushed the metric-singleton branch 9 times, most recently from 3dcd91a to f1f9744 Compare February 26, 2025 19:07

eero-t force-pushed the metric-singleton branch 2 times, most recently from 04f2407 to 4bfd4ae Compare March 3, 2025 18:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Orchestrator metrics singleton #1301

Make Orchestrator metrics singleton #1301

eero-t commented Feb 18, 2025 •

edited

Loading

codecov bot commented Feb 18, 2025 •

edited

Loading

eero-t commented Feb 21, 2025

xiguiw commented Feb 25, 2025

eero-t commented Feb 25, 2025 •

edited

Loading

eero-t commented Feb 26, 2025

eero-t commented Feb 26, 2025

eero-t commented Feb 27, 2025

eero-t commented Feb 28, 2025 •

edited

Loading

Make Orchestrator metrics singleton #1301

Are you sure you want to change the base?

Make Orchestrator metrics singleton #1301

Conversation

eero-t commented Feb 18, 2025 • edited Loading

Description

Issues

Type of change

Dependencies

Tests

codecov bot commented Feb 18, 2025 • edited Loading

Codecov Report

eero-t commented Feb 21, 2025

xiguiw commented Feb 25, 2025

eero-t commented Feb 25, 2025 • edited Loading

eero-t commented Feb 26, 2025

eero-t commented Feb 26, 2025

eero-t commented Feb 27, 2025

eero-t commented Feb 28, 2025 • edited Loading

eero-t commented Feb 18, 2025 •

edited

Loading

codecov bot commented Feb 18, 2025 •

edited

Loading

eero-t commented Feb 25, 2025 •

edited

Loading

eero-t commented Feb 28, 2025 •

edited

Loading