From a78c768d401e9e444752c89e092e7ba7fc9fc082 Mon Sep 17 00:00:00 2001 From: courageJ Date: Thu, 20 Feb 2025 19:06:27 +0000 Subject: [PATCH] Move pkg/ext-proc/metrics/README.md -> site-src/guides/metrics.md (#373) * Move pkgepp/metrics/README.md -> site-src/guides/metrics.md * add docs link for metrics.md * update formatting --- mkdocs.yml | 1 + .../README.md => site-src/guides/metrics.md | 30 ++++++++----------- 2 files changed, 14 insertions(+), 17 deletions(-) rename pkg/epp/metrics/README.md => site-src/guides/metrics.md (51%) diff --git a/mkdocs.yml b/mkdocs.yml index a024c16d..8cd3f3fb 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -57,6 +57,7 @@ nav: - User Guides: - Getting started: guides/index.md - Adapter Rollout: guides/adapter-rollout.md + - Metrics: guides/metrics.md - Implementer's Guide: guides/implementers.md - Reference: - API Reference: reference/spec.md diff --git a/pkg/epp/metrics/README.md b/site-src/guides/metrics.md similarity index 51% rename from pkg/epp/metrics/README.md rename to site-src/guides/metrics.md index 1f68a0bd..f793734d 100644 --- a/pkg/epp/metrics/README.md +++ b/site-src/guides/metrics.md @@ -1,10 +1,6 @@ -# Documentation +# Metrics -This documentation is the current state of exposed metrics. - -## Table of Contents -* [Exposed Metrics](#exposed-metrics) -* [Scrape Metrics](#scrape-metrics) +This guide describes the current state of exposed metrics and how to scrape them. ## Requirements @@ -38,17 +34,17 @@ spec: ## Exposed metrics -| Metric name | Metric Type | Description | Labels | Status | -| ------------|--------------| ----------- | ------ | ------ | -| inference_model_request_total | Counter | The counter of requests broken out for each model. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_request_error_total | Counter | The counter of requests errors broken out for each model. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_request_sizes | Distribution | Distribution of request size in bytes. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_response_sizes | Distribution | Distribution of response size in bytes. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_input_tokens | Distribution | Distribution of input token count. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_model_output_tokens | Distribution | Distribution of output token count. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | -| inference_pool_average_kv_cache_utilization | Gauge | The average kv cache utilization for an inference server pool. | `name`=<inference-pool-name> | ALPHA | -| inference_pool_average_queue_size | Gauge | The average number of requests pending in the model server queue. | `name`=<inference-pool-name> | ALPHA | +| **Metric name** | **Metric Type** |
**Description**
|
**Labels**
| **Status** | +|:---------------------------------------------|:-----------------|:------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:------------| +| inference_model_request_total | Counter | The counter of requests broken out for each model. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_request_error_total | Counter | The counter of requests errors broken out for each model. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_request_duration_seconds | Distribution | Distribution of response latency. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_request_sizes | Distribution | Distribution of request size in bytes. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_response_sizes | Distribution | Distribution of response size in bytes. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_input_tokens | Distribution | Distribution of input token count. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_model_output_tokens | Distribution | Distribution of output token count. | `model_name`=<model-name>
`target_model_name`=<target-model-name> | ALPHA | +| inference_pool_average_kv_cache_utilization | Gauge | The average kv cache utilization for an inference server pool. | `name`=<inference-pool-name> | ALPHA | +| inference_pool_average_queue_size | Gauge | The average number of requests pending in the model server queue. | `name`=<inference-pool-name> | ALPHA | ## Scrape Metrics