diff --git a/docs/dev/telemetry.md b/docs/dev/telemetry.md deleted file mode 100644 index b0a52d6b0..000000000 --- a/docs/dev/telemetry.md +++ /dev/null @@ -1,705 +0,0 @@ -## OpenTelemetry Instrumentation in Mellea - -Mellea provides built-in OpenTelemetry instrumentation with comprehensive observability features that can be enabled independently. The instrumentation follows the [OpenTelemetry Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) for standardized observability across LLM applications. - -**Note**: OpenTelemetry is an optional dependency. If not installed, telemetry features are automatically disabled with no impact on functionality. - -### Observability Features - -1. **Application Trace** (`mellea.application`) - Tracks user-facing operations -2. **Backend Trace** (`mellea.backend`) - Tracks LLM backend interactions with Gen-AI semantic conventions -3. **Token Usage Metrics** - Tracks token consumption across all backends with Gen-AI semantic conventions - -### Installation - -To use telemetry features, install Mellea with OpenTelemetry support: - -```bash -pip install mellea[telemetry] -# or -uv pip install mellea[telemetry] -``` - -Without the `[telemetry]` extra, Mellea works normally but telemetry features are disabled. - -### Configuration - -Telemetry is configured via environment variables: - -#### General Telemetry Configuration - -| Variable | Description | Default | -|----------|-------------|---------| -| `OTEL_SERVICE_NAME` | Service name for all telemetry signals | `mellea` | -| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for all telemetry signals | None | - -#### Tracing Configuration - -| Variable | Description | Default | -|----------|-------------|---------| -| `MELLEA_TRACE_APPLICATION` | Enable application-level tracing | `false` | -| `MELLEA_TRACE_BACKEND` | Enable backend-level tracing | `false` | -| `MELLEA_TRACE_CONSOLE` | Print traces to console (debugging) | `false` | - -#### Metrics Configuration - -| Variable | Description | Default | -|----------|-------------|---------| -| `MELLEA_METRICS_ENABLED` | Enable metrics collection | `false` | -| `MELLEA_METRICS_CONSOLE` | Print metrics to console (debugging) | `false` | -| `MELLEA_METRICS_OTLP` | Enable OTLP metrics exporter | `false` | -| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | OTLP metrics-specific endpoint (overrides general endpoint) | None | -| `MELLEA_METRICS_PROMETHEUS` | Enable Prometheus metric reader (registers with prometheus_client registry) | `false` | -| `OTEL_METRIC_EXPORT_INTERVAL` | Export interval in milliseconds | `60000` | - -#### Logging Configuration - -| Variable | Description | Default | -|----------|-------------|---------| -| `MELLEA_LOGS_OTLP` | Enable OTLP logs exporter | `false` | -| `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | OTLP logs-specific endpoint (overrides general endpoint) | None | - -### Application Trace Scope - -The application tracer (`mellea.application`) instruments: - -- **Session lifecycle**: `start_session()`, session context manager entry/exit -- **@generative functions**: Execution of functions decorated with `@generative` -- **mfuncs.aact()**: Action execution with requirements and sampling strategies -- **Sampling strategies**: Rejection sampling, budget forcing, etc. -- **Requirement validation**: Validation of requirements and constraints - -**Span attributes include:** -- `backend`: Backend class name -- `model_id`: Model identifier -- `context_type`: Context class name -- `action_type`: Component type being executed -- `has_requirements`: Whether requirements are specified -- `has_strategy`: Whether a sampling strategy is used -- `strategy_type`: Sampling strategy class name -- `num_generate_logs`: Number of generation attempts -- `sampling_success`: Whether sampling succeeded -- `response`: Model response (truncated to 500 chars) -- `response_length`: Full length of model response - -### Backend Trace Scope - -The backend tracer (`mellea.backend`) instruments LLM interactions following [OpenTelemetry Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/): - -- **Backend.generate_from_context()**: Context-based generation (chat operations) -- **Backend.generate_from_raw()**: Raw generation without context (text completions) -- **Backend-specific implementations**: Ollama, OpenAI, HuggingFace, Watsonx, LiteLLM - -**Gen-AI Semantic Convention Attributes:** -- `gen_ai.system`: LLM system name (e.g., `openai`, `ollama`, `huggingface`) -- `gen_ai.request.model`: Model identifier used for the request -- `gen_ai.response.model`: Actual model used in the response (may differ from request) -- `gen_ai.operation.name`: Operation type (`chat` or `text_completion`) -- `gen_ai.usage.input_tokens`: Number of input tokens consumed -- `gen_ai.usage.output_tokens`: Number of output tokens generated -- `gen_ai.usage.total_tokens`: Total tokens consumed -- `gen_ai.response.id`: Response ID from the LLM provider -- `gen_ai.response.finish_reasons`: List of finish reasons (e.g., `["stop"]`, `["length"]`) - -**Mellea-Specific Attributes:** -- `mellea.backend`: Backend class name (e.g., `OpenAIBackend`) -- `mellea.action_type`: Component type being executed -- `mellea.context_size`: Number of items in context -- `mellea.has_format`: Whether structured output format is specified -- `mellea.format_type`: Response format class name -- `mellea.tool_calls_enabled`: Whether tool calling is enabled -- `mellea.num_actions`: Number of actions in batch (for `generate_from_raw`) - -### Token Usage Metrics - -Mellea automatically tracks token consumption across backends using OpenTelemetry metrics counters. Token metrics follow the [Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) for standardized observability. - -> **Note**: Token usage metrics are only tracked for `generate_from_context` requests. `generate_from_raw` calls do not record token metrics. - -#### Metrics - -| Metric Name | Type | Unit | Description | -|-------------|------|------|-------------| -| `mellea.llm.tokens.input` | Counter | `tokens` | Total input/prompt tokens processed | -| `mellea.llm.tokens.output` | Counter | `tokens` | Total output/completion tokens generated | - -#### Attributes - -All token metrics include these attributes following Gen-AI semantic conventions: - -| Attribute | Description | Example Values | -|-----------|-------------|----------------| -| `gen_ai.provider.name` | Backend provider name | `openai`, `ollama`, `watsonx`, `litellm`, `huggingface` | -| `gen_ai.request.model` | Model identifier | `gpt-4`, `llama3.2:7b`, `granite-3.1-8b-instruct` | - -#### Backend Support - -| Backend | Streaming | Non-Streaming | Source | -|---------|-----------|---------------|--------| -| OpenAI | ✅ | ✅ | `usage.prompt_tokens` and `usage.completion_tokens` | -| Ollama | ✅ | ✅ | `prompt_eval_count` and `eval_count` | -| WatsonX | ❌ | ✅ | `input_token_count` and `generated_token_count` (streaming API limitation) | -| LiteLLM | ✅ | ✅ | `usage.prompt_tokens` and `usage.completion_tokens` | -| HuggingFace | ✅ | ✅ | Calculated from input_ids and output sequences | - -#### Configuration - -Token metrics are **disabled by default** for zero overhead. Enable with: - -```bash -export MELLEA_METRICS_ENABLED=true -``` - -Metrics are automatically recorded after each LLM call completes. No code changes required. - -#### When Metrics Are Recorded - -Token metrics are recorded **after the full response is received**, not incrementally during streaming: - -- **Non-streaming**: Metrics recorded immediately after `await mot.avalue()` completes -- **Streaming**: Metrics recorded after the stream is fully consumed (all chunks received) - -This ensures accurate token counts are captured from the backend's usage metadata, which is only available after the complete response. - -```python -mot, _ = await backend.generate_from_context(msg, ctx) - -# Metrics NOT recorded yet (stream still in progress) -await mot.astream() - -# Metrics recorded here (after stream completion) -await mot.avalue() -``` - -#### Metrics Export Configuration - -Mellea supports multiple metrics exporters that can be used independently or simultaneously: - -1. **Console Exporter** - For local debugging -2. **OTLP Exporter** - For production observability platforms -3. **Prometheus Exporter** - For Prometheus-based monitoring - -**Important**: If `MELLEA_METRICS_ENABLED=true` but no exporter is configured, you'll see a warning. Metrics will be collected but not exported. - -##### Console Exporter (Debugging) - -Print metrics to console for local debugging without setting up an observability backend: - -```bash -export MELLEA_METRICS_ENABLED=true -export MELLEA_METRICS_CONSOLE=true -python docs/examples/telemetry/metrics_example.py -``` - -Metrics are printed as JSON at the configured export interval (default: 60 seconds). - -##### OTLP Exporter (Production) - -Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.): - -```bash -# Enable metrics and OTLP exporter -export MELLEA_METRICS_ENABLED=true -export MELLEA_METRICS_OTLP=true - -# Configure OTLP endpoint -export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 - -# Optional: Use metrics-specific endpoint (overrides general endpoint) -export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318 - -# Optional: Set service name -export OTEL_SERVICE_NAME=my-mellea-app - -# Optional: Adjust export interval (milliseconds, default: 60000) -export OTEL_METRIC_EXPORT_INTERVAL=30000 - -python docs/examples/telemetry/metrics_example.py -``` - -**OTLP Collector Setup Example:** - -```bash -# Create otel-collector-config.yaml -cat > otel-collector-config.yaml < otel-collector-config.yaml <. diff --git a/docs/docs/docs.json b/docs/docs/docs.json index d7f60ad0b..8fb65bf95 100644 --- a/docs/docs/docs.json +++ b/docs/docs/docs.json @@ -68,7 +68,8 @@ "how-to/use-images-and-vision", "how-to/build-a-rag-pipeline", "how-to/refactor-prompts-with-cli", - "how-to/unit-test-generative-code" + "how-to/unit-test-generative-code", + "how-to/handling-exceptions" ] }, { @@ -100,10 +101,16 @@ { "group": "Evaluation and Observability", "pages": [ - "evaluation-and-observability/handling-exceptions", - "evaluation-and-observability/metrics-and-telemetry", - "evaluation-and-observability/opentelemetry-tracing", - "evaluation-and-observability/evaluate-with-llm-as-a-judge" + "evaluation-and-observability/evaluate-with-llm-as-a-judge", + { + "group": "Telemetry", + "pages": [ + "evaluation-and-observability/telemetry", + "evaluation-and-observability/tracing", + "evaluation-and-observability/metrics", + "evaluation-and-observability/logging" + ] + } ] }, { diff --git a/docs/docs/evaluation-and-observability/evaluate-with-llm-as-a-judge.md b/docs/docs/evaluation-and-observability/evaluate-with-llm-as-a-judge.md index 84d5a57fb..f8064ca58 100644 --- a/docs/docs/evaluation-and-observability/evaluate-with-llm-as-a-judge.md +++ b/docs/docs/evaluation-and-observability/evaluate-with-llm-as-a-judge.md @@ -202,4 +202,4 @@ requirements. `sample_generations` lists every attempt made. **See also:** [The Requirements System](../concepts/requirements-system) | [Write Custom Verifiers](../how-to/write-custom-verifiers) | -[Handling Exceptions and Failures](../evaluation-and-observability/handling-exceptions) +[Handling Exceptions and Failures](../how-to/handling-exceptions) diff --git a/docs/docs/evaluation-and-observability/logging.md b/docs/docs/evaluation-and-observability/logging.md new file mode 100644 index 000000000..908c43928 --- /dev/null +++ b/docs/docs/evaluation-and-observability/logging.md @@ -0,0 +1,186 @@ +--- +title: "Logging" +description: "Configure Mellea's console logging and export logs to OTLP collectors." +# diataxis: reference +--- + +**Prerequisites:** [Telemetry](../evaluation-and-observability/telemetry) +introduces the environment variables and telemetry architecture. This page +covers logging configuration in detail. + +Mellea provides two logging layers: a built-in console logger for local +development and an optional OTLP exporter for centralized log aggregation. +Both work simultaneously when enabled. + +## Console logging + +Mellea uses `FancyLogger`, a color-coded singleton logger built on Python's +`logging` module. All internal Mellea modules obtain their logger via +`FancyLogger.get_logger()`. + +### Configuration + +| Variable | Description | Default | +| -------- | ----------- | ------- | +| `DEBUG` | Set to any value to enable `DEBUG`-level output | unset (`INFO` level) | +| `FLOG` | Set to any value to forward logs to a local REST endpoint at `http://localhost:8000/api/receive` | unset | + +By default, `FancyLogger` logs at `INFO` level with color-coded output to +stdout. Set the `DEBUG` environment variable to lower the level to `DEBUG`: + +```bash +export DEBUG=1 +python your_script.py +``` + +### Log format + +Console output uses ANSI color codes by log level: + +- **Cyan** — DEBUG +- **Grey** — INFO +- **Yellow** — WARNING +- **Red** — ERROR +- **Bold red** — CRITICAL + +Each message is formatted as: + +```text +=== HH:MM:SS-LEVEL ====== +message +``` + +## OTLP log export + +When the `[telemetry]` extra is installed, Mellea can export logs to an OTLP +collector alongside the existing console output. This is useful for centralizing +logs from distributed services. + +> **Note:** OTLP logging is disabled by default. When disabled, there is zero +> overhead — no OTLP handler is created. + +### Enable OTLP logging + +```bash +export MELLEA_LOGS_OTLP=true +export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 + +# Optional: logs-specific endpoint (overrides general endpoint) +export OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:4318 + +# Optional: set service name +export OTEL_SERVICE_NAME=my-mellea-app +``` + +### How it works + +When `MELLEA_LOGS_OTLP=true`, `FancyLogger` adds an OpenTelemetry +`LoggingHandler` alongside its existing handlers: + +- **Console handler** — continues to work normally (color-coded output) +- **REST handler** — continues to work normally (when `FLOG` is set) +- **OTLP handler** — exports logs to the configured OTLP collector + +Logs are exported using OpenTelemetry's Logs API with batched processing +for efficiency. + +### Programmatic access + +Use `get_otlp_log_handler()` to add OTLP log export to your own loggers: + +```python +import logging +from mellea.telemetry import get_otlp_log_handler + +logger = logging.getLogger("my_app") +handler = get_otlp_log_handler() +if handler: + logger.addHandler(handler) + logger.info("This log will be exported via OTLP") +``` + +The function returns `None` when OTLP logging is disabled or not configured, +so the `if handler` check is always safe. + +### OTLP collector setup example + +```bash +cat > otel-collector-config.yaml < **Note:** OpenTelemetry is an optional dependency. Mellea works normally without it. -> Install with `pip install "mellea[telemetry]"` or `uv pip install "mellea[telemetry]"`. - -## Configuration - -All telemetry is configured via environment variables: - -| Variable | Description | Default | -| -------- | ----------- | ------- | -| `MELLEA_TRACE_APPLICATION` | Enable application-level tracing | `false` | -| `MELLEA_TRACE_BACKEND` | Enable backend-level tracing | `false` | -| `MELLEA_TRACE_CONSOLE` | Print traces to console (debugging) | `false` | -| `MELLEA_METRICS_ENABLED` | Enable metrics collection | `false` | -| `MELLEA_METRICS_CONSOLE` | Print metrics to console (debugging) | `false` | -| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for trace and metric export | none | -| `OTEL_SERVICE_NAME` | Service name in exported telemetry | `mellea` | - -## Trace scopes - -Mellea has two independent trace scopes: - -- **`mellea.application`** — user-facing operations: session lifecycle, `@generative` - function calls, `instruct()` and `act()` calls, sampling strategies, and requirement - validation. -- **`mellea.backend`** — LLM backend interactions, following the - [OpenTelemetry Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). - Records model calls, token usage, finish reasons, and API latency. - -Enable both for full observability, or pick one depending on what you need to debug. - -## Using `start_session()` as a context manager - -Wrapping a session in `with start_session()` ties the trace lifecycle to the session -scope. All spans generated within the block are nested under the session span: - -```python -from mellea import generative, start_session -from mellea.stdlib.requirements import req - -@generative -def classify_sentiment(text: str) -> str: - """Classify the sentiment of the given text as positive, negative, or neutral.""" - -with start_session() as m: - email = m.instruct( - "Write a professional email to {{name}} about {{topic}}", - requirements=[req("Must be formal"), req("Must be under 100 words")], - user_variables={"name": "Alice", "topic": "project update"}, - ) - sentiment = classify_sentiment(m, text="I love this product!") -``` - -Run this with application tracing enabled: - -```bash -export MELLEA_TRACE_APPLICATION=true -python your_script.py -``` - -## Debugging with console output - -Print spans directly to stdout without configuring an OTLP backend: - -```bash -export MELLEA_TRACE_APPLICATION=true -export MELLEA_TRACE_CONSOLE=true -python your_script.py -``` - -This is the fastest way to verify that instrumentation is working. - -## Exporting to an OTLP backend - -Any OTLP-compatible backend works. To export to a local Jaeger instance: - -```bash -# Start Jaeger -docker run -d --name jaeger \ - -p 4317:4317 \ - -p 16686:16686 \ - jaegertracing/all-in-one:latest - -# Configure Mellea -export MELLEA_TRACE_APPLICATION=true -export MELLEA_TRACE_BACKEND=true -export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 -export OTEL_SERVICE_NAME=my-mellea-app - -python your_script.py -# View traces at http://localhost:16686 -``` - -Other compatible backends include Grafana Tempo, Honeycomb, Datadog, New Relic, -AWS X-Ray (via OTLP), and Google Cloud Trace. - -## Checking trace status programmatically - -```python -from mellea.telemetry import ( - is_application_tracing_enabled, - is_backend_tracing_enabled, - is_metrics_enabled, -) - -print(f"Application tracing: {is_application_tracing_enabled()}") -print(f"Backend tracing: {is_backend_tracing_enabled()}") -print(f"Metrics: {is_metrics_enabled()}") -``` - -## Metrics - -The metrics API exposes counters, histograms, and up-down counters backed by -the OpenTelemetry Metrics API. Enable metrics collection: - -```bash -export MELLEA_METRICS_ENABLED=true -export MELLEA_METRICS_CONSOLE=true # optional: print to stdout -``` - -Use `create_counter` and `create_histogram` to instrument your own code: - -```python -from mellea.telemetry import create_counter, create_histogram - -requests = create_counter("mellea.requests", unit="1", description="Total requests") -latency = create_histogram("mellea.latency", unit="ms", description="Request latency") - -requests.add(1, {"backend": "ollama", "model": "granite4:micro"}) -latency.record(120, {"backend": "ollama"}) -``` - -If `MELLEA_METRICS_ENABLED` is `false` or the `[telemetry]` extra is not installed, -all instrument calls are no-ops with no overhead. - -> **Note:** Metrics are exported to `OTEL_EXPORTER_OTLP_ENDPOINT` when set. -> If metrics are enabled but no endpoint is configured and `MELLEA_METRICS_CONSOLE` -> is also `false`, Mellea will log a warning at startup. - -## Span hierarchy - -When both trace scopes are enabled, spans nest as follows: - -```text -session_context (mellea.application) -├── aact (mellea.application) -│ ├── chat (mellea.backend) [gen_ai.system=ollama, gen_ai.request.model=granite4:micro] -│ │ [gen_ai.usage.input_tokens=150, gen_ai.usage.output_tokens=50] -│ └── requirement_validation (mellea.application) -└── aact (mellea.application) - └── chat (mellea.backend) [gen_ai.system=openai, gen_ai.request.model=gpt-4] - [gen_ai.usage.input_tokens=200, gen_ai.usage.output_tokens=75] -``` - -Backend spans carry Gen-AI semantic convention attributes for cross-provider comparisons: - -| Attribute | Description | -| --------- | ----------- | -| `gen_ai.system` | LLM provider name (`openai`, `ollama`, `huggingface`) | -| `gen_ai.request.model` | Model requested | -| `gen_ai.response.model` | Model actually used (may differ) | -| `gen_ai.usage.input_tokens` | Input tokens consumed | -| `gen_ai.usage.output_tokens` | Output tokens generated | -| `gen_ai.response.finish_reasons` | Finish reason list (e.g., `["stop"]`) | - -Application spans add Mellea-specific attributes: - -| Attribute | Description | -| --------- | ----------- | -| `mellea.backend` | Backend class name | -| `mellea.action_type` | Component type being executed | -| `sampling_success` | Whether sampling succeeded | -| `num_generate_logs` | Number of generation attempts | -| `response` | Model response (truncated to 500 chars) | - -> **Full example:** [`docs/examples/telemetry/telemetry_example.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/telemetry/telemetry_example.py) diff --git a/docs/docs/evaluation-and-observability/metrics.md b/docs/docs/evaluation-and-observability/metrics.md new file mode 100644 index 000000000..ac7b3b827 --- /dev/null +++ b/docs/docs/evaluation-and-observability/metrics.md @@ -0,0 +1,386 @@ +--- +title: "Metrics" +description: "Collect token usage metrics and instrument your own code with OpenTelemetry counters, histograms, and up-down counters." +# diataxis: how-to +--- + +**Prerequisites:** [Telemetry](../evaluation-and-observability/telemetry) +introduces the environment variables and telemetry architecture. This page +covers metrics collection in detail. + +Mellea automatically tracks token consumption across all backends using +OpenTelemetry metrics counters. Token metrics follow the +[Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) +for standardized observability. The metrics API also lets you create your own +counters, histograms, and up-down counters for application-level instrumentation. + +> **Note:** Metrics are an optional feature. All instrument calls are no-ops +> when metrics are disabled or the `[telemetry]` extra is not installed. + +## Enable metrics + +```bash +export MELLEA_METRICS_ENABLED=true +``` + +You also need at least one exporter configured — see +[Metrics export configuration](#metrics-export-configuration) below. + +## Token usage metrics + +Mellea records token consumption automatically after each LLM call completes. +No code changes are required. The `TokenMetricsPlugin` auto-registers when +`MELLEA_METRICS_ENABLED=true` and records metrics via the plugin hook system. + +### Built-in metrics + +| Metric Name | Type | Unit | Description | +| ----------- | ---- | ---- | ----------- | +| `mellea.llm.tokens.input` | Counter | `tokens` | Total input/prompt tokens processed | +| `mellea.llm.tokens.output` | Counter | `tokens` | Total output/completion tokens generated | + +### Metric attributes + +All token metrics include these attributes following Gen-AI semantic conventions: + +| Attribute | Description | Example Values | +| --------- | ----------- | -------------- | +| `gen_ai.provider.name` | Backend provider name | `openai`, `ollama`, `watsonx`, `litellm`, `huggingface` | +| `gen_ai.request.model` | Model identifier | `gpt-4`, `llama3.2:7b`, `granite-3.1-8b-instruct` | + +### Backend support + +| Backend | Streaming | Non-Streaming | Source | +| ------- | --------- | ------------- | ------ | +| OpenAI | Yes | Yes | `usage.prompt_tokens` and `usage.completion_tokens` | +| Ollama | Yes | Yes | `prompt_eval_count` and `eval_count` | +| WatsonX | No | Yes | `input_token_count` and `generated_token_count` (streaming API limitation) | +| LiteLLM | Yes | Yes | `usage.prompt_tokens` and `usage.completion_tokens` | +| HuggingFace | Yes | Yes | Calculated from input_ids and output sequences | + +> **Note:** Token usage metrics are only tracked for `generate_from_context` +> requests. `generate_from_raw` calls do not record token metrics. + +### When metrics are recorded + +Token metrics are recorded **after the full response is received**, not +incrementally during streaming: + +- **Non-streaming**: Metrics recorded immediately after `await mot.avalue()` completes. +- **Streaming**: Metrics recorded after the stream is fully consumed (all chunks received). + +This ensures accurate token counts from the backend's usage metadata, which +is only available after the complete response. + +```python +mot, _ = await backend.generate_from_context(msg, ctx) + +# Metrics NOT recorded yet (stream still in progress) +await mot.astream() + +# Metrics recorded here (after stream completion) +await mot.avalue() +``` + +## Metrics export configuration + +Mellea supports multiple metrics exporters that can be used independently or +simultaneously. + +> **Warning:** If `MELLEA_METRICS_ENABLED=true` but no exporter is configured, +> Mellea logs a warning. Metrics are collected but not exported. + +### Console exporter (debugging) + +Print metrics to console for local debugging without setting up an +observability backend: + +```bash +export MELLEA_METRICS_ENABLED=true +export MELLEA_METRICS_CONSOLE=true +python your_script.py +``` + +Metrics are printed as JSON at the configured export interval (default: 60 +seconds). + +### OTLP exporter (production) + +Export metrics to an OTLP collector for production observability platforms +(Jaeger, Grafana, Datadog, etc.): + +```bash +export MELLEA_METRICS_ENABLED=true +export MELLEA_METRICS_OTLP=true +export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 + +# Optional: metrics-specific endpoint (overrides general endpoint) +export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318 + +# Optional: set service name +export OTEL_SERVICE_NAME=my-mellea-app + +# Optional: adjust export interval (milliseconds, default: 60000) +export OTEL_METRIC_EXPORT_INTERVAL=30000 +``` + +**OTLP collector setup example:** + +```bash +cat > otel-collector-config.yaml < **Full example:** [`docs/examples/telemetry/metrics_example.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/telemetry/metrics_example.py) + +--- + +**See also:** + +- [Telemetry](../evaluation-and-observability/telemetry) — overview of all + telemetry features and configuration. +- [Tracing](../evaluation-and-observability/tracing) — distributed traces + with Gen-AI semantic conventions. +- [Logging](../evaluation-and-observability/logging) — console logging and OTLP + log export. diff --git a/docs/docs/evaluation-and-observability/telemetry.md b/docs/docs/evaluation-and-observability/telemetry.md new file mode 100644 index 000000000..0d654478f --- /dev/null +++ b/docs/docs/evaluation-and-observability/telemetry.md @@ -0,0 +1,160 @@ +--- +title: "Telemetry" +sidebarTitle: "Overview" +description: "Add OpenTelemetry tracing, metrics, and logging to Mellea programs." +# diataxis: how-to +--- + +**Prerequisites:** [Quick Start](../getting-started/quickstart) complete, +`pip install "mellea[telemetry]"`, Ollama running locally. + +Mellea provides built-in [OpenTelemetry](https://opentelemetry.io/) instrumentation +across three independent pillars — tracing, metrics, and logging. Each can be enabled +separately. All telemetry is opt-in: if the `[telemetry]` extra is not installed, +every telemetry call is a silent no-op. + +> **Note:** OpenTelemetry is an optional dependency. Mellea works normally without it. +> Install with `pip install "mellea[telemetry]"` or `uv pip install "mellea[telemetry]"`. + +## Configuration + +All telemetry is configured via environment variables: + +### General + +| Variable | Description | Default | +| -------- | ----------- | ------- | +| `OTEL_SERVICE_NAME` | Service name for all telemetry signals | `mellea` | +| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for all telemetry signals | none | + +### Tracing variables + +| Variable | Description | Default | +| -------- | ----------- | ------- | +| `MELLEA_TRACE_APPLICATION` | Enable application-level tracing | `false` | +| `MELLEA_TRACE_BACKEND` | Enable backend-level tracing | `false` | +| `MELLEA_TRACE_CONSOLE` | Print traces to console (debugging) | `false` | + +### Metrics variables + +| Variable | Description | Default | +| -------- | ----------- | ------- | +| `MELLEA_METRICS_ENABLED` | Enable metrics collection | `false` | +| `MELLEA_METRICS_CONSOLE` | Print metrics to console (debugging) | `false` | +| `MELLEA_METRICS_OTLP` | Enable OTLP metrics exporter | `false` | +| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | Metrics-specific OTLP endpoint (overrides general) | none | +| `MELLEA_METRICS_PROMETHEUS` | Enable Prometheus metric reader | `false` | +| `OTEL_METRIC_EXPORT_INTERVAL` | Export interval in milliseconds | `60000` | + +### Logging variables + +| Variable | Description | Default | +| -------- | ----------- | ------- | +| `MELLEA_LOGS_OTLP` | Enable OTLP logs exporter | `false` | +| `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | Logs-specific OTLP endpoint (overrides general) | none | + +## Quick start + +Enable tracing and metrics with console output to verify everything works: + +```bash +export MELLEA_TRACE_APPLICATION=true +export MELLEA_TRACE_BACKEND=true +export MELLEA_TRACE_CONSOLE=true +export MELLEA_METRICS_ENABLED=true +export MELLEA_METRICS_CONSOLE=true +python your_script.py +``` + +Traces and metrics print to stdout. For production use, replace the console +exporters with an OTLP endpoint: + +```bash +export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317 +export OTEL_SERVICE_NAME=my-mellea-app +``` + +## Checking telemetry status programmatically + +```python +from mellea.telemetry import ( + is_application_tracing_enabled, + is_backend_tracing_enabled, + is_metrics_enabled, +) + +print(f"Application tracing: {is_application_tracing_enabled()}") +print(f"Backend tracing: {is_backend_tracing_enabled()}") +print(f"Metrics: {is_metrics_enabled()}") +``` + +## Tracing + +Mellea has two independent trace scopes: + +- **`mellea.application`** — user-facing operations: session lifecycle, + `@generative` calls, `instruct()` and `act()`, sampling strategies, and + requirement validation. +- **`mellea.backend`** — LLM backend interactions following the + [OpenTelemetry Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/). + Records model calls, token usage, finish reasons, and API latency. + +Enable both for full observability, or pick one depending on what you need to +debug. When both scopes are active, backend spans nest inside application spans: + +```text +session_context (mellea.application) +├── aact (mellea.application) +│ ├── chat (mellea.backend) [gen_ai.system=ollama] +│ └── requirement_validation (mellea.application) +└── aact (mellea.application) + └── chat (mellea.backend) [gen_ai.system=openai] +``` + +See [Tracing](../evaluation-and-observability/tracing) for span attributes, +exporter configuration (Jaeger, Grafana Tempo, etc.), and debugging guidance. + +## Metrics + +Mellea automatically tracks token consumption across all backends using +OpenTelemetry counters (`mellea.llm.tokens.input` and +`mellea.llm.tokens.output`). No code changes are required — the +`TokenMetricsPlugin` records metrics via the plugin hook system after each +LLM call completes. + +The metrics API also exposes `create_counter`, `create_histogram`, and +`create_up_down_counter` for instrumenting your own application code. + +Mellea supports three exporters that can run simultaneously: + +- **Console** — print to stdout for debugging +- **OTLP** — export to production observability platforms +- **Prometheus** — register with `prometheus_client` for scraping + +See [Metrics](../evaluation-and-observability/metrics) for token usage details, +backend support matrix, exporter setup, custom instruments, and troubleshooting. + +## Logging + +Mellea uses a color-coded console logger (`FancyLogger`) by default. When the +`[telemetry]` extra is installed and `MELLEA_LOGS_OTLP=true` is set, Mellea +also exports logs to an OTLP collector alongside existing console output. + +See [Logging](../evaluation-and-observability/logging) for console logging +configuration, OTLP log export setup, and programmatic access via +`get_otlp_log_handler()`. + +> **Full example:** [`docs/examples/telemetry/telemetry_example.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/telemetry/telemetry_example.py) + +--- + +**See also:** + +- [Tracing](../evaluation-and-observability/tracing) — distributed traces + with Gen-AI semantic conventions. +- [Metrics](../evaluation-and-observability/metrics) — token usage metrics, + exporters, and custom instruments. +- [Logging](../evaluation-and-observability/logging) — console logging and OTLP + log export. +- [Evaluate with LLM-as-a-Judge](../evaluation-and-observability/evaluate-with-llm-as-a-judge) — + automated quality evaluation correlated with trace data. diff --git a/docs/docs/evaluation-and-observability/opentelemetry-tracing.md b/docs/docs/evaluation-and-observability/tracing.md similarity index 87% rename from docs/docs/evaluation-and-observability/opentelemetry-tracing.md rename to docs/docs/evaluation-and-observability/tracing.md index d4dde4a67..0595f7c1d 100644 --- a/docs/docs/evaluation-and-observability/opentelemetry-tracing.md +++ b/docs/docs/evaluation-and-observability/tracing.md @@ -1,10 +1,10 @@ --- -title: "OpenTelemetry Tracing" +title: "Tracing" description: "Export distributed traces from Mellea using OpenTelemetry semantic conventions." # diataxis: how-to --- -**Prerequisites:** [Metrics and Telemetry](../evaluation-and-observability/metrics-and-telemetry) +**Prerequisites:** [Telemetry](../evaluation-and-observability/telemetry) introduces the environment variables and trace scopes. This page focuses on exporting traces to external backends and interpreting the span data they contain. @@ -131,9 +131,22 @@ Backend spans cover individual LLM API calls. They follow the | `gen_ai.usage.input_tokens` | Input tokens consumed | | `gen_ai.usage.output_tokens` | Output tokens generated | | `gen_ai.usage.total_tokens` | Total tokens (input + output) | +| `gen_ai.response.model` | Actual model used in the response (may differ from request) | | `gen_ai.response.finish_reasons` | List of finish reasons (e.g., `["stop"]`) | | `gen_ai.response.id` | Response identifier from the backend | +Mellea also adds context-specific attributes to backend spans: + +| Attribute | Description | +| --------- | ----------- | +| `mellea.backend` | Backend class name (e.g., `OpenAIBackend`) | +| `mellea.action_type` | Component type being executed | +| `mellea.context_size` | Number of items in context | +| `mellea.has_format` | Whether structured output format is specified | +| `mellea.format_type` | Response format class name | +| `mellea.tool_calls_enabled` | Whether tool calling is enabled | +| `mellea.num_actions` | Number of actions in batch (for `generate_from_raw`) | + ### Span hierarchy When both scopes are active, backend spans nest inside application spans: @@ -225,11 +238,13 @@ import mellea # noqa: E402 --- -## Next steps +**See also:** -- [Metrics and Telemetry](../evaluation-and-observability/metrics-and-telemetry) — - enable metrics collection alongside tracing, and learn how to instrument your - own code with counters and histograms. +- [Telemetry](../evaluation-and-observability/telemetry) — overview of all + telemetry features and configuration. +- [Metrics](../evaluation-and-observability/metrics) — token usage metrics, + exporters, and custom instruments. +- [Logging](../evaluation-and-observability/logging) — console logging and OTLP + log export. - [Evaluate with LLM-as-a-Judge](../evaluation-and-observability/evaluate-with-llm-as-a-judge) — - add automated quality evaluation to your pipeline and correlate evaluation - results with trace data. + automated quality evaluation correlated with trace data. diff --git a/docs/docs/examples/traced-generation-loop.md b/docs/docs/examples/traced-generation-loop.md index 469b16b3f..3b12b766f 100644 --- a/docs/docs/examples/traced-generation-loop.md +++ b/docs/docs/examples/traced-generation-loop.md @@ -364,7 +364,7 @@ applicable: - Set `OTEL_SERVICE_NAME=my-app` to customise the service name in your trace backend. -- See [OpenTelemetry Tracing](../evaluation-and-observability/opentelemetry-tracing) +- See [Tracing](../evaluation-and-observability/tracing) for attribute schemas and advanced configuration. - Add `MELLEA_TRACE_CONSOLE=true` alongside an OTLP endpoint to confirm spans are generated even when the remote collector is unavailable. diff --git a/docs/docs/guide/CONTRIBUTING.md b/docs/docs/guide/CONTRIBUTING.md index 7780ce734..fd0a58434 100644 --- a/docs/docs/guide/CONTRIBUTING.md +++ b/docs/docs/guide/CONTRIBUTING.md @@ -361,7 +361,7 @@ See [CONTRIBUTING.md](../../CONTRIBUTING.md) for the full validation workflow. ```bash cd docs/docs -mint dev +mintlify dev # Site available at http://localhost:3000 ``` @@ -405,7 +405,7 @@ markdownlint docs/docs/guide/your-page.md - [ ] `**See also:**` footer present with relevant cross-links (Mintlify generates prev/next automatically). - [ ] `docs.json` updated if new page added; old MDX page removed from nav if replaced. - [ ] `index.mdx` landing page cards reviewed — add a card if the new page is a major entry point (key pattern, integration, or prominent how-to); keep total cards per section to ≤ 8. -- [ ] Previewed locally with `mint dev`. +- [ ] Previewed locally with `mintlify dev`. - [ ] Non-deterministic LLM output noted. - [ ] Backend-specific code blocks flagged with `> **Backend note:**`. - [ ] No visible TODO placeholders — missing content tracked as GitHub issues. diff --git a/docs/docs/guide/glossary.md b/docs/docs/guide/glossary.md index df1e279e0..d5af369ea 100644 --- a/docs/docs/guide/glossary.md +++ b/docs/docs/guide/glossary.md @@ -537,7 +537,7 @@ except PreconditionException as e: print(e.validation) # list of ValidationResult ``` -See: [Handling Exceptions and Failures](../evaluation-and-observability/handling-exceptions) +See: [Handling Exceptions and Failures](../how-to/handling-exceptions) --- diff --git a/docs/docs/evaluation-and-observability/handling-exceptions.md b/docs/docs/how-to/handling-exceptions.md similarity index 99% rename from docs/docs/evaluation-and-observability/handling-exceptions.md rename to docs/docs/how-to/handling-exceptions.md index aae90f94d..9de551621 100644 --- a/docs/docs/evaluation-and-observability/handling-exceptions.md +++ b/docs/docs/how-to/handling-exceptions.md @@ -296,7 +296,7 @@ if not result.success: ``` For structured telemetry across all calls, see -[Metrics and Telemetry](./metrics-and-telemetry). +[Telemetry](../evaluation-and-observability/telemetry). --- diff --git a/docs/docs/how-to/unit-test-generative-code.md b/docs/docs/how-to/unit-test-generative-code.md index 25eba7997..0c62cd271 100644 --- a/docs/docs/how-to/unit-test-generative-code.md +++ b/docs/docs/how-to/unit-test-generative-code.md @@ -383,5 +383,5 @@ pytest -m qualitative - [The Requirements System](../concepts/requirements-system) — understand how `Requirement`, `simple_validate`, and `check` interact with the IVR loop -- [Handling Exceptions](../evaluation-and-observability/handling-exceptions) — +- [Handling Exceptions](../how-to/handling-exceptions) — catch and diagnose errors that occur during generation diff --git a/docs/docs/troubleshooting/common-errors.md b/docs/docs/troubleshooting/common-errors.md index 7c2553c51..d691cbbe0 100644 --- a/docs/docs/troubleshooting/common-errors.md +++ b/docs/docs/troubleshooting/common-errors.md @@ -237,7 +237,7 @@ ollama pull granite-guardian-3.2-5b - **GitHub Issues:** [github.com/generative-computing/mellea/issues](https://github.com/generative-computing/mellea/issues) - **Examples:** [`docs/examples/`](https://github.com/generative-computing/mellea/tree/main/docs/examples) - Enable telemetry to inspect what is happening at each step — see - [Metrics and Telemetry](../evaluation-and-observability/metrics-and-telemetry). + [Telemetry](../evaluation-and-observability/telemetry). --- diff --git a/docs/docs/troubleshooting/faq.md b/docs/docs/troubleshooting/faq.md index 2ac1a3d30..079979ac8 100644 --- a/docs/docs/troubleshooting/faq.md +++ b/docs/docs/troubleshooting/faq.md @@ -269,7 +269,7 @@ with start_session() as m: ``` For the full telemetry setup, see -[OpenTelemetry Tracing](../evaluation-and-observability/opentelemetry-tracing). +[Tracing](../evaluation-and-observability/tracing). ## Does Mellea support async? diff --git a/docs/examples/telemetry/README.md b/docs/examples/telemetry/README.md index 051733634..2ef414925 100644 --- a/docs/examples/telemetry/README.md +++ b/docs/examples/telemetry/README.md @@ -166,4 +166,4 @@ ollama serve # Start Ollama server ## Learn More -See [docs/dev/telemetry.md](../../dev/telemetry.md) for complete documentation. \ No newline at end of file +See the [Telemetry documentation](../../docs/evaluation-and-observability/telemetry.md) for complete details. \ No newline at end of file diff --git a/docs/examples/telemetry/metrics_example.py b/docs/examples/telemetry/metrics_example.py index c8630c554..8a58f7a35 100644 --- a/docs/examples/telemetry/metrics_example.py +++ b/docs/examples/telemetry/metrics_example.py @@ -36,7 +36,7 @@ python metrics_example.py # For OTLP Collector and Prometheus setup instructions, see: -# docs/dev/telemetry.md +# docs/docs/evaluation-and-observability/metrics.md """ import os