diff --git a/docs/dev/telemetry.md b/docs/dev/telemetry.md
deleted file mode 100644
index b0a52d6b0..000000000
--- a/docs/dev/telemetry.md
+++ /dev/null
@@ -1,705 +0,0 @@
-## OpenTelemetry Instrumentation in Mellea
-
-Mellea provides built-in OpenTelemetry instrumentation with comprehensive observability features that can be enabled independently. The instrumentation follows the [OpenTelemetry Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) for standardized observability across LLM applications.
-
-**Note**: OpenTelemetry is an optional dependency. If not installed, telemetry features are automatically disabled with no impact on functionality.
-
-### Observability Features
-
-1. **Application Trace** (`mellea.application`) - Tracks user-facing operations
-2. **Backend Trace** (`mellea.backend`) - Tracks LLM backend interactions with Gen-AI semantic conventions
-3. **Token Usage Metrics** - Tracks token consumption across all backends with Gen-AI semantic conventions
-
-### Installation
-
-To use telemetry features, install Mellea with OpenTelemetry support:
-
-```bash
-pip install mellea[telemetry]
-# or
-uv pip install mellea[telemetry]
-```
-
-Without the `[telemetry]` extra, Mellea works normally but telemetry features are disabled.
-
-### Configuration
-
-Telemetry is configured via environment variables:
-
-#### General Telemetry Configuration
-
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `OTEL_SERVICE_NAME` | Service name for all telemetry signals | `mellea` |
-| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for all telemetry signals | None |
-
-#### Tracing Configuration
-
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `MELLEA_TRACE_APPLICATION` | Enable application-level tracing | `false` |
-| `MELLEA_TRACE_BACKEND` | Enable backend-level tracing | `false` |
-| `MELLEA_TRACE_CONSOLE` | Print traces to console (debugging) | `false` |
-
-#### Metrics Configuration
-
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `MELLEA_METRICS_ENABLED` | Enable metrics collection | `false` |
-| `MELLEA_METRICS_CONSOLE` | Print metrics to console (debugging) | `false` |
-| `MELLEA_METRICS_OTLP` | Enable OTLP metrics exporter | `false` |
-| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | OTLP metrics-specific endpoint (overrides general endpoint) | None |
-| `MELLEA_METRICS_PROMETHEUS` | Enable Prometheus metric reader (registers with prometheus_client registry) | `false` |
-| `OTEL_METRIC_EXPORT_INTERVAL` | Export interval in milliseconds | `60000` |
-
-#### Logging Configuration
-
-| Variable | Description | Default |
-|----------|-------------|---------|
-| `MELLEA_LOGS_OTLP` | Enable OTLP logs exporter | `false` |
-| `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | OTLP logs-specific endpoint (overrides general endpoint) | None |
-
-### Application Trace Scope
-
-The application tracer (`mellea.application`) instruments:
-
-- **Session lifecycle**: `start_session()`, session context manager entry/exit
-- **@generative functions**: Execution of functions decorated with `@generative`
-- **mfuncs.aact()**: Action execution with requirements and sampling strategies
-- **Sampling strategies**: Rejection sampling, budget forcing, etc.
-- **Requirement validation**: Validation of requirements and constraints
-
-**Span attributes include:**
-- `backend`: Backend class name
-- `model_id`: Model identifier
-- `context_type`: Context class name
-- `action_type`: Component type being executed
-- `has_requirements`: Whether requirements are specified
-- `has_strategy`: Whether a sampling strategy is used
-- `strategy_type`: Sampling strategy class name
-- `num_generate_logs`: Number of generation attempts
-- `sampling_success`: Whether sampling succeeded
-- `response`: Model response (truncated to 500 chars)
-- `response_length`: Full length of model response
-
-### Backend Trace Scope
-
-The backend tracer (`mellea.backend`) instruments LLM interactions following [OpenTelemetry Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/):
-
-- **Backend.generate_from_context()**: Context-based generation (chat operations)
-- **Backend.generate_from_raw()**: Raw generation without context (text completions)
-- **Backend-specific implementations**: Ollama, OpenAI, HuggingFace, Watsonx, LiteLLM
-
-**Gen-AI Semantic Convention Attributes:**
-- `gen_ai.system`: LLM system name (e.g., `openai`, `ollama`, `huggingface`)
-- `gen_ai.request.model`: Model identifier used for the request
-- `gen_ai.response.model`: Actual model used in the response (may differ from request)
-- `gen_ai.operation.name`: Operation type (`chat` or `text_completion`)
-- `gen_ai.usage.input_tokens`: Number of input tokens consumed
-- `gen_ai.usage.output_tokens`: Number of output tokens generated
-- `gen_ai.usage.total_tokens`: Total tokens consumed
-- `gen_ai.response.id`: Response ID from the LLM provider
-- `gen_ai.response.finish_reasons`: List of finish reasons (e.g., `["stop"]`, `["length"]`)
-
-**Mellea-Specific Attributes:**
-- `mellea.backend`: Backend class name (e.g., `OpenAIBackend`)
-- `mellea.action_type`: Component type being executed
-- `mellea.context_size`: Number of items in context
-- `mellea.has_format`: Whether structured output format is specified
-- `mellea.format_type`: Response format class name
-- `mellea.tool_calls_enabled`: Whether tool calling is enabled
-- `mellea.num_actions`: Number of actions in batch (for `generate_from_raw`)
-
-### Token Usage Metrics
-
-Mellea automatically tracks token consumption across backends using OpenTelemetry metrics counters. Token metrics follow the [Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/) for standardized observability.
-
-> **Note**: Token usage metrics are only tracked for `generate_from_context` requests. `generate_from_raw` calls do not record token metrics.
-
-#### Metrics
-
-| Metric Name | Type | Unit | Description |
-|-------------|------|------|-------------|
-| `mellea.llm.tokens.input` | Counter | `tokens` | Total input/prompt tokens processed |
-| `mellea.llm.tokens.output` | Counter | `tokens` | Total output/completion tokens generated |
-
-#### Attributes
-
-All token metrics include these attributes following Gen-AI semantic conventions:
-
-| Attribute | Description | Example Values |
-|-----------|-------------|----------------|
-| `gen_ai.provider.name` | Backend provider name | `openai`, `ollama`, `watsonx`, `litellm`, `huggingface` |
-| `gen_ai.request.model` | Model identifier | `gpt-4`, `llama3.2:7b`, `granite-3.1-8b-instruct` |
-
-#### Backend Support
-
-| Backend | Streaming | Non-Streaming | Source |
-|---------|-----------|---------------|--------|
-| OpenAI | ✅ | ✅ | `usage.prompt_tokens` and `usage.completion_tokens` |
-| Ollama | ✅ | ✅ | `prompt_eval_count` and `eval_count` |
-| WatsonX | ❌ | ✅ | `input_token_count` and `generated_token_count` (streaming API limitation) |
-| LiteLLM | ✅ | ✅ | `usage.prompt_tokens` and `usage.completion_tokens` |
-| HuggingFace | ✅ | ✅ | Calculated from input_ids and output sequences |
-
-#### Configuration
-
-Token metrics are **disabled by default** for zero overhead. Enable with:
-
-```bash
-export MELLEA_METRICS_ENABLED=true
-```
-
-Metrics are automatically recorded after each LLM call completes. No code changes required.
-
-#### When Metrics Are Recorded
-
-Token metrics are recorded **after the full response is received**, not incrementally during streaming:
-
-- **Non-streaming**: Metrics recorded immediately after `await mot.avalue()` completes
-- **Streaming**: Metrics recorded after the stream is fully consumed (all chunks received)
-
-This ensures accurate token counts are captured from the backend's usage metadata, which is only available after the complete response.
-
-```python
-mot, _ = await backend.generate_from_context(msg, ctx)
-
-# Metrics NOT recorded yet (stream still in progress)
-await mot.astream()
-
-# Metrics recorded here (after stream completion)
-await mot.avalue()
-```
-
-#### Metrics Export Configuration
-
-Mellea supports multiple metrics exporters that can be used independently or simultaneously:
-
-1. **Console Exporter** - For local debugging
-2. **OTLP Exporter** - For production observability platforms
-3. **Prometheus Exporter** - For Prometheus-based monitoring
-
-**Important**: If `MELLEA_METRICS_ENABLED=true` but no exporter is configured, you'll see a warning. Metrics will be collected but not exported.
-
-##### Console Exporter (Debugging)
-
-Print metrics to console for local debugging without setting up an observability backend:
-
-```bash
-export MELLEA_METRICS_ENABLED=true
-export MELLEA_METRICS_CONSOLE=true
-python docs/examples/telemetry/metrics_example.py
-```
-
-Metrics are printed as JSON at the configured export interval (default: 60 seconds).
-
-##### OTLP Exporter (Production)
-
-Export metrics to an OTLP collector for production observability platforms (Jaeger, Grafana, Datadog, etc.):
-
-```bash
-# Enable metrics and OTLP exporter
-export MELLEA_METRICS_ENABLED=true
-export MELLEA_METRICS_OTLP=true
-
-# Configure OTLP endpoint
-export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
-
-# Optional: Use metrics-specific endpoint (overrides general endpoint)
-export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318
-
-# Optional: Set service name
-export OTEL_SERVICE_NAME=my-mellea-app
-
-# Optional: Adjust export interval (milliseconds, default: 60000)
-export OTEL_METRIC_EXPORT_INTERVAL=30000
-
-python docs/examples/telemetry/metrics_example.py
-```
-
-**OTLP Collector Setup Example:**
-
-```bash
-# Create otel-collector-config.yaml
-cat > otel-collector-config.yaml <<EOF
-receivers:
-  otlp:
-    protocols:
-      grpc:
-        endpoint: 0.0.0.0:4317
-
-exporters:
-  prometheus:
-    endpoint: 0.0.0.0:8889
-  debug:
-    verbosity: detailed
-
-service:
-  pipelines:
-    metrics:
-      receivers: [otlp]
-      exporters: [prometheus, debug]
-EOF
-
-# Start OTLP collector
-docker run -p 4317:4317 -p 8889:8889 \
-  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
-  otel/opentelemetry-collector:latest
-```
-
-##### Prometheus Exporter
-
-Register metrics with the `prometheus_client` default registry for Prometheus scraping:
-
-```bash
-# Enable metrics and Prometheus metric reader
-export MELLEA_METRICS_ENABLED=true
-export MELLEA_METRICS_PROMETHEUS=true
-
-# Optional: Set service name
-export OTEL_SERVICE_NAME=my-mellea-app
-
-python docs/examples/telemetry/metrics_example.py
-```
-
-When enabled, Mellea registers its OpenTelemetry metrics with the `prometheus_client` default registry via `PrometheusMetricReader`. Your application is responsible for exposing the registry for scraping. Common approaches:
-
-**Standalone HTTP server** (simplest):
-
-```python
-from prometheus_client import start_http_server
-
-# Start Prometheus scrape endpoint on port 9464
-start_http_server(9464)
-```
-
-**FastAPI middleware**:
-
-```python
-from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
-from fastapi import FastAPI, Response
-
-app = FastAPI()
-
-@app.get("/metrics")
-def metrics():
-    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)
-```
-
-**Flask route**:
-
-```python
-from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
-from flask import Flask, Response
-
-app = Flask(__name__)
-
-@app.route("/metrics")
-def metrics():
-    return Response(generate_latest(), content_type=CONTENT_TYPE_LATEST)
-```
-
-**Verify Prometheus Endpoint:**
-
-```bash
-curl http://localhost:9464/metrics
-```
-
-**Prometheus Server Configuration:**
-
-```yaml
-# prometheus.yml
-global:
-  scrape_interval: 15s
-
-scrape_configs:
-  - job_name: 'mellea'
-    static_configs:
-      - targets: ['localhost:9464']
-```
-
-**Start Prometheus:**
-
-```bash
-docker run -p 9090:9090 \
-  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
-  prom/prometheus
-```
-
-Access Prometheus UI at `http://localhost:9090` and query metrics like `mellea_llm_tokens_input`.
-
-##### Multiple Exporters Simultaneously
-
-You can enable multiple exporters at once for different use cases:
-
-```bash
-# Enable all exporters
-export MELLEA_METRICS_ENABLED=true
-export MELLEA_METRICS_CONSOLE=true          # Debug output
-export MELLEA_METRICS_OTLP=true             # Production observability
-export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
-export MELLEA_METRICS_PROMETHEUS=true       # Prometheus monitoring
-
-python docs/examples/telemetry/metrics_example.py
-```
-
-This configuration:
-- Prints metrics to console for immediate feedback
-- Exports to OTLP collector for centralized observability
-- Registers metrics with prometheus_client registry for Prometheus scraping
-
-**Use Cases:**
-- **Development**: Console + Prometheus for local testing
-- **Production**: OTLP + Prometheus for comprehensive monitoring
-- **Debugging**: Console only for quick verification
-
-#### Programmatic Access
-
-Check if metrics are enabled:
-
-```python
-from mellea.telemetry import is_metrics_enabled
-
-if is_metrics_enabled():
-    print("Token metrics are being collected")
-```
-
-Access token usage data from `ModelOutputThunk`:
-
-```python
-from mellea import start_session
-
-with start_session() as m:
-    result = m.instruct("Write a haiku about programming")
-    
-    # Access token usage (follows OpenAI API format)
-    if result.usage:
-        print(f"Prompt tokens: {result.usage['prompt_tokens']}")
-        print(f"Completion tokens: {result.usage['completion_tokens']}")
-        print(f"Total tokens: {result.usage['total_tokens']}")
-```
-
-The `usage` field is a dictionary with three keys: `prompt_tokens`, `completion_tokens`, and `total_tokens`. All backends populate this field consistently.
-
-#### Performance
-
-- **Zero overhead when disabled**: When `MELLEA_METRICS_ENABLED=false` (default), the TokenMetricsPlugin is not registered and has no overhead
-- **Minimal overhead when enabled**: Counter increments are extremely fast (~nanoseconds per operation)
-- **Async export**: Metrics are batched and exported asynchronously (default: every 60 seconds)
-- **Non-blocking**: Metric recording never blocks LLM calls
-- **Automatic collection**: Metrics are recorded via hooks after generation completes—no manual instrumentation needed
-
-#### Use Cases
-
-**Cost Monitoring**: Track token consumption to estimate and control LLM costs across models and backends.
-
-**Performance Optimization**: Identify operations consuming excessive tokens and optimize prompts.
-
-**Model Comparison**: Compare token efficiency across different models for the same tasks.
-
-**Budget Enforcement**: Set up alerts when token usage exceeds thresholds.
-
-**Capacity Planning**: Analyze token usage patterns to plan infrastructure capacity.
-
-### Trace Usage Examples
-
-#### Enable Application Tracing Only
-
-```bash
-export MELLEA_TRACE_APPLICATION=true
-export MELLEA_TRACE_BACKEND=false
-python docs/examples/instruct_validate_repair/101_email.py
-```
-
-This traces user-facing operations like `@generative` function calls, session lifecycle, and sampling strategies, but not the underlying LLM API calls.
-
-#### Enable Backend Tracing Only
-
-```bash
-export MELLEA_TRACE_APPLICATION=false
-export MELLEA_TRACE_BACKEND=true
-python docs/examples/instruct_validate_repair/101_email.py
-```
-
-This traces only the LLM backend interactions, showing model calls, token usage, and API latency.
-
-#### Enable Both Traces
-
-```bash
-export MELLEA_TRACE_APPLICATION=true
-export MELLEA_TRACE_BACKEND=true
-python docs/examples/instruct_validate_repair/101_email.py
-```
-
-This provides complete observability across both application logic and backend interactions.
-
-#### Export to Jaeger
-
-```bash
-# Start Jaeger (example using Docker)
-docker run -d --name jaeger \
-  -p 4317:4317 \
-  -p 16686:16686 \
-  jaegertracing/all-in-one:latest
-
-# Configure Mellea to export traces
-export MELLEA_TRACE_APPLICATION=true
-export MELLEA_TRACE_BACKEND=true
-export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
-export OTEL_SERVICE_NAME=my-mellea-app
-
-python docs/examples/instruct_validate_repair/101_email.py
-
-# View traces at http://localhost:16686
-```
-
-#### Console Output for Debugging
-
-```bash
-export MELLEA_TRACE_APPLICATION=true
-export MELLEA_TRACE_CONSOLE=true
-python docs/examples/instruct_validate_repair/101_email.py
-```
-
-This prints trace spans to the console, useful for local debugging without setting up a trace backend.
-
-### Programmatic Access
-
-You can check if tracing is enabled in your code:
-
-```python
-from mellea.telemetry import (
-    is_application_tracing_enabled,
-    is_backend_tracing_enabled,
-)
-
-if is_application_tracing_enabled():
-    print("Application tracing is enabled")
-
-if is_backend_tracing_enabled():
-    print("Backend tracing is enabled")
-```
-
-### Performance Considerations
-
-- **Zero overhead when disabled**: When tracing is disabled (default), there is minimal performance impact
-- **Async-friendly**: Tracing works seamlessly with async operations
-- **Batched export**: Traces are exported in batches to minimize network overhead
-- **Separate scopes**: Enable only the tracing you need to reduce overhead
-
-### Integration with Observability Tools
-
-Mellea's OpenTelemetry instrumentation works with any OTLP-compatible backend:
-
-- **Jaeger**: Distributed tracing
-- **Zipkin**: Distributed tracing
-- **Grafana Tempo**: Distributed tracing
-- **Honeycomb**: Observability platform
-- **Datadog**: APM and observability
-- **New Relic**: APM and observability
-- **AWS X-Ray**: Distributed tracing (via OTLP)
-- **Google Cloud Trace**: Distributed tracing (via OTLP)
-
-### Example Trace Hierarchy
-
-When both traces are enabled, you'll see a hierarchy like:
-
-```
-session_context (application)
-├── aact (application)
-│   ├── chat (backend) [gen_ai.system=ollama, gen_ai.request.model=llama3.2]
-│   │   └── [gen_ai.usage.input_tokens=150, gen_ai.usage.output_tokens=50]
-│   └── requirement_validation (application)
-├── aact (application)
-│   └── chat (backend) [gen_ai.system=openai, gen_ai.request.model=gpt-4]
-│       └── [gen_ai.usage.input_tokens=200, gen_ai.usage.output_tokens=75]
-```
-
-The Gen-AI semantic conventions make it easy to:
-- Track token usage across different LLM providers
-- Compare performance between models
-- Monitor costs based on token consumption
-- Identify which operations consume the most tokens
-
-### Troubleshooting
-
-#### Traces
-
-**Traces not appearing:**
-1. Verify environment variables are set correctly
-2. Check that OTLP endpoint is reachable
-3. Enable console output to verify traces are being created
-4. Check firewall/network settings
-
-**High overhead:**
-1. Disable application tracing if you only need backend metrics
-2. Reduce sampling rate (future feature)
-3. Use a local OTLP collector to batch exports
-
-**Missing spans:**
-1. Ensure you're using `with start_session()` context manager
-2. Check that async operations are properly awaited
-3. Verify backend implementation has instrumentation
-
-#### Metrics
-
-**Metrics not appearing:**
-1. Verify `MELLEA_METRICS_ENABLED=true` is set
-2. Check that at least one exporter is configured (Console, OTLP, or Prometheus)
-3. For OTLP: Verify `MELLEA_METRICS_OTLP=true` and endpoint is reachable
-4. For Prometheus: Verify `MELLEA_METRICS_PROMETHEUS=true` and your application exposes the registry (`curl http://localhost:PORT/metrics`)
-5. Enable console output (`MELLEA_METRICS_CONSOLE=true`) to verify metrics are being collected
-
-**Missing OpenTelemetry dependency:**
-```
-ImportError: No module named 'opentelemetry'
-```
-Install telemetry dependencies:
-```bash
-pip install mellea[telemetry]
-# or
-uv pip install mellea[telemetry]
-```
-
-**OTLP connection refused:**
-```
-Failed to export metrics via OTLP
-```
-1. Verify OTLP collector is running: `docker ps | grep otel`
-2. Check endpoint URL is correct (default: `http://localhost:4317`)
-3. Verify network connectivity: `curl http://localhost:4317`
-4. Check collector logs for errors
-
-**Metrics not updating:**
-1. Metrics are exported at intervals (default: 60 seconds). Wait for export cycle.
-2. Reduce export interval for testing: `export OTEL_METRIC_EXPORT_INTERVAL=10000` (10 seconds)
-3. For Prometheus: Metrics update on scrape, not continuously
-4. Verify LLM calls are actually being made and completing successfully
-
-**No exporter configured warning:**
-```
-WARNING: Metrics are enabled but no exporters are configured
-```
-Enable at least one exporter:
-- Console: `export MELLEA_METRICS_CONSOLE=true`
-- OTLP: `export MELLEA_METRICS_OTLP=true` + endpoint
-- Prometheus: `export MELLEA_METRICS_PROMETHEUS=true`
-
-### OTLP Logging Export
-
-Mellea's internal logging (via `FancyLogger`) can export logs to OTLP collectors for centralized log aggregation and analysis. This complements the existing tracing and metrics capabilities.
-
-**Note**: OTLP logging is **disabled by default** and requires explicit enablement. When disabled, Mellea's logging works normally (console + REST handlers) with zero overhead.
-
-#### Configuration
-
-Enable OTLP log export with environment variables:
-
-```bash
-# Enable OTLP logs exporter
-export MELLEA_LOGS_OTLP=true
-
-# Configure OTLP endpoint (required)
-export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
-
-# Optional: Use logs-specific endpoint (overrides general endpoint)
-export OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:4318
-
-# Optional: Set service name
-export OTEL_SERVICE_NAME=my-mellea-app
-```
-
-#### How It Works
-
-When enabled, Mellea's `FancyLogger` adds an OpenTelemetry `LoggingHandler` alongside existing handlers:
-- **Console handler**: Continues to work normally
-- **REST handler**: Continues to work normally
-- **OTLP handler**: Exports logs to configured OTLP collector (new)
-
-Logs are exported using OpenTelemetry's Logs API with batched processing for efficiency.
-
-#### OTLP Collector Setup Example
-
-```bash
-# Create otel-collector-config.yaml
-cat > otel-collector-config.yaml <<EOF
-receivers:
-  otlp:
-    protocols:
-      grpc:
-        endpoint: 0.0.0.0:4317
-
-exporters:
-  debug:
-    verbosity: detailed
-  file:
-    path: ./mellea-logs.json
-
-service:
-  pipelines:
-    logs:
-      receivers: [otlp]
-      exporters: [debug, file]
-EOF
-
-# Start OTLP collector
-docker run -p 4317:4317 \
-  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
-  -v $(pwd):/logs \
-  otel/opentelemetry-collector:latest
-```
-
-#### Integration with Observability Platforms
-
-OTLP logs work with any OTLP-compatible platform:
-- **Grafana Loki**: Log aggregation and querying
-- **Elasticsearch**: Log storage and analysis
-- **Datadog**: Unified logs, traces, and metrics
-- **New Relic**: Centralized logging
-- **Splunk**: Log analysis and monitoring
-
-#### Performance
-
-- **Zero overhead when disabled**: No OTLP handler created, no performance impact
-- **Batched export**: Logs are batched and exported asynchronously
-- **Non-blocking**: Log export never blocks application code
-- **Minimal overhead when enabled**: OpenTelemetry's efficient batching minimizes impact
-
-#### Troubleshooting
-
-**Logs not appearing:**
-1. Verify `MELLEA_LOGS_OTLP=true` is set
-2. Check that OTLP endpoint is configured and reachable
-3. Verify OTLP collector is running and configured to receive logs
-4. Check collector logs for connection errors
-
-**Warning about missing endpoint:**
-```
-WARNING: OTLP logs exporter is enabled but no endpoint is configured
-```
-Set either `OTEL_EXPORTER_OTLP_ENDPOINT` or `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT`.
-
-**Connection refused:**
-1. Verify OTLP collector is running: `docker ps | grep otel`
-2. Check endpoint URL is correct (default: `http://localhost:4317`)
-3. Verify network connectivity: `curl http://localhost:4317`
-
-### Future Enhancements
-
-Planned improvements to telemetry:
-
-**Metrics:**
-- Latency histograms: Request duration and time-to-first-token (TTFB) for streaming
-- Cost tracking: Automatic cost estimation based on token usage and model pricing
-- Error counters: Categorized by semantic error type (rate limits, timeouts, content policy, etc.)
-- Operational metrics: Sampling attempts, requirement validation outcomes, tool calls
-
-**Tracing:**
-- Sampling rate configuration
-- Custom span attributes via decorators
-- Trace context propagation for distributed systems
-- Integration with LangSmith and other LLM observability tools
-
-**Logging:**
-- Log correlation with traces and metrics (automatic trace/span ID injection)
-- Structured log attributes for better querying
\ No newline at end of file
diff --git a/docs/docs/README.md b/docs/docs/README.md
index fa382eb23..a2957cc0b 100644
--- a/docs/docs/README.md
+++ b/docs/docs/README.md
@@ -19,7 +19,7 @@ npm install -g mintlify
 
 ```bash
 cd docs/docs
-mint dev
+mintlify dev
 ```
 
 The site is available at <http://localhost:3000>.
diff --git a/docs/docs/docs.json b/docs/docs/docs.json
index d7f60ad0b..8fb65bf95 100644
--- a/docs/docs/docs.json
+++ b/docs/docs/docs.json
@@ -68,7 +68,8 @@
               "how-to/use-images-and-vision",
               "how-to/build-a-rag-pipeline",
               "how-to/refactor-prompts-with-cli",
-              "how-to/unit-test-generative-code"
+              "how-to/unit-test-generative-code",
+              "how-to/handling-exceptions"
             ]
           },
           {
@@ -100,10 +101,16 @@
           {
             "group": "Evaluation and Observability",
             "pages": [
-              "evaluation-and-observability/handling-exceptions",
-              "evaluation-and-observability/metrics-and-telemetry",
-              "evaluation-and-observability/opentelemetry-tracing",
-              "evaluation-and-observability/evaluate-with-llm-as-a-judge"
+              "evaluation-and-observability/evaluate-with-llm-as-a-judge",
+              {
+                "group": "Telemetry",
+                "pages": [
+                  "evaluation-and-observability/telemetry",
+                  "evaluation-and-observability/tracing",
+                  "evaluation-and-observability/metrics",
+                  "evaluation-and-observability/logging"
+                ]
+              }
             ]
           },
           {
diff --git a/docs/docs/evaluation-and-observability/evaluate-with-llm-as-a-judge.md b/docs/docs/evaluation-and-observability/evaluate-with-llm-as-a-judge.md
index 84d5a57fb..f8064ca58 100644
--- a/docs/docs/evaluation-and-observability/evaluate-with-llm-as-a-judge.md
+++ b/docs/docs/evaluation-and-observability/evaluate-with-llm-as-a-judge.md
@@ -202,4 +202,4 @@ requirements. `sample_generations` lists every attempt made.
 
 **See also:** [The Requirements System](../concepts/requirements-system) |
 [Write Custom Verifiers](../how-to/write-custom-verifiers) |
-[Handling Exceptions and Failures](../evaluation-and-observability/handling-exceptions)
+[Handling Exceptions and Failures](../how-to/handling-exceptions)
diff --git a/docs/docs/evaluation-and-observability/logging.md b/docs/docs/evaluation-and-observability/logging.md
new file mode 100644
index 000000000..908c43928
--- /dev/null
+++ b/docs/docs/evaluation-and-observability/logging.md
@@ -0,0 +1,186 @@
+---
+title: "Logging"
+description: "Configure Mellea's console logging and export logs to OTLP collectors."
+# diataxis: reference
+---
+
+**Prerequisites:** [Telemetry](../evaluation-and-observability/telemetry)
+introduces the environment variables and telemetry architecture. This page
+covers logging configuration in detail.
+
+Mellea provides two logging layers: a built-in console logger for local
+development and an optional OTLP exporter for centralized log aggregation.
+Both work simultaneously when enabled.
+
+## Console logging
+
+Mellea uses `FancyLogger`, a color-coded singleton logger built on Python's
+`logging` module. All internal Mellea modules obtain their logger via
+`FancyLogger.get_logger()`.
+
+### Configuration
+
+| Variable | Description | Default |
+| -------- | ----------- | ------- |
+| `DEBUG` | Set to any value to enable `DEBUG`-level output | unset (`INFO` level) |
+| `FLOG` | Set to any value to forward logs to a local REST endpoint at `http://localhost:8000/api/receive` | unset |
+
+By default, `FancyLogger` logs at `INFO` level with color-coded output to
+stdout. Set the `DEBUG` environment variable to lower the level to `DEBUG`:
+
+```bash
+export DEBUG=1
+python your_script.py
+```
+
+### Log format
+
+Console output uses ANSI color codes by log level:
+
+- **Cyan** — DEBUG
+- **Grey** — INFO
+- **Yellow** — WARNING
+- **Red** — ERROR
+- **Bold red** — CRITICAL
+
+Each message is formatted as:
+
+```text
+=== HH:MM:SS-LEVEL ======
+message
+```
+
+## OTLP log export
+
+When the `[telemetry]` extra is installed, Mellea can export logs to an OTLP
+collector alongside the existing console output. This is useful for centralizing
+logs from distributed services.
+
+> **Note:** OTLP logging is disabled by default. When disabled, there is zero
+> overhead — no OTLP handler is created.
+
+### Enable OTLP logging
+
+```bash
+export MELLEA_LOGS_OTLP=true
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
+
+# Optional: logs-specific endpoint (overrides general endpoint)
+export OTEL_EXPORTER_OTLP_LOGS_ENDPOINT=http://localhost:4318
+
+# Optional: set service name
+export OTEL_SERVICE_NAME=my-mellea-app
+```
+
+### How it works
+
+When `MELLEA_LOGS_OTLP=true`, `FancyLogger` adds an OpenTelemetry
+`LoggingHandler` alongside its existing handlers:
+
+- **Console handler** — continues to work normally (color-coded output)
+- **REST handler** — continues to work normally (when `FLOG` is set)
+- **OTLP handler** — exports logs to the configured OTLP collector
+
+Logs are exported using OpenTelemetry's Logs API with batched processing
+for efficiency.
+
+### Programmatic access
+
+Use `get_otlp_log_handler()` to add OTLP log export to your own loggers:
+
+```python
+import logging
+from mellea.telemetry import get_otlp_log_handler
+
+logger = logging.getLogger("my_app")
+handler = get_otlp_log_handler()
+if handler:
+    logger.addHandler(handler)
+    logger.info("This log will be exported via OTLP")
+```
+
+The function returns `None` when OTLP logging is disabled or not configured,
+so the `if handler` check is always safe.
+
+### OTLP collector setup example
+
+```bash
+cat > otel-collector-config.yaml <<EOF
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+
+exporters:
+  debug:
+    verbosity: detailed
+  file:
+    path: ./mellea-logs.json
+
+service:
+  pipelines:
+    logs:
+      receivers: [otlp]
+      exporters: [debug, file]
+EOF
+
+docker run -p 4317:4317 \
+  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
+  -v $(pwd):/logs \
+  otel/opentelemetry-collector:latest
+```
+
+### Integration with observability platforms
+
+OTLP logs work with any OTLP-compatible platform:
+
+- **Grafana Loki** — log aggregation and querying
+- **Elasticsearch** — log storage and analysis
+- **Datadog** — unified logs, traces, and metrics
+- **New Relic** — centralized logging
+- **Splunk** — log analysis and monitoring
+
+## Performance
+
+- **Zero overhead when disabled**: No OTLP handler is created, no performance
+  impact.
+- **Batched export**: Logs are batched and exported asynchronously.
+- **Non-blocking**: Log export never blocks application code.
+- **Minimal overhead when enabled**: OpenTelemetry's efficient batching
+  minimizes impact.
+
+## Troubleshooting
+
+**Logs not appearing in OTLP collector:**
+
+1. Verify `MELLEA_LOGS_OTLP=true` is set.
+2. Check that an OTLP endpoint is configured
+   (`OTEL_EXPORTER_OTLP_ENDPOINT` or `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT`).
+3. Verify the OTLP collector is running and configured to receive logs.
+4. Check collector logs for connection errors.
+
+**Warning about missing endpoint:**
+
+```text
+WARNING: OTLP logs exporter is enabled but no endpoint is configured
+```
+
+Set either `OTEL_EXPORTER_OTLP_ENDPOINT` or `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT`.
+
+**Connection refused:**
+
+1. Verify the OTLP collector is running: `docker ps | grep otel`
+2. Check the endpoint URL is correct (default: `http://localhost:4317`).
+3. Verify network connectivity: `curl http://localhost:4317`
+
+---
+
+**See also:**
+
+- [Telemetry](../evaluation-and-observability/telemetry) — overview of all
+  telemetry features and configuration.
+- [Tracing](../evaluation-and-observability/tracing) — distributed traces
+  with Gen-AI semantic conventions.
+- [Metrics](../evaluation-and-observability/metrics) — token usage metrics,
+  exporters, and custom instruments.
diff --git a/docs/docs/evaluation-and-observability/metrics-and-telemetry.md b/docs/docs/evaluation-and-observability/metrics-and-telemetry.md
deleted file mode 100644
index 950fbcb9b..000000000
--- a/docs/docs/evaluation-and-observability/metrics-and-telemetry.md
+++ /dev/null
@@ -1,189 +0,0 @@
----
-title: "Metrics and Telemetry"
-description: "Add OpenTelemetry tracing and metrics to Mellea programs."
-# diataxis: how-to
----
-
-**Prerequisites:** [Quick Start](../getting-started/quickstart) complete,
-`pip install "mellea[telemetry]"`, Ollama running locally.
-
-Mellea provides built-in [OpenTelemetry](https://opentelemetry.io/) instrumentation.
-Two independent trace scopes can be enabled separately, and a metrics API lets you
-collect counters and histograms alongside traces. All telemetry is opt-in — if the
-`[telemetry]` extra is not installed, every telemetry call is a silent no-op.
-
-> **Note:** OpenTelemetry is an optional dependency. Mellea works normally without it.
-> Install with `pip install "mellea[telemetry]"` or `uv pip install "mellea[telemetry]"`.
-
-## Configuration
-
-All telemetry is configured via environment variables:
-
-| Variable | Description | Default |
-| -------- | ----------- | ------- |
-| `MELLEA_TRACE_APPLICATION` | Enable application-level tracing | `false` |
-| `MELLEA_TRACE_BACKEND` | Enable backend-level tracing | `false` |
-| `MELLEA_TRACE_CONSOLE` | Print traces to console (debugging) | `false` |
-| `MELLEA_METRICS_ENABLED` | Enable metrics collection | `false` |
-| `MELLEA_METRICS_CONSOLE` | Print metrics to console (debugging) | `false` |
-| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for trace and metric export | none |
-| `OTEL_SERVICE_NAME` | Service name in exported telemetry | `mellea` |
-
-## Trace scopes
-
-Mellea has two independent trace scopes:
-
-- **`mellea.application`** — user-facing operations: session lifecycle, `@generative`
-  function calls, `instruct()` and `act()` calls, sampling strategies, and requirement
-  validation.
-- **`mellea.backend`** — LLM backend interactions, following the
-  [OpenTelemetry Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
-  Records model calls, token usage, finish reasons, and API latency.
-
-Enable both for full observability, or pick one depending on what you need to debug.
-
-## Using `start_session()` as a context manager
-
-Wrapping a session in `with start_session()` ties the trace lifecycle to the session
-scope. All spans generated within the block are nested under the session span:
-
-```python
-from mellea import generative, start_session
-from mellea.stdlib.requirements import req
-
-@generative
-def classify_sentiment(text: str) -> str:
-    """Classify the sentiment of the given text as positive, negative, or neutral."""
-
-with start_session() as m:
-    email = m.instruct(
-        "Write a professional email to {{name}} about {{topic}}",
-        requirements=[req("Must be formal"), req("Must be under 100 words")],
-        user_variables={"name": "Alice", "topic": "project update"},
-    )
-    sentiment = classify_sentiment(m, text="I love this product!")
-```
-
-Run this with application tracing enabled:
-
-```bash
-export MELLEA_TRACE_APPLICATION=true
-python your_script.py
-```
-
-## Debugging with console output
-
-Print spans directly to stdout without configuring an OTLP backend:
-
-```bash
-export MELLEA_TRACE_APPLICATION=true
-export MELLEA_TRACE_CONSOLE=true
-python your_script.py
-```
-
-This is the fastest way to verify that instrumentation is working.
-
-## Exporting to an OTLP backend
-
-Any OTLP-compatible backend works. To export to a local Jaeger instance:
-
-```bash
-# Start Jaeger
-docker run -d --name jaeger \
-  -p 4317:4317 \
-  -p 16686:16686 \
-  jaegertracing/all-in-one:latest
-
-# Configure Mellea
-export MELLEA_TRACE_APPLICATION=true
-export MELLEA_TRACE_BACKEND=true
-export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
-export OTEL_SERVICE_NAME=my-mellea-app
-
-python your_script.py
-# View traces at http://localhost:16686
-```
-
-Other compatible backends include Grafana Tempo, Honeycomb, Datadog, New Relic,
-AWS X-Ray (via OTLP), and Google Cloud Trace.
-
-## Checking trace status programmatically
-
-```python
-from mellea.telemetry import (
-    is_application_tracing_enabled,
-    is_backend_tracing_enabled,
-    is_metrics_enabled,
-)
-
-print(f"Application tracing: {is_application_tracing_enabled()}")
-print(f"Backend tracing:     {is_backend_tracing_enabled()}")
-print(f"Metrics:             {is_metrics_enabled()}")
-```
-
-## Metrics
-
-The metrics API exposes counters, histograms, and up-down counters backed by
-the OpenTelemetry Metrics API. Enable metrics collection:
-
-```bash
-export MELLEA_METRICS_ENABLED=true
-export MELLEA_METRICS_CONSOLE=true   # optional: print to stdout
-```
-
-Use `create_counter` and `create_histogram` to instrument your own code:
-
-```python
-from mellea.telemetry import create_counter, create_histogram
-
-requests = create_counter("mellea.requests", unit="1", description="Total requests")
-latency = create_histogram("mellea.latency", unit="ms", description="Request latency")
-
-requests.add(1, {"backend": "ollama", "model": "granite4:micro"})
-latency.record(120, {"backend": "ollama"})
-```
-
-If `MELLEA_METRICS_ENABLED` is `false` or the `[telemetry]` extra is not installed,
-all instrument calls are no-ops with no overhead.
-
-> **Note:** Metrics are exported to `OTEL_EXPORTER_OTLP_ENDPOINT` when set.
-> If metrics are enabled but no endpoint is configured and `MELLEA_METRICS_CONSOLE`
-> is also `false`, Mellea will log a warning at startup.
-
-## Span hierarchy
-
-When both trace scopes are enabled, spans nest as follows:
-
-```text
-session_context          (mellea.application)
-├── aact                 (mellea.application)
-│   ├── chat             (mellea.backend) [gen_ai.system=ollama, gen_ai.request.model=granite4:micro]
-│   │                    [gen_ai.usage.input_tokens=150, gen_ai.usage.output_tokens=50]
-│   └── requirement_validation  (mellea.application)
-└── aact                 (mellea.application)
-    └── chat             (mellea.backend) [gen_ai.system=openai, gen_ai.request.model=gpt-4]
-                         [gen_ai.usage.input_tokens=200, gen_ai.usage.output_tokens=75]
-```
-
-Backend spans carry Gen-AI semantic convention attributes for cross-provider comparisons:
-
-| Attribute | Description |
-| --------- | ----------- |
-| `gen_ai.system` | LLM provider name (`openai`, `ollama`, `huggingface`) |
-| `gen_ai.request.model` | Model requested |
-| `gen_ai.response.model` | Model actually used (may differ) |
-| `gen_ai.usage.input_tokens` | Input tokens consumed |
-| `gen_ai.usage.output_tokens` | Output tokens generated |
-| `gen_ai.response.finish_reasons` | Finish reason list (e.g., `["stop"]`) |
-
-Application spans add Mellea-specific attributes:
-
-| Attribute | Description |
-| --------- | ----------- |
-| `mellea.backend` | Backend class name |
-| `mellea.action_type` | Component type being executed |
-| `sampling_success` | Whether sampling succeeded |
-| `num_generate_logs` | Number of generation attempts |
-| `response` | Model response (truncated to 500 chars) |
-
-> **Full example:** [`docs/examples/telemetry/telemetry_example.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/telemetry/telemetry_example.py)
diff --git a/docs/docs/evaluation-and-observability/metrics.md b/docs/docs/evaluation-and-observability/metrics.md
new file mode 100644
index 000000000..ac7b3b827
--- /dev/null
+++ b/docs/docs/evaluation-and-observability/metrics.md
@@ -0,0 +1,386 @@
+---
+title: "Metrics"
+description: "Collect token usage metrics and instrument your own code with OpenTelemetry counters, histograms, and up-down counters."
+# diataxis: how-to
+---
+
+**Prerequisites:** [Telemetry](../evaluation-and-observability/telemetry)
+introduces the environment variables and telemetry architecture. This page
+covers metrics collection in detail.
+
+Mellea automatically tracks token consumption across all backends using
+OpenTelemetry metrics counters. Token metrics follow the
+[Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/)
+for standardized observability. The metrics API also lets you create your own
+counters, histograms, and up-down counters for application-level instrumentation.
+
+> **Note:** Metrics are an optional feature. All instrument calls are no-ops
+> when metrics are disabled or the `[telemetry]` extra is not installed.
+
+## Enable metrics
+
+```bash
+export MELLEA_METRICS_ENABLED=true
+```
+
+You also need at least one exporter configured — see
+[Metrics export configuration](#metrics-export-configuration) below.
+
+## Token usage metrics
+
+Mellea records token consumption automatically after each LLM call completes.
+No code changes are required. The `TokenMetricsPlugin` auto-registers when
+`MELLEA_METRICS_ENABLED=true` and records metrics via the plugin hook system.
+
+### Built-in metrics
+
+| Metric Name | Type | Unit | Description |
+| ----------- | ---- | ---- | ----------- |
+| `mellea.llm.tokens.input` | Counter | `tokens` | Total input/prompt tokens processed |
+| `mellea.llm.tokens.output` | Counter | `tokens` | Total output/completion tokens generated |
+
+### Metric attributes
+
+All token metrics include these attributes following Gen-AI semantic conventions:
+
+| Attribute | Description | Example Values |
+| --------- | ----------- | -------------- |
+| `gen_ai.provider.name` | Backend provider name | `openai`, `ollama`, `watsonx`, `litellm`, `huggingface` |
+| `gen_ai.request.model` | Model identifier | `gpt-4`, `llama3.2:7b`, `granite-3.1-8b-instruct` |
+
+### Backend support
+
+| Backend | Streaming | Non-Streaming | Source |
+| ------- | --------- | ------------- | ------ |
+| OpenAI | Yes | Yes | `usage.prompt_tokens` and `usage.completion_tokens` |
+| Ollama | Yes | Yes | `prompt_eval_count` and `eval_count` |
+| WatsonX | No | Yes | `input_token_count` and `generated_token_count` (streaming API limitation) |
+| LiteLLM | Yes | Yes | `usage.prompt_tokens` and `usage.completion_tokens` |
+| HuggingFace | Yes | Yes | Calculated from input_ids and output sequences |
+
+> **Note:** Token usage metrics are only tracked for `generate_from_context`
+> requests. `generate_from_raw` calls do not record token metrics.
+
+### When metrics are recorded
+
+Token metrics are recorded **after the full response is received**, not
+incrementally during streaming:
+
+- **Non-streaming**: Metrics recorded immediately after `await mot.avalue()` completes.
+- **Streaming**: Metrics recorded after the stream is fully consumed (all chunks received).
+
+This ensures accurate token counts from the backend's usage metadata, which
+is only available after the complete response.
+
+```python
+mot, _ = await backend.generate_from_context(msg, ctx)
+
+# Metrics NOT recorded yet (stream still in progress)
+await mot.astream()
+
+# Metrics recorded here (after stream completion)
+await mot.avalue()
+```
+
+## Metrics export configuration
+
+Mellea supports multiple metrics exporters that can be used independently or
+simultaneously.
+
+> **Warning:** If `MELLEA_METRICS_ENABLED=true` but no exporter is configured,
+> Mellea logs a warning. Metrics are collected but not exported.
+
+### Console exporter (debugging)
+
+Print metrics to console for local debugging without setting up an
+observability backend:
+
+```bash
+export MELLEA_METRICS_ENABLED=true
+export MELLEA_METRICS_CONSOLE=true
+python your_script.py
+```
+
+Metrics are printed as JSON at the configured export interval (default: 60
+seconds).
+
+### OTLP exporter (production)
+
+Export metrics to an OTLP collector for production observability platforms
+(Jaeger, Grafana, Datadog, etc.):
+
+```bash
+export MELLEA_METRICS_ENABLED=true
+export MELLEA_METRICS_OTLP=true
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
+
+# Optional: metrics-specific endpoint (overrides general endpoint)
+export OTEL_EXPORTER_OTLP_METRICS_ENDPOINT=http://localhost:4318
+
+# Optional: set service name
+export OTEL_SERVICE_NAME=my-mellea-app
+
+# Optional: adjust export interval (milliseconds, default: 60000)
+export OTEL_METRIC_EXPORT_INTERVAL=30000
+```
+
+**OTLP collector setup example:**
+
+```bash
+cat > otel-collector-config.yaml <<EOF
+receivers:
+  otlp:
+    protocols:
+      grpc:
+        endpoint: 0.0.0.0:4317
+
+exporters:
+  prometheus:
+    endpoint: 0.0.0.0:8889
+  debug:
+    verbosity: detailed
+
+service:
+  pipelines:
+    metrics:
+      receivers: [otlp]
+      exporters: [prometheus, debug]
+EOF
+
+docker run -p 4317:4317 -p 8889:8889 \
+  -v $(pwd)/otel-collector-config.yaml:/etc/otelcol/config.yaml \
+  otel/opentelemetry-collector:latest
+```
+
+### Prometheus exporter
+
+Register metrics with the `prometheus_client` default registry for
+Prometheus scraping:
+
+```bash
+export MELLEA_METRICS_ENABLED=true
+export MELLEA_METRICS_PROMETHEUS=true
+```
+
+When enabled, Mellea registers its OpenTelemetry metrics with the
+`prometheus_client` default registry via `PrometheusMetricReader`. Your
+application is responsible for exposing the registry. Common approaches:
+
+**Standalone HTTP server** (simplest):
+
+```python
+from prometheus_client import start_http_server
+
+start_http_server(9464)
+```
+
+**FastAPI middleware**:
+
+```python
+from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
+from fastapi import FastAPI, Response
+
+app = FastAPI()
+
+@app.get("/metrics")
+def metrics():
+    return Response(content=generate_latest(), media_type=CONTENT_TYPE_LATEST)
+```
+
+**Flask route**:
+
+```python
+from prometheus_client import CONTENT_TYPE_LATEST, generate_latest
+from flask import Flask, Response
+
+app = Flask(__name__)
+
+@app.route("/metrics")
+def metrics():
+    return Response(generate_latest(), content_type=CONTENT_TYPE_LATEST)
+```
+
+Verify with:
+
+```bash
+curl http://localhost:9464/metrics
+```
+
+**Prometheus server configuration:**
+
+```yaml
+# prometheus.yml
+global:
+  scrape_interval: 15s
+
+scrape_configs:
+  - job_name: 'mellea'
+    static_configs:
+      - targets: ['localhost:9464']
+```
+
+```bash
+docker run -p 9090:9090 \
+  -v $(pwd)/prometheus.yml:/etc/prometheus/prometheus.yml \
+  prom/prometheus
+```
+
+Access Prometheus UI at `http://localhost:9090` and query metrics like
+`mellea_llm_tokens_input`.
+
+### Multiple exporters simultaneously
+
+You can enable multiple exporters at once:
+
+```bash
+export MELLEA_METRICS_ENABLED=true
+export MELLEA_METRICS_CONSOLE=true
+export MELLEA_METRICS_OTLP=true
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
+export MELLEA_METRICS_PROMETHEUS=true
+```
+
+This configuration prints metrics to console for immediate feedback, exports
+to an OTLP collector for centralized observability, and registers with the
+`prometheus_client` registry for Prometheus scraping.
+
+**Typical combinations:**
+
+- **Development**: Console + Prometheus for local testing
+- **Production**: OTLP + Prometheus for comprehensive monitoring
+- **Debugging**: Console only for quick verification
+
+## Custom metrics
+
+The metrics API exposes `create_counter`, `create_histogram`, and
+`create_up_down_counter` for instrumenting your own application code. These
+return no-ops when metrics are disabled, so you can call them unconditionally.
+
+```python
+from mellea.telemetry import create_counter, create_histogram, create_up_down_counter
+
+# Monotonically increasing values
+requests = create_counter("myapp.requests", unit="1", description="Total requests")
+requests.add(1, {"backend": "ollama", "model": "granite4:micro"})
+
+# Value distributions
+latency = create_histogram("myapp.latency", unit="ms", description="Request latency")
+latency.record(120.5, {"backend": "ollama"})
+
+# Values that increase or decrease
+active = create_up_down_counter(
+    "myapp.sessions.active", unit="1", description="Active sessions"
+)
+active.add(1)   # session started
+active.add(-1)  # session ended
+```
+
+## Programmatic access
+
+Check if metrics are enabled:
+
+```python
+from mellea.telemetry import is_metrics_enabled
+
+if is_metrics_enabled():
+    print("Token metrics are being collected")
+```
+
+Access token usage data from a `ModelOutputThunk`:
+
+```python
+from mellea import start_session
+
+with start_session() as m:
+    result = m.instruct("Write a haiku about programming")
+
+    if result.usage:
+        print(f"Prompt tokens: {result.usage['prompt_tokens']}")
+        print(f"Completion tokens: {result.usage['completion_tokens']}")
+        print(f"Total tokens: {result.usage['total_tokens']}")
+```
+
+The `usage` field is a dictionary with three keys: `prompt_tokens`,
+`completion_tokens`, and `total_tokens`. All backends populate this field
+consistently.
+
+## Performance
+
+- **Zero overhead when disabled**: When `MELLEA_METRICS_ENABLED=false` (default),
+  the `TokenMetricsPlugin` is not registered and all instrument calls are no-ops.
+- **Minimal overhead when enabled**: Counter increments are extremely fast
+  (~nanoseconds per operation).
+- **Async export**: Metrics are batched and exported asynchronously (default:
+  every 60 seconds).
+- **Non-blocking**: Metric recording never blocks LLM calls.
+- **Automatic collection**: Metrics are recorded via hooks after generation
+  completes — no manual instrumentation needed.
+
+## Troubleshooting
+
+**Metrics not appearing:**
+
+1. Verify `MELLEA_METRICS_ENABLED=true` is set.
+2. Check that at least one exporter is configured (Console, OTLP, or Prometheus).
+3. For OTLP: Verify `MELLEA_METRICS_OTLP=true` and the endpoint is reachable.
+4. For Prometheus: Verify `MELLEA_METRICS_PROMETHEUS=true` and your application
+   exposes the registry (`curl http://localhost:PORT/metrics`).
+5. Enable console output (`MELLEA_METRICS_CONSOLE=true`) to verify metrics are
+   being collected.
+
+**Missing OpenTelemetry dependency:**
+
+```text
+ImportError: No module named 'opentelemetry'
+```
+
+Install telemetry dependencies:
+
+```bash
+pip install "mellea[telemetry]"
+```
+
+**OTLP connection refused:**
+
+```text
+Failed to export metrics via OTLP
+```
+
+1. Verify the OTLP collector is running: `docker ps | grep otel`
+2. Check the endpoint URL is correct (default: `http://localhost:4317`).
+3. Verify network connectivity: `curl http://localhost:4317`
+4. Check collector logs for errors.
+
+**Metrics not updating:**
+
+1. Metrics are exported at intervals (default: 60 seconds). Wait for the
+   export cycle.
+2. Reduce the export interval for testing:
+   `export OTEL_METRIC_EXPORT_INTERVAL=10000` (10 seconds).
+3. For Prometheus: Metrics update on scrape, not continuously.
+4. Verify LLM calls are actually being made and completing successfully.
+
+**No exporter configured warning:**
+
+```text
+WARNING: Metrics are enabled but no exporters are configured
+```
+
+Enable at least one exporter:
+
+- Console: `export MELLEA_METRICS_CONSOLE=true`
+- OTLP: `export MELLEA_METRICS_OTLP=true` + endpoint
+- Prometheus: `export MELLEA_METRICS_PROMETHEUS=true`
+
+> **Full example:** [`docs/examples/telemetry/metrics_example.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/telemetry/metrics_example.py)
+
+---
+
+**See also:**
+
+- [Telemetry](../evaluation-and-observability/telemetry) — overview of all
+  telemetry features and configuration.
+- [Tracing](../evaluation-and-observability/tracing) — distributed traces
+  with Gen-AI semantic conventions.
+- [Logging](../evaluation-and-observability/logging) — console logging and OTLP
+  log export.
diff --git a/docs/docs/evaluation-and-observability/telemetry.md b/docs/docs/evaluation-and-observability/telemetry.md
new file mode 100644
index 000000000..0d654478f
--- /dev/null
+++ b/docs/docs/evaluation-and-observability/telemetry.md
@@ -0,0 +1,160 @@
+---
+title: "Telemetry"
+sidebarTitle: "Overview"
+description: "Add OpenTelemetry tracing, metrics, and logging to Mellea programs."
+# diataxis: how-to
+---
+
+**Prerequisites:** [Quick Start](../getting-started/quickstart) complete,
+`pip install "mellea[telemetry]"`, Ollama running locally.
+
+Mellea provides built-in [OpenTelemetry](https://opentelemetry.io/) instrumentation
+across three independent pillars — tracing, metrics, and logging. Each can be enabled
+separately. All telemetry is opt-in: if the `[telemetry]` extra is not installed,
+every telemetry call is a silent no-op.
+
+> **Note:** OpenTelemetry is an optional dependency. Mellea works normally without it.
+> Install with `pip install "mellea[telemetry]"` or `uv pip install "mellea[telemetry]"`.
+
+## Configuration
+
+All telemetry is configured via environment variables:
+
+### General
+
+| Variable | Description | Default |
+| -------- | ----------- | ------- |
+| `OTEL_SERVICE_NAME` | Service name for all telemetry signals | `mellea` |
+| `OTEL_EXPORTER_OTLP_ENDPOINT` | OTLP endpoint for all telemetry signals | none |
+
+### Tracing variables
+
+| Variable | Description | Default |
+| -------- | ----------- | ------- |
+| `MELLEA_TRACE_APPLICATION` | Enable application-level tracing | `false` |
+| `MELLEA_TRACE_BACKEND` | Enable backend-level tracing | `false` |
+| `MELLEA_TRACE_CONSOLE` | Print traces to console (debugging) | `false` |
+
+### Metrics variables
+
+| Variable | Description | Default |
+| -------- | ----------- | ------- |
+| `MELLEA_METRICS_ENABLED` | Enable metrics collection | `false` |
+| `MELLEA_METRICS_CONSOLE` | Print metrics to console (debugging) | `false` |
+| `MELLEA_METRICS_OTLP` | Enable OTLP metrics exporter | `false` |
+| `OTEL_EXPORTER_OTLP_METRICS_ENDPOINT` | Metrics-specific OTLP endpoint (overrides general) | none |
+| `MELLEA_METRICS_PROMETHEUS` | Enable Prometheus metric reader | `false` |
+| `OTEL_METRIC_EXPORT_INTERVAL` | Export interval in milliseconds | `60000` |
+
+### Logging variables
+
+| Variable | Description | Default |
+| -------- | ----------- | ------- |
+| `MELLEA_LOGS_OTLP` | Enable OTLP logs exporter | `false` |
+| `OTEL_EXPORTER_OTLP_LOGS_ENDPOINT` | Logs-specific OTLP endpoint (overrides general) | none |
+
+## Quick start
+
+Enable tracing and metrics with console output to verify everything works:
+
+```bash
+export MELLEA_TRACE_APPLICATION=true
+export MELLEA_TRACE_BACKEND=true
+export MELLEA_TRACE_CONSOLE=true
+export MELLEA_METRICS_ENABLED=true
+export MELLEA_METRICS_CONSOLE=true
+python your_script.py
+```
+
+Traces and metrics print to stdout. For production use, replace the console
+exporters with an OTLP endpoint:
+
+```bash
+export OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
+export OTEL_SERVICE_NAME=my-mellea-app
+```
+
+## Checking telemetry status programmatically
+
+```python
+from mellea.telemetry import (
+    is_application_tracing_enabled,
+    is_backend_tracing_enabled,
+    is_metrics_enabled,
+)
+
+print(f"Application tracing: {is_application_tracing_enabled()}")
+print(f"Backend tracing:     {is_backend_tracing_enabled()}")
+print(f"Metrics:             {is_metrics_enabled()}")
+```
+
+## Tracing
+
+Mellea has two independent trace scopes:
+
+- **`mellea.application`** — user-facing operations: session lifecycle,
+  `@generative` calls, `instruct()` and `act()`, sampling strategies, and
+  requirement validation.
+- **`mellea.backend`** — LLM backend interactions following the
+  [OpenTelemetry Gen-AI Semantic Conventions](https://opentelemetry.io/docs/specs/semconv/gen-ai/).
+  Records model calls, token usage, finish reasons, and API latency.
+
+Enable both for full observability, or pick one depending on what you need to
+debug. When both scopes are active, backend spans nest inside application spans:
+
+```text
+session_context          (mellea.application)
+├── aact                 (mellea.application)
+│   ├── chat             (mellea.backend) [gen_ai.system=ollama]
+│   └── requirement_validation  (mellea.application)
+└── aact                 (mellea.application)
+    └── chat             (mellea.backend) [gen_ai.system=openai]
+```
+
+See [Tracing](../evaluation-and-observability/tracing) for span attributes,
+exporter configuration (Jaeger, Grafana Tempo, etc.), and debugging guidance.
+
+## Metrics
+
+Mellea automatically tracks token consumption across all backends using
+OpenTelemetry counters (`mellea.llm.tokens.input` and
+`mellea.llm.tokens.output`). No code changes are required — the
+`TokenMetricsPlugin` records metrics via the plugin hook system after each
+LLM call completes.
+
+The metrics API also exposes `create_counter`, `create_histogram`, and
+`create_up_down_counter` for instrumenting your own application code.
+
+Mellea supports three exporters that can run simultaneously:
+
+- **Console** — print to stdout for debugging
+- **OTLP** — export to production observability platforms
+- **Prometheus** — register with `prometheus_client` for scraping
+
+See [Metrics](../evaluation-and-observability/metrics) for token usage details,
+backend support matrix, exporter setup, custom instruments, and troubleshooting.
+
+## Logging
+
+Mellea uses a color-coded console logger (`FancyLogger`) by default. When the
+`[telemetry]` extra is installed and `MELLEA_LOGS_OTLP=true` is set, Mellea
+also exports logs to an OTLP collector alongside existing console output.
+
+See [Logging](../evaluation-and-observability/logging) for console logging
+configuration, OTLP log export setup, and programmatic access via
+`get_otlp_log_handler()`.
+
+> **Full example:** [`docs/examples/telemetry/telemetry_example.py`](https://github.com/generative-computing/mellea/blob/main/docs/examples/telemetry/telemetry_example.py)
+
+---
+
+**See also:**
+
+- [Tracing](../evaluation-and-observability/tracing) — distributed traces
+  with Gen-AI semantic conventions.
+- [Metrics](../evaluation-and-observability/metrics) — token usage metrics,
+  exporters, and custom instruments.
+- [Logging](../evaluation-and-observability/logging) — console logging and OTLP
+  log export.
+- [Evaluate with LLM-as-a-Judge](../evaluation-and-observability/evaluate-with-llm-as-a-judge) —
+  automated quality evaluation correlated with trace data.
diff --git a/docs/docs/evaluation-and-observability/opentelemetry-tracing.md b/docs/docs/evaluation-and-observability/tracing.md
similarity index 87%
rename from docs/docs/evaluation-and-observability/opentelemetry-tracing.md
rename to docs/docs/evaluation-and-observability/tracing.md
index d4dde4a67..0595f7c1d 100644
--- a/docs/docs/evaluation-and-observability/opentelemetry-tracing.md
+++ b/docs/docs/evaluation-and-observability/tracing.md
@@ -1,10 +1,10 @@
 ---
-title: "OpenTelemetry Tracing"
+title: "Tracing"
 description: "Export distributed traces from Mellea using OpenTelemetry semantic conventions."
 # diataxis: how-to
 ---
 
-**Prerequisites:** [Metrics and Telemetry](../evaluation-and-observability/metrics-and-telemetry)
+**Prerequisites:** [Telemetry](../evaluation-and-observability/telemetry)
 introduces the environment variables and trace scopes. This page focuses on
 exporting traces to external backends and interpreting the span data they contain.
 
@@ -131,9 +131,22 @@ Backend spans cover individual LLM API calls. They follow the
 | `gen_ai.usage.input_tokens` | Input tokens consumed |
 | `gen_ai.usage.output_tokens` | Output tokens generated |
 | `gen_ai.usage.total_tokens` | Total tokens (input + output) |
+| `gen_ai.response.model` | Actual model used in the response (may differ from request) |
 | `gen_ai.response.finish_reasons` | List of finish reasons (e.g., `["stop"]`) |
 | `gen_ai.response.id` | Response identifier from the backend |
 
+Mellea also adds context-specific attributes to backend spans:
+
+| Attribute | Description |
+| --------- | ----------- |
+| `mellea.backend` | Backend class name (e.g., `OpenAIBackend`) |
+| `mellea.action_type` | Component type being executed |
+| `mellea.context_size` | Number of items in context |
+| `mellea.has_format` | Whether structured output format is specified |
+| `mellea.format_type` | Response format class name |
+| `mellea.tool_calls_enabled` | Whether tool calling is enabled |
+| `mellea.num_actions` | Number of actions in batch (for `generate_from_raw`) |
+
 ### Span hierarchy
 
 When both scopes are active, backend spans nest inside application spans:
@@ -225,11 +238,13 @@ import mellea  # noqa: E402
 
 ---
 
-## Next steps
+**See also:**
 
-- [Metrics and Telemetry](../evaluation-and-observability/metrics-and-telemetry) —
-  enable metrics collection alongside tracing, and learn how to instrument your
-  own code with counters and histograms.
+- [Telemetry](../evaluation-and-observability/telemetry) — overview of all
+  telemetry features and configuration.
+- [Metrics](../evaluation-and-observability/metrics) — token usage metrics,
+  exporters, and custom instruments.
+- [Logging](../evaluation-and-observability/logging) — console logging and OTLP
+  log export.
 - [Evaluate with LLM-as-a-Judge](../evaluation-and-observability/evaluate-with-llm-as-a-judge) —
-  add automated quality evaluation to your pipeline and correlate evaluation
-  results with trace data.
+  automated quality evaluation correlated with trace data.
diff --git a/docs/docs/examples/traced-generation-loop.md b/docs/docs/examples/traced-generation-loop.md
index 469b16b3f..3b12b766f 100644
--- a/docs/docs/examples/traced-generation-loop.md
+++ b/docs/docs/examples/traced-generation-loop.md
@@ -364,7 +364,7 @@ applicable:
 
 - Set `OTEL_SERVICE_NAME=my-app` to customise the service name in your trace
   backend.
-- See [OpenTelemetry Tracing](../evaluation-and-observability/opentelemetry-tracing)
+- See [Tracing](../evaluation-and-observability/tracing)
   for attribute schemas and advanced configuration.
 - Add `MELLEA_TRACE_CONSOLE=true` alongside an OTLP endpoint to confirm spans
   are generated even when the remote collector is unavailable.
diff --git a/docs/docs/guide/CONTRIBUTING.md b/docs/docs/guide/CONTRIBUTING.md
index 7780ce734..fd0a58434 100644
--- a/docs/docs/guide/CONTRIBUTING.md
+++ b/docs/docs/guide/CONTRIBUTING.md
@@ -361,7 +361,7 @@ See [CONTRIBUTING.md](../../CONTRIBUTING.md) for the full validation workflow.
 
 ```bash
 cd docs/docs
-mint dev
+mintlify dev
 # Site available at http://localhost:3000
 ```
 
@@ -405,7 +405,7 @@ markdownlint docs/docs/guide/your-page.md
 - [ ] `**See also:**` footer present with relevant cross-links (Mintlify generates prev/next automatically).
 - [ ] `docs.json` updated if new page added; old MDX page removed from nav if replaced.
 - [ ] `index.mdx` landing page cards reviewed — add a card if the new page is a major entry point (key pattern, integration, or prominent how-to); keep total cards per section to ≤ 8.
-- [ ] Previewed locally with `mint dev`.
+- [ ] Previewed locally with `mintlify dev`.
 - [ ] Non-deterministic LLM output noted.
 - [ ] Backend-specific code blocks flagged with `> **Backend note:**`.
 - [ ] No visible TODO placeholders — missing content tracked as GitHub issues.
diff --git a/docs/docs/guide/glossary.md b/docs/docs/guide/glossary.md
index df1e279e0..d5af369ea 100644
--- a/docs/docs/guide/glossary.md
+++ b/docs/docs/guide/glossary.md
@@ -537,7 +537,7 @@ except PreconditionException as e:
     print(e.validation)  # list of ValidationResult
 ```
 
-See: [Handling Exceptions and Failures](../evaluation-and-observability/handling-exceptions)
+See: [Handling Exceptions and Failures](../how-to/handling-exceptions)
 
 ---
 
diff --git a/docs/docs/evaluation-and-observability/handling-exceptions.md b/docs/docs/how-to/handling-exceptions.md
similarity index 99%
rename from docs/docs/evaluation-and-observability/handling-exceptions.md
rename to docs/docs/how-to/handling-exceptions.md
index aae90f94d..9de551621 100644
--- a/docs/docs/evaluation-and-observability/handling-exceptions.md
+++ b/docs/docs/how-to/handling-exceptions.md
@@ -296,7 +296,7 @@ if not result.success:
 ```
 
 For structured telemetry across all calls, see
-[Metrics and Telemetry](./metrics-and-telemetry).
+[Telemetry](../evaluation-and-observability/telemetry).
 
 ---
 
diff --git a/docs/docs/how-to/unit-test-generative-code.md b/docs/docs/how-to/unit-test-generative-code.md
index 25eba7997..0c62cd271 100644
--- a/docs/docs/how-to/unit-test-generative-code.md
+++ b/docs/docs/how-to/unit-test-generative-code.md
@@ -383,5 +383,5 @@ pytest -m qualitative
 
 - [The Requirements System](../concepts/requirements-system) — understand how
   `Requirement`, `simple_validate`, and `check` interact with the IVR loop
-- [Handling Exceptions](../evaluation-and-observability/handling-exceptions) —
+- [Handling Exceptions](../how-to/handling-exceptions) —
   catch and diagnose errors that occur during generation
diff --git a/docs/docs/troubleshooting/common-errors.md b/docs/docs/troubleshooting/common-errors.md
index 7c2553c51..d691cbbe0 100644
--- a/docs/docs/troubleshooting/common-errors.md
+++ b/docs/docs/troubleshooting/common-errors.md
@@ -237,7 +237,7 @@ ollama pull granite-guardian-3.2-5b
 - **GitHub Issues:** [github.com/generative-computing/mellea/issues](https://github.com/generative-computing/mellea/issues)
 - **Examples:** [`docs/examples/`](https://github.com/generative-computing/mellea/tree/main/docs/examples)
 - Enable telemetry to inspect what is happening at each step — see
-  [Metrics and Telemetry](../evaluation-and-observability/metrics-and-telemetry).
+  [Telemetry](../evaluation-and-observability/telemetry).
 
 ---
 
diff --git a/docs/docs/troubleshooting/faq.md b/docs/docs/troubleshooting/faq.md
index 2ac1a3d30..079979ac8 100644
--- a/docs/docs/troubleshooting/faq.md
+++ b/docs/docs/troubleshooting/faq.md
@@ -269,7 +269,7 @@ with start_session() as m:
 ```
 
 For the full telemetry setup, see
-[OpenTelemetry Tracing](../evaluation-and-observability/opentelemetry-tracing).
+[Tracing](../evaluation-and-observability/tracing).
 
 ## Does Mellea support async?
 
diff --git a/docs/examples/telemetry/README.md b/docs/examples/telemetry/README.md
index 051733634..2ef414925 100644
--- a/docs/examples/telemetry/README.md
+++ b/docs/examples/telemetry/README.md
@@ -166,4 +166,4 @@ ollama serve  # Start Ollama server
 
 ## Learn More
 
-See [docs/dev/telemetry.md](../../dev/telemetry.md) for complete documentation.
\ No newline at end of file
+See the [Telemetry documentation](../../docs/evaluation-and-observability/telemetry.md) for complete details.
\ No newline at end of file
diff --git a/docs/examples/telemetry/metrics_example.py b/docs/examples/telemetry/metrics_example.py
index c8630c554..8a58f7a35 100644
--- a/docs/examples/telemetry/metrics_example.py
+++ b/docs/examples/telemetry/metrics_example.py
@@ -36,7 +36,7 @@
 python metrics_example.py
 
 # For OTLP Collector and Prometheus setup instructions, see:
-# docs/dev/telemetry.md
+# docs/docs/evaluation-and-observability/metrics.md
 """
 
 import os