-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Summary
We propose adding a Claude Code plugin to the observability-stack repository that teaches Claude Code how to query traces, logs, and metrics from the running stack using PPL, PromQL, and curl commands. The plugin is a set of markdown skill files with no runtime code and no build step. Claude Code loads them as context to gain OpenSearch-native observability capabilities.
No existing public Claude Code skill covers OpenSearch observability or PPL. This fills that gap.
Motivation
Developers using AI coding assistants with the observability stack currently have to:
- Manually look up PPL syntax for every trace or log query
- Remember the correct curl flags, auth credentials, and API endpoints for OpenSearch and Prometheus
- Know which index patterns store traces vs. logs vs. service maps
- Construct cross-signal correlation queries (trace-to-log joins) from scratch
- Debug stack health issues without structured guidance
- Build RED metrics dashboards and SLO/SLI monitoring from scratch
- Figure out how to connect to AWS managed services (Amazon OpenSearch Service, Amazon Managed Prometheus) with SigV4 auth
A Claude Code plugin eliminates this friction. When a developer asks "show me the slowest agent invocations in the last hour", "what's the error budget burn rate for the payment service?", or "why is the payment service erroring?", Claude Code can immediately construct and execute the right PPL or PromQL query against the right endpoint with the right auth.
Glossary
| Term | Definition |
|---|---|
| Plugin | A collection of CLAUDE.md-compatible markdown skill files placed in a project directory that Claude Code loads as context to gain domain-specific capabilities. |
| Skill File | A single markdown file with frontmatter (name, description, allowed-tools) and instructional content that teaches Claude Code a specific capability. |
| PPL | Piped Processing Language, the query language used by OpenSearch for log and trace analytics. Queries are piped commands starting with source=<index>. |
| PromQL | Prometheus Query Language used for querying time-series metrics from Prometheus. |
| OpenSearch | The search and analytics engine that stores traces and logs in this stack, accessible at port 9200 with HTTPS and basic authentication. |
| Prometheus | The time-series database that stores metrics in this stack, accessible at port 9090. |
| OTel Collector | The OpenTelemetry Collector that receives telemetry via OTLP protocol on ports 4317 (gRPC) and 4318 (HTTP) and routes data to Data Prepper and Prometheus. |
| Data Prepper | The pipeline processor that transforms and enriches logs and traces before writing them to OpenSearch. |
| Trace Index | The OpenSearch index pattern otel-v1-apm-span-* storing trace span data. |
| Log Index | The OpenSearch index pattern otel-v1-apm-log-* storing log data. |
| Service Map Index | The OpenSearch index otel-v2-apm-service-map storing service dependency topology. |
| Gen AI Attributes | OpenTelemetry semantic convention attributes for generative AI operations, prefixed with gen_ai.* (e.g., gen_ai.operation.name, gen_ai.agent.name, gen_ai.usage.input_tokens). |
| Stack | The complete observability infrastructure: OTel Collector, Data Prepper, OpenSearch, Prometheus, and OpenSearch Dashboards. |
| Cross-Signal Correlation | The practice of linking telemetry signals (traces, logs, metrics) using shared identifiers such as traceId and spanId to enable end-to-end investigation. |
| Exemplar | A Prometheus data structure that links an individual metric sample to a specific trace by carrying trace_id and span_id alongside the measurement value. Enables metric-to-trace correlation. |
| Test Fixture | A YAML file defining a single integration test case with command, expected status code, expected response fields, and tags. |
| PPL Grammar Source | The official OpenSearch PPL grammar documentation located in the opensearch-project/sql repository under docs/user/ppl/. |
| RED Metrics | Rate, Errors, Duration: the three golden signals for service-level APM monitoring. Rate measures throughput, Errors measures failure ratio, Duration measures latency distribution. |
| SLI | Service Level Indicator: a quantitative measurement of a service's behavior, such as the ratio of successful requests to total requests. |
| SLO | Service Level Objective: a target value or range for an SLI, such as "99.9% availability over 30 days." |
| Error Budget | The allowed amount of unreliability derived from an SLO. For a 99.9% SLO, the error budget is 0.1%. |
| Burn Rate | The speed at which the error budget is being consumed. A burn rate of 1x means the budget will be exhausted exactly at the end of the SLO window. |
| Recording Rule | A Prometheus configuration that pre-computes and stores the result of a PromQL expression as a new time series, enabling efficient querying of SLI metrics at multiple time windows. |
| AWS SigV4 | AWS Signature Version 4, the authentication protocol used to sign HTTP requests to AWS services including Amazon OpenSearch Service and Amazon Managed Prometheus. |
Architecture
System Context
graph TB
subgraph "Claude Code Plugin"
CM[CLAUDE.md<br/>Entry Point]
subgraph "skills/"
TS[traces.md]
LS[logs.md]
MS[metrics.md]
SH[stack-health.md]
PR[ppl-reference.md]
CR[correlation.md]
AR[apm-red.md]
SL[slo-sli.md]
end
subgraph "tests/"
CF[conftest.py]
TF[test_fixtures.py]
TR[test_runner.py]
FX[fixtures/*.yaml]
end
end
subgraph "Observability Stack"
OS[OpenSearch :9200<br/>HTTPS + Basic Auth]
PM[Prometheus :9090<br/>HTTP]
OC[OTel Collector :4317/:4318]
DP[Data Prepper :21890]
end
CM -->|references| TS
CM -->|references| LS
CM -->|references| MS
CM -->|references| SH
CM -->|references| PR
CM -->|references| CR
CM -->|references| AR
CM -->|references| SL
TS -->|PPL queries via curl| OS
LS -->|PPL queries via curl| OS
CR -->|PPL queries via curl| OS
CR -->|PromQL + exemplars via curl| PM
AR -->|PromQL RED queries via curl| PM
AR -->|PPL RED queries via curl| OS
SL -->|PromQL SLO queries via curl| PM
SH -->|health checks via curl| OS
SH -->|health checks via curl| PM
SH -->|health checks via curl| OC
MS -->|PromQL queries via curl| PM
PR -->|PPL reference for| OS
TR -->|validates commands from| FX
CF -->|checks health of| OS
CF -->|checks health of| PM
Data Flow
flowchart LR
A[User asks Claude Code<br/>an observability question] --> B[Claude Code reads CLAUDE.md]
B --> C{Route by intent}
C -->|trace investigation| D[Load traces.md]
C -->|log search| E[Load logs.md]
C -->|metrics query| F[Load metrics.md]
C -->|stack issues| G[Load stack-health.md]
C -->|PPL syntax help| H[Load ppl-reference.md]
C -->|cross-signal correlation| X[Load correlation.md]
C -->|RED metrics / APM| Y[Load apm-red.md]
C -->|SLO/SLI / error budget| Z[Load slo-sli.md]
D --> I[Execute curl command<br/>against OpenSearch PPL API]
E --> I
F --> J[Execute curl command<br/>against Prometheus API]
G --> K[Execute curl/docker commands<br/>against stack endpoints]
H --> L[Reference for constructing<br/>novel PPL queries]
X --> I
X --> J
Y --> I
Y --> J
Z --> J
What's Included
Eight Skill Files
The plugin ships as a CLAUDE.md entry point plus eight skill files in a skills/ directory:
| Skill | What it does | Query language | Target |
|---|---|---|---|
traces.md |
Query trace spans: agent invocations, tool executions, slow spans, errors, token usage, trace tree reconstruction, cross-signal correlation | PPL | OpenSearch :9200 |
logs.md |
Query logs: severity filtering, trace correlation, error patterns, log volume, body search | PPL | OpenSearch :9200 |
metrics.md |
Query metrics: HTTP rates, latency percentiles, error rates, GenAI token usage, operation duration | PromQL | Prometheus :9090 |
stack-health.md |
Health checks for all stack components, troubleshooting guide, port reference | curl + docker | All services |
ppl-reference.md |
Comprehensive PPL language reference: 50+ commands, 14 function categories, 3 API endpoints | n/a | Reference |
correlation.md |
Cross-signal correlation: trace-log joins via PPL, metric-to-trace via Prometheus exemplars, resource-level correlation, investigation workflows | PPL + PromQL | OpenSearch + Prometheus |
apm-red.md |
APM RED metrics: per-service request rate, error ratio, latency percentiles (p50/p95/p99), GenAI RED, OTel HTTP semantic conventions | PromQL + PPL | Prometheus + OpenSearch |
slo-sli.md |
SLO/SLI monitoring: SLI definitions, Prometheus recording rules, error budgets, multi-window burn rate alerts, compliance reporting | PromQL | Prometheus :9090 |
Plugin Directory Structure
claude-code-observability-plugin/
├── CLAUDE.md # Entry point, routing table for skills
├── skills/
│ ├── traces.md # Trace querying with PPL
│ ├── logs.md # Log querying with PPL
│ ├── metrics.md # Metrics querying with PromQL
│ ├── stack-health.md # Health checks and troubleshooting
│ ├── ppl-reference.md # Comprehensive PPL language reference
│ ├── correlation.md # Cross-signal correlation workflows
│ ├── apm-red.md # APM RED metrics (Rate, Errors, Duration)
│ └── slo-sli.md # SLO/SLI definitions, error budgets, burn rates
└── tests/
├── README.md # Test documentation
├── conftest.py # Session fixtures, stack health gate
├── test_runner.py # YAML-driven test execution
├── models.py # Pydantic test fixture model
├── requirements.txt # pytest, pyyaml, pydantic, requests
└── fixtures/
├── traces.yaml # Trace skill test cases
├── logs.yaml # Log skill test cases
├── metrics.yaml # Metrics skill test cases
├── stack-health.yaml # Stack health test cases
├── ppl.yaml # PPL reference test cases
├── correlation.yaml # Correlation skill test cases
├── apm-red.yaml # APM RED skill test cases
└── slo-sli.yaml # SLO/SLI skill test cases
Skill File Format
Each skill file follows the Claude Code CLAUDE.md convention:
---
name: <skill-name>
description: <one-line summary>
allowed-tools:
- Bash
- curl
---Every query template is a complete, copy-paste-ready curl command with:
- Correct protocol (HTTPS for OpenSearch, HTTP for Prometheus)
- Authentication (
-u admin:'My_password_123!@#'for OpenSearch, none for Prometheus) - Certificate skip (
-kfor development) - Proper JSON body with PPL/PromQL query
- Backtick escaping for dotted field names in PPL
Requirements
Requirement 1: Plugin Directory Structure
As a developer, I want the plugin organized as a directory of skill files with a top-level CLAUDE.md entry point, so that Claude Code automatically loads the observability capabilities when I work in the project.
- The plugin contains a top-level CLAUDE.md that references all skill files
- Skill files live in a single
skills/directory - Eight skill files: traces, logs, metrics, stack-health, ppl-reference, correlation, apm-red, and slo-sli
- Each skill file includes frontmatter with
name,description, andallowed-tools
Requirement 2: Traces Skill
As a developer, I want to query trace data from OpenSearch using PPL, so that I can investigate agent invocations, tool executions, slow spans, error spans, and token usage.
- PPL query templates for agent invocation spans (
attributes.gen_ai.operation.name = invoke_agent) - PPL query templates for tool execution spans (
attributes.gen_ai.operation.name = execute_tool) - Slow span detection where
durationInNanosexceeds a configurable threshold - Error span identification where
status.code = 2 - Token usage aggregation by model and by agent name
- Service operation listing with GenAI operation type breakdown
- Service map queries for dependency exploration
- All GenAI attributes documented with descriptions and example values
- Every PPL query includes the complete curl command with endpoint, auth, and escaping
Requirement 3: Logs Skill
As a developer, I want to query log data from OpenSearch using PPL, so that I can search logs by severity, correlate logs with traces, identify error patterns, and analyze log volume.
- Severity-based filtering (ERROR, WARN, INFO)
- Trace-to-log correlation via
traceId - Error pattern identification with
stats count() byaggregations - Log volume trending over time with
span(time, <interval>) - Full-text body search with string matching or relevance functions
- Log Index field reference:
severityText,severityNumber,traceId,spanId,serviceName,body,@timestamp
Requirement 4: Metrics Skill
As a developer, I want to query metrics from Prometheus using PromQL, so that I can monitor HTTP request rates, latency percentiles, error rates, and active connections.
- HTTP request rate per second grouped by service
- HTTP latency at p95 and p99 by service
- HTTP error rate (5xx) as a ratio
- Active HTTP connections by service
- Database operation latency at p95
- Every PromQL query includes the complete curl command targeting
localhost:9090/api/v1/query - Note on PPL as alternative for OpenSearch-ingested metrics
Requirement 5: Stack Health Skill
As a developer, I want to check the health of all observability stack components and troubleshoot common issues, so that I can verify the stack is operational and diagnose data flow problems.
- Health check curl commands for OpenSearch, Prometheus, OTel Collector
- Index listing and document count verification
- Docker compose commands for container status and logs
- Troubleshooting section for common failures: OpenSearch unreachable, no data in indices, Data Prepper pipeline errors, OTel Collector export failures
- Port reference: OpenSearch (9200), OTel Collector gRPC (4317), OTel Collector HTTP (4318), Data Prepper (21890), Prometheus (9090), OpenSearch Dashboards (5601)
- PPL
describefor index mapping inspection - PPL
_explainendpoint for query plan debugging
Requirement 6: PPL Reference Skill
As a developer, I want a comprehensive PPL language reference available to Claude Code, so that Claude Code can understand PPL syntax and construct correct queries for any observability question.
Commands (50+):
- Core query:
search,source,where,fields,stats,sort,head,eval,dedup,rename,top,rare,table - Time-series:
timechart,chart,bin,trendline,streamstats,eventstats - Parse/extract:
parse,grok,rex,regex,patterns,spath - Join/lookup:
join,lookup,graphlookup,subquery,append,appendcol,appendpipe - Transform:
fillnull,flatten,expand,transpose,convert,replace,reverse - Multi-value:
mvexpand,mvcombine,nomv - Aggregation/totals:
addcoltotals,addtotals - ML:
ad(anomaly detection),kmeans,ml - System:
describe,explain,showdatasources,multisearch - Display:
fieldformat
Functions (14 categories):
- Aggregation: COUNT, SUM, AVG, MAX, MIN, VAR_SAMP, VAR_POP, STDDEV_SAMP, STDDEV_POP, DISTINCT_COUNT, PERCENTILE, EARLIEST, LATEST, LIST, VALUES, FIRST, LAST
- Collection: ARRAY, SPLIT, MVJOIN, MVCOUNT, MVINDEX, MVFIRST, MVLAST, MVAPPEND, MVDEDUP, MVSORT, MVZIP, MVRANGE, MVFILTER
- Condition: ISNULL, ISNOTNULL, IF, IFNULL, NULLIF, CASE, COALESCE, LIKE, IN, BETWEEN
- Conversion: CAST, TOSTRING, TONUMBER, TOINT, TOLONG, TOFLOAT, TODOUBLE, TOBOOLEAN
- Cryptographic: MD5, SHA1, SHA2
- Datetime: NOW, CURDATE, CURTIME, DATE_FORMAT, DATE_ADD, DATE_SUB, DATEDIFF, DAY, MONTH, YEAR, HOUR, MINUTE, SECOND, DAYOFWEEK, DAYOFYEAR, WEEK, UNIX_TIMESTAMP, FROM_UNIXTIME, and more
- Expressions: arithmetic (+, -, *, /), comparison (=, !=, <, >, <=, >=), logical (AND, OR, NOT, XOR)
- IP: CIDRMATCH, GEOIP
- JSON: JSON_EXTRACT, JSON_KEYS, JSON_VALID, JSON_ARRAY, JSON_OBJECT, JSON_ARRAY_LENGTH, JSON_EXTRACT_PATH_TEXT, TO_JSON_STRING
- Math: ABS, CEIL, FLOOR, ROUND, SQRT, POW, MOD, LOG, LOG2, LOG10, LN, EXP, and more
- Relevance: MATCH, MATCH_PHRASE, MULTI_MATCH, QUERY_STRING, SIMPLE_QUERY_STRING, HIGHLIGHT, SCORE, WILDCARD_QUERY
- Statistical: CORR, COVAR_POP, COVAR_SAMP
- String: CONCAT, LENGTH, LOWER, UPPER, TRIM, SUBSTRING, REPLACE, REGEXP, REGEXP_EXTRACT, REGEXP_REPLACE, and more
- System: TYPEOF
API Endpoints:
- Query execution:
POST /_plugins/_pplwith JSON body{"query": "<ppl_query>"} - Query explain:
POST /_plugins/_ppl/_explain - Grammar metadata:
GET /_plugins/_ppl/_grammar
Source: Grammar reference sourced from the opensearch-project/sql repository's docs/user/ppl/ directory.
Requirement 7: Skill File Format Compliance
- Each skill file is valid markdown with YAML frontmatter delimited by
--- - Frontmatter contains
name,description, andallowed-toolsfields - Top-level CLAUDE.md references each skill file path with a one-line summary
- Credentials sourced from
.envfile (admin /My_password_123!@#), noted as configurable
Requirement 8: Authentication and Connection Details
| Service | Protocol | Port | Auth |
|---|---|---|---|
| OpenSearch (local) | HTTPS | 9200 | Basic auth (admin / My_password_123!@#), -k flag for cert skip |
| OpenSearch (AWS managed) | HTTPS | 443 | AWS SigV4 (--aws-sigv4 "aws:amz:REGION:es") |
| Prometheus (local) | HTTP | 9090 | None |
| Prometheus (AWS managed) | HTTPS | 443 | AWS SigV4 (--aws-sigv4 "aws:amz:REGION:aps") |
| OTel Collector | HTTP | 4317 (gRPC), 4318 (HTTP) | None |
| Data Prepper | HTTP | 21890 | None |
| OpenSearch Dashboards | HTTP | 5601 | Same as OpenSearch |
All credentials are sourced from the repository .env file. The test harness reads .env with fallback to these defaults.
Skill files provide curl command variants for both local and AWS managed endpoints. The CLAUDE.md entry point includes a configuration section where users set $OPENSEARCH_ENDPOINT and $PROMETHEUS_ENDPOINT environment variables to switch between local and managed services. PPL and PromQL query syntax is identical across both profiles; only the endpoint URL and authentication method differ.
Requirement 9: PPL Grammar Source Documentation
- Grammar reference sourced from
opensearch-project/sqlrepository'sdocs/user/ppl/directory - Repository URL included:
https://github.com/opensearch-project/sql - Commands organized into logical categories
- Functions organized into categories matching the source repository
Requirement 10: Cross-Signal Correlation and GenAI Debugging
As a developer, I want the plugin skills to support cross-signal correlation between traces, logs, and metrics, and provide GenAI-specific debugging capabilities, so that I can perform end-to-end observability investigations across all telemetry signals.
Cross-signal correlation:
- Trace-to-log joins by matching
traceIdacross Trace Index and Log Index - Log-to-span correlation by
spanId - Full trace tree reconstruction by
traceIdwithparentSpanIdhierarchy - Latency gap analysis between parent and child spans
- Root span identification where
parentSpanIdis empty or null
GenAI operation types (beyond invoke_agent and execute_tool):
chat,embeddings,retrieval,create_agent,text_completion,generate_content
Exception and error querying:
- Span events with
exception.type,exception.message,exception.stacktrace - Spans with
error.typefor error categorization - Exception-to-log correlation via shared
traceIdandspanId
Extended GenAI attributes:
gen_ai.agent.id,gen_ai.agent.description,gen_ai.agent.versiongen_ai.conversation.idfor multi-turn conversation trackinggen_ai.tool.call.id,gen_ai.tool.type,gen_ai.tool.call.arguments,gen_ai.tool.call.result
GenAI-specific metrics:
gen_ai_client_token_usagehistogram grouped by operation and modelgen_ai_client_operation_durationhistogram grouped by operation and model
Requirement 11: Integration Test Harness
As a developer, I want an integration test suite that validates all skill file commands against a running observability stack, so that I can verify the plugin's queries and health checks produce correct results.
Test infrastructure:
- pytest test suite in a
tests/directory within the plugin - YAML fixture files defining test cases with
command,expected_status_code,expected_fields, andtags - Pydantic model for strict schema validation (
extra="forbid") - Session-scoped fixture that checks stack health before tests run
- All tests skipped with clear message if stack is not running
Test categories:
traces: PPL queries against Trace Index, validateschemaanddatarowsin responselogs: PPL queries against Log Index, validate response structuremetrics: PromQL queries against Prometheus, validatestatus: "success"anddatafieldstack-health: Health check commands, validate HTTP 200 status codesppl: PPL system commands (describe,_explain), validate response structurecorrelation: Cross-signal correlation queries, validate join results and exemplar responsesapm_red: RED metric queries against Prometheus and OpenSearch, validate rate/error/duration responsesslo_sli: SLO/SLI queries against Prometheus, validate recording rule outputs and burn rate calculations
Test execution:
- Commands executed via
subprocess.runwith configurable timeout (default 30s) - JSON response parsing with recursive field lookup for
expected_fields - pytest markers for tag-based filtering (
pytest -m traces) before_testandafter_testhooks in YAML for setup/teardown scripts
Configuration:
- Connection details read from
.envwith fallback defaults - Dependencies:
pytest,pyyaml,pydantic,requests,hypothesis - README documenting how to run tests, prerequisites, and how to add new test cases
Requirement 12: Correlation Skill
As a developer, I want a dedicated correlation skill that teaches Claude Code how to join traces, logs, and metrics across all three telemetry signals using OTel semantic convention correlation fields, so that I can perform end-to-end investigations starting from any signal.
OTel correlation fields (sourced from opentelemetry.io):
The OTel specification defines three correlation mechanisms across signals:
| Mechanism | Fields | Signals Connected | How It Works |
|---|---|---|---|
| Trace context | traceId, spanId, traceFlags |
Traces + Logs | Both span records and log records carry the same traceId/spanId, enabling direct joins |
| Exemplars | trace_id, span_id, filtered_attributes |
Metrics + Traces | Prometheus exemplars attach trace context to individual metric samples |
| Resource attributes | service.name, service.namespace, service.version, service.instance.id |
All three signals | Every span, metric data point, and log record from the same service carries identical resource attributes |
GenAI resource attributes promoted to Prometheus labels in this stack:
gen_ai.agent.id,gen_ai.agent.name,gen_ai.provider.name,gen_ai.request.model,gen_ai.response.model- These are configured in
docker-compose/prometheus/prometheus.ymlunderotlp.promote_resource_attributes - This enables PromQL queries filtered by agent or model that can then be correlated to traces via exemplars
Trace-to-log correlation (PPL):
- Find all logs for a trace:
source=otel-v1-apm-log-* | WHERE traceId = '<id>' - Find logs for a specific span:
source=otel-v1-apm-log-* | WHERE spanId = '<id>' - Join spans with logs: PPL
joinacross Trace Index and Log Index ontraceId - Full timeline reconstruction: all spans + all logs for a
traceId, sorted by timestamp
Log-to-trace correlation (PPL):
- From an error log, extract
traceIdand query the Trace Index for the full trace tree - From a log entry, extract
spanIdand find the exact span that produced it
Metric-to-trace correlation (PromQL + exemplars):
- Query Prometheus exemplars API:
GET /api/v1/query_exemplars?query=<metric>&start=<start>&end=<end> - Extract
trace_idfrom exemplar, then query Trace Index via PPL - Filter metrics by GenAI labels (
gen_ai_agent_name,gen_ai_request_model), then correlate to traces
Resource-level correlation:
serviceNamein traces/logs maps toservice_namelabel in Prometheus metrics- Query all signals for a specific service to get the complete picture
Investigation workflows:
- Metric spike investigation: PromQL anomaly detection, exemplars, trace tree, correlated logs
- Error log investigation: find error logs, extract traceId, reconstruct trace, identify root cause span
- Slow agent investigation: find slow invoke_agent spans, get child spans, correlated logs, token usage metrics
Requirement 13: APM/RED Metrics Skill
As a developer, I want a dedicated APM skill that teaches Claude Code how to construct RED (Rate, Errors, Duration) metrics queries for any service, so that I can quickly assess service health using the standard APM methodology.
- Rate queries: per-service request rate via PromQL (
rate(http_server_duration_seconds_count[5m])), per-endpoint rate, and PPL alternative from trace spans - Error queries: error rate as a ratio (5xx / total) via PromQL, error count from trace spans via PPL (
status.code = 2) - Duration queries: latency percentiles (p50, p95, p99) via PromQL
histogram_quantileand PPLpercentile()from trace spans - Combined RED dashboard query set for all services in a single investigation workflow
- GenAI-specific RED metrics using
gen_ai_client_operation_durationhistogram - OTel HTTP semantic convention metrics reference:
http.server.request.duration(histogram),http.server.active_requests(gauge), and their Prometheus-exported equivalents - OTel Collector
spanmetricsconnector documentation for auto-generating RED metrics from traces - Every query template includes the complete curl command with the appropriate endpoint and authentication
Requirement 14: SLO/SLI Skill
As a developer, I want a dedicated SLO/SLI skill that teaches Claude Code how to define SLIs, calculate error budgets, and construct burn rate queries using Prometheus recording rules, so that I can implement and monitor service level objectives for my services.
- SLI definition templates: availability SLI (successful/total ratio), latency SLI (within-threshold/total ratio), GenAI-specific SLI
- Prometheus recording rule YAML templates for pre-computing SLIs at multiple time windows (5m, 30m, 1h, 6h, 1d, 3d, 30d)
- Recording rule naming conventions:
sli:http_availability:ratio_rate<window>,sli:http_latency:ratio_rate<window> - Error budget calculation: remaining budget given an SLO target, consumption rate, common SLO targets (99.9%, 99.5%, 99.0%) with allowed downtime
- Burn rate queries: single-window and multi-window (Google SRE book pattern: 14.4x fast burn 1h/6h, 1x slow burn 3d/30d)
- Prometheus alerting rule YAML templates for burn rate alerts
- SLO compliance reporting: current SLI value, SLO target, error budget remaining, burn rate per service
- Step-by-step SLO setup workflow: define SLIs, add recording rules, set targets, add burn rate alerts, query compliance
- Every query template includes the complete curl command with the appropriate Prometheus endpoint and authentication
Data Models
OpenSearch Trace Index Schema (otel-v1-apm-span-*)
| Field | Type | Description |
|---|---|---|
traceId |
keyword | Unique trace identifier |
spanId |
keyword | Unique span identifier |
parentSpanId |
keyword | Parent span ID (empty for root) |
serviceName |
keyword | Service that produced the span |
name |
text | Span operation name |
kind |
keyword | Span kind (SERVER, CLIENT, INTERNAL, etc.) |
startTime |
date | Span start timestamp |
endTime |
date | Span end timestamp |
durationInNanos |
long | Span duration in nanoseconds |
status.code |
integer | Status code (0=Unset, 1=Ok, 2=Error) |
attributes.gen_ai.operation.name |
keyword | GenAI operation type |
attributes.gen_ai.agent.name |
keyword | Agent name |
attributes.gen_ai.agent.id |
keyword | Agent identifier |
attributes.gen_ai.request.model |
keyword | Requested model |
attributes.gen_ai.usage.input_tokens |
long | Input token count |
attributes.gen_ai.usage.output_tokens |
long | Output token count |
attributes.gen_ai.tool.name |
keyword | Tool name |
attributes.gen_ai.tool.call.id |
keyword | Tool call identifier |
attributes.gen_ai.tool.call.arguments |
text | Tool call arguments |
attributes.gen_ai.tool.call.result |
text | Tool call result |
attributes.gen_ai.conversation.id |
keyword | Conversation identifier |
events.attributes.exception.type |
keyword | Exception type |
events.attributes.exception.message |
text | Exception message |
events.attributes.exception.stacktrace |
text | Exception stacktrace |
OpenSearch Log Index Schema (otel-v1-apm-log-*)
| Field | Type | Description |
|---|---|---|
traceId |
keyword | Correlated trace identifier |
spanId |
keyword | Correlated span identifier |
severityText |
keyword | Log level (ERROR, WARN, INFO, DEBUG) |
severityNumber |
integer | Numeric severity |
serviceName |
keyword | Service that produced the log |
body |
text | Log message body |
@timestamp |
date | Log timestamp |
OpenSearch Service Map Index (otel-v2-apm-service-map)
| Field | Type | Description |
|---|---|---|
serviceName |
keyword | Source service |
destination.domain |
keyword | Destination service |
destination.resource |
keyword | Destination resource |
traceGroupName |
keyword | Trace group |
Prometheus Metrics
| Metric | Type | Labels |
|---|---|---|
http_server_duration_seconds |
histogram | service_name, http_response_status_code |
http_server_active_requests |
gauge | service_name |
db_client_operation_duration_seconds |
histogram | service_name |
gen_ai_client_token_usage |
histogram | gen_ai.operation.name, gen_ai.request.model |
gen_ai_client_operation_duration |
histogram | gen_ai.operation.name, gen_ai.request.model |
Connection Profiles
| Profile | OpenSearch Endpoint | OpenSearch Auth | Prometheus Endpoint | Prometheus Auth |
|---|---|---|---|---|
| Local | https://localhost:9200 |
Basic auth (-u admin:'My_password_123!@#' -k) |
http://localhost:9090 |
None |
| AWS Managed | https://DOMAIN-ID.REGION.es.amazonaws.com |
AWS SigV4 (--aws-sigv4 "aws:amz:REGION:es") |
https://aps-workspaces.REGION.amazonaws.com/workspaces/WORKSPACE_ID |
AWS SigV4 (--aws-sigv4 "aws:amz:REGION:aps") |
Test Fixture YAML Schema
- name: "agent_invocations"
description: "Query all agent invocation spans"
command: |
curl -sk -u admin:'My_password_123!@#' \
-X POST https://localhost:9200/_plugins/_ppl \
-H 'Content-Type: application/json' \
-d '{"query": "source=otel-v1-apm-span-* | WHERE `attributes.gen_ai.operation.name` = '\''invoke_agent'\'' | head 10"}'
expected_status_code: 200
expected_fields: ["schema", "datarows"]
tags: ["traces"]
before_test: null
after_test: nullDesign Decisions
Why a flat skills/ directory?
Eight files don't need subdirectories. Flat is simpler to reference from CLAUDE.md and easier for contributors to navigate.
Why complete curl commands instead of just query bodies?
Claude Code can execute curl directly via its Bash tool. Including the full command (endpoint, auth, headers, body) means zero assembly required. The skill file is the executable documentation.
Why a dedicated PPL reference file?
The PPL grammar is large (50+ commands, 14 function categories). Inlining it into traces.md or logs.md would bloat those files. As a separate skill, Claude Code loads it on demand when it needs to construct a novel query.
Why YAML test fixtures instead of inline pytest?
Declarative YAML fixtures are easier for contributors to add (no Python knowledge needed to add a test case). The Pydantic schema catches malformed fixtures at load time. This pattern is proven at scale in HolmesGPT's test suite.
Why read credentials from .env?
The observability stack already centralizes configuration in .env. The plugin and test harness reuse the same source of truth rather than duplicating credentials.
Error Handling
Skill File Errors
| Scenario | Handling |
|---|---|
| OpenSearch unreachable | Stack health skill provides diagnostic steps: check docker compose ps, verify port 9200, check health endpoint |
| Prometheus unreachable | Stack health skill suggests checking container status and port 9090 |
| PPL query syntax error | PPL reference skill provides syntax guidance; _explain endpoint helps debug query plans |
| Authentication failure | Skill files document correct credentials from .env; stack health skill suggests verifying credentials |
| No data in indices | Stack health skill provides index listing commands and document count verification |
| Data Prepper pipeline errors | Stack health skill suggests checking Data Prepper logs via docker compose logs data-prepper |
| OTel Collector export failures | Stack health skill suggests checking collector metrics at port 8888 and logs |
Test Harness Errors
| Scenario | Handling |
|---|---|
| Stack not running | Session-scoped fixture detects this and skips all tests with clear message |
| Curl command timeout | Configurable timeout (default 30s); test fails with timeout error |
| Invalid YAML fixture | Pydantic model with extra="forbid" raises validation error at load time |
| Unexpected JSON response | Test reports which expected_fields were missing from the response |
| Hook failure | Test reports before_test/after_test hook failure separately from the main command result |
| Missing .env file | Config loader falls back to hardcoded defaults |
Running the Tests
Prerequisites: the observability stack must be running (docker compose up -d).
cd claude-code-observability-plugin/tests
# Install dependencies
pip install -r requirements.txt
# Run all tests
pytest
# Run by category
pytest -m traces
pytest -m logs
pytest -m metrics
pytest -m stack_health
pytest -m ppl
# Verbose output
pytest -v --tb=shortIf the stack is not running, all tests are skipped with a clear message.
Open Questions
-
Plugin location: Should the plugin live at the repo root (
claude-code-observability-plugin/) or under a newplugins/directory? -
Versioning: Should the plugin version track the observability stack version, or have its own independent version?
-
Additional AI assistants: The skill file format is Claude Code-specific (CLAUDE.md convention). Should we also provide equivalent configurations for other AI coding assistants (e.g., Cursor rules, Kiro steering)?
-
Metrics in OpenSearch: The metrics skill currently targets Prometheus. Should we also include PPL queries for metrics stored in OpenSearch (when metrics are ingested via Data Prepper)?
-
Example telemetry data: Should the test harness include a script that sends sample telemetry data to the stack, so tests can validate queries return actual results rather than just valid empty responses?
How to Contribute
Adding a new query template to a skill file:
- Add the curl command to the appropriate
skills/*.mdfile - Add a corresponding test fixture in
tests/fixtures/*.yaml - Run
pytestto verify the command works against a running stack
Adding a new test case:
- Create a YAML entry in the appropriate
tests/fixtures/*.yamlfile - Follow the schema:
name,description,command,expected_status_code,expected_fields,tags - Run
pytest -m <tag>to verify
Feedback Requested
We'd like feedback on:
- The skill file organization and routing approach
- Which query templates are most valuable for your workflow
- The open questions above
- Any missing capabilities or query patterns you'd want included
- The integration test approach and fixture format
Please comment on this RFC or open an issue with your thoughts.