Add Claude Code observability plugin with skills, tests, and docs#120
Conversation
- 8 skill files (traces, logs, metrics, stack-health, ppl-reference, correlation, apm-red, slo-sli) - Plugin manifest (.claude-plugin/plugin.json) - Marketplace manifest (.claude-plugin/marketplace.json) at repo root - CLAUDE.md routing table and configuration - Property-based test suite (381 tests) - Docs page with installation instructions for Claude Code and Claude Desktop - Sidebar entry in astro.config.mjs Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
… docs - Fix log index pattern from otel-v1-apm-log-* to logs-otel-v1-* - Fix log serviceName field to resource.attributes.service.name - Fix Python 3.9 compatibility (type union syntax) - Fix PPL explain expected field (calcite not root) for OpenSearch 3.6 - Fix conftest to use localhost instead of Docker-internal hostname - Add 16 new integration tests (53 total) against real stack data - Add advanced PPL patterns: coalesce() for remote service identification, service map topology queries, batch traceId IN correlation - Add advanced PromQL patterns: clamp_min() safe division, topk/bottomk, service availability calculations - Add docs: INSTALL.md, USAGE.md with 50+ sample questions - Add Starlight docs: usage guide and showcase with 9 real-world scenarios - Update CLAUDE.md with index patterns reference table - All 453 tests passing (381 property + 72 integration) Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #120 +/- ##
=======================================
Coverage 18.51% 18.51%
=======================================
Files 3 3
Lines 54 54
Branches 18 19 +1
=======================================
Hits 10 10
Misses 44 44 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
3223fb7 to
1d41abf
Compare
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep Claude Code sidebar section, accept upstream removal of SDKs & API. Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Nice Ani. Few suggestions I got. |
| -d '{"query": "source=logs-otel-v1-* | where traceId = '\''<TRACE_ID>'\'' AND spanId = '\''<SPAN_ID>'\'' AND severityText = '\''ERROR'\'' | fields body, severityText, `@timestamp`"}' | ||
| ``` | ||
|
|
||
| ## PPL Commands for Log Analysis |
There was a problem hiding this comment.
Let's also add links to PPL live docs: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/index.md, so Claude can fetch these live docs in-case some queries fail due to OpenSearch version differences or new syntax update.
There was a problem hiding this comment.
Good idea — adding PPL live docs link (https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/index.md) to logs, traces, ppl-reference, correlation, and apm-red skills so Claude can fetch latest syntax when queries fail.
| "name": "observability-stack", | ||
| "owner": { | ||
| "name": "OpenSearch Project", | ||
| "email": "anirudha@nyu.edu" |
There was a problem hiding this comment.
Should we change this to some OS email alias?
There was a problem hiding this comment.
Keeping personal email for now until we finalize a proper OS alias. Will update once decided.
| @@ -0,0 +1,278 @@ | |||
| --- | |||
| name: stack-health | |||
There was a problem hiding this comment.
Missing: OpenSearch Dashboards (OSD) API Integration
The plugin currently queries OpenSearch (:9200) and Prometheus (:9090) directly but never talks to OpenSearch Dashboards (:5601). This means it misses the configuration and correlation layer that OSD provides — especially workspace-scoped configs that define how signals relate to each other per environment/team.
1. No OSD API integration — only raw OpenSearch/Prometheus curls
The plugin hardcodes index patterns and field names instead of discovering them from OSD. It misses:
- Datasets / Index pattern / schema mappings / Correlations / APM Config — OSD's saved objects API (
GET /api/saved_objects/_find?type=index-pattern) lets you discover what index patterns exist and their field mappings.
2. APM config correlation object is absent
The OSD observability plugin has data source / correlation configuration objects that define:
- Which trace index connects to which log index
- Service-to-index mappings
- Correlation rules (e.g., link
traceIdfrom span index to log index)
The plugin hardcodes index patterns (otel-v1-apm-span-*, logs-otel-v1-*) instead of reading them from OSD's config. Use the APM correlations saved object to create these interlinkings for easy root-cause analysis
3. Topology via APM service map index
The skills should add capabilities query otel-v2-apm-service-map-* directly via PPL to see service connections dependencies, and operations.
4. Workspace-scoped configuration is completely absent
This is a significant gap. These configs are per-workspace in OSD. The plugin:
- Doesn't reference workspaces at all
- Doesn't show how to query workspace-scoped saved objects
- Doesn't account for the
workspace_idparameter in OSD API calls - Doesn't explain how different workspaces might have different index pattern configurations, correlation rules, or APM configs
For workspace-aware queries, the plugin would need something like:
# List workspaces
curl -s http://localhost:5601/api/workspaces/_list
# Get workspace-scoped index patterns
curl -s http://localhost:5601/w/<workspace_id>/api/saved_objects/_find?type=index-pattern
# Get workspace-scoped observability configs
curl -s http://localhost:5601/w/<workspace_id>/api/observability/...5. No dynamic index/field discovery
The plugin hardcodes all index patterns and field names. It should teach Claude how to discover them dynamically:
# Fetch log index mappings
curl -sk -u admin:'...' https://localhost:9200/logs-otel-v1-*/_mapping
# Fetch trace index mappings
curl -sk -u admin:'...' https://localhost:9200/otel-v1-apm-span-*/_mappingThe PPL describe command is already in stack-health/SKILL.md but should also be in correlation/SKILL.md where it's most needed for field discovery during cross-signal investigations.
Summary of Recommended Additions
| Gap | Where to Add | Priority |
|---|---|---|
OSD Dashboards API integration (:5601) |
New skill or extend stack-health | High |
| APM correlation config objects from OSD | correlation/SKILL.md | High |
Workspace-scoped queries (/w/<workspace_id>/api/...) |
All skills or new osd-config skill | High |
| Service topology via PPL | traces/SKILL.md or correlation/SKILL.md | Medium |
| Dynamic index/field discovery from OSD saved objects | correlation/SKILL.md and stack-health/SKILL.md | Medium |
| Which log/trace indexes map to which services (from OSD config) | correlation/SKILL.md | Medium |
Bottom line: The plugin is solid for direct OpenSearch + Prometheus querying, but it treats them as standalone backends. The missing layer is OSD as the configuration and correlation authority especially with workspace-scoped configs that define how signals relate to each other per environment/team. Consider adding an osd-config skill or extending existing skills to query OSD APIs with workspace awareness.
There was a problem hiding this comment.
Great feedback. For this PR we'll keep the scope to direct OpenSearch + Prometheus querying. The OSD API integration (workspace-scoped configs, APM correlation objects, dynamic index discovery from saved objects) is a solid next step — will track as a follow-up. Adding describe to correlation skill and improving service map PPL queries in this PR though.
There was a problem hiding this comment.
I believe this should be part of P0, without this actual RCAs are difficult, you wouldn't know which indexes are for logs and which are for traces, you can only stick with defaults with this knowledge base.
There was a problem hiding this comment.
Agreed — added both in this PR:
-
New
osd-configskill (skills/osd-config/SKILL.md) — OSD API integration for workspace discovery, index pattern resolution, APM correlation configs, saved objects, and workspace-scoped queries at:5601 -
Dynamic index/field discovery via OpenSearch API — added
_mappingand PPLdescribecommands tostack-healthandcorrelationskills so Claude can discover indices and fields dynamically without OSD
This way Claude can resolve indices/fields dynamically whether OSD is available or not.
| @@ -0,0 +1,77 @@ | |||
| - name: red_rate_promql | |||
There was a problem hiding this comment.
These do not cover actual RED metrics generated by the data prepper apm service map processor. We need to check error, fault, request and latency_seconds_bucket in Prometheus. These are gauge metrics so even the rate queries won't work here.
Let's add some test cases like suggest get p99 latency for frontend service for past 1 hour. Use some queries from the APM code probably: https://github.com/opensearch-project/dashboards-observability/blob/main/public/components/apm/query_services/query_requests/promql_queries.ts
There was a problem hiding this comment.
Good catch — the current tests use http_server_duration_seconds (OTel SDK histogram metrics) and miss the Data Prepper APM-generated gauge metrics (request, error, fault, latency_seconds_bucket). Will add test cases using the PromQL patterns from the dashboards-observability APM code for these gauge metrics.
There was a problem hiding this comment.
I still don't see the RED metrics test fixtures updated from this discussion. The tests still use wrong metrics
There was a problem hiding this comment.
Fixed in this push. Added 4 new Data Prepper APM gauge metric test cases:
red_dp_request_gauge—sum(request{namespace="span_derived"}) by (service)red_dp_error_gauge—sum(error{namespace="span_derived"}) by (service)red_dp_fault_gauge—sum(fault{namespace="span_derived"}) by (service)red_dp_latency_histogram—histogram_quantile(0.95, latency_seconds_seconds_bucket{namespace="span_derived"})
Also added a full "Data Prepper APM Metrics" section to apm-red/SKILL.md documenting these gauges, their labels (service, operation, remoteService, remoteOperation), and the key difference from OTel SDK metrics — these are gauges, not counters, so no rate() wrapper. All 4 tests pass against the live stack.
| @@ -0,0 +1 @@ | |||
|
|
|||
There was a problem hiding this comment.
let's add a new test fixtures for service topology discovery. We can use the APM PPL queries for reference https://github.com/opensearch-project/dashboards-observability/blob/main/public/components/apm/query_services/query_requests/ppl_queries.ts
There was a problem hiding this comment.
Added fixtures/topology.yaml with 6 test cases: service listing via dedup nodeConnectionHash, operation discovery via operationConnectionHash, dependency counting, service attributes, index discovery via _cat/indices, and field mapping via _mapping API. All 6 passing.
|
Thanks Vamsi! Addressing in this push:
Keeping as-is (intentional):
Follow-up PR:
|
… Action - Use env vars ($OPENSEARCH_ENDPOINT, $OPENSEARCH_USER, $OPENSEARCH_PASSWORD, $PROMETHEUS_ENDPOINT) in all skill curl commands instead of hardcoded values - Add Connection Defaults and Base Command sections to reduce boilerplate - Add PPL live docs reference links to all skill files per ps48's suggestion - Add GH Action workflow to generate and attach skill ZIPs to releases - Update install command to observability@observability per vamsimanohar - Update docs ZIP reference to point to GitHub releases - Fix broken /docs/sdks/ link (removed upstream) - Update property tests to accept env var patterns Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New osd-config skill for OSD API integration: workspace discovery, index pattern resolution, APM correlation configs, saved objects, and workspace-scoped queries at :5601 - Add dynamic index/field discovery to stack-health and correlation skills via _mapping API and PPL describe commands - Add 6 topology test fixtures for service map discovery using APM PPL patterns (nodeConnectionHash, operationConnectionHash) - Update CLAUDE.md routing table with osd-config skill - All 394 tests passing Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu> Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
|
|
||
| ## Index Pattern Discovery | ||
|
|
||
| ### List All Index Patterns |
There was a problem hiding this comment.
Let's add dataset Discover here as well: https://docs.opensearch.org/latest/observing-your-data/exploring-observability-data/datasets/
These help to understand which indexes are defined as logs or traces by the users. More like an evolution of the index pattern
There was a problem hiding this comment.
Added a "Dataset Discovery" section with a link to the docs. The Dataset Discovery API isn't exposed as a REST endpoint in this OSD version — it's a UI feature in the query_enhancements plugin. For programmatic discovery, the skill now shows how to query index patterns with fields=title&fields=dataSourceType to identify signal types, and explains the schema mappings (otelLogs, trace time fields) that the init script applies.
| "$OSD_ENDPOINT/api/saved_objects/_find?type=observability-visualization&per_page=100" \ | ||
| -H 'osd-xsrf: true' | ||
| ``` | ||
|
|
||
| ### Workspace-Scoped APM Config | ||
|
|
||
| ```bash | ||
| curl -s -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \ | ||
| "$OSD_ENDPOINT/w/<WORKSPACE_ID>/api/saved_objects/_find?type=observability-visualization&per_page=100" \ | ||
| -H 'osd-xsrf: true' |
There was a problem hiding this comment.
This is incorrect, use the correlations type.
There was a problem hiding this comment.
Fixed — replaced all observability-visualization references with correlations. Also added documentation for the two correlation types created by the init script: trace-to-logs-* (links trace index to log index) and APM-Config-* (ties traces + service map + Prometheus together for the APM UI).
| - curl | ||
| --- | ||
|
|
||
| ## Connection Defaults |
There was a problem hiding this comment.
We need to re-validate everything here using the code in https://github.com/opensearch-project/observability-stack/blob/main/docker-compose/opensearch-dashboards/init/init-opensearch-dashboards.py
There was a problem hiding this comment.
Re-validated against the init script. Changes:
- Fixed
observability-visualization→correlationstype - Fixed
_associatepayload format (workspaceId+savedObjects, nottargetWorkspace+objects) - Added missing APIs:
data-source,data-connection,exploresaved object types,POST /api/directquery/dataconnectionsfor Prometheus creation,GET /api/opensearch-dashboards/settings, workspace association - Updated saved object types list to match what the init script actually creates
- Added 6 new test fixtures that all pass against the live stack
opensearch@observbaility not |
| @@ -0,0 +1,326 @@ | |||
| # Feature Guide & Sample Questions | |||
There was a problem hiding this comment.
apm=red, correlation, etc need not be treated as separate skills but rather there should be a single high level skill and other files should be placed under references. Example: https://github.com/opensearch-project/opensearch-launchpad/tree/main/skills/opensearch-launchpad
There was a problem hiding this comment.
Thanks for the suggestion and the opensearch-launchpad reference @arjunkumargiri. Looked at this carefully against the official best practices and the launchpad skill structure.
The best practices actually support both patterns — and specifically call out our use case as a reason to use the multi-file approach. From the official docs:
Pattern 2: Domain-specific organization — "For Skills with multiple domains, organize content by domain to avoid loading irrelevant context. When a user asks about sales metrics, Claude only needs to read sales-related schemas, not finance or marketing data. This keeps token usage low and context focused."
The BigQuery example in the docs uses exactly this pattern — one skill with reference/finance.md, reference/sales.md, reference/product.md, reference/marketing.md — which is structurally equivalent to what we have, just organized as separate skills instead of reference files under one skill.
Where the launchpad pattern works well: opensearch-launchpad has a single linear workflow (provision → collect sample → gather preferences → plan → execute → evaluate → deploy). Users move through phases sequentially. One SKILL.md with reference files for each phase makes sense because the entry point is always the same.
Where observability differs: Users don't start at a common entry point. A dev debugging a slow endpoint goes straight to traces. An SRE checking error budgets goes to SLO/SLI. A platform engineer troubleshooting ingestion goes to stack-health. These are parallel, independent entry points — not phases of one workflow.
Practical concern with consolidation: Our 9 SKILL.md files total 4,624 lines. The best practices recommend keeping SKILL.md under 500 lines. A single consolidated SKILL.md would need to be a routing table that loads reference files on demand — which is functionally what CLAUDE.md + 9 skills already does, just with an extra LLM hop (load single SKILL.md → decide which reference → read it → execute) vs the current path (CLAUDE.md metadata routes directly to the right skill).
That said — we haven't benchmarked this. The official docs emphasize "start with evaluation" and "iterate based on observation, not assumptions." Happy to do a perf eval comparing both approaches (single-skill-with-references vs multi-skill) as a follow-up PR, measuring trigger accuracy and token efficiency across representative queries.
- the only skill i am ok being larger is PPL for full context on scope of grammar.
| To discover datasets programmatically, query the index patterns and check their `dataSourceType` or use the saved objects API to find index patterns with observability schema mappings: | ||
|
|
||
| ```bash | ||
| curl -s -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \ |
There was a problem hiding this comment.
The Dataset API call is incorrect let's fix this to:
/api/saved_objects/_find?fields=title&fields=type&fields=displayName&fields=signalType&fields=description&per_page=10000&type=index-pattern
881fab6 to
d87eb1d
Compare
…real cluster Validated all skill queries against a running observability stack and fixed hallucinated field names, incorrect PPL syntax, wrong Prometheus metric names, and missing caveats. All 75 integration tests pass. Skills fixes: - traces: add startTime to 14 fields clauses, fix attributes.error.type → error_type, add join/stacktrace caveats - ppl-reference: fix 16 commands (timechart, replace, rex, ad, kmeans, ml, corr, graphlookup, lookup, spath), add resource limit caveats for streamstats/eventstats/grok/dedup/parse/fillnull/appendcol/nomv - correlation: fix gen_ai_client_operation_duration → _seconds suffix, fix gen_ai_agent_name label references - apm-red: fix PPL is not null → isnotnull(), add metric discovery section, add status code label caveat - slo-sli: add status code label caveat, add latency threshold note for ms vs s metrics - metrics: add metric discovery section, fix stray markdown, add status code label caveat - stack-health: fix otelcol counter metrics to use _total suffix - osd-config: fix saved objects _find API to require type parameter Test infrastructure: - Add expected_min_results field to TestFixture model - Add min result count assertions for list, PPL datarows, and Prometheus data.result responses - Fix test_runner to handle list-type JSON responses before dict checks - Add osd_config and osd_dashboards markers - Add GenAI metric and milliseconds metric test fixtures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
12 test fixtures covering OpenSearch-direct APIs (index discovery, trace/log/service-map mappings, PPL describe) and OSD Dashboards APIs (workspace list, saved objects, index patterns, queries, dashboards, visualizations). All 75 tests pass. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
…t discovery, install name - Fix install name to opensearch@observability (vamsimanohar) - Add Data Prepper APM gauge metrics (request, error, fault, latency_seconds) to apm-red SKILL.md with 10 query templates + 4 new test fixtures (ps48) - Fix observability-visualization → correlations type in osd-config (ps48) - Add Dataset Discovery section with docs link (ps48) - Re-validate osd-config against init script: add workspace association, dashboards settings, data-source/data-connection/explore/correlations saved object queries, directquery dataconnections API (ps48) - Add 6 new osd-config test fixtures, all passing against live stack Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
Signed-off-by: ps48 <pshenoy36@gmail.com>
d87eb1d to
524898c
Compare
|
Thanks for the commit @ps48 |
#119
Summary
Adds a Claude Code plugin that teaches Claude how to query and investigate traces, logs, and metrics from the observability stack using PPL and PromQL. The plugin follows the
Agent Skills specification and works across Claude Code CLI, VS Code extension, and Claude Desktop.
Plugin (
claude-code-observability-plugin/)8 skill files with ready-to-execute curl commands:
coalesce()match,match_phrase,like)traceId INlookups, exemplar queries, resource-level correlationclamp_min()division,topk()/bottomk(), availability, spanmetrics connectorKey technical details:
logs-otel-v1-*(nototel-v1-apm-log-*)resource.attributes.service.name(backtick-quoted in PPL)otel-v1-apm-span-*with top-levelserviceNameotel-v2-apm-service-map-*withsourceNode/targetNodestructure-kflagTests (
claude-code-observability-plugin/tests/)recording rule validity
fields
Documentation
Plugin docs (
claude-code-observability-plugin/docs/):INSTALL.md— Prerequisites, setup, configuration, AWS support, troubleshootingUSAGE.md— 50+ sample questions across all 8 skills with "What Claude Does" explanationsStarlight docs (
docs/starlight-docs/src/content/docs/claude-code/):index.md— Installation (CLI, VS Code, Desktop), configuration, index patterns, troubleshootingusage.md— Usage guide with sample questions per skillshowcase.md— 9 real-world investigation scenarios: AI agent cost analysis, e-commerce checkout incident investigation, multi-agent orchestration debugging, servicedependency discovery, error budget monitoring, log pattern discovery, cross-service latency investigation, tool execution analysis, full RED dashboards
Test plan
pytest -v— 381 property + 72 integration)npm run buildindocs/starlight-docs/)