Skip to content

Add Claude Code observability plugin with skills, tests, and docs#120

Merged
anirudha merged 12 commits intoopensearch-project:mainfrom
anirudha:claude-code-observability-plugin
Mar 22, 2026
Merged

Add Claude Code observability plugin with skills, tests, and docs#120
anirudha merged 12 commits intoopensearch-project:mainfrom
anirudha:claude-code-observability-plugin

Conversation

@anirudha
Copy link
Copy Markdown
Collaborator

@anirudha anirudha commented Mar 21, 2026

#119

Summary

Adds a Claude Code plugin that teaches Claude how to query and investigate traces, logs, and metrics from the observability stack using PPL and PromQL. The plugin follows the
Agent Skills specification and works across Claude Code CLI, VS Code extension, and Claude Desktop.

Plugin (claude-code-observability-plugin/)

8 skill files with ready-to-execute curl commands:

Skill Description
traces Agent invocations, tool executions, slow spans, errors, token usage, service maps, remote service identification via coalesce()
logs Severity filtering, trace correlation, error patterns, log volume, full-text search (match, match_phrase, like)
metrics HTTP request rates, latency percentiles (p50/p95/p99), error rates, active connections, GenAI token metrics
correlation Cross-signal trace↔log↔metric correlation, batch traceId IN lookups, exemplar queries, resource-level correlation
apm-red RED methodology (Rate/Errors/Duration) with safe clamp_min() division, topk()/bottomk(), availability, spanmetrics connector
slo-sli SLI definitions, Prometheus recording rules, error budget calculations, multi-window burn rate alerting
stack-health Component health checks, data ingestion verification, troubleshooting guide
ppl-reference 50+ PPL commands with syntax, observability examples, and function reference

Key technical details:

  • Log index pattern: logs-otel-v1-* (not otel-v1-apm-log-*)
  • Log service name field: resource.attributes.service.name (backtick-quoted in PPL)
  • Trace span index: otel-v1-apm-span-* with top-level serviceName
  • Service map index: otel-v2-apm-service-map-* with sourceNode/targetNode structure
  • All OpenSearch queries use HTTPS + basic auth + -k flag
  • All Prometheus queries use HTTP (no auth)
  • AWS SigV4 variants included for managed services

Tests (claude-code-observability-plugin/tests/)

  • 453 tests total (381 property-based + 72 integration), all passing
  • Property tests (no stack needed): validate frontmatter, curl auth/protocol, PPL/PromQL syntax, field lookup correctness, config parsing, RED query completeness, SLO
    recording rule validity
  • Integration tests (requires running stack): YAML-driven fixtures that execute real curl commands against OpenSearch and Prometheus, validating expected JSON response
    fields
  • Python 3.9+ compatible
  • Auto-skips gracefully when stack is not running

Documentation

Plugin docs (claude-code-observability-plugin/docs/):

  • INSTALL.md — Prerequisites, setup, configuration, AWS support, troubleshooting
  • USAGE.md — 50+ sample questions across all 8 skills with "What Claude Does" explanations

Starlight docs (docs/starlight-docs/src/content/docs/claude-code/):

  • index.md — Installation (CLI, VS Code, Desktop), configuration, index patterns, troubleshooting
  • usage.md — Usage guide with sample questions per skill
  • showcase.md — 9 real-world investigation scenarios: AI agent cost analysis, e-commerce checkout incident investigation, multi-agent orchestration debugging, service
    dependency discovery, error budget monitoring, log pattern discovery, cross-service latency investigation, tool execution analysis, full RED dashboards

Test plan

  • All 453 tests pass (pytest -v — 381 property + 72 integration)
  • Docs site builds with no broken links (npm run build in docs/starlight-docs/)
  • All 8 skills tested against live stack with real data from 23+ services
  • PPL queries validated against OpenSearch 3.6.0
  • PromQL queries validated against Prometheus v3.8.1

- 8 skill files (traces, logs, metrics, stack-health, ppl-reference, correlation, apm-red, slo-sli)
- Plugin manifest (.claude-plugin/plugin.json)
- Marketplace manifest (.claude-plugin/marketplace.json) at repo root
- CLAUDE.md routing table and configuration
- Property-based test suite (381 tests)
- Docs page with installation instructions for Claude Code and Claude Desktop
- Sidebar entry in astro.config.mjs

Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
… docs

- Fix log index pattern from otel-v1-apm-log-* to logs-otel-v1-*
- Fix log serviceName field to resource.attributes.service.name
- Fix Python 3.9 compatibility (type union syntax)
- Fix PPL explain expected field (calcite not root) for OpenSearch 3.6
- Fix conftest to use localhost instead of Docker-internal hostname
- Add 16 new integration tests (53 total) against real stack data
- Add advanced PPL patterns: coalesce() for remote service identification,
  service map topology queries, batch traceId IN correlation
- Add advanced PromQL patterns: clamp_min() safe division, topk/bottomk,
  service availability calculations
- Add docs: INSTALL.md, USAGE.md with 50+ sample questions
- Add Starlight docs: usage guide and showcase with 9 real-world scenarios
- Update CLAUDE.md with index patterns reference table
- All 453 tests passing (381 property + 72 integration)

Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 21, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 18.51%. Comparing base (f46780d) to head (524898c).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #120   +/-   ##
=======================================
  Coverage   18.51%   18.51%           
=======================================
  Files           3        3           
  Lines          54       54           
  Branches       18       19    +1     
=======================================
  Hits           10       10           
  Misses         44       44           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@anirudha anirudha force-pushed the claude-code-observability-plugin branch from 3223fb7 to 1d41abf Compare March 21, 2026 04:16
@anirudha anirudha marked this pull request as ready for review March 21, 2026 04:20
anirudha and others added 3 commits March 21, 2026 00:51
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep Claude Code sidebar section, accept upstream removal of SDKs & API.

Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@vamsimanohar
Copy link
Copy Markdown
Member

Nice Ani.
https://platform.claude.com/docs/en/agents-and-tools/agent-skills/best-practices
can you add this guide to claude and ask it to modify accordingly.

Few suggestions I got.
  1. Split large skills — especially ppl-reference (1,247  lines) — into SKILL.md + reference files
  2. Reduce boilerplate — define curl patterns once, show  only the varying query per section
  3. Use env vars in curl commands instead of hardcoded  localhost + credentials
  4. Collapse repetitive sections (e.g., 8 GenAI operation  types → one template + table)
  5. Remove allowed-tools from frontmatter (non-standard  field)

-d '{"query": "source=logs-otel-v1-* | where traceId = '\''<TRACE_ID>'\'' AND spanId = '\''<SPAN_ID>'\'' AND severityText = '\''ERROR'\'' | fields body, severityText, `@timestamp`"}'
```

## PPL Commands for Log Analysis
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also add links to PPL live docs: https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/index.md, so Claude can fetch these live docs in-case some queries fail due to OpenSearch version differences or new syntax update.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea — adding PPL live docs link (https://github.com/opensearch-project/sql/blob/main/docs/user/ppl/index.md) to logs, traces, ppl-reference, correlation, and apm-red skills so Claude can fetch latest syntax when queries fail.

"name": "observability-stack",
"owner": {
"name": "OpenSearch Project",
"email": "anirudha@nyu.edu"
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we change this to some OS email alias?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping personal email for now until we finalize a proper OS alias. Will update once decided.

@@ -0,0 +1,278 @@
---
name: stack-health
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing: OpenSearch Dashboards (OSD) API Integration

The plugin currently queries OpenSearch (:9200) and Prometheus (:9090) directly but never talks to OpenSearch Dashboards (:5601). This means it misses the configuration and correlation layer that OSD provides — especially workspace-scoped configs that define how signals relate to each other per environment/team.

1. No OSD API integration — only raw OpenSearch/Prometheus curls

The plugin hardcodes index patterns and field names instead of discovering them from OSD. It misses:

  • Datasets / Index pattern / schema mappings / Correlations / APM Config — OSD's saved objects API (GET /api/saved_objects/_find?type=index-pattern) lets you discover what index patterns exist and their field mappings.

2. APM config correlation object is absent

The OSD observability plugin has data source / correlation configuration objects that define:

  • Which trace index connects to which log index
  • Service-to-index mappings
  • Correlation rules (e.g., link traceId from span index to log index)

The plugin hardcodes index patterns (otel-v1-apm-span-*, logs-otel-v1-*) instead of reading them from OSD's config. Use the APM correlations saved object to create these interlinkings for easy root-cause analysis

3. Topology via APM service map index

The skills should add capabilities query otel-v2-apm-service-map-* directly via PPL to see service connections dependencies, and operations.

4. Workspace-scoped configuration is completely absent

This is a significant gap. These configs are per-workspace in OSD. The plugin:

  • Doesn't reference workspaces at all
  • Doesn't show how to query workspace-scoped saved objects
  • Doesn't account for the workspace_id parameter in OSD API calls
  • Doesn't explain how different workspaces might have different index pattern configurations, correlation rules, or APM configs

For workspace-aware queries, the plugin would need something like:

# List workspaces
curl -s http://localhost:5601/api/workspaces/_list

# Get workspace-scoped index patterns
curl -s http://localhost:5601/w/<workspace_id>/api/saved_objects/_find?type=index-pattern

# Get workspace-scoped observability configs
curl -s http://localhost:5601/w/<workspace_id>/api/observability/...

5. No dynamic index/field discovery

The plugin hardcodes all index patterns and field names. It should teach Claude how to discover them dynamically:

# Fetch log index mappings
curl -sk -u admin:'...' https://localhost:9200/logs-otel-v1-*/_mapping

# Fetch trace index mappings
curl -sk -u admin:'...' https://localhost:9200/otel-v1-apm-span-*/_mapping

The PPL describe command is already in stack-health/SKILL.md but should also be in correlation/SKILL.md where it's most needed for field discovery during cross-signal investigations.

Summary of Recommended Additions

Gap Where to Add Priority
OSD Dashboards API integration (:5601) New skill or extend stack-health High
APM correlation config objects from OSD correlation/SKILL.md High
Workspace-scoped queries (/w/<workspace_id>/api/...) All skills or new osd-config skill High
Service topology via PPL traces/SKILL.md or correlation/SKILL.md Medium
Dynamic index/field discovery from OSD saved objects correlation/SKILL.md and stack-health/SKILL.md Medium
Which log/trace indexes map to which services (from OSD config) correlation/SKILL.md Medium

Bottom line: The plugin is solid for direct OpenSearch + Prometheus querying, but it treats them as standalone backends. The missing layer is OSD as the configuration and correlation authority especially with workspace-scoped configs that define how signals relate to each other per environment/team. Consider adding an osd-config skill or extending existing skills to query OSD APIs with workspace awareness.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great feedback. For this PR we'll keep the scope to direct OpenSearch + Prometheus querying. The OSD API integration (workspace-scoped configs, APM correlation objects, dynamic index discovery from saved objects) is a solid next step — will track as a follow-up. Adding describe to correlation skill and improving service map PPL queries in this PR though.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this should be part of P0, without this actual RCAs are difficult, you wouldn't know which indexes are for logs and which are for traces, you can only stick with defaults with this knowledge base.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — added both in this PR:

  1. New osd-config skill (skills/osd-config/SKILL.md) — OSD API integration for workspace discovery, index pattern resolution, APM correlation configs, saved objects, and workspace-scoped queries at :5601

  2. Dynamic index/field discovery via OpenSearch API — added _mapping and PPL describe commands to stack-health and correlation skills so Claude can discover indices and fields dynamically without OSD

This way Claude can resolve indices/fields dynamically whether OSD is available or not.

@@ -0,0 +1,77 @@
- name: red_rate_promql
Copy link
Copy Markdown
Member

@ps48 ps48 Mar 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These do not cover actual RED metrics generated by the data prepper apm service map processor. We need to check error, fault, request and latency_seconds_bucket in Prometheus. These are gauge metrics so even the rate queries won't work here.

Let's add some test cases like suggest get p99 latency for frontend service for past 1 hour. Use some queries from the APM code probably: https://github.com/opensearch-project/dashboards-observability/blob/main/public/components/apm/query_services/query_requests/promql_queries.ts

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch — the current tests use http_server_duration_seconds (OTel SDK histogram metrics) and miss the Data Prepper APM-generated gauge metrics (request, error, fault, latency_seconds_bucket). Will add test cases using the PromQL patterns from the dashboards-observability APM code for these gauge metrics.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't see the RED metrics test fixtures updated from this discussion. The tests still use wrong metrics

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in this push. Added 4 new Data Prepper APM gauge metric test cases:

  • red_dp_request_gaugesum(request{namespace="span_derived"}) by (service)
  • red_dp_error_gaugesum(error{namespace="span_derived"}) by (service)
  • red_dp_fault_gaugesum(fault{namespace="span_derived"}) by (service)
  • red_dp_latency_histogramhistogram_quantile(0.95, latency_seconds_seconds_bucket{namespace="span_derived"})

Also added a full "Data Prepper APM Metrics" section to apm-red/SKILL.md documenting these gauges, their labels (service, operation, remoteService, remoteOperation), and the key difference from OTel SDK metrics — these are gauges, not counters, so no rate() wrapper. All 4 tests pass against the live stack.

@@ -0,0 +1 @@

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's add a new test fixtures for service topology discovery. We can use the APM PPL queries for reference https://github.com/opensearch-project/dashboards-observability/blob/main/public/components/apm/query_services/query_requests/ppl_queries.ts

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added fixtures/topology.yaml with 6 test cases: service listing via dedup nodeConnectionHash, operation discovery via operationConnectionHash, dependency counting, service attributes, index discovery via _cat/indices, and field mapping via _mapping API. All 6 passing.

@anirudha
Copy link
Copy Markdown
Collaborator Author

Thanks Vamsi! Addressing in this push:

  • 5b Reduce curl boilerplate — defining base pattern once, showing only the query per section
  • 5c Use env vars — replacing hardcoded localhost+creds with $OPENSEARCH_ENDPOINT, $OPENSEARCH_USER, $OPENSEARCH_PASSWORD, $PROMETHEUS_ENDPOINT
  • ✅ Install command updated to observability@observability
  • ✅ PPL live docs link added per ps48's suggestion

Keeping as-is (intentional):

  • 5a ppl-reference stays as single file — splitting skills hurts full-context LLM usage
  • 5d GenAI operation repetition kept — LLMs perform better with explicit examples
  • 5e allowed-tools kept in frontmatter — prevents Claude from switching to DSL/code without warranting it

Follow-up PR:

  • OSD API integration (workspace-scoped configs, APM correlation objects, dynamic index discovery) per ps48's feedback
  • Data Prepper APM gauge metrics (request, error, fault, latency_seconds_bucket) per ps48

anirudha and others added 2 commits March 21, 2026 12:05
… Action

- Use env vars ($OPENSEARCH_ENDPOINT, $OPENSEARCH_USER, $OPENSEARCH_PASSWORD,
  $PROMETHEUS_ENDPOINT) in all skill curl commands instead of hardcoded values
- Add Connection Defaults and Base Command sections to reduce boilerplate
- Add PPL live docs reference links to all skill files per ps48's suggestion
- Add GH Action workflow to generate and attach skill ZIPs to releases
- Update install command to observability@observability per vamsimanohar
- Update docs ZIP reference to point to GitHub releases
- Fix broken /docs/sdks/ link (removed upstream)
- Update property tests to accept env var patterns

Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New osd-config skill for OSD API integration: workspace discovery,
  index pattern resolution, APM correlation configs, saved objects,
  and workspace-scoped queries at :5601
- Add dynamic index/field discovery to stack-health and correlation
  skills via _mapping API and PPL describe commands
- Add 6 topology test fixtures for service map discovery using
  APM PPL patterns (nodeConnectionHash, operationConnectionHash)
- Update CLAUDE.md routing table with osd-config skill
- All 394 tests passing

Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@anirudha
Copy link
Copy Markdown
Collaborator Author

  • New skill: osd-config — OSD workspace, index pattern, saved object, and APM config APIs
  • Dynamic discovery: _mapping API + PPL describe in stack-health and correlation skills
  • 6 topology tests: service listing, operations, dependencies, attributes, index discovery, field mappings
  • 394 tests passing

@anirudha anirudha self-assigned this Mar 21, 2026

## Index Pattern Discovery

### List All Index Patterns
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add dataset Discover here as well: https://docs.opensearch.org/latest/observing-your-data/exploring-observability-data/datasets/

These help to understand which indexes are defined as logs or traces by the users. More like an evolution of the index pattern

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a "Dataset Discovery" section with a link to the docs. The Dataset Discovery API isn't exposed as a REST endpoint in this OSD version — it's a UI feature in the query_enhancements plugin. For programmatic discovery, the skill now shows how to query index patterns with fields=title&fields=dataSourceType to identify signal types, and explains the schema mappings (otelLogs, trace time fields) that the init script applies.

Comment on lines +71 to +80
"$OSD_ENDPOINT/api/saved_objects/_find?type=observability-visualization&per_page=100" \
-H 'osd-xsrf: true'
```

### Workspace-Scoped APM Config

```bash
curl -s -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
"$OSD_ENDPOINT/w/<WORKSPACE_ID>/api/saved_objects/_find?type=observability-visualization&per_page=100" \
-H 'osd-xsrf: true'
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect, use the correlations type.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — replaced all observability-visualization references with correlations. Also added documentation for the two correlation types created by the init script: trace-to-logs-* (links trace index to log index) and APM-Config-* (ties traces + service map + Prometheus together for the APM UI).

- curl
---

## Connection Defaults
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-validated against the init script. Changes:

  • Fixed observability-visualizationcorrelations type
  • Fixed _associate payload format (workspaceId + savedObjects, not targetWorkspace + objects)
  • Added missing APIs: data-source, data-connection, explore saved object types, POST /api/directquery/dataconnections for Prometheus creation, GET /api/opensearch-dashboards/settings, workspace association
  • Updated saved object types list to match what the init script actually creates
  • Added 6 new test fixtures that all pass against the live stack

@anirudha anirudha requested a review from ps48 March 21, 2026 20:21
@vamsimanohar
Copy link
Copy Markdown
Member

  • ✅ Install command updated to observability@observability

opensearch@observbaility not observability@observability

@@ -0,0 +1,326 @@
# Feature Guide & Sample Questions
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apm=red, correlation, etc need not be treated as separate skills but rather there should be a single high level skill and other files should be placed under references. Example: https://github.com/opensearch-project/opensearch-launchpad/tree/main/skills/opensearch-launchpad

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion and the opensearch-launchpad reference @arjunkumargiri. Looked at this carefully against the official best practices and the launchpad skill structure.

The best practices actually support both patterns — and specifically call out our use case as a reason to use the multi-file approach. From the official docs:

Pattern 2: Domain-specific organization — "For Skills with multiple domains, organize content by domain to avoid loading irrelevant context. When a user asks about sales metrics, Claude only needs to read sales-related schemas, not finance or marketing data. This keeps token usage low and context focused."

The BigQuery example in the docs uses exactly this pattern — one skill with reference/finance.md, reference/sales.md, reference/product.md, reference/marketing.md — which is structurally equivalent to what we have, just organized as separate skills instead of reference files under one skill.

Where the launchpad pattern works well: opensearch-launchpad has a single linear workflow (provision → collect sample → gather preferences → plan → execute → evaluate → deploy). Users move through phases sequentially. One SKILL.md with reference files for each phase makes sense because the entry point is always the same.

Where observability differs: Users don't start at a common entry point. A dev debugging a slow endpoint goes straight to traces. An SRE checking error budgets goes to SLO/SLI. A platform engineer troubleshooting ingestion goes to stack-health. These are parallel, independent entry points — not phases of one workflow.

Practical concern with consolidation: Our 9 SKILL.md files total 4,624 lines. The best practices recommend keeping SKILL.md under 500 lines. A single consolidated SKILL.md would need to be a routing table that loads reference files on demand — which is functionally what CLAUDE.md + 9 skills already does, just with an extra LLM hop (load single SKILL.md → decide which reference → read it → execute) vs the current path (CLAUDE.md metadata routes directly to the right skill).

That said — we haven't benchmarked this. The official docs emphasize "start with evaluation" and "iterate based on observation, not assumptions." Happy to do a perf eval comparing both approaches (single-skill-with-references vs multi-skill) as a follow-up PR, measuring trigger accuracy and token efficiency across representative queries.

  • the only skill i am ok being larger is PPL for full context on scope of grammar.

To discover datasets programmatically, query the index patterns and check their `dataSourceType` or use the saved objects API to find index patterns with observability schema mappings:

```bash
curl -s -u "$OPENSEARCH_USER:$OPENSEARCH_PASSWORD" \
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Dataset API call is incorrect let's fix this to:

/api/saved_objects/_find?fields=title&fields=type&fields=displayName&fields=signalType&fields=description&per_page=10000&type=index-pattern

@ps48 ps48 force-pushed the claude-code-observability-plugin branch from 881fab6 to d87eb1d Compare March 22, 2026 20:02
anirudha and others added 4 commits March 22, 2026 13:08
…real cluster

Validated all skill queries against a running observability stack and fixed
hallucinated field names, incorrect PPL syntax, wrong Prometheus metric names,
and missing caveats. All 75 integration tests pass.

Skills fixes:
- traces: add startTime to 14 fields clauses, fix attributes.error.type → error_type, add join/stacktrace caveats
- ppl-reference: fix 16 commands (timechart, replace, rex, ad, kmeans, ml, corr, graphlookup, lookup, spath), add resource limit caveats for streamstats/eventstats/grok/dedup/parse/fillnull/appendcol/nomv
- correlation: fix gen_ai_client_operation_duration → _seconds suffix, fix gen_ai_agent_name label references
- apm-red: fix PPL is not null → isnotnull(), add metric discovery section, add status code label caveat
- slo-sli: add status code label caveat, add latency threshold note for ms vs s metrics
- metrics: add metric discovery section, fix stray markdown, add status code label caveat
- stack-health: fix otelcol counter metrics to use _total suffix
- osd-config: fix saved objects _find API to require type parameter

Test infrastructure:
- Add expected_min_results field to TestFixture model
- Add min result count assertions for list, PPL datarows, and Prometheus data.result responses
- Fix test_runner to handle list-type JSON responses before dict checks
- Add osd_config and osd_dashboards markers
- Add GenAI metric and milliseconds metric test fixtures

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
12 test fixtures covering OpenSearch-direct APIs (index discovery,
trace/log/service-map mappings, PPL describe) and OSD Dashboards APIs
(workspace list, saved objects, index patterns, queries, dashboards,
visualizations). All 75 tests pass.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
…t discovery, install name

- Fix install name to opensearch@observability (vamsimanohar)
- Add Data Prepper APM gauge metrics (request, error, fault, latency_seconds)
  to apm-red SKILL.md with 10 query templates + 4 new test fixtures (ps48)
- Fix observability-visualization → correlations type in osd-config (ps48)
- Add Dataset Discovery section with docs link (ps48)
- Re-validate osd-config against init script: add workspace association,
  dashboards settings, data-source/data-connection/explore/correlations
  saved object queries, directquery dataconnections API (ps48)
- Add 6 new osd-config test fixtures, all passing against live stack

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Anirudha Jadhav <anirudha@nyu.edu>
Signed-off-by: ps48 <pshenoy36@gmail.com>
@ps48 ps48 force-pushed the claude-code-observability-plugin branch from d87eb1d to 524898c Compare March 22, 2026 20:08
@anirudha
Copy link
Copy Markdown
Collaborator Author

Thanks for the commit @ps48

@anirudha anirudha merged commit a986ab8 into opensearch-project:main Mar 22, 2026
9 of 10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants