Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Observability Stack is an open-source stack designed for modern distributed syst
- **OpenSearch**: Stores and indexes logs and traces for search and analysis
- **Prometheus**: Stores time-series metrics data
- **OpenSearch Dashboards**: Provides web-based visualization and exploration
- **PPL (Piped Processing Language)**: Native query language for logs and traces — pipe-based, human-readable, 50+ commands

## See it in action

Expand Down Expand Up @@ -441,8 +442,22 @@ The current configuration includes a custom OpenSearch Dockerfile (`docker-compo

Track progress: [OpenSearch 3.5.0 Release](https://github.com/opensearch-project/OpenSearch/releases)

## Query Language: PPL

The Observability Stack uses **Piped Processing Language (PPL)** as its native query language for logs and traces. PPL is a pipe-based language designed for the way operators actually investigate data:

```
source = logs-otel-v1*
| where severityNumber >= 17
| stats count() as errors by `resource.attributes.service.name`
| sort - errors
```

PPL provides 50+ commands and 200+ functions covering search, aggregation, pattern discovery, machine learning, joins, and more. See the [PPL documentation](https://observability.opensearch.org/docs/ppl/) for the full reference with live playground examples.

## Documentation

- [PPL Language Reference](https://observability.opensearch.org/docs/ppl/) - Query language documentation with live examples
- [AGENTS.md](AGENTS.md) - AI-optimized repository documentation
- [CONTRIBUTING.md](CONTRIBUTING.md) - Development workflow and contribution guidelines
- [examples/](examples/) - Language-specific instrumentation examples
Expand Down
125 changes: 105 additions & 20 deletions docs/starlight-docs/astro.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -63,24 +63,6 @@ export default defineConfig({
},
],
},
{
label: 'Agent Observability',
collapsed: true,
items: [
{ label: 'Overview', link: '/ai-observability/' },
{ label: 'Getting Started', link: '/ai-observability/getting-started/' },
{ label: 'Framework Integrations', link: '/send-data/ai-agents/integrations/' },
{ label: 'Agent Tracing', link: '/ai-observability/agent-tracing/' },
{ label: 'Agent Graph & Path', link: '/ai-observability/agent-tracing/graph/' },
{ label: 'Evaluation & Scoring', link: '/ai-observability/evaluation/' },
{ label: 'Evaluation Integrations', link: '/ai-observability/evaluation-integrations/' },
],
},
{
label: 'Agent Health',
collapsed: true,
autogenerate: { directory: 'agent-health' },
},
{
label: 'Send Data',
collapsed: true,
Expand All @@ -104,11 +86,109 @@ export default defineConfig({
},
],
},
{
label: 'PPL - Query Language',
collapsed: true,
items: [
{ label: 'Overview', link: '/ppl/' },
{ label: 'Command Reference', link: '/ppl/commands/' },
{
label: 'Search & Filter',
collapsed: true,
items: [
{ label: 'search', link: '/ppl/commands/search/' },
{ label: 'where', link: '/ppl/commands/where/' },
],
},
{
label: 'Fields & Transformation',
collapsed: true,
items: [
{ label: 'fields', link: '/ppl/commands/fields/' },
{ label: 'eval', link: '/ppl/commands/eval/' },
{ label: 'rename', link: '/ppl/commands/rename/' },
{ label: 'fillnull', link: '/ppl/commands/fillnull/' },
{ label: 'expand', link: '/ppl/commands/expand/' },
{ label: 'flatten', link: '/ppl/commands/flatten/' },
],
},
{
label: 'Aggregation & Statistics',
collapsed: true,
items: [
{ label: 'stats', link: '/ppl/commands/stats/' },
{ label: 'eventstats', link: '/ppl/commands/eventstats/' },
{ label: 'streamstats', link: '/ppl/commands/streamstats/' },
{ label: 'timechart', link: '/ppl/commands/timechart/' },
{ label: 'trendline', link: '/ppl/commands/trendline/' },
],
},
{
label: 'Sorting & Limiting',
collapsed: true,
items: [
{ label: 'sort', link: '/ppl/commands/sort/' },
{ label: 'head', link: '/ppl/commands/head/' },
{ label: 'dedup', link: '/ppl/commands/dedup/' },
{ label: 'top', link: '/ppl/commands/top/' },
{ label: 'rare', link: '/ppl/commands/rare/' },
],
},
{
label: 'Text Extraction',
collapsed: true,
items: [
{ label: 'parse', link: '/ppl/commands/parse/' },
{ label: 'grok', link: '/ppl/commands/grok/' },
{ label: 'rex', link: '/ppl/commands/rex/' },
{ label: 'patterns', link: '/ppl/commands/patterns/' },
{ label: 'spath', link: '/ppl/commands/spath/' },
],
},
{
label: 'Data Combination',
collapsed: true,
items: [
{ label: 'join', link: '/ppl/commands/join/' },
{ label: 'lookup', link: '/ppl/commands/lookup/' },
],
},
{
label: 'Machine Learning',
collapsed: true,
items: [
{ label: 'ml', link: '/ppl/commands/ml/' },
],
},
{
label: 'Metadata',
collapsed: true,
items: [
{ label: 'describe', link: '/ppl/commands/describe/' },
],
},
{ label: 'Function Reference', link: '/ppl/functions/' },
{ label: 'Observability Examples', link: '/ppl/examples/' },
],
},
{
label: 'Discover',
collapsed: true,
autogenerate: { directory: 'investigate' },
},
{
label: 'Agent Observability',
collapsed: true,
items: [
{ label: 'Overview', link: '/ai-observability/' },
{ label: 'Getting Started', link: '/ai-observability/getting-started/' },
{ label: 'Framework Integrations', link: '/send-data/ai-agents/integrations/' },
{ label: 'Agent Tracing', link: '/ai-observability/agent-tracing/' },
{ label: 'Agent Graph & Path', link: '/ai-observability/agent-tracing/graph/' },
{ label: 'Evaluation & Scoring', link: '/ai-observability/evaluation/' },
{ label: 'Evaluation Integrations', link: '/ai-observability/evaluation-integrations/' },
],
},
{
label: 'Application Monitoring',
collapsed: true,
Expand All @@ -120,7 +200,7 @@ export default defineConfig({
autogenerate: { directory: 'dashboards' },
},
{
label: 'Alerting & Detection',
label: 'Alerting',
collapsed: true,
items: [
{ label: 'Alerting', link: '/alerting/' },
Expand All @@ -129,7 +209,12 @@ export default defineConfig({
],
},
{
label: 'Reference',
label: 'Agent Health',
collapsed: true,
autogenerate: { directory: 'agent-health' },
},
{
label: 'SDKs, MCP & Clients',
collapsed: true,
items: [
{ label: 'Python SDK', link: '/send-data/ai-agents/python/' },
Expand Down
14 changes: 7 additions & 7 deletions docs/starlight-docs/src/content/docs/agent-health/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ agent-health [serve] [options]
|--------|-------------|---------|
| `-p, --port <n>` | Server port | `4001` |
| `-e, --env-file <path>` | Load env file | `.env` |
| `--no-browser` | Skip auto-open browser | |
| `--no-browser` | Skip auto-open browser | - |

```bash
agent-health --port 8080 --env-file prod.env
Expand Down Expand Up @@ -94,14 +94,14 @@ agent-health benchmark [options]

| Option | Description | Default |
|--------|-------------|---------|
| `-n, --name <name>` | Benchmark name or ID | |
| `-f, --file <path>` | JSON file of test cases to import and benchmark | |
| `-n, --name <name>` | Benchmark name or ID | - |
| `-f, --file <path>` | JSON file of test cases to import and benchmark | - |
| `-a, --agent <key>` | Agent key (repeatable) | First enabled agent |
| `-m, --model <id>` | Model override | Agent default |
| `-o, --output <fmt>` | Output: `table`, `json` | `table` |
| `--export <path>` | Export results to file | |
| `--export <path>` | Export results to file | - |
| `--format <type>` | Report format for `--export`: `json`, `html`, `pdf` | `json` |
| `-v, --verbose` | Show per-test-case results and errors | |
| `-v, --verbose` | Show per-test-case results and errors | - |
| `--stop-server` | Stop the server after benchmark completes | Keep running |

**Modes:**
Expand Down Expand Up @@ -149,11 +149,11 @@ agent-health report -b <benchmark> [options]

| Option | Description | Default |
|--------|-------------|---------|
| `-b, --benchmark <id>` | Benchmark name or ID **(required)** | |
| `-b, --benchmark <id>` | Benchmark name or ID **(required)** | - |
| `-r, --runs <ids>` | Comma-separated run IDs | All runs |
| `-f, --format <type>` | Report format: `json`, `html`, `pdf` | `html` |
| `-o, --output <file>` | Output file path | Auto-generated |
| `--stdout` | Write to stdout (JSON format only) | |
| `--stdout` | Write to stdout (JSON format only) | - |

```bash
agent-health report -b "Baseline" # HTML report (all runs)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,9 @@ Settings are loaded in this order (later overrides earlier):
|
2. Environment variables (.env file)
|
3. JSON config file (agent-health.config.json) auto-created
3. JSON config file (agent-health.config.json) - auto-created
|
4. TypeScript config file (agent-health.config.ts) optional, for custom agents/connectors
4. TypeScript config file (agent-health.config.ts) - optional, for custom agents/connectors
```

## JSON config file
Expand Down Expand Up @@ -138,19 +138,19 @@ Required for the Bedrock LLM judge and Claude Code agent.
|----------|-------------|---------|
| `AWS_PROFILE` | AWS profile to use | `default` |
| `AWS_REGION` | AWS region | `us-west-2` |
| `AWS_ACCESS_KEY_ID` | Explicit access key (alternative to profile) | |
| `AWS_SECRET_ACCESS_KEY` | Explicit secret key | |
| `AWS_SESSION_TOKEN` | Session token (for temporary credentials) | |
| `AWS_ACCESS_KEY_ID` | Explicit access key (alternative to profile) | - |
| `AWS_SECRET_ACCESS_KEY` | Explicit secret key | - |
| `AWS_SESSION_TOKEN` | Session token (for temporary credentials) | - |

### OpenSearch Storage (optional)

Override the default file-based storage with an OpenSearch cluster.

| Variable | Description | Default |
|----------|-------------|---------|
| `OPENSEARCH_STORAGE_ENDPOINT` | Storage cluster URL | |
| `OPENSEARCH_STORAGE_USERNAME` | Username | |
| `OPENSEARCH_STORAGE_PASSWORD` | Password | |
| `OPENSEARCH_STORAGE_ENDPOINT` | Storage cluster URL | - |
| `OPENSEARCH_STORAGE_USERNAME` | Username | - |
| `OPENSEARCH_STORAGE_PASSWORD` | Password | - |
| `OPENSEARCH_STORAGE_TLS_SKIP_VERIFY` | Skip TLS verification | `false` |

### OpenSearch Observability (optional)
Expand All @@ -159,9 +159,9 @@ For viewing agent traces and logs.

| Variable | Description | Default |
|----------|-------------|---------|
| `OPENSEARCH_LOGS_ENDPOINT` | Logs cluster URL | |
| `OPENSEARCH_LOGS_USERNAME` | Username | |
| `OPENSEARCH_LOGS_PASSWORD` | Password | |
| `OPENSEARCH_LOGS_ENDPOINT` | Logs cluster URL | - |
| `OPENSEARCH_LOGS_USERNAME` | Username | - |
| `OPENSEARCH_LOGS_PASSWORD` | Password | - |
| `OPENSEARCH_LOGS_TRACES_INDEX` | Traces index pattern | `otel-v1-apm-span-*` |
| `OPENSEARCH_LOGS_INDEX` | Logs index pattern | `ml-commons-logs-*` |

Expand Down Expand Up @@ -191,5 +191,5 @@ $ agent-health doctor

## Next steps

- [Connectors](/docs/agent-health/configuration/connectors/) create custom connectors for your agent type
- [CLI Reference](/docs/agent-health/cli/) all commands and options
- [Connectors](/docs/agent-health/configuration/connectors/) - create custom connectors for your agent type
- [CLI Reference](/docs/agent-health/cli/) - all commands and options
Original file line number Diff line number Diff line change
Expand Up @@ -31,16 +31,16 @@ To compare agents, run the same experiment multiple times with different agent/m
## Running experiments from the CLI

```bash
# Quick mode auto-creates a benchmark from all stored test cases
# Quick mode - auto-creates a benchmark from all stored test cases
npx @opensearch-project/agent-health benchmark

# Named mode runs a specific existing benchmark
# Named mode - runs a specific existing benchmark
npx @opensearch-project/agent-health benchmark -n "Baseline" -a my-agent

# File mode imports test cases from JSON and runs them
# File mode - imports test cases from JSON and runs them
npx @opensearch-project/agent-health benchmark -f ./test-cases.json -a my-agent

# With export save results to file
# With export - save results to file
npx @opensearch-project/agent-health benchmark -f ./test-cases.json -n "My Run" -a my-agent --export results.json
```

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ A "Golden Path" is the expected trajectory an agent should follow to successfull
- What reasoning steps are expected
- What the final response should contain

The LLM judge doesn't require an exact match it evaluates whether the agent's actual trajectory achieves the expected outcomes through reasonable steps, even if the specific path differs.
The LLM judge doesn't require an exact match - it evaluates whether the agent's actual trajectory achieves the expected outcomes through reasonable steps, even if the specific path differs.

## LLM Judge output

Expand Down Expand Up @@ -64,5 +64,5 @@ AWS_SECRET_ACCESS_KEY=your_secret

## Next steps

- [Test Cases](/docs/agent-health/evaluations/test-cases/) create and manage evaluation scenarios
- [Experiments](/docs/agent-health/evaluations/experiments/) run batch evaluations and compare results
- [Test Cases](/docs/agent-health/evaluations/test-cases/) - create and manage evaluation scenarios
- [Experiments](/docs/agent-health/evaluations/experiments/) - run batch evaluations and compare results
Original file line number Diff line number Diff line change
Expand Up @@ -76,8 +76,8 @@ npx @opensearch-project/agent-health benchmark -f test-cases.json -a another-age

## Tips for good test cases

- **Make prompts specific and unambiguous** avoid vague instructions
- **Include all necessary context data** the agent shouldn't need to guess
- **Define clear, measurable expected outcomes** the judge needs concrete criteria
- **Start with simple cases, add complexity gradually** build confidence before testing edge cases
- **Use labels for organization** filter and group test cases by category, difficulty, or domain
- **Make prompts specific and unambiguous** - avoid vague instructions
- **Include all necessary context data** - the agent shouldn't need to guess
- **Define clear, measurable expected outcomes** - the judge needs concrete criteria
- **Start with simple cases, add complexity gradually** - build confidence before testing edge cases
- **Use labels for organization** - filter and group test cases by category, difficulty, or domain
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ This guide walks you through using Agent Health to evaluate AI agents. The appli
## Prerequisites

**Required:**
- **Node.js 18+** [download here](https://nodejs.org/)
- **Node.js 18+** - [download here](https://nodejs.org/)
- **npm** (comes with Node.js)

**Optional (for production use):**
Expand Down Expand Up @@ -51,7 +51,7 @@ Agent Health includes a built-in Travel Planner multi-agent demo, along with a D

- Simulates a multi-agent Travel Planner system with realistic trajectories
- Agent types: Travel Coordinator, Weather Agent, Events Agent, Booking Agent, Budget Agent
- No external endpoint required select "Demo Agent" in the agent dropdown
- No external endpoint required - select "Demo Agent" in the agent dropdown

### Demo Judge

Expand Down Expand Up @@ -125,7 +125,7 @@ Each step shows timestamp, duration, tool arguments (for actions), full tool out

## Next steps

- [Connect your own agent](/docs/agent-health/configuration/) configure Agent Health for your agent
- [Create custom test cases](/docs/agent-health/evaluations/test-cases/) build test cases for your domain
- [Run experiments](/docs/agent-health/evaluations/experiments/) batch evaluate across agents and models
- [View traces](/docs/agent-health/traces/) visualize OpenTelemetry traces from your agent
- [Connect your own agent](/docs/agent-health/configuration/) - configure Agent Health for your agent
- [Create custom test cases](/docs/agent-health/evaluations/test-cases/) - build test cases for your domain
- [Run experiments](/docs/agent-health/evaluations/experiments/) - batch evaluate across agents and models
- [View traces](/docs/agent-health/traces/) - visualize OpenTelemetry traces from your agent
12 changes: 6 additions & 6 deletions docs/starlight-docs/src/content/docs/agent-health/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ sidebar:
hidden: true
---

Agent Health is an evaluation and observability framework for AI agents. It helps you measure agent performance through "Golden Path" trajectory comparison where an LLM judge evaluates agent actions against expected outcomes. Check out the [GitHub repository](https://github.com/opensearch-project/agent-health) for source code and contributions.
Agent Health is an evaluation and observability framework for AI agents. It helps you measure agent performance through "Golden Path" trajectory comparison - where an LLM judge evaluates agent actions against expected outcomes. Check out the [GitHub repository](https://github.com/opensearch-project/agent-health) for source code and contributions.

## Quick start

Expand Down Expand Up @@ -50,8 +50,8 @@ For creating custom connectors, see [Connectors](/docs/agent-health/configuratio

## Next steps

- [Getting Started](/docs/agent-health/getting-started/) step-by-step walkthrough from install to first evaluation
- [Evaluations](/docs/agent-health/evaluations/) how evaluations, test cases, and experiments work
- [Trace Visualization](/docs/agent-health/traces/) real-time trace monitoring and comparison
- [Configuration](/docs/agent-health/configuration/) connect your own agent and configure the environment
- [CLI Reference](/docs/agent-health/cli/) all CLI commands and options
- [Getting Started](/docs/agent-health/getting-started/) - step-by-step walkthrough from install to first evaluation
- [Evaluations](/docs/agent-health/evaluations/) - how evaluations, test cases, and experiments work
- [Trace Visualization](/docs/agent-health/traces/) - real-time trace monitoring and comparison
- [Configuration](/docs/agent-health/configuration/) - connect your own agent and configure the environment
- [CLI Reference](/docs/agent-health/cli/) - all CLI commands and options
Loading
Loading