Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 18 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,13 +83,24 @@ agentcore invoke

### Resource Management

| Command | Description |
| -------- | ------------------------------------- |
| `add` | Add agents, memory, identity, targets |
| `remove` | Remove resources from project |
| Command | Description |
| -------- | ------------------------------------------------- |
| `add` | Add agents, memory, identity, evaluators, targets |
| `remove` | Remove resources from project |

> **Note**: Run `agentcore deploy` after `add` or `remove` to update resources in AWS.

### Evaluations

| Command | Description |
| -------------------- | --------------------------------------------- |
| `add evaluator` | Add a custom LLM-as-a-Judge evaluator |
| `add online-eval` | Add continuous evaluation for live traffic |
| `run evals` | Run on-demand evaluation against agent traces |
| `evals history` | View past eval run results |
| `pause online-eval` | Pause a deployed online eval config |
| `resume online-eval` | Resume a paused online eval config |

## Project Structure

```
Expand All @@ -116,7 +127,7 @@ my-project/

Projects use JSON schema files in the `agentcore/` directory:

- `agentcore.json` - Agent specifications, memory, identity, remote tools
- `agentcore.json` - Agent specifications, memory, identity, evaluators, online evals
- `deployed-state.json` - Runtime state in agentcore/.cli/ (auto-managed)
- `aws-targets.json` - Deployment targets (account, region)

Expand All @@ -125,11 +136,13 @@ Projects use JSON schema files in the `agentcore/` directory:
- **Runtime** - Managed execution environment for deployed agents
- **Memory** - Semantic, summarization, and user preference strategies
- **Identity** - Secure API key management via Secrets Manager
- **Evaluations** - LLM-as-a-Judge for on-demand and continuous agent quality monitoring

## Documentation

- [CLI Commands Reference](docs/commands.md) - Full command reference for scripting and CI/CD
- [Configuration](docs/configuration.md) - Schema reference for config files
- [Evaluations](docs/evals.md) - Evaluators, on-demand evals, and online monitoring
- [Frameworks](docs/frameworks.md) - Supported frameworks and model providers
- [Gateway](docs/gateway.md) - Gateway setup, targets, and authentication
- [Memory](docs/memory.md) - Memory strategies and sharing
Expand Down
146 changes: 146 additions & 0 deletions docs/commands.md
Original file line number Diff line number Diff line change
Expand Up @@ -308,6 +308,51 @@ agentcore add identity \
| `--scopes <scopes>` | OAuth scopes, comma-separated |
| `--json` | JSON output |

### add evaluator

Add a custom LLM-as-a-Judge evaluator. See [Evaluations](evals.md) for full details.

```bash
agentcore add evaluator \
--name ResponseQuality \
--level SESSION \
--model us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
--instructions "Evaluate the response quality. Context: {context}" \
--rating-scale 1-5-quality
```

| Flag | Description |
| ------------------------- | -------------------------------------------------------------------------- |
| `--name <name>` | Evaluator name |
| `--level <level>` | `SESSION`, `TRACE`, or `TOOL_CALL` |
| `--model <model>` | Bedrock model ID for the LLM judge |
| `--instructions <text>` | Evaluation prompt with placeholders (e.g. `{context}`) |
| `--rating-scale <preset>` | `1-5-quality`, `1-3-simple`, `pass-fail`, `good-neutral-bad`, or custom |
| `--config <path>` | Config JSON file (overrides `--model`, `--instructions`, `--rating-scale`) |
| `--json` | JSON output |

### add online-eval

Add an online eval config for continuous agent monitoring.

```bash
agentcore add online-eval \
--name QualityMonitor \
--agent MyAgent \
--evaluator ResponseQuality Builtin.Faithfulness \
--sampling-rate 10
```

| Flag | Description |
| ---------------------------- | --------------------------------------------- |
| `--name <name>` | Config name |
| `-a, --agent <name>` | Agent to monitor |
| `-e, --evaluator <names...>` | Evaluator name(s), `Builtin.*` IDs, or ARNs |
| `--evaluator-arn <arns...>` | Evaluator ARN(s) |
| `--sampling-rate <rate>` | Percentage of requests to evaluate (0.01–100) |
| `--enable-on-create` | Enable immediately after deploy |
| `--json` | JSON output |

### remove

Remove resources from project.
Expand All @@ -316,6 +361,8 @@ Remove resources from project.
agentcore remove agent --name MyAgent --force
agentcore remove memory --name SharedMemory
agentcore remove identity --name OpenAI
agentcore remove evaluator --name ResponseQuality
agentcore remove online-eval --name QualityMonitor
agentcore remove gateway --name MyGateway
agentcore remove gateway-target --name WeatherTools

Expand Down Expand Up @@ -378,6 +425,105 @@ agentcore invoke --json # JSON output

---

## Evaluations

See [Evaluations](evals.md) for the full guide on evaluators, scoring, and online monitoring.

### run evals

Run on-demand evaluation against historical agent traces.

```bash
# Project mode
agentcore run evals --agent MyAgent --evaluator ResponseQuality --days 7

# Standalone mode (no project required)
agentcore run evals \
--agent-arn arn:aws:...:runtime/abc123 \
--evaluator-arn arn:aws:...:evaluator/eval123 \
--region us-east-1
```

| Flag | Description |
| ---------------------------- | ----------------------------------------- |
| `-a, --agent <name>` | Agent name from project |
| `--agent-arn <arn>` | Agent runtime ARN (standalone mode) |
| `-e, --evaluator <names...>` | Evaluator name(s) or `Builtin.*` IDs |
| `--evaluator-arn <arns...>` | Evaluator ARN(s) (use with `--agent-arn`) |
| `--region <region>` | AWS region (required with `--agent-arn`) |
| `-s, --session-id <id>` | Evaluate a specific session |
| `-t, --trace-id <id>` | Evaluate a specific trace |
| `--days <days>` | Lookback window in days (default: 7) |
| `--output <path>` | Custom output file path |
| `--json` | JSON output |

### evals history

View past on-demand eval run results.

```bash
agentcore evals history
agentcore evals history --agent MyAgent --limit 5 --json
```

| Flag | Description |
| --------------------- | -------------------- |
| `-a, --agent <name>` | Filter by agent name |
| `-n, --limit <count>` | Max runs to display |
| `--json` | JSON output |

### pause online-eval

Pause a deployed online eval config.

```bash
agentcore pause online-eval QualityMonitor
agentcore pause online-eval --arn arn:aws:...:online-eval-config/abc123
```

| Flag | Description |
| ------------------- | -------------------------------------------------- |
| `[name]` | Config name from project (not needed with `--arn`) |
| `--arn <arn>` | Online eval config ARN (standalone mode) |
| `--region <region>` | AWS region override |
| `--json` | JSON output |

### resume online-eval

Resume a paused online eval config.

```bash
agentcore resume online-eval QualityMonitor
agentcore resume online-eval --arn arn:aws:...:online-eval-config/abc123
```

| Flag | Description |
| ------------------- | -------------------------------------------------- |
| `[name]` | Config name from project (not needed with `--arn`) |
| `--arn <arn>` | Online eval config ARN (standalone mode) |
| `--region <region>` | AWS region override |
| `--json` | JSON output |

### logs evals

Stream or search online eval logs.

```bash
agentcore logs evals --agent MyAgent --since 1h
agentcore logs evals --follow --json
```

| Flag | Description |
| --------------------- | --------------------------------------------- |
| `-a, --agent <name>` | Filter by agent |
| `--since <time>` | Start time (e.g. `1h`, `30m`, `2d`, ISO 8601) |
| `--until <time>` | End time |
| `-n, --lines <count>` | Maximum log lines |
| `-f, --follow` | Stream in real-time |
| `--json` | JSON Lines output |

---

## Utilities

### package
Expand Down
138 changes: 118 additions & 20 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,13 @@ AgentCore projects use JSON configuration files in the `agentcore/` directory.

## Files Overview

| File | Purpose |
| --------------------- | ------------------------------------------- |
| `agentcore.json` | Project, agents, memories, and credentials |
| `mcp.json` | Gateways, gateway targets, and MCP tools |
| `aws-targets.json` | Deployment targets |
| `deployed-state.json` | Runtime state (auto-managed, do not edit) |
| `.env.local` | API keys for local development (gitignored) |
| File | Purpose |
| --------------------- | ---------------------------------------------------------------- |
| `agentcore.json` | Project, agents, memories, credentials, evaluators, online evals |
| `mcp.json` | Gateways, gateway targets, and MCP tools |
| `aws-targets.json` | Deployment targets |
| `deployed-state.json` | Runtime state (auto-managed, do not edit) |
| `.env.local` | API keys for local development (gitignored) |

---

Expand Down Expand Up @@ -44,26 +44,42 @@ Main project configuration using a **flat resource model**. Agents, memories, an
{
"type": "ApiKeyCredentialProvider",
"name": "OpenAI"
},
}
],
"evaluators": [
{
"type": "OAuthCredentialProvider",
"name": "MyOAuthProvider",
"discoveryUrl": "https://idp.example.com/.well-known/openid-configuration",
"scopes": ["read", "write"]
"type": "CustomEvaluator",
"name": "ResponseQuality",
"level": "SESSION",
"config": {
"llmAsAJudge": {
"model": "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
"instructions": "Evaluate the response quality. Context: {context}",
"ratingScale": {
"numerical": [
{ "value": 1, "label": "Poor", "definition": "Fails to meet expectations" },
{ "value": 5, "label": "Excellent", "definition": "Far exceeds expectations" }
]
}
}
}
}
]
],
"onlineEvalConfigs": []
}
```

### Project Fields

| Field | Required | Description |
| ------------- | -------- | ----------------------------------------------------------- |
| `name` | Yes | Project name (1-23 chars, alphanumeric, starts with letter) |
| `version` | Yes | Schema version (integer, currently `1`) |
| `agents` | Yes | Array of agent specifications |
| `memories` | Yes | Array of memory resources |
| `credentials` | Yes | Array of credential providers (API key or OAuth) |
| Field | Required | Description |
| ------------------- | -------- | ----------------------------------------------------------- |
| `name` | Yes | Project name (1-23 chars, alphanumeric, starts with letter) |
| `version` | Yes | Schema version (integer, currently `1`) |
| `agents` | Yes | Array of agent specifications |
| `memories` | Yes | Array of memory resources |
| `credentials` | Yes | Array of credential providers (API key or OAuth) |
| `evaluators` | Yes | Array of custom evaluator definitions |
| `onlineEvalConfigs` | Yes | Array of online eval configurations |

> Gateway configuration is stored separately in `mcp.json`. See [mcp.json](#mcpjson) below.

Expand Down Expand Up @@ -191,6 +207,88 @@ AgentCore Identity service for deployed environments.

---

## Evaluator Resource

See [Evaluations](evals.md) for the full guide.

```json
{
"type": "CustomEvaluator",
"name": "ResponseQuality",
"level": "SESSION",
"description": "Evaluate response quality",
"config": {
"llmAsAJudge": {
"model": "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
"instructions": "Evaluate the response quality. Context: {context}",
"ratingScale": {
"numerical": [
{ "value": 1, "label": "Poor", "definition": "Fails to meet expectations" },
{ "value": 5, "label": "Excellent", "definition": "Far exceeds expectations" }
]
}
}
}
}
```

| Field | Required | Description |
| ------------- | -------- | ----------------------------------------------- |
| `type` | Yes | Always `"CustomEvaluator"` |
| `name` | Yes | Evaluator name (1-48 chars, alphanumeric + `_`) |
| `level` | Yes | `"SESSION"`, `"TRACE"`, or `"TOOL_CALL"` |
| `description` | No | Evaluator description |
| `config` | Yes | LLM-as-a-Judge configuration (see below) |

### LLM-as-a-Judge Config

| Field | Required | Description |
| -------------- | -------- | ------------------------------------------------------ |
| `model` | Yes | Bedrock model ID or cross-region inference profile |
| `instructions` | Yes | Evaluation prompt with placeholders (e.g. `{context}`) |
| `ratingScale` | Yes | Either `numerical` or `categorical` array (not both) |

### Rating Scale

**Numerical** — scored values:

```json
{ "numerical": [{ "value": 1, "label": "Poor", "definition": "..." }, ...] }
```

**Categorical** — named labels:

```json
{ "categorical": [{ "label": "Pass", "definition": "..." }, ...] }
```

---

## Online Eval Config Resource

```json
{
"type": "OnlineEvaluationConfig",
"name": "QualityMonitor",
"agent": "MyAgent",
"evaluators": ["ResponseQuality", "Builtin.Faithfulness"],
"samplingRate": 10,
"enableOnCreate": true
}
```

| Field | Required | Description |
| ---------------- | -------- | ------------------------------------------------------------ |
| `type` | Yes | Always `"OnlineEvaluationConfig"` |
| `name` | Yes | Config name (1-48 chars, alphanumeric + `_`) |
| `agent` | Yes | Agent name to monitor (must match a project agent) |
| `evaluators` | Yes | Array of evaluator names, `Builtin.*` IDs, or evaluator ARNs |
| `samplingRate` | Yes | Percentage of requests to evaluate (0.01–100) |
| `description` | No | Config description (max 200 chars) |
| `enableOnCreate` | No | Enable evaluation on deploy (default: true) |

---

## mcp.json

Gateway and MCP tool configuration. Gateways, their targets, and standalone MCP runtime tools are defined here.
Expand Down
Loading
Loading