aws · tejaskash · Mar 19, 2026 · Mar 19, 2026
diff --git a/README.md b/README.md
@@ -83,13 +83,24 @@ agentcore invoke
 
 ### Resource Management
 
-| Command  | Description                           |
-| -------- | ------------------------------------- |
-| `add`    | Add agents, memory, identity, targets |
-| `remove` | Remove resources from project         |
+| Command  | Description                                       |
+| -------- | ------------------------------------------------- |
+| `add`    | Add agents, memory, identity, evaluators, targets |
+| `remove` | Remove resources from project                     |
 
 > **Note**: Run `agentcore deploy` after `add` or `remove` to update resources in AWS.
 
+### Evaluations
+
+| Command              | Description                                   |
+| -------------------- | --------------------------------------------- |
+| `add evaluator`      | Add a custom LLM-as-a-Judge evaluator         |
+| `add online-eval`    | Add continuous evaluation for live traffic    |
+| `run evals`          | Run on-demand evaluation against agent traces |
+| `evals history`      | View past eval run results                    |
+| `pause online-eval`  | Pause a deployed online eval config           |
+| `resume online-eval` | Resume a paused online eval config            |
+
 ## Project Structure
 
 ```
@@ -116,7 +127,7 @@ my-project/
 
 Projects use JSON schema files in the `agentcore/` directory:
 
-- `agentcore.json` - Agent specifications, memory, identity, remote tools
+- `agentcore.json` - Agent specifications, memory, identity, evaluators, online evals
 - `deployed-state.json` - Runtime state in agentcore/.cli/ (auto-managed)
 - `aws-targets.json` - Deployment targets (account, region)
 
@@ -125,11 +136,13 @@ Projects use JSON schema files in the `agentcore/` directory:
 - **Runtime** - Managed execution environment for deployed agents
 - **Memory** - Semantic, summarization, and user preference strategies
 - **Identity** - Secure API key management via Secrets Manager
+- **Evaluations** - LLM-as-a-Judge for on-demand and continuous agent quality monitoring
 
 ## Documentation
 
 - [CLI Commands Reference](docs/commands.md) - Full command reference for scripting and CI/CD
 - [Configuration](docs/configuration.md) - Schema reference for config files
+- [Evaluations](docs/evals.md) - Evaluators, on-demand evals, and online monitoring
 - [Frameworks](docs/frameworks.md) - Supported frameworks and model providers
 - [Gateway](docs/gateway.md) - Gateway setup, targets, and authentication
 - [Memory](docs/memory.md) - Memory strategies and sharing

diff --git a/docs/commands.md b/docs/commands.md
@@ -308,6 +308,51 @@ agentcore add identity \
 | `--scopes <scopes>`        | OAuth scopes, comma-separated    |
 | `--json`                   | JSON output                      |
 
+### add evaluator
+
+Add a custom LLM-as-a-Judge evaluator. See [Evaluations](evals.md) for full details.
+
+```bash
+agentcore add evaluator \
+  --name ResponseQuality \
+  --level SESSION \
+  --model us.anthropic.claude-sonnet-4-5-20250929-v1:0 \
+  --instructions "Evaluate the response quality. Context: {context}" \
+  --rating-scale 1-5-quality
+```
+
+| Flag                      | Description                                                                |
+| ------------------------- | -------------------------------------------------------------------------- |
+| `--name <name>`           | Evaluator name                                                             |
+| `--level <level>`         | `SESSION`, `TRACE`, or `TOOL_CALL`                                         |
+| `--model <model>`         | Bedrock model ID for the LLM judge                                         |
+| `--instructions <text>`   | Evaluation prompt with placeholders (e.g. `{context}`)                     |
+| `--rating-scale <preset>` | `1-5-quality`, `1-3-simple`, `pass-fail`, `good-neutral-bad`, or custom    |
+| `--config <path>`         | Config JSON file (overrides `--model`, `--instructions`, `--rating-scale`) |
+| `--json`                  | JSON output                                                                |
+
+### add online-eval
+
+Add an online eval config for continuous agent monitoring.
+
+```bash
+agentcore add online-eval \
+  --name QualityMonitor \
+  --agent MyAgent \
+  --evaluator ResponseQuality Builtin.Faithfulness \
+  --sampling-rate 10
+```
+
+| Flag                         | Description                                   |
+| ---------------------------- | --------------------------------------------- |
+| `--name <name>`              | Config name                                   |
+| `-a, --agent <name>`         | Agent to monitor                              |
+| `-e, --evaluator <names...>` | Evaluator name(s), `Builtin.*` IDs, or ARNs   |
+| `--evaluator-arn <arns...>`  | Evaluator ARN(s)                              |
+| `--sampling-rate <rate>`     | Percentage of requests to evaluate (0.01–100) |
+| `--enable-on-create`         | Enable immediately after deploy               |
+| `--json`                     | JSON output                                   |
+
 ### remove
 
 Remove resources from project.
@@ -316,6 +361,8 @@ Remove resources from project.
 agentcore remove agent --name MyAgent --force
 agentcore remove memory --name SharedMemory
 agentcore remove identity --name OpenAI
+agentcore remove evaluator --name ResponseQuality
+agentcore remove online-eval --name QualityMonitor
 agentcore remove gateway --name MyGateway
 agentcore remove gateway-target --name WeatherTools
 
@@ -378,6 +425,105 @@ agentcore invoke --json                   # JSON output
 
 ---
 
+## Evaluations
+
+See [Evaluations](evals.md) for the full guide on evaluators, scoring, and online monitoring.
+
+### run evals
+
+Run on-demand evaluation against historical agent traces.
+
+```bash
+# Project mode
+agentcore run evals --agent MyAgent --evaluator ResponseQuality --days 7
+
+# Standalone mode (no project required)
+agentcore run evals \
+  --agent-arn arn:aws:...:runtime/abc123 \
+  --evaluator-arn arn:aws:...:evaluator/eval123 \
+  --region us-east-1
+```
+
+| Flag                         | Description                               |
+| ---------------------------- | ----------------------------------------- |
+| `-a, --agent <name>`         | Agent name from project                   |
+| `--agent-arn <arn>`          | Agent runtime ARN (standalone mode)       |
+| `-e, --evaluator <names...>` | Evaluator name(s) or `Builtin.*` IDs      |
+| `--evaluator-arn <arns...>`  | Evaluator ARN(s) (use with `--agent-arn`) |
+| `--region <region>`          | AWS region (required with `--agent-arn`)  |
+| `-s, --session-id <id>`      | Evaluate a specific session               |
+| `-t, --trace-id <id>`        | Evaluate a specific trace                 |
+| `--days <days>`              | Lookback window in days (default: 7)      |
+| `--output <path>`            | Custom output file path                   |
+| `--json`                     | JSON output                               |
+
+### evals history
+
+View past on-demand eval run results.
+
+```bash
+agentcore evals history
+agentcore evals history --agent MyAgent --limit 5 --json
+```
+
+| Flag                  | Description          |
+| --------------------- | -------------------- |
+| `-a, --agent <name>`  | Filter by agent name |
+| `-n, --limit <count>` | Max runs to display  |
+| `--json`              | JSON output          |
+
+### pause online-eval
+
+Pause a deployed online eval config.
+
+```bash
+agentcore pause online-eval QualityMonitor
+agentcore pause online-eval --arn arn:aws:...:online-eval-config/abc123
+```
+
+| Flag                | Description                                        |
+| ------------------- | -------------------------------------------------- |
+| `[name]`            | Config name from project (not needed with `--arn`) |
+| `--arn <arn>`       | Online eval config ARN (standalone mode)           |
+| `--region <region>` | AWS region override                                |
+| `--json`            | JSON output                                        |
+
+### resume online-eval
+
+Resume a paused online eval config.
+
+```bash
+agentcore resume online-eval QualityMonitor
+agentcore resume online-eval --arn arn:aws:...:online-eval-config/abc123
+```
+
+| Flag                | Description                                        |
+| ------------------- | -------------------------------------------------- |
+| `[name]`            | Config name from project (not needed with `--arn`) |
+| `--arn <arn>`       | Online eval config ARN (standalone mode)           |
+| `--region <region>` | AWS region override                                |
+| `--json`            | JSON output                                        |
+
+### logs evals
+
+Stream or search online eval logs.
+
+```bash
+agentcore logs evals --agent MyAgent --since 1h
+agentcore logs evals --follow --json
+```
+
+| Flag                  | Description                                   |
+| --------------------- | --------------------------------------------- |
+| `-a, --agent <name>`  | Filter by agent                               |
+| `--since <time>`      | Start time (e.g. `1h`, `30m`, `2d`, ISO 8601) |
+| `--until <time>`      | End time                                      |
+| `-n, --lines <count>` | Maximum log lines                             |
+| `-f, --follow`        | Stream in real-time                           |
+| `--json`              | JSON Lines output                             |
+
+---
+
 ## Utilities
 
 ### package

diff --git a/docs/configuration.md b/docs/configuration.md
@@ -4,13 +4,13 @@ AgentCore projects use JSON configuration files in the `agentcore/` directory.
 
 ## Files Overview
 
-| File                  | Purpose                                     |
-| --------------------- | ------------------------------------------- |
-| `agentcore.json`      | Project, agents, memories, and credentials  |
-| `mcp.json`            | Gateways, gateway targets, and MCP tools    |
-| `aws-targets.json`    | Deployment targets                          |
-| `deployed-state.json` | Runtime state (auto-managed, do not edit)   |
-| `.env.local`          | API keys for local development (gitignored) |
+| File                  | Purpose                                                          |
+| --------------------- | ---------------------------------------------------------------- |
+| `agentcore.json`      | Project, agents, memories, credentials, evaluators, online evals |
+| `mcp.json`            | Gateways, gateway targets, and MCP tools                         |
+| `aws-targets.json`    | Deployment targets                                               |
+| `deployed-state.json` | Runtime state (auto-managed, do not edit)                        |
+| `.env.local`          | API keys for local development (gitignored)                      |
 
 ---
 
@@ -44,26 +44,42 @@ Main project configuration using a **flat resource model**. Agents, memories, an
     {
       "type": "ApiKeyCredentialProvider",
       "name": "OpenAI"
-    },
+    }
+  ],
+  "evaluators": [
     {
-      "type": "OAuthCredentialProvider",
-      "name": "MyOAuthProvider",
-      "discoveryUrl": "https://idp.example.com/.well-known/openid-configuration",
-      "scopes": ["read", "write"]
+      "type": "CustomEvaluator",
+      "name": "ResponseQuality",
+      "level": "SESSION",
+      "config": {
+        "llmAsAJudge": {
+          "model": "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
+          "instructions": "Evaluate the response quality. Context: {context}",
+          "ratingScale": {
+            "numerical": [
+              { "value": 1, "label": "Poor", "definition": "Fails to meet expectations" },
+              { "value": 5, "label": "Excellent", "definition": "Far exceeds expectations" }
+            ]
+          }
+        }
+      }
     }
-  ]
+  ],
+  "onlineEvalConfigs": []
 }
 ```
 
 ### Project Fields
 
-| Field         | Required | Description                                                 |
-| ------------- | -------- | ----------------------------------------------------------- |
-| `name`        | Yes      | Project name (1-23 chars, alphanumeric, starts with letter) |
-| `version`     | Yes      | Schema version (integer, currently `1`)                     |
-| `agents`      | Yes      | Array of agent specifications                               |
-| `memories`    | Yes      | Array of memory resources                                   |
-| `credentials` | Yes      | Array of credential providers (API key or OAuth)            |
+| Field               | Required | Description                                                 |
+| ------------------- | -------- | ----------------------------------------------------------- |
+| `name`              | Yes      | Project name (1-23 chars, alphanumeric, starts with letter) |
+| `version`           | Yes      | Schema version (integer, currently `1`)                     |
+| `agents`            | Yes      | Array of agent specifications                               |
+| `memories`          | Yes      | Array of memory resources                                   |
+| `credentials`       | Yes      | Array of credential providers (API key or OAuth)            |
+| `evaluators`        | Yes      | Array of custom evaluator definitions                       |
+| `onlineEvalConfigs` | Yes      | Array of online eval configurations                         |
 
 > Gateway configuration is stored separately in `mcp.json`. See [mcp.json](#mcpjson) below.
 
@@ -191,6 +207,88 @@ AgentCore Identity service for deployed environments.
 
 ---
 
+## Evaluator Resource
+
+See [Evaluations](evals.md) for the full guide.
+
+```json
+{
+  "type": "CustomEvaluator",
+  "name": "ResponseQuality",
+  "level": "SESSION",
+  "description": "Evaluate response quality",
+  "config": {
+    "llmAsAJudge": {
+      "model": "us.anthropic.claude-sonnet-4-5-20250929-v1:0",
+      "instructions": "Evaluate the response quality. Context: {context}",
+      "ratingScale": {
+        "numerical": [
+          { "value": 1, "label": "Poor", "definition": "Fails to meet expectations" },
+          { "value": 5, "label": "Excellent", "definition": "Far exceeds expectations" }
+        ]
+      }
+    }
+  }
+}
+```
+
+| Field         | Required | Description                                     |
+| ------------- | -------- | ----------------------------------------------- |
+| `type`        | Yes      | Always `"CustomEvaluator"`                      |
+| `name`        | Yes      | Evaluator name (1-48 chars, alphanumeric + `_`) |
+| `level`       | Yes      | `"SESSION"`, `"TRACE"`, or `"TOOL_CALL"`        |
+| `description` | No       | Evaluator description                           |
+| `config`      | Yes      | LLM-as-a-Judge configuration (see below)        |
+
+### LLM-as-a-Judge Config
+
+| Field          | Required | Description                                            |
+| -------------- | -------- | ------------------------------------------------------ |
+| `model`        | Yes      | Bedrock model ID or cross-region inference profile     |
+| `instructions` | Yes      | Evaluation prompt with placeholders (e.g. `{context}`) |
+| `ratingScale`  | Yes      | Either `numerical` or `categorical` array (not both)   |
+
+### Rating Scale
+
+**Numerical** — scored values:
+
+```json
+{ "numerical": [{ "value": 1, "label": "Poor", "definition": "..." }, ...] }
+```
+
+**Categorical** — named labels:
+
+```json
+{ "categorical": [{ "label": "Pass", "definition": "..." }, ...] }
+```
+
+---
+
+## Online Eval Config Resource
+
+```json
+{
+  "type": "OnlineEvaluationConfig",
+  "name": "QualityMonitor",
+  "agent": "MyAgent",
+  "evaluators": ["ResponseQuality", "Builtin.Faithfulness"],
+  "samplingRate": 10,
+  "enableOnCreate": true
+}
+```
+
+| Field            | Required | Description                                                  |
+| ---------------- | -------- | ------------------------------------------------------------ |
+| `type`           | Yes      | Always `"OnlineEvaluationConfig"`                            |
+| `name`           | Yes      | Config name (1-48 chars, alphanumeric + `_`)                 |
+| `agent`          | Yes      | Agent name to monitor (must match a project agent)           |
+| `evaluators`     | Yes      | Array of evaluator names, `Builtin.*` IDs, or evaluator ARNs |
+| `samplingRate`   | Yes      | Percentage of requests to evaluate (0.01–100)                |
+| `description`    | No       | Config description (max 200 chars)                           |
+| `enableOnCreate` | No       | Enable evaluation on deploy (default: true)                  |
+
+---
+
 ## mcp.json
 
 Gateway and MCP tool configuration. Gateways, their targets, and standalone MCP runtime tools are defined here.