[BUG] Default eval model does not exist but the eval suite still runs in full.

## Description
Grok-code is not reachable anymore on opencode zen.

## Steps to Reproduce
cit clone https://github.com/darrenhinde/OpenAgentsControl
cd evals/frameworks
npm run eval:sdk -- --agent=openagent --pattern="**/golden/*.yaml"

## Expected Behavior
Eval suite breaks execution

## Actual Behavior
Eval suite continues running

## Environment
- Ubuntu 24
- OpenCode CLI ver: 1.2.5
- Installation profile: developer
- Bash version: 5.2.21

## Possible Solution: Change the default model to big-pickle, and implement error handling.


```
npm run eval:sdk -- --agent=openagent --pattern="**/golden/*.yaml" --debug

> @opencode-agents/eval-framework@0.1.1 eval:sdk
> tsx src/sdk/run-sdk-tests.ts --agent=openagent --pattern=**/golden/*.yaml --debug

🚀 OpenCode SDK Test Runner

Testing agent: openagent

Found 10 test file(s):

  1. shared/tests/golden/08-error-handling.yaml
  2. shared/tests/golden/07-tool-selection.yaml
  3. shared/tests/golden/06-delegation-decision.yaml
  4. shared/tests/golden/05-multi-turn-context.yaml
  5. shared/tests/golden/04-write-with-approval.yaml
  6. shared/tests/golden/03-read-before-write.yaml
  7. shared/tests/golden/02-delegation-test.yaml
  8. shared/tests/golden/02-context-loading.yaml
  9. shared/tests/golden/02-context-loading-explicit.yaml
  10. shared/tests/golden/00-smoke-test.yaml

Loading test cases...
✅ Loaded 10 test case(s)

[TestRunner] DEBUG_VERBOSE enabled - full conversation logging active
[TestRunner] Git root: /media/nvme/projects/OpenAgentsControl
[TestRunner] Server will use eval-runner.md (dynamically configured)
Using default model: opencode/grok-code-fast (free tier)

Starting test runner...
[TestRunner] Configured eval-runner.md with core/openagent.md
Starting opencode server...
[Server STDOUT]: Warning: OPENCODE_SERVER_PASSWORD is not set; server is unsecured.
[Server STDOUT]: opencode server listening on http://127.0.0.1:4096
Server started at http://127.0.0.1:4096
[TestRunner] Multi-agent logging enabled (verbose mode)
[TestRunner] Multi-agent logging enabled (verbose mode)
[TestRunner] Evaluators initialized with SDK client
✅ Test runner started

Running tests...


============================================================
Running test: golden-08-error-handling - Golden 08: Error Handling - Agent Handles Missing Files Gracefully
============================================================
┌──────────────────────────────────────────────────────────┐
│ 🤖 Agent: OpenAgent                                      │
│ 🧠 Model: opencode/grok-code                             │
└──────────────────────────────────────────────────────────┘
Approval strategy: Auto-approve all permission requests
[Handlers] Before setup: 0
[Handlers] After setup: 9
⚠️   Event stream connection not confirmed within timeout

══════════════════════════════════════════════════════════════════════
📋 EVAL-RUNNER.MD CONTENT (System Prompt Being Tested)
══════════════════════════════════════════════════════════════════════

🔧 FRONTMATTER:
──────────────────────────────────────────────────────────────────────
---
name: OpenAgent
description: "Universal agent for answering queries, executing tasks, and coordinating workflows across any domain"
mode: primary
temperature: 0.2
permission:
  bash:
    # 1. Base Rule: Allow standard workflow to prevent nagging
    "*": "allow"

    # 2. Moderate Risk: Ask for confirmation (prevents accidents)
    "rm *": "ask"
    "mv *": "ask"
    "cp -f *": "ask"       # Overwrite copy
    "> *": "ask"           # Overwrite redirection
    ">> *": "ask"          # Append redirection
    "chmod *": "ask"       # Changing file permissions
    "chown *": "ask"       # Changing file ownership
    "git push *": "ask"    # Pushing to remote
    "ssh *": "ask"         # Remote connections

    # 3. High Danger: Deny outright
    "sudo *": "deny"
    "su *": "deny"
    "rm -rf *": "deny"     # Recursive force delete is too dangerous for agents
    "dd *": "deny"         # Disk writing
    "mkfs *": "deny"       # Formatting
    "fdisk *": "deny"
    "parted *": "deny"
    "shutdown *": "deny"
    "reboot *": "deny"
    "halt *": "deny"
    ":(){:|:&};:": "deny"  # Fork bomb
    "cat /etc/shadow": "deny"

  edit:
    "**/*.env*": "deny"
    "**/*.key": "deny"
    "**/*.pem": "deny"     # AWS/SSH keys
    "**/*.secret": "deny"
    "**/id_rsa*": "deny"   # Private SSH keys
    "**/id_dsa*": "deny"
    "**/.bash_history": "deny"
    "**/.zsh_history": "deny"
---

📝 SYSTEM PROMPT (first 50 lines):
──────────────────────────────────────────────────────────────────────
Always use ContextScout for discovery of new tasks or context files.
ContextScout is exempt from the approval gate rule. ContextScout is your secret weapon for quality, use it where possible.
<context>
  <system_context>Universal AI agent for code, docs, tests, and workflow coordination called OpenAgent</system_context>
  <domain_context>Any codebase, any language, any project structure</domain_context>
  <task_context>Execute tasks directly or delegate to specialized subagents</task_context>
  <execution_context>Context-aware execution with project standards enforcement</execution_context>
</context>

<critical_context_requirement>
PURPOSE: Context files contain project-specific standards that ensure consistency,
quality, and alignment with established patterns. Without loading context first,
you will create code/docs/tests that don't match the project's conventions,
causing inconsistency and rework.

BEFORE any bash/write/edit/task execution, ALWAYS load required context files.
(Read/list/glob/grep for discovery are allowed - load context once discovered)
NEVER proceed with code/docs/tests without loading standards first.
AUTO-STOP if you find yourself executing without context loaded.

WHY THIS MATTERS:
- Code without standards/code-quality.md → Inconsistent patterns, wrong architecture
- Docs without standards/documentation.md → Wrong tone, missing sections, poor structure
- Tests without standards/test-coverage.md → Wrong framework, incomplete coverage
- Review without workflows/code-review.md → Missed quality checks, incomplete analysis
- Delegation without workflows/task-delegation-basics.md → Wrong context passed to subagents

Required context files:
- Code tasks → .opencode/context/core/standards/code-quality.md
- Docs tasks → .opencode/context/core/standards/documentation.md
- Tests tasks → .opencode/context/core/standards/test-coverage.md
- Review tasks → .opencode/context/core/workflows/code-review.md
- Delegation → .opencode/context/core/workflows/task-delegation-basics.md

CONSEQUENCE OF SKIPPING: Work that doesn't match project standards = wasted effort + rework
</critical_context_requirement>

<critical_rules priority="absolute" enforcement="strict">
  <rule id="approval_gate" scope="all_execution">
    Request approval before ANY execution (bash, write, edit, task). Read/list ops don't require approval.
  </rule>

  <rule id="stop_on_failure" scope="validation">
    STOP on test fail/errors - NEVER auto-fix
  </rule>
  <rule id="report_first" scope="error_handling">
    On fail: REPORT→PROPOSE FIX→REQUEST APPROVAL→FIX (never auto-fix)
  </rule>
  <rule id="confirm_cleanup" scope="session_management">
    Confirm before deleting session files/cleanup ops

... (609 more lines)
══════════════════════════════════════════════════════════════════════

Creating session...

┌────────────────────────────────────────────────────────────┐
│ 🎯 PARENT: OpenAgent (ses_399c15b0...)                     │
└────────────────────────────────────────────────────────────┘
📋 Session created
Session created: ses_399c15b0effe5xvbrmBumPfziD
Sending 1 prompts (multi-turn)...
Using smart timeout: 60000ms per prompt, max 120000ms absolute

Prompt 1/1:
  Text: Read the file evals/test_tmp/this-file-does-not-exist-12345.txt and show me its contents.

  🤖 Agent: Read the file evals/test_tmp/this-file-does-not-exist-12345.txt and show me its ...
[Server STDERR]: 1102 |     const info = provider.models[modelID]
1103 |     if (!info) {
1104 |       const availableModels = Object.keys(provider.models)
1105 |       const matches = fuzzysort.go(modelID, availableModels, { limit: 3, threshold: -10000 })
1106 |       const suggestions = matches.map((m) => m.target)
1107 |       throw new ModelNotFoundError({ providerID, modelID, suggestions })
                   ^
ProviderModelNotFoundError: ProviderModelNotFoundError
 data: {
  providerID: "opencode",
  modelID: "grok-code",
  suggestions: [],
},

      at getModel (src/provider/provider.ts:1107:13)
✅ Completed via sdk: success
  Completed

All prompts completed
...
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Default eval model does not exist but the eval suite still runs in full. #222

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Possible Solution: Change the default model to big-pickle, and implement error handling.

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[BUG] Default eval model does not exist but the eval suite still runs in full. #222

Description

Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Environment

Possible Solution: Change the default model to big-pickle, and implement error handling.

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions