-
Notifications
You must be signed in to change notification settings - Fork 216
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Description
Grok-code is not reachable anymore on opencode zen.
Steps to Reproduce
cit clone https://github.com/darrenhinde/OpenAgentsControl
cd evals/frameworks
npm run eval:sdk -- --agent=openagent --pattern="**/golden/*.yaml"
Expected Behavior
Eval suite breaks execution
Actual Behavior
Eval suite continues running
Environment
- Ubuntu 24
- OpenCode CLI ver: 1.2.5
- Installation profile: developer
- Bash version: 5.2.21
Possible Solution: Change the default model to big-pickle, and implement error handling.
npm run eval:sdk -- --agent=openagent --pattern="**/golden/*.yaml" --debug
> @opencode-agents/eval-framework@0.1.1 eval:sdk
> tsx src/sdk/run-sdk-tests.ts --agent=openagent --pattern=**/golden/*.yaml --debug
🚀 OpenCode SDK Test Runner
Testing agent: openagent
Found 10 test file(s):
1. shared/tests/golden/08-error-handling.yaml
2. shared/tests/golden/07-tool-selection.yaml
3. shared/tests/golden/06-delegation-decision.yaml
4. shared/tests/golden/05-multi-turn-context.yaml
5. shared/tests/golden/04-write-with-approval.yaml
6. shared/tests/golden/03-read-before-write.yaml
7. shared/tests/golden/02-delegation-test.yaml
8. shared/tests/golden/02-context-loading.yaml
9. shared/tests/golden/02-context-loading-explicit.yaml
10. shared/tests/golden/00-smoke-test.yaml
Loading test cases...
✅ Loaded 10 test case(s)
[TestRunner] DEBUG_VERBOSE enabled - full conversation logging active
[TestRunner] Git root: /media/nvme/projects/OpenAgentsControl
[TestRunner] Server will use eval-runner.md (dynamically configured)
Using default model: opencode/grok-code-fast (free tier)
Starting test runner...
[TestRunner] Configured eval-runner.md with core/openagent.md
Starting opencode server...
[Server STDOUT]: Warning: OPENCODE_SERVER_PASSWORD is not set; server is unsecured.
[Server STDOUT]: opencode server listening on http://127.0.0.1:4096
Server started at http://127.0.0.1:4096
[TestRunner] Multi-agent logging enabled (verbose mode)
[TestRunner] Multi-agent logging enabled (verbose mode)
[TestRunner] Evaluators initialized with SDK client
✅ Test runner started
Running tests...
============================================================
Running test: golden-08-error-handling - Golden 08: Error Handling - Agent Handles Missing Files Gracefully
============================================================
┌──────────────────────────────────────────────────────────┐
│ 🤖 Agent: OpenAgent │
│ 🧠 Model: opencode/grok-code │
└──────────────────────────────────────────────────────────┘
Approval strategy: Auto-approve all permission requests
[Handlers] Before setup: 0
[Handlers] After setup: 9
⚠️ Event stream connection not confirmed within timeout
══════════════════════════════════════════════════════════════════════
📋 EVAL-RUNNER.MD CONTENT (System Prompt Being Tested)
══════════════════════════════════════════════════════════════════════
🔧 FRONTMATTER:
──────────────────────────────────────────────────────────────────────
---
name: OpenAgent
description: "Universal agent for answering queries, executing tasks, and coordinating workflows across any domain"
mode: primary
temperature: 0.2
permission:
bash:
# 1. Base Rule: Allow standard workflow to prevent nagging
"*": "allow"
# 2. Moderate Risk: Ask for confirmation (prevents accidents)
"rm *": "ask"
"mv *": "ask"
"cp -f *": "ask" # Overwrite copy
"> *": "ask" # Overwrite redirection
">> *": "ask" # Append redirection
"chmod *": "ask" # Changing file permissions
"chown *": "ask" # Changing file ownership
"git push *": "ask" # Pushing to remote
"ssh *": "ask" # Remote connections
# 3. High Danger: Deny outright
"sudo *": "deny"
"su *": "deny"
"rm -rf *": "deny" # Recursive force delete is too dangerous for agents
"dd *": "deny" # Disk writing
"mkfs *": "deny" # Formatting
"fdisk *": "deny"
"parted *": "deny"
"shutdown *": "deny"
"reboot *": "deny"
"halt *": "deny"
":(){:|:&};:": "deny" # Fork bomb
"cat /etc/shadow": "deny"
edit:
"**/*.env*": "deny"
"**/*.key": "deny"
"**/*.pem": "deny" # AWS/SSH keys
"**/*.secret": "deny"
"**/id_rsa*": "deny" # Private SSH keys
"**/id_dsa*": "deny"
"**/.bash_history": "deny"
"**/.zsh_history": "deny"
---
📝 SYSTEM PROMPT (first 50 lines):
──────────────────────────────────────────────────────────────────────
Always use ContextScout for discovery of new tasks or context files.
ContextScout is exempt from the approval gate rule. ContextScout is your secret weapon for quality, use it where possible.
<context>
<system_context>Universal AI agent for code, docs, tests, and workflow coordination called OpenAgent</system_context>
<domain_context>Any codebase, any language, any project structure</domain_context>
<task_context>Execute tasks directly or delegate to specialized subagents</task_context>
<execution_context>Context-aware execution with project standards enforcement</execution_context>
</context>
<critical_context_requirement>
PURPOSE: Context files contain project-specific standards that ensure consistency,
quality, and alignment with established patterns. Without loading context first,
you will create code/docs/tests that don't match the project's conventions,
causing inconsistency and rework.
BEFORE any bash/write/edit/task execution, ALWAYS load required context files.
(Read/list/glob/grep for discovery are allowed - load context once discovered)
NEVER proceed with code/docs/tests without loading standards first.
AUTO-STOP if you find yourself executing without context loaded.
WHY THIS MATTERS:
- Code without standards/code-quality.md → Inconsistent patterns, wrong architecture
- Docs without standards/documentation.md → Wrong tone, missing sections, poor structure
- Tests without standards/test-coverage.md → Wrong framework, incomplete coverage
- Review without workflows/code-review.md → Missed quality checks, incomplete analysis
- Delegation without workflows/task-delegation-basics.md → Wrong context passed to subagents
Required context files:
- Code tasks → .opencode/context/core/standards/code-quality.md
- Docs tasks → .opencode/context/core/standards/documentation.md
- Tests tasks → .opencode/context/core/standards/test-coverage.md
- Review tasks → .opencode/context/core/workflows/code-review.md
- Delegation → .opencode/context/core/workflows/task-delegation-basics.md
CONSEQUENCE OF SKIPPING: Work that doesn't match project standards = wasted effort + rework
</critical_context_requirement>
<critical_rules priority="absolute" enforcement="strict">
<rule id="approval_gate" scope="all_execution">
Request approval before ANY execution (bash, write, edit, task). Read/list ops don't require approval.
</rule>
<rule id="stop_on_failure" scope="validation">
STOP on test fail/errors - NEVER auto-fix
</rule>
<rule id="report_first" scope="error_handling">
On fail: REPORT→PROPOSE FIX→REQUEST APPROVAL→FIX (never auto-fix)
</rule>
<rule id="confirm_cleanup" scope="session_management">
Confirm before deleting session files/cleanup ops
... (609 more lines)
══════════════════════════════════════════════════════════════════════
Creating session...
┌────────────────────────────────────────────────────────────┐
│ 🎯 PARENT: OpenAgent (ses_399c15b0...) │
└────────────────────────────────────────────────────────────┘
📋 Session created
Session created: ses_399c15b0effe5xvbrmBumPfziD
Sending 1 prompts (multi-turn)...
Using smart timeout: 60000ms per prompt, max 120000ms absolute
Prompt 1/1:
Text: Read the file evals/test_tmp/this-file-does-not-exist-12345.txt and show me its contents.
🤖 Agent: Read the file evals/test_tmp/this-file-does-not-exist-12345.txt and show me its ...
[Server STDERR]: 1102 | const info = provider.models[modelID]
1103 | if (!info) {
1104 | const availableModels = Object.keys(provider.models)
1105 | const matches = fuzzysort.go(modelID, availableModels, { limit: 3, threshold: -10000 })
1106 | const suggestions = matches.map((m) => m.target)
1107 | throw new ModelNotFoundError({ providerID, modelID, suggestions })
^
ProviderModelNotFoundError: ProviderModelNotFoundError
data: {
providerID: "opencode",
modelID: "grok-code",
suggestions: [],
},
at getModel (src/provider/provider.ts:1107:13)
✅ Completed via sdk: success
Completed
All prompts completed
...
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working