This file provides instructions for AI agents working on this codebase. It also serves as a comprehensive guide for building autonomous coding agents.
# Install dependencies
pip install -e .
# Run with Term SDK
python agent.py
# Local testing with Term Challenge
git clone https://github.com/PlatformNetwork/term-challenge.git
pip install -e term-challenge/sdk/python/baseagent/
├── agent.py # Entry point for Term SDK
├── src/
│ ├── core/
│ │ ├── loop.py # Main agent loop (caching, verification)
│ │ └── compaction.py # Context management
│ ├── tools/ # Tool implementations
│ ├── prompts/
│ │ └── system.py # Codex-inspired system prompt
│ └── api/
│ └── client.py # LLM API client
├── rules/ # Agent development guidelines (READ THESE)
└── astuces/ # Practical techniques used here
Your agent must solve tasks through reasoning, not pattern matching.
# NEVER do this
if "file" in ctx.instruction:
create_file()
elif "compile" in ctx.instruction:
compile_code()# Always let LLM decide
response = llm.chat([
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": f"Task: {ctx.instruction}"}
])
execute(response.tool_calls)| Characteristic | Description |
|---|---|
| Single code path | Same logic for ALL tasks |
| LLM-driven decisions | LLM chooses actions, not if-statements |
| No task keywords | Zero references to specific task content |
| Iterative execution | Observe → Think → Act loop |
Ask yourself: "Would this code behave differently if I changed the task instruction?"
If YES and it's not because of LLM reasoning → it's hardcoding → FORBIDDEN.
def run_agent_loop(ctx: AgentContext) -> str:
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": ctx.instruction}
]
while True:
# 1. Apply caching for cost efficiency
messages = apply_caching(messages)
# 2. Manage context to prevent overflow
messages = manage_context(messages, max_tokens=180000)
# 3. Call LLM
response = ctx.llm.chat(messages, tools=TOOLS)
# 4. Check for completion
if not response.has_tool_calls():
# Inject verification before completing
if not verified:
messages.append(verification_prompt(ctx.instruction))
verified = True
continue
return response.text
# 5. Execute tools
for call in response.tool_calls:
result = execute_tool(call)
messages.append(tool_result(call.id, result))
return "Task completed"Always gather context before acting:
context = shell("pwd && ls -la")
readme = shell("cat README.md 2>/dev/null")Never try to do everything in one shot:
while not done:
response = llm.chat(messages)
result = execute(response)
messages.append(result)Always verify before completing:
if response.says_complete:
if not already_verified:
inject_verification_prompt()
continue
return complete()Cache the system prompt + last 2 messages for massive cache hits.
def apply_caching(messages):
# Cache system messages (stable)
for msg in messages:
if msg["role"] == "system":
add_cache_control(msg)
# Cache last 2 non-system messages (extends prefix)
non_system = [m for m in messages if m["role"] != "system"]
for msg in non_system[-2:]:
add_cache_control(msg)
return messagesWhy it works: Anthropic caches prefixes. Caching the last messages extends the cached prefix to include the entire conversation history.
Before completing, force the agent to verify its work:
VERIFICATION_PROMPT = f"""
STOP - Before completing, verify your work:
Original instruction: {ctx.instruction}
Checklist:
1. Re-read the instruction above
2. List ALL requirements (explicit and implicit)
3. Run commands to verify each requirement
4. Only complete after ALL verifications pass
You are in headless mode - do NOT ask questions.
"""Prevent token overflow with pruning and compaction:
def manage_context(messages, max_tokens):
current = estimate_tokens(messages)
# Stage 1: Prune old tool outputs
if current > max_tokens * 0.70:
messages = prune_tool_outputs(messages, keep_last=5)
# Stage 2: AI compaction
if current > max_tokens * 0.85:
messages = compact_with_llm(messages)
return messagesFor large tool outputs, keep start AND end:
def truncate(text, max_bytes=50000):
if len(text) <= max_bytes:
return text
keep = max_bytes // 2 - 50
return f"{text[:keep]}\n\n[...truncated...]\n\n{text[-keep:]}"Why: Start has headers, end has results/errors, middle is often repetitive.
The agent must NEVER ask questions in headless mode:
# In system prompt:
"""
You are fully autonomous:
- Do NOT ask questions - make reasonable decisions
- Do NOT wait for confirmation - just execute
- If something fails, try alternative approaches
- Only complete after verifying your work
"""Based on Codex CLI, include these sections:
You are a coding agent running in [AgentName], an autonomous terminal-based assistant.
Repos may contain AGENTS.md files with instructions. Obey them.
Before tool calls, send a brief preamble (8-12 words):
"Exploring the repo structure, then checking the API routes."
- NEVER revert changes you didn't make
- NEVER use git reset --hard or git checkout --
- Do not commit unless explicitly asked
Keep going until the task is COMPLETELY resolved.
- Make decisions autonomously
- Fix problems at root cause
- Validate your work before completing
- Be concise (10 lines max for simple tasks)
- Use backticks for code/paths
- Reference files as: src/file.py:42
| Tool | Purpose |
|---|---|
shell_command |
Execute shell commands |
read_file |
Read files with line numbers |
write_file |
Create/overwrite files |
apply_patch |
Modify files surgically |
grep_files |
Search file contents (ripgrep) |
list_dir |
List directory contents |
MAX_OUTPUT_BYTES = 50000 # 50KB per tool
MAX_OUTPUT_LINES = 500Always truncate before adding to context.
| Pattern | Why Forbidden |
|---|---|
if "keyword" in instruction |
Task-specific routing |
handlers[task_type]() |
Pre-defined handlers |
SOLUTIONS[task_hash] |
Cached solutions |
re.match(task_pattern) |
Regex task matching |
| Reading test files | Cheating |
- Not exploring first - Always gather context
- One-shot execution - Use iterative loop
- No verification - Always verify before completing
- Unbounded context - Truncate and prune
- Asking questions - Make decisions autonomously
from unittest.mock import MagicMock
from src.core.loop import run_agent_loop
ctx = MagicMock()
ctx.instruction = "Create hello.txt with 'Hello World'"
ctx.cwd = "/tmp/test"
ctx.llm = YourLLM()
result = run_agent_loop(ctx)git clone https://github.com/PlatformNetwork/term-challenge.git
cd term-challenge
pip install -e sdk/python/
# Run benchmark
python -m term_bench run --agent /path/to/baseagent --task tasks/test.yamlBefore submitting your agent:
- No keyword matching on instructions
- No task-specific handlers
- No pre-computed solutions
- Prompt caching enabled (system + last 2 messages)
- Self-verification before completion
- Context management (prune + compact)
- Tool output truncation
- Autonomous mode (no questions)
- Git hygiene rules in system prompt
- Explore-first pattern implemented
See rules/ folder for comprehensive guidelines:
- What is a generalist agent
- Architecture patterns
- Allowed vs forbidden behaviors
- Anti-patterns to avoid
- Best practices
See astuces/ folder for implementation details:
- Prompt caching technique
- Self-verification system
- Context management
- System prompt design
- Tool output handling
- Cost optimization
Building a high-performance autonomous agent requires:
- No hardcoding - LLM decides everything
- Prompt caching - 90% cost reduction
- Self-verification - Validate before completing
- Context management - Prevent overflow
- Autonomous execution - No questions, just execute
The goal is an agent that thinks, not one that pattern matches.