A high-performance autonomous coding agent built for generalist problem-solving
BaseAgent is an autonomous coding agent designed for the Term Challenge. Unlike traditional scripted automation, BaseAgent uses Large Language Models (LLMs) to reason about tasks and make decisions dynamically.
The agent receives natural language instructions and autonomously:
- Explores the codebase
- Plans and executes solutions
- Validates its own work
- Handles errors and edge cases
BaseAgent follows the Golden Rule: all decisions are made by the LLM, not by conditional logic.
# ❌ FORBIDDEN - Hardcoded task routing
if "file" in instruction:
create_file()
elif "compile" in instruction:
compile_code()
# ✅ REQUIRED - LLM-driven decisions
response = llm.chat(messages, tools=tools)
execute(response.tool_calls)Every task, regardless of complexity or domain, flows through the same agent loop:
graph LR
A[Receive Instruction] --> B[Build Context]
B --> C[LLM Decides]
C --> D[Execute Tools]
D --> E{Complete?}
E -->|No| C
E -->|Yes| F[Verify & Return]
BaseAgent never tries to solve everything in one shot. Instead, it:
- Observes the current state
- Thinks about the next step
- Acts by calling tools
- Repeats until the task is complete
Before declaring a task complete, the agent automatically:
- Re-reads the original instruction
- Lists all requirements (explicit and implicit)
- Verifies each requirement with actual commands
- Only completes if all verifications pass
graph TB
subgraph Interface["User Interface"]
CLI["python agent.py --instruction '...'"]
end
subgraph Engine["Core Engine"]
direction TB
Loop["Agent Loop<br/>(src/core/loop.py)"]
Context["Context Manager<br/>(src/core/compaction.py)"]
Prompt["System Prompt<br/>(src/prompts/system.py)"]
end
subgraph LLM["LLM Layer"]
Client["LiteLLM Client<br/>(src/llm/client.py)"]
API["Provider API<br/>(Chutes/OpenRouter)"]
end
subgraph Tools["Tool System"]
Registry["Tool Registry"]
Exec["Execution Engine"]
end
CLI --> Loop
Loop --> Context
Loop --> Prompt
Loop --> Client
Client --> API
Loop --> Registry
Registry --> Exec
style Loop fill:#4CAF50,color:#fff
style Client fill:#2196F3,color:#fff
BaseAgent runs in fully autonomous mode:
- No user confirmations required
- Makes reasonable decisions when faced with ambiguity
- Handles errors by trying alternative approaches
- Never asks questions - just executes
Achieves 90%+ cache hit rate using Anthropic's prompt caching:
- System prompt cached for stability
- Last 2 messages cached to extend prefix
- Reduces API costs by 90%
Intelligent memory management for long tasks:
- Token-based overflow detection
- Tool output pruning (protects recent outputs)
- AI-powered compaction when needed
- Middle-out truncation for large outputs
Eight specialized tools for coding tasks:
| Tool | Purpose |
|---|---|
shell_command |
Execute shell commands |
read_file |
Read files with line numbers |
write_file |
Create or overwrite files |
apply_patch |
Surgical file modifications |
grep_files |
Fast file content search |
list_dir |
Directory exploration |
view_image |
Image analysis |
update_plan |
Progress tracking |
sequenceDiagram
participant User
participant CLI as agent.py
participant Loop as Agent Loop
participant LLM as LLM (Chutes/OpenRouter)
participant Tools as Tool Registry
User->>CLI: python agent.py --instruction "..."
CLI->>Loop: Initialize session
loop Until task complete
Loop->>Loop: Manage context (prune/compact)
Loop->>Loop: Apply prompt caching
Loop->>LLM: Send messages + tools
LLM-->>Loop: Response (text + tool_calls)
alt Has tool calls
Loop->>Tools: Execute tool calls
Tools-->>Loop: Tool results
else No tool calls
Loop->>Loop: Self-verification check
end
end
Loop-->>CLI: Task complete
CLI-->>User: JSONL output
| Characteristic | Description |
|---|---|
| Single code path | Same logic handles ALL tasks |
| LLM-driven decisions | LLM chooses actions, not if-statements |
| No task keywords | Zero references to specific task content |
| Iterative execution | Observe → Think → Act loop |
Ask yourself: "Would this code behave differently if I changed the task instruction?"
If YES and it's not because of LLM reasoning → it's hardcoding → FORBIDDEN
BaseAgent is built on these principles:
- Explore First - Always gather context before acting
- Iterate - Never try to do everything in one shot
- Verify - Double-confirm before completing
- Fail Gracefully - Handle errors and retry
- Stay Focused - Complete the task, nothing more
- Installation Guide - Set up BaseAgent
- Quick Start - Run your first task
- Architecture - Deep dive into the system design