A research framework for autonomous agents with incremental planning, persistent memory, and tool use.
Cognitive Workbench is experimental research software for studying LLM-based cognitive architectures. It prioritizes inspectable agent behavior and fast iteration over stability.
The core idea: an incremental planner that interleaves reasoning with tool execution. Rather than generating a complete plan and then executing it, the planner generates one step at a time, runs it, observes the result, and decides what to do next. This tight feedback loop — combined with persistent memory, reflective quality control, and autonomous goal scheduling — produces agents that can pursue complex goals over extended periods.
User: "goal: Find recent papers on multi-agent coordination"
│
┌──────────▼──────────┐
│ Executive Node │ OODA loop: Observe → Orient → Decide → Act
│ (goal queue, │
│ scheduling) │
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Incremental Planner │ Stage 0: Retrieve context (FAISS)
│ │ Stage 1: Analyze + select tools
│ ┌───────────────┐ │ Stage 2: Generate code → Execute → Evaluate
│ │ Reason → Act │──│──────► repeat until done
│ │ ← Observe │ │
│ └───────────────┘ │ Reflect: learn from execution trace
└──────────┬──────────┘
│
┌──────────▼──────────┐
│ Infospace Executor │ Primitives + Tools
│ │ Notes + Collections + Relations
│ search-web, say, │ FAISS semantic search
│ create-note, ... │ Persistent memory
└─────────────────────┘
- Incremental Planning — the planner interleaves LLM reasoning with tool execution, adapting its approach based on real results
- Goal Scheduling — submit goals with
goal:prefix; schedule them for manual, automatic, recurring, or daily-at-time execution - Concern Model — user concerns and agent-derived concerns with activation-based triage into actionable tasks
- Envisioning & Quality Control — lightweight LLM framing for coherent dialog; post-execution reflection for failure recovery and learning
- Infospace Memory — Notes, Collections, and Relations as structured working memory with FAISS semantic search + entity-augmented retrieval
- NER & Entity Graph — automatic entity extraction from user input, goals, and notes; cognitive graph integration with entity nodes and mentions edges (explorer guide)
- Theory of Mind — persistent per-peer models of trust, competence, goals, and emotional state, updated from conversation evidence
- World Model — Bayesian cross-goal knowledge with recency-weighted evidence decay and staleness detection
- Extensible Tools — 24 built-in tools (web search, email, Bluesky, academic papers, shell scripts) plus world-specific integrations
- Sensors — autonomous data collectors (browser visit tracking, RSS feeds) that feed real-world context to the agent
- Web UI — real-time activation field visualization, chat, goal management, resource browser, and task/concern manager
- World Integrations — optional worlds (Minecraft, file system, desktop automation, ScienceWorld) with specialized tools
git clone https://github.com/bdambrosio/Cognitive_workbench.git
cd Cognitive_workbench
python3 -m venv zenoh_venv
source zenoh_venv/bin/activate
pip install -r requirements.txtThe browse tool requires the agent-browser CLI (Rust binary, not a Python package):
cargo install agent-browser # if you have Rust/cargo
# or download a prebuilt binary from https://github.com/vercel-labs/agent-browser/releasesSkip this if you don't need browser automation — all other tools work without it.
Option A — Local GPU (SGLang):
- Edit
scenarios/jill-infospace.yamland setsgl_model_pathto your preferred model. SGLang can be finicky, sorry, but use of @function makes reasoning loop so much faster. - Or
scenarios/jill-infospace-vllm.yamland setvllm_model_pathto your preferred model.
Option B — Cloud API (no GPU needed):
export OPENROUTER_API_KEY="sk-or-v1-..." # from openrouter.aiAlt Model for semantic processing: Some tools, like refine, extract-struct, filter-semantic, assess, perform complex semantic processing of text (e.g. extracting field from json). If your basic llm isn't up to the task, you can provide a heavier weight model for these to use:
alt_llm_config:
openrouter_model_path: "qwen/qwen3-235b-a22b-2507"source zenoh_venv/bin/activate
cd src
python3 launcher.py ../scenarios/jill-infospace.yaml --ui --resource-browser --task-manager
# Or for OpenRouter:
python3 launcher.py ../scenarios/jill-infospace-openrouter.yaml --ui --resource-browser --task-managerOpen http://localhost:3000 and submit a goal via the + Goal button:
Find and summarize recent papers on transformer architectures
See Getting Started for full setup details, environment variables, and troubleshooting.
The system provides four web-facing components. See the UI Guide for full details.
The default view is an interactive D3 force-directed graph centered on the agent. Nodes represent the agent, its goals, concerns, notes, and variable bindings — sized and colored by activation level. Click any node to inspect it in the side panel.
The bottom dock bar provides controls for chat, goal entry, execution control (stop, continuous, LLM toggle), and links to the other UI components.
An OODA pulse overlay shows the agent's cognitive cycle in real time — expanding colored rings indicate Observe (blue), Orient (yellow), Decide (orange), and Act (green) phases.
A text-oriented alternative with a scrollable action log, character sidebar with tabs (Plan, Bindings, Goals, Plans, State, Schedule, Tasks), and direct text input for goals and chat.
Browse, view, edit, and delete Notes and Collections — the agent's working memory. Two-panel layout with a resource list and content viewer.
Monitor the concern-to-task pipeline. The left panel shows user and derived concerns with activation levels and management controls (close, resolve, abandon, delete). The right panel shows task WIPs with approve/edit/abandon controls, scheduled goals, situation notes, and triage status.
A Chrome extension that captures page visits and feeds them to the agent via the browser-visits sensor. Install by loading the browser_extension/ directory as an unpacked extension.
- You type a message: the unified chat handler decides whether to respond conversationally, escalate to a goal (tool use needed), or dispatch a system command — all in a single LLM call
- The Executive Node queues goals and invokes the Incremental Planner
- The Planner retrieves relevant context (FAISS semantic search + entity-augmented retrieval), selects tools, then enters a generate-execute-evaluate loop:
- LLM writes a code block calling tools (
search-web,stock-price,create-note, etc.) - Executor runs it, returns structured results
- LLM evaluates: done? next step? error recovery?
- LLM writes a code block calling tools (
- Reflection analyzes the full execution trace — updates world model (recency-weighted Bayesian facts), tool insights, and cross-goal learnings
- Named entities are extracted from user input, goals, and persistent notes — building a cognitive graph of entities and mentions that improves retrieval over time
- Theory of Mind models are updated when conversations are archived (
/done,/next,/bye), tracking trust, competence, goals, and emotional state per peer - Scheduled goals can repeat daily at a set time, or auto-proceed through multi-step workflows
- Sensors (browser visits, RSS feeds) run on timers and feed real-world context back into the agent's concern model
| Scenario | World | Backend |
|---|---|---|
jill-infospace.yaml |
Core infospace | SGLang (local GPU) |
jill-infospace-openrouter.yaml |
Core infospace | OpenRouter (cloud) |
jill-infospace-anthropic.yaml |
Core infospace | Anthropic Claude |
jill-infospace-openai.yaml |
Core infospace | OpenAI |
jill-infospace-vllm.yaml |
Core infospace | vLLM (local GPU) |
jill-fs.yaml |
File system | SGLang |
jill-fs-openrouter.yaml |
File system | OpenRouter (cloud) |
jill-minecraft.yaml |
Minecraft 3D world | SGLang |
jill-osworld.yaml |
Desktop automation | SGLang |
jill-scienceworld.yaml |
Science simulation | SGLang |
jack-and-jill.yaml |
Multi-agent | SGLang |
See Configuration for details on each.
Cognitive_workbench/
├── README.md # This file
├── BACKGROUND.md # Research philosophy
├── requirements.txt # Python dependencies
├── docs/ # Detailed documentation
├── scenarios/ # Scenario YAML files + runtime data
├── browser_extension/ # Chrome extension for page visit tracking
└── src/
├── launcher.py # Entry point
├── executive_node.py # OODA loop coordinator
├── incremental_planner.py # Core planner (the heart of the system)
├── infospace_executor.py # Primitives + tool execution
├── infospace_resource_manager.py # Notes/Collections/Relations + FAISS
├── entity_index.py # NER extraction, entity index, graph integration
├── cognitive_graph.py # OODA event graph + entity/ToM nodes
├── conversation_store.py # Dialog lifecycle, archival, session backfill
├── discourse.py # Theory of Mind templates + discourse analysis
├── world_model.py # Bayesian recency-weighted knowledge
├── fastapi_action_display.py # Web UI (Activation Field + Classic)
├── resource_browser.py # Resource Browser UI
├── task_manager.py # Task & Concern Manager UI
├── goal_scheduler.py # Autonomous goal scheduling
├── concern_triage.py # Concern → task pipeline
├── derived_concern_model.py # Agent-derived concerns
├── sensor_runner.py # Sensor scheduling and execution
├── sensors/ # Sensor implementations
│ ├── browser-visits/ # Browser page visit sensor
│ └── rss-watcher/ # RSS feed monitor
├── tools/ # Core tools (search-web, run-script, etc.)
├── world-tools/ # World-specific tools (minecraft, fs, etc.)
├── static/ui/ # Activation Field frontend (HTML/JS/CSS)
├── scripts/ # Shell scripts for run-script tool
└── utils/ # Shared utilities
| Document | Description |
|---|---|
| Getting Started | Installation, credentials, LLM backend setup, first run |
| Architecture | Core cognitive architecture — incremental planner, OODA loop, infospace memory |
| UI Guide | Activation Field, Classic UI, Resource Browser, Task Manager, sensors |
| Goals & Scheduling | Goal submission (goal: prefix), scheduled goals, daily-at-time, autonomous execution |
| Envisioning & QC | Conversational envisioning, reflection, failure recovery, missing affordance monitoring |
| Tools & Primitives | Infospace primitives, tool catalog, run-script, plan tools |
| Configuration | Scenario YAML reference, available scenarios, directory structure |
| Tool Development | Creating new tools (Skill.md + tool.py) |
| Background | Research motivation and philosophy |
| Contributor Guidelines | Code style, testing, commit conventions |
See src/AGENTS.md for repository guidelines, code style, and commit conventions.
MIT License — see LICENSE.