Cognitive Workbench

A research framework for autonomous agents with incremental planning, persistent memory, and tool use.

What This Is

Cognitive Workbench is experimental research software for studying LLM-based cognitive architectures. It prioritizes inspectable agent behavior and fast iteration over stability.

The core idea: an incremental planner that interleaves reasoning with tool execution. Rather than generating a complete plan and then executing it, the planner generates one step at a time, runs it, observes the result, and decides what to do next. This tight feedback loop — combined with persistent memory, reflective quality control, and autonomous goal scheduling — produces agents that can pursue complex goals over extended periods.

User: "goal: Find recent papers on multi-agent coordination"
                    │
         ┌──────────▼──────────┐
         │   Executive Node    │  OODA loop: Observe → Orient → Decide → Act
         │   (goal queue,      │
         │    scheduling)      │
         └──────────┬──────────┘
                    │
         ┌──────────▼──────────┐
         │ Incremental Planner │  Stage 0: Retrieve context (FAISS)
         │                     │  Stage 1: Analyze + select tools
         │  ┌───────────────┐  │  Stage 2: Generate code → Execute → Evaluate
         │  │ Reason → Act  │──│──────► repeat until done
         │  │ ← Observe     │  │
         │  └───────────────┘  │  Reflect: learn from execution trace
         └──────────┬──────────┘
                    │
         ┌──────────▼──────────┐
         │ Infospace Executor   │  Primitives + Tools
         │                     │  Notes + Collections + Relations
         │  search-web, say,   │  FAISS semantic search
         │  create-note, ...   │  Persistent memory
         └─────────────────────┘

Key Features

Incremental Planning — the planner interleaves LLM reasoning with tool execution, adapting its approach based on real results
Goal Scheduling — submit goals with goal: prefix; schedule them for manual, automatic, recurring, or daily-at-time execution
Concern Model — user concerns and agent-derived concerns with activation-based triage into actionable tasks
Envisioning & Quality Control — lightweight LLM framing for coherent dialog; post-execution reflection for failure recovery and learning
Infospace Memory — Notes, Collections, and Relations as structured working memory with FAISS semantic search + entity-augmented retrieval
NER & Entity Graph — automatic entity extraction from user input, goals, and notes; cognitive graph integration with entity nodes and mentions edges (explorer guide)
Theory of Mind — persistent per-peer models of trust, competence, goals, and emotional state, updated from conversation evidence
World Model — Bayesian cross-goal knowledge with recency-weighted evidence decay and staleness detection
Extensible Tools — 24 built-in tools (web search, email, Bluesky, academic papers, shell scripts) plus world-specific integrations
Sensors — autonomous data collectors (browser visit tracking, RSS feeds) that feed real-world context to the agent
Web UI — real-time activation field visualization, chat, goal management, resource browser, and task/concern manager
World Integrations — optional worlds (Minecraft, file system, desktop automation, ScienceWorld) with specialized tools

Quick Start

1. Install

git clone https://github.com/bdambrosio/Cognitive_workbench.git
cd Cognitive_workbench
python3 -m venv zenoh_venv
source zenoh_venv/bin/activate
pip install -r requirements.txt

2. Optional: Browser automation

The browse tool requires the agent-browser CLI (Rust binary, not a Python package):

cargo install agent-browser        # if you have Rust/cargo
# or download a prebuilt binary from https://github.com/vercel-labs/agent-browser/releases

Skip this if you don't need browser automation — all other tools work without it.

3. Configure an LLM backend

Option A — Local GPU (SGLang):

Edit scenarios/jill-infospace.yaml and set sgl_model_path to your preferred model. SGLang can be finicky, sorry, but use of @function makes reasoning loop so much faster.
Or scenarios/jill-infospace-vllm.yaml and set vllm_model_path to your preferred model.

Option B — Cloud API (no GPU needed):

export OPENROUTER_API_KEY="sk-or-v1-..."   # from openrouter.ai

Alt Model for semantic processing: Some tools, like refine, extract-struct, filter-semantic, assess, perform complex semantic processing of text (e.g. extracting field from json). If your basic llm isn't up to the task, you can provide a heavier weight model for these to use:

alt_llm_config:
  openrouter_model_path: "qwen/qwen3-235b-a22b-2507"

4. Run

source zenoh_venv/bin/activate
cd src

python3 launcher.py ../scenarios/jill-infospace.yaml --ui --resource-browser --task-manager
# Or for OpenRouter:
python3 launcher.py ../scenarios/jill-infospace-openrouter.yaml --ui --resource-browser --task-manager

Open http://localhost:3000 and submit a goal via the + Goal button:

Find and summarize recent papers on transformer architectures

See Getting Started for full setup details, environment variables, and troubleshooting.

Web UI

The system provides four web-facing components. See the UI Guide for full details.

Activation Field (port 3000)

The default view is an interactive D3 force-directed graph centered on the agent. Nodes represent the agent, its goals, concerns, notes, and variable bindings — sized and colored by activation level. Click any node to inspect it in the side panel.

The bottom dock bar provides controls for chat, goal entry, execution control (stop, continuous, LLM toggle), and links to the other UI components.

An OODA pulse overlay shows the agent's cognitive cycle in real time — expanding colored rings indicate Observe (blue), Orient (yellow), Decide (orange), and Act (green) phases.

Classic UI (port 3000/classic)

A text-oriented alternative with a scrollable action log, character sidebar with tabs (Plan, Bindings, Goals, Plans, State, Schedule, Tasks), and direct text input for goals and chat.

Resource Browser (port 3001)

Browse, view, edit, and delete Notes and Collections — the agent's working memory. Two-panel layout with a resource list and content viewer.

Task & Concern Manager (port 3002)

Monitor the concern-to-task pipeline. The left panel shows user and derived concerns with activation levels and management controls (close, resolve, abandon, delete). The right panel shows task WIPs with approve/edit/abandon controls, scheduled goals, situation notes, and triage status.

Browser Extension (optional)

A Chrome extension that captures page visits and feeds them to the agent via the browser-visits sensor. Install by loading the browser_extension/ directory as an unpacked extension.

How It Works (In Brief)

You type a message: the unified chat handler decides whether to respond conversationally, escalate to a goal (tool use needed), or dispatch a system command — all in a single LLM call
The Executive Node queues goals and invokes the Incremental Planner
The Planner retrieves relevant context (FAISS semantic search + entity-augmented retrieval), selects tools, then enters a generate-execute-evaluate loop:
- LLM writes a code block calling tools (search-web, stock-price, create-note, etc.)
- Executor runs it, returns structured results
- LLM evaluates: done? next step? error recovery?
Reflection analyzes the full execution trace — updates world model (recency-weighted Bayesian facts), tool insights, and cross-goal learnings
Named entities are extracted from user input, goals, and persistent notes — building a cognitive graph of entities and mentions that improves retrieval over time
Theory of Mind models are updated when conversations are archived (/done, /next, /bye), tracking trust, competence, goals, and emotional state per peer
Scheduled goals can repeat daily at a set time, or auto-proceed through multi-step workflows
Sensors (browser visits, RSS feeds) run on timers and feed real-world context back into the agent's concern model

Available Scenarios

Scenario	World	Backend
`jill-infospace.yaml`	Core infospace	SGLang (local GPU)
`jill-infospace-openrouter.yaml`	Core infospace	OpenRouter (cloud)
`jill-infospace-anthropic.yaml`	Core infospace	Anthropic Claude
`jill-infospace-openai.yaml`	Core infospace	OpenAI
`jill-infospace-vllm.yaml`	Core infospace	vLLM (local GPU)
`jill-fs.yaml`	File system	SGLang
`jill-fs-openrouter.yaml`	File system	OpenRouter (cloud)
`jill-minecraft.yaml`	Minecraft 3D world	SGLang
`jill-osworld.yaml`	Desktop automation	SGLang
`jill-scienceworld.yaml`	Science simulation	SGLang
`jack-and-jill.yaml`	Multi-agent	SGLang

See Configuration for details on each.

Repository Structure

Cognitive_workbench/
├── README.md                          # This file
├── BACKGROUND.md                      # Research philosophy
├── requirements.txt                   # Python dependencies
├── docs/                              # Detailed documentation
├── scenarios/                         # Scenario YAML files + runtime data
├── browser_extension/                 # Chrome extension for page visit tracking
└── src/
    ├── launcher.py                    # Entry point
    ├── executive_node.py              # OODA loop coordinator
    ├── incremental_planner.py         # Core planner (the heart of the system)
    ├── infospace_executor.py           # Primitives + tool execution
    ├── infospace_resource_manager.py   # Notes/Collections/Relations + FAISS
    ├── entity_index.py                # NER extraction, entity index, graph integration
    ├── cognitive_graph.py             # OODA event graph + entity/ToM nodes
    ├── conversation_store.py          # Dialog lifecycle, archival, session backfill
    ├── discourse.py                   # Theory of Mind templates + discourse analysis
    ├── world_model.py                 # Bayesian recency-weighted knowledge
    ├── fastapi_action_display.py      # Web UI (Activation Field + Classic)
    ├── resource_browser.py            # Resource Browser UI
    ├── task_manager.py                # Task & Concern Manager UI
    ├── goal_scheduler.py              # Autonomous goal scheduling
    ├── concern_triage.py              # Concern → task pipeline
    ├── derived_concern_model.py       # Agent-derived concerns
    ├── sensor_runner.py               # Sensor scheduling and execution
    ├── sensors/                       # Sensor implementations
    │   ├── browser-visits/            # Browser page visit sensor
    │   └── rss-watcher/               # RSS feed monitor
    ├── tools/                         # Core tools (search-web, run-script, etc.)
    ├── world-tools/                   # World-specific tools (minecraft, fs, etc.)
    ├── static/ui/                     # Activation Field frontend (HTML/JS/CSS)
    ├── scripts/                       # Shell scripts for run-script tool
    └── utils/                         # Shared utilities

Documentation

Document	Description
Getting Started	Installation, credentials, LLM backend setup, first run
Architecture	Core cognitive architecture — incremental planner, OODA loop, infospace memory
UI Guide	Activation Field, Classic UI, Resource Browser, Task Manager, sensors
Goals & Scheduling	Goal submission (`goal:` prefix), scheduled goals, daily-at-time, autonomous execution
Envisioning & QC	Conversational envisioning, reflection, failure recovery, missing affordance monitoring
Tools & Primitives	Infospace primitives, tool catalog, run-script, plan tools
Configuration	Scenario YAML reference, available scenarios, directory structure
Tool Development	Creating new tools (`Skill.md` + `tool.py`)
Background	Research motivation and philosophy
Contributor Guidelines	Code style, testing, commit conventions

Contributing

See src/AGENTS.md for repository guidelines, code style, and commit conventions.

License

MIT License — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 574 Commits
.claude/projects/-home-bruce-Downloads-Cognitive-workbench/memory		.claude/projects/-home-bruce-Downloads-Cognitive-workbench/memory
.vscode		.vscode
bench		bench
browser_extension		browser_extension
docs		docs
scenarios		scenarios
src		src
tests		tests
.gitignore		.gitignore
BACKGROUND.md		BACKGROUND.md
CLAUDE.md		CLAUDE.md
README.md		README.md
plan_review_prompt.md		plan_review_prompt.md
requirements.txt		requirements.txt
run_all_tests.sh		run_all_tests.sh
test_primitives.md		test_primitives.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cognitive Workbench

What This Is

Key Features

Quick Start

1. Install

2. Optional: Browser automation

3. Configure an LLM backend

4. Run

Web UI

Activation Field (port 3000)

Classic UI (port 3000/classic)

Resource Browser (port 3001)

Task & Concern Manager (port 3002)

Browser Extension (optional)

How It Works (In Brief)

Available Scenarios

Repository Structure

Documentation

Contributing

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Cognitive Workbench

What This Is

Key Features

Quick Start

1. Install

2. Optional: Browser automation

3. Configure an LLM backend

4. Run

Web UI

Activation Field (port 3000)

Classic UI (port 3000/classic)

Resource Browser (port 3001)

Task & Concern Manager (port 3002)

Browser Extension (optional)

How It Works (In Brief)

Available Scenarios

Repository Structure

Documentation

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages