diff --git a/.github/agents/agent-workflow.agent.md b/.github/agents/agent-workflow.agent.md
new file mode 100644
index 0000000000..7d2065cc6a
--- /dev/null
+++ b/.github/agents/agent-workflow.agent.md
@@ -0,0 +1,455 @@
+---
+description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates building-agents-core, building-agents-construction, building-agents-patterns, testing-agent, and setup-credentials skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance.
+name: Agent Workflow
+tools: ['agent-builder/*', 'tools/*']
+target: vscode
+---
+
+# Agent Development Workflow
+
+Complete Standard Operating Procedure (SOP) for building production-ready goal-driven agents.
+
+## Overview
+
+This workflow orchestrates specialized skills to take you from initial concept to production-ready agent:
+
+1. **Understand Concepts** → building-agents-core (optional)
+2. **Build Structure** → building-agents-construction
+3. **Optimize Design** → building-agents-patterns (optional)
+4. **Setup Credentials** → setup-credentials (if agent uses tools requiring API keys)
+5. **Test & Validate** → testing-agent
+
+## When to Use This Workflow
+
+Use this skill when:
+- Starting a new agent from scratch
+- Unclear which skill to use first
+- Need end-to-end guidance for agent development
+- Want consistent, repeatable agent builds
+
+**Skip this workflow** if:
+- You only need to test an existing agent → use testing-agent directly
+- You know exactly which phase you're in → use specific skill directly
+
+## Quick Decision Tree
+
+```
+"Need to understand agent concepts" → building-agents-core
+"Build a new agent" → building-agents-construction
+"Optimize my agent design" → building-agents-patterns
+"Set up API keys for my agent" → setup-credentials
+"Test my agent" → testing-agent
+"Not sure what I need" → Read phases below, then decide
+"Agent has structure but needs implementation" → See agent directory STATUS.md
+```
+
+## Phase 0: Understand Concepts (Optional)
+
+**Duration**: 5-10 minutes
+**Skill**: building-agents-core
+**Input**: Questions about agent architecture
+
+### When to Use
+
+- First time building an agent
+- Need to understand node types, edges, goals
+- Want to validate tool availability
+- Learning about pause/resume architecture
+
+### What This Phase Provides
+
+- Architecture overview (Python packages, not JSON)
+- Core concepts (Goal, Node, Edge, Pause/Resume)
+- Tool discovery and validation procedures
+- Workflow overview
+
+**Skip this phase** if you already understand agent fundamentals.
+
+## Phase 1: Build Agent Structure
+
+**Duration**: 15-30 minutes
+**Skill**: building-agents-construction
+**Input**: User requirements ("Build an agent that...")
+
+### What This Phase Does
+
+Creates the complete agent architecture:
+- Package structure (`exports/agent_name/`)
+- Goal with success criteria and constraints
+- Workflow graph (nodes and edges)
+- Node specifications
+- CLI interface
+- Documentation
+
+### Process
+
+1. **Create package** - Directory structure with skeleton files
+2. **Define goal** - Success criteria and constraints written to agent.py
+3. **Design nodes** - Each node approved and written incrementally
+4. **Connect edges** - Workflow graph with conditional routing
+5. **Finalize** - Agent class, exports, and documentation
+
+### Outputs
+
+- ✅ `exports/agent_name/` package created
+- ✅ Goal defined in agent.py
+- ✅ 3-5 success criteria defined
+- ✅ 1-5 constraints defined
+- ✅ 5-10 nodes specified in nodes/__init__.py
+- ✅ 8-15 edges connecting workflow
+- ✅ Validated structure (passes `python -m agent_name validate`)
+- ✅ README.md with usage instructions
+- ✅ CLI commands (info, validate, run, shell)
+
+### Success Criteria
+
+You're ready for Phase 2 when:
+- Agent structure validates without errors
+- All nodes and edges are defined
+- CLI commands work (info, validate)
+- You see: "Agent complete: exports/agent_name/"
+
+### Common Outputs
+
+The building-agents-construction skill produces:
+```
+exports/agent_name/
+├── __init__.py          (package exports)
+├── __main__.py          (CLI interface)
+├── agent.py             (goal, graph, agent class)
+├── nodes/__init__.py    (node specifications)
+├── config.py            (configuration)
+├── implementations.py   (may be created for Python functions)
+└── README.md            (documentation)
+```
+
+### Next Steps
+
+**If structure complete and validated:**
+→ Check `exports/agent_name/STATUS.md` or `IMPLEMENTATION_GUIDE.md`
+→ These files explain implementation options
+→ You may need to add Python functions or MCP tools (not covered by current skills)
+
+**If want to optimize design:**
+→ Proceed to Phase 1.5 (building-agents-patterns)
+
+**If ready to test:**
+→ Proceed to Phase 2
+
+## Phase 1.5: Optimize Design (Optional)
+
+**Duration**: 10-15 minutes
+**Skill**: building-agents-patterns
+**Input**: Completed agent structure
+
+### When to Use
+
+- Want to add pause/resume functionality
+- Need error handling patterns
+- Want to optimize performance
+- Need examples of complex routing
+- Want best practices guidance
+
+### What This Phase Provides
+
+- Practical examples and patterns
+- Pause/resume architecture
+- Error handling strategies
+- Anti-patterns to avoid
+- Performance optimization techniques
+
+**Skip this phase** if your agent design is straightforward.
+
+## Phase 2: Test & Validate
+
+**Duration**: 20-40 minutes
+**Skill**: testing-agent
+**Input**: Working agent from Phase 1
+
+### What This Phase Does
+
+Creates comprehensive test suite:
+- Constraint tests (verify hard requirements)
+- Success criteria tests (measure goal achievement)
+- Edge case tests (handle failures gracefully)
+- Integration tests (end-to-end workflows)
+
+### Process
+
+1. **Analyze agent** - Read goal, constraints, success criteria
+2. **Generate tests** - Create pytest files in `exports/agent_name/tests/`
+3. **User approval** - Review and approve each test
+4. **Run evaluation** - Execute tests and collect results
+5. **Debug failures** - Identify and fix issues
+6. **Iterate** - Repeat until all tests pass
+
+### Outputs
+
+- ✅ Test files in `exports/agent_name/tests/`
+- ✅ Test report with pass/fail metrics
+- ✅ Coverage of all success criteria
+- ✅ Coverage of all constraints
+- ✅ Edge case handling verified
+
+### Success Criteria
+
+You're done when:
+- All tests pass
+- All success criteria validated
+- All constraints verified
+- Agent handles edge cases
+- Test coverage is comprehensive
+
+### Next Steps
+
+**Agent ready for:**
+- Production deployment
+- Integration into larger systems
+- Documentation and handoff
+- Continuous monitoring
+
+## Phase Transitions
+
+### From Phase 1 to Phase 2
+
+**Trigger signals:**
+- "Agent complete: exports/..."
+- Structure validation passes
+- README indicates implementation complete
+
+**Before proceeding:**
+- Verify agent can be imported: `from exports.agent_name import default_agent`
+- Check if implementation is needed (see STATUS.md or IMPLEMENTATION_GUIDE.md)
+- Confirm agent executes without import errors
+
+### Skipping Phases
+
+**When to skip Phase 1:**
+- Agent structure already exists
+- Only need to add tests
+- Modifying existing agent
+
+**When to skip Phase 2:**
+- Prototyping or exploring
+- Agent not production-bound
+- Manual testing sufficient
+
+## Common Patterns
+
+### Pattern 1: Complete New Build (Simple)
+
+```
+User: "Build an agent that monitors files"
+→ Use building-agents-construction
+→ Agent structure created
+→ Use testing-agent
+→ Tests created and passing
+→ Done: Production-ready agent
+```
+
+### Pattern 1b: Complete New Build (With Learning)
+
+```
+User: "Build an agent (first time)"
+→ Use building-agents-core (understand concepts)
+→ Use building-agents-construction (build structure)
+→ Use building-agents-patterns (optimize design)
+→ Use testing-agent (validate)
+→ Done: Production-ready agent
+```
+
+### Pattern 2: Test Existing Agent
+
+```
+User: "Test my agent at exports/my_agent"
+→ Skip Phase 1
+→ Use testing-agent directly
+→ Tests created
+→ Done: Validated agent
+```
+
+### Pattern 3: Iterative Development
+
+```
+User: "Build an agent"
+→ Use building-agents-construction (Phase 1)
+→ Implementation needed (see STATUS.md)
+→ [User implements functions]
+→ Use testing-agent (Phase 2)
+→ Tests reveal bugs
+→ [Fix bugs manually]
+→ Re-run tests
+→ Done: Working agent
+```
+
+### Pattern 4: Complex Agent with Patterns
+
+```
+User: "Build an agent with multi-turn conversations"
+→ Use building-agents-core (learn pause/resume)
+→ Use building-agents-construction (build structure)
+→ Use building-agents-patterns (implement pause/resume pattern)
+→ Use testing-agent (validate conversation flows)
+→ Done: Complex conversational agent
+```
+
+## Skill Dependencies
+
+```
+agent-workflow (meta-skill)
+    │
+    ├── building-agents-core (foundational)
+    │   ├── Architecture concepts
+    │   ├── Node/Edge/Goal definitions
+    │   ├── Tool discovery procedures
+    │   └── Workflow overview
+    │
+    ├── building-agents-construction (procedural)
+    │   ├── Creates package structure
+    │   ├── Defines goal
+    │   ├── Adds nodes incrementally
+    │   ├── Connects edges
+    │   ├── Finalizes agent class
+    │   └── Requires: building-agents-core
+    │
+    ├── building-agents-patterns (reference)
+    │   ├── Best practices
+    │   ├── Pause/resume patterns
+    │   ├── Error handling
+    │   ├── Anti-patterns
+    │   └── Performance optimization
+    │
+    └── testing-agent
+        ├── Reads agent goal
+        ├── Generates tests
+        ├── Runs evaluation
+        └── Reports results
+```
+
+## Troubleshooting
+
+### "Agent structure won't validate"
+
+- Check node IDs match between nodes/__init__.py and agent.py
+- Verify all edges reference valid node IDs
+- Ensure entry_node exists in nodes list
+- Run: `PYTHONPATH=core:exports python -m agent_name validate`
+
+### "Agent has structure but won't run"
+
+- Check for STATUS.md or IMPLEMENTATION_GUIDE.md in agent directory
+- Implementation may be needed (Python functions or MCP tools)
+- This is expected - building-agents-construction creates structure, not implementation
+- See implementation guide for completion options
+
+### "Tests are failing"
+
+- Review test output for specific failures
+- Check agent goal and success criteria
+- Verify constraints are met
+- Use testing-agent to debug and iterate
+- Fix agent code and re-run tests
+
+### "Not sure which phase I'm in"
+
+Run these checks:
+
+```bash
+# Check if agent structure exists
+ls exports/my_agent/agent.py
+
+# Check if it validates
+PYTHONPATH=core:exports python -m my_agent validate
+
+# Check if tests exist
+ls exports/my_agent/tests/
+
+# If structure exists and validates → Phase 2 (testing)
+# If structure doesn't exist → Phase 1 (building)
+# If tests exist but failing → Debug phase
+```
+
+## Best Practices
+
+### For Phase 1 (Building)
+
+1. **Start with clear requirements** - Know what the agent should do
+2. **Define success criteria early** - Measurable goals drive design
+3. **Keep nodes focused** - One responsibility per node
+4. **Use descriptive names** - Node IDs should explain purpose
+5. **Validate incrementally** - Check structure after each major addition
+
+### For Phase 2 (Testing)
+
+1. **Test constraints first** - Hard requirements must pass
+2. **Mock external dependencies** - Use mock mode for LLMs/APIs
+3. **Cover edge cases** - Test failures, not just success paths
+4. **Iterate quickly** - Fix one test at a time
+5. **Document test patterns** - Future tests follow same structure
+
+### General Workflow
+
+1. **Use version control** - Git commit after each phase
+2. **Document decisions** - Update README with changes
+3. **Keep iterations small** - Build → Test → Fix → Repeat
+4. **Preserve working states** - Tag successful iterations
+5. **Learn from failures** - Failed tests reveal design issues
+
+## Exit Criteria
+
+You're done with the workflow when:
+
+✅ Agent structure validates
+✅ All tests pass
+✅ Success criteria met
+✅ Constraints verified
+✅ Documentation complete
+✅ Agent ready for deployment
+
+## Additional Resources
+
+- **building-agents-core**: See `.claude/skills/building-agents-core/SKILL.md`
+- **building-agents-construction**: See `.claude/skills/building-agents-construction/SKILL.md`
+- **building-agents-patterns**: See `.claude/skills/building-agents-patterns/SKILL.md`
+- **testing-agent**: See `.claude/skills/testing-agent/SKILL.md`
+- **Agent framework docs**: See `core/README.md`
+- **Example agents**: See `exports/` directory
+
+## Summary
+
+This workflow provides a proven path from concept to production-ready agent:
+
+1. **Learn** with building-agents-core → Understand fundamentals (optional)
+2. **Build** with building-agents-construction → Get validated structure
+3. **Optimize** with building-agents-patterns → Apply best practices (optional)
+4. **Test** with testing-agent → Get verified functionality
+
+The workflow is **flexible** - skip phases as needed, iterate freely, and adapt to your specific requirements. The goal is **production-ready agents** built with **consistent, repeatable processes**.
+
+## Skill Selection Guide
+
+**Choose building-agents-core when:**
+- First time building agents
+- Need to understand architecture
+- Validating tool availability
+- Learning about node types and edges
+
+**Choose building-agents-construction when:**
+- Actually building an agent
+- Have clear requirements
+- Ready to write code
+- Want step-by-step guidance
+
+**Choose building-agents-patterns when:**
+- Agent structure complete
+- Need advanced patterns
+- Implementing pause/resume
+- Optimizing performance
+- Want best practices
+
+**Choose testing-agent when:**
+- Agent structure complete
+- Ready to validate functionality
+- Need comprehensive test coverage
+- Debugging agent behavior
+- Debugging agent behavior
diff --git a/.github/agents/building-agents-construction.agent.md b/.github/agents/building-agents-construction.agent.md
new file mode 100644
index 0000000000..a11ab9e928
--- /dev/null
+++ b/.github/agents/building-agents-construction.agent.md
@@ -0,0 +1,356 @@
+---
+description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent.
+name: Building Agents - Construction
+tools: ['agent-builder/*', 'tools/*']
+target: vscode
+---
+
+# Agent Construction - Step-by-Step Guide
+
+**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.**
+
+When this skill is loaded, IMMEDIATELY begin executing Step 1. Do not explain what you will do - just do it.
+
+---
+
+## STEP 1: Initialize Build Environment
+
+**EXECUTE THESE TOOL CALLS NOW:**
+
+1. Register the hive-tools MCP server:
+
+```
+mcp__agent-builder__add_mcp_server(
+    name="hive-tools",
+    transport="stdio",
+    command="python",
+    args='["mcp_server.py", "--stdio"]',
+    cwd="tools",
+    description="Hive tools MCP server"
+)
+```
+
+2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case):
+
+```
+mcp__agent-builder__create_session(name="AGENT_NAME")
+```
+
+3. Discover available tools:
+
+```
+mcp__agent-builder__list_mcp_tools()
+```
+
+4. Create the package directory:
+
+```
+mkdir -p exports/AGENT_NAME/nodes
+```
+
+**AFTER completing these calls**, tell the user:
+
+> ✅ Build environment initialized
+>
+> - Session created
+> - Available tools: [list the tools from step 3]
+>
+> Proceeding to define the agent goal...
+
+**THEN immediately proceed to STEP 2.**
+
+---
+
+## STEP 2: Define and Approve Goal
+
+**PROPOSE a goal to the user.** Based on what they asked for, propose:
+
+- Goal ID (kebab-case)
+- Goal name
+- Goal description
+- 3-5 success criteria (each with: id, description, metric, target, weight)
+- 2-4 constraints (each with: id, description, constraint_type, category)
+
+**FORMAT your proposal as a clear summary, then ask for approval:**
+
+> **Proposed Goal: [Name]**
+>
+> [Description]
+>
+> **Success Criteria:**
+>
+> 1. [criterion 1]
+> 2. [criterion 2]
+>    ...
+>
+> **Constraints:**
+>
+> 1. [constraint 1]
+> 2. [constraint 2]
+>    ...
+
+**THEN call AskUserQuestion:**
+
+```
+AskUserQuestion(questions=[{
+    "question": "Do you approve this goal definition?",
+    "header": "Goal",
+    "options": [
+        {"label": "Approve", "description": "Goal looks good, proceed"},
+        {"label": "Modify", "description": "I want to change something"}
+    ],
+    "multiSelect": false
+}])
+```
+
+**WAIT for user response.**
+
+- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3
+- If **Modify**: Ask what they want to change, update proposal, ask again
+
+---
+
+## STEP 3: Design Node Workflow
+
+**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist.
+
+**DESIGN the workflow** as a series of nodes. For each node, determine:
+
+- node_id (kebab-case)
+- name
+- description
+- node_type: `"llm_generate"` (no tools) or `"llm_tool_use"` (uses tools)
+- input_keys (what data this node receives)
+- output_keys (what data this node produces)
+- tools (ONLY tools that exist - empty list for llm_generate)
+- system_prompt
+
+**PRESENT the workflow to the user:**
+
+> **Proposed Workflow: [N] nodes**
+>
+> 1. **[node-id]** - [description]
+>
+>    - Type: [llm_generate/llm_tool_use]
+>    - Input: [keys]
+>    - Output: [keys]
+>    - Tools: [tools or "none"]
+>
+> 2. **[node-id]** - [description]
+>    ...
+>
+> **Flow:** node1 → node2 → node3 → ...
+
+**THEN call AskUserQuestion:**
+
+```
+AskUserQuestion(questions=[{
+    "question": "Do you approve this workflow design?",
+    "header": "Workflow",
+    "options": [
+        {"label": "Approve", "description": "Workflow looks good, proceed to build nodes"},
+        {"label": "Modify", "description": "I want to change the workflow"}
+    ],
+    "multiSelect": false
+}])
+```
+
+**WAIT for user response.**
+
+- If **Approve**: Proceed to STEP 4
+- If **Modify**: Ask what they want to change, update design, ask again
+
+---
+
+## STEP 4: Build Nodes One by One
+
+**FOR EACH node in the approved workflow:**
+
+1. **Call** `mcp__agent-builder__add_node(...)` with the node details
+
+   - input_keys and output_keys must be JSON strings: `'["key1", "key2"]'`
+   - tools must be a JSON string: `'["tool1"]'` or `'[]'`
+
+2. **Call** `mcp__agent-builder__test_node(...)` to validate:
+
+```
+mcp__agent-builder__test_node(
+    node_id="the-node-id",
+    test_input='{"key": "test value"}',
+    mock_llm_response='{"output_key": "test output"}'
+)
+```
+
+3. **Check result:**
+
+   - If valid: Tell user "✅ Node [id] validated" and continue to next node
+   - If invalid: Show errors, fix the node, re-validate
+
+4. **Show progress** after each node:
+
+```
+mcp__agent-builder__get_session_status()
+```
+
+> ✅ Node [X] of [Y] complete: [node-id]
+
+**AFTER all nodes are added and validated**, proceed to STEP 5.
+
+---
+
+## STEP 5: Connect Edges
+
+**DETERMINE the edges** based on the workflow flow. For each connection:
+
+- edge_id (kebab-case)
+- source (node that outputs)
+- target (node that receives)
+- condition: `"on_success"`, `"always"`, `"on_failure"`, or `"conditional"`
+- condition_expr (Python expression, only if conditional)
+- priority (integer, lower = higher priority)
+
+**FOR EACH edge, call:**
+
+```
+mcp__agent-builder__add_edge(
+    edge_id="source-to-target",
+    source="source-node-id",
+    target="target-node-id",
+    condition="on_success",
+    condition_expr="",
+    priority=1
+)
+```
+
+**AFTER all edges are added, validate the graph:**
+
+```
+mcp__agent-builder__validate_graph()
+```
+
+- If valid: Tell user "✅ Graph structure validated" and proceed to STEP 6
+- If invalid: Show errors, fix edges, re-validate
+
+---
+
+## STEP 6: Generate Agent Package
+
+**EXPORT the graph data:**
+
+```
+mcp__agent-builder__export_graph()
+```
+
+This returns JSON with all the goal, nodes, edges, and MCP server configurations.
+
+**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`:
+
+1. `config.py` - Runtime configuration with model settings
+2. `nodes/__init__.py` - All NodeSpec definitions
+3. `agent.py` - Goal, edges, graph config, and agent class
+4. `__init__.py` - Package exports
+5. `__main__.py` - CLI interface
+6. `mcp_servers.json` - MCP server configurations
+7. `README.md` - Usage documentation
+
+**IMPORTANT entry_points format:**
+
+- MUST be: `{"start": "first-node-id"}`
+- NOT: `{"first-node-id": ["input_keys"]}` (WRONG)
+- NOT: `{"first-node-id"}` (WRONG - this is a set)
+
+**Use the example agent** at `.claude/skills/building-agents-construction/examples/online_research_agent/` as a template for file structure and patterns.
+
+**AFTER writing all files, tell the user:**
+
+> ✅ Agent package created: `exports/AGENT_NAME/`
+>
+> **Files generated:**
+>
+> - `__init__.py` - Package exports
+> - `agent.py` - Goal, nodes, edges, agent class
+> - `config.py` - Runtime configuration
+> - `__main__.py` - CLI interface
+> - `nodes/__init__.py` - Node definitions
+> - `mcp_servers.json` - MCP server config
+> - `README.md` - Usage documentation
+>
+> **Test your agent:**
+>
+> ```bash
+> cd /home/timothy/oss/hive
+> PYTHONPATH=core:exports python -m AGENT_NAME validate
+> PYTHONPATH=core:exports python -m AGENT_NAME info
+> ```
+
+---
+
+## STEP 7: Verify and Test
+
+**RUN validation:**
+
+```bash
+cd /home/timothy/oss/hive && PYTHONPATH=core:exports python -m AGENT_NAME validate
+```
+
+- If valid: Agent is complete!
+- If errors: Fix the issues and re-run
+
+**SHOW final session summary:**
+
+```
+mcp__agent-builder__get_session_status()
+```
+
+**TELL the user the agent is ready** and suggest next steps:
+
+- Run with mock mode to test without API calls
+- Use testing-agent skill for comprehensive testing
+- Use setup-credentials if the agent needs API keys
+
+---
+
+## REFERENCE: Node Types
+
+| Type           | tools param            | Use when                                       |
+| -------------- | ---------------------- | ---------------------------------------------- |
+| `llm_generate` | `'[]'`                 | Pure reasoning, JSON output, no external calls |
+| `llm_tool_use` | `'["tool1", "tool2"]'` | Needs to call MCP tools                        |
+
+---
+
+## REFERENCE: Edge Conditions
+
+| Condition     | When edge is followed                 |
+| ------------- | ------------------------------------- |
+| `on_success`  | Source node completed successfully    |
+| `on_failure`  | Source node failed                    |
+| `always`      | Always, regardless of success/failure |
+| `conditional` | When condition_expr evaluates to True |
+
+---
+
+## REFERENCE: System Prompt Best Practice
+
+For nodes with JSON output, include this in the system_prompt:
+
+```
+CRITICAL: Return ONLY raw JSON. NO markdown, NO code blocks.
+Just the JSON object starting with { and ending with }.
+
+Return this exact structure:
+{
+  "key1": "...",
+  "key2": "..."
+}
+```
+
+---
+
+## COMMON MISTAKES TO AVOID
+
+1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first
+2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list
+3. **Skipping validation** - Always validate nodes and graph before proceeding
+4. **Not waiting for approval** - Always ask user before major steps
+5. **Displaying this file** - Execute the steps, don't show documentation
diff --git a/.github/agents/building-agents-core.agent.md b/.github/agents/building-agents-core.agent.md
new file mode 100644
index 0000000000..561ea137bb
--- /dev/null
+++ b/.github/agents/building-agents-core.agent.md
@@ -0,0 +1,299 @@
+---
+description: Core concepts for goal-driven agents - architecture, node types, tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals.
+name: Building Agents - Core Concepts
+tools: ['agent-builder/*', 'tools/*']
+target: vscode
+---
+
+# Building Agents - Core Concepts
+
+Foundational knowledge for building goal-driven agents as Python packages.
+
+## Architecture: Python Services (Not JSON Configs)
+
+Agents are built as Python packages:
+
+```
+exports/my_agent/
+├── __init__.py          # Package exports
+├── __main__.py          # CLI (run, info, validate, shell)
+├── agent.py             # Graph construction (goal, edges, agent class)
+├── nodes/__init__.py    # Node definitions (NodeSpec)
+├── config.py            # Runtime config
+└── README.md            # Documentation
+```
+
+**Key Principle: Agent is visible and editable during build**
+
+- ✅ Files created immediately as components are approved
+- ✅ User can watch files grow in their editor
+- ✅ No session state - just direct file writes
+- ✅ No "export" step - agent is ready when build completes
+
+## Core Concepts
+
+### Goal
+
+Success criteria and constraints (written to agent.py)
+
+```python
+goal = Goal(
+    id="research-goal",
+    name="Technical Research Agent",
+    description="Research technical topics thoroughly",
+    success_criteria=[
+        SuccessCriterion(
+            id="completeness",
+            description="Cover all aspects of topic",
+            metric="coverage_score",
+            target=">=0.9",
+            weight=0.4,
+        ),
+        # 3-5 success criteria total
+    ],
+    constraints=[
+        Constraint(
+            id="accuracy",
+            description="All information must be verified",
+            constraint_type="hard",
+            category="quality",
+        ),
+        # 1-5 constraints total
+    ],
+)
+```
+
+### Node
+
+Unit of work (written to nodes/__init__.py)
+
+**Node Types:**
+
+- `llm_generate` - Text generation, parsing
+- `llm_tool_use` - Actions requiring tools
+- `router` - Conditional branching
+- `function` - Deterministic operations
+
+```python
+search_node = NodeSpec(
+    id="search-web",
+    name="Search Web",
+    description="Search for information online",
+    node_type="llm_tool_use",
+    input_keys=["query"],
+    output_keys=["search_results"],
+    system_prompt="Search the web for: {query}",
+    tools=["web_search"],
+    max_retries=3,
+)
+```
+
+### Edge
+
+Connection between nodes (written to agent.py)
+
+**Edge Conditions:**
+
+- `on_success` - Proceed if node succeeds
+- `on_failure` - Handle errors
+- `always` - Always proceed
+- `conditional` - Based on expression
+
+```python
+EdgeSpec(
+    id="search-to-analyze",
+    source="search-web",
+    target="analyze-results",
+    condition=EdgeCondition.ON_SUCCESS,
+    priority=1,
+)
+```
+
+### Pause/Resume
+
+Multi-turn conversations
+
+- **Pause nodes** - Stop execution, wait for user input
+- **Resume entry points** - Continue from pause with user's response
+
+```python
+# Example pause/resume configuration
+pause_nodes = ["request-clarification"]
+entry_points = {
+    "start": "analyze-request",
+    "request-clarification_resume": "process-clarification"
+}
+```
+
+## Tool Discovery & Validation
+
+**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist.
+
+Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically.
+
+### Step 1: Register MCP Server (if not already done)
+
+```python
+mcp__agent-builder__add_mcp_server(
+    name="tools",
+    transport="stdio",
+    command="python",
+    args='["mcp_server.py", "--stdio"]',
+    cwd="../tools"
+)
+```
+
+### Step 2: Discover Available Tools
+
+```python
+# List all tools from all registered servers
+mcp__agent-builder__list_mcp_tools()
+
+# Or list tools from a specific server
+mcp__agent-builder__list_mcp_tools(server_name="tools")
+```
+
+This returns available tools with their descriptions and parameters:
+
+```json
+{
+  "success": true,
+  "tools_by_server": {
+    "tools": [
+      {
+        "name": "web_search",
+        "description": "Search the web...",
+        "parameters": ["query"]
+      },
+      {
+        "name": "web_scrape",
+        "description": "Scrape a URL...",
+        "parameters": ["url"]
+      }
+    ]
+  },
+  "total_tools": 14
+}
+```
+
+### Step 3: Validate Before Adding Nodes
+
+Before writing a node with `tools=[...]`:
+
+1. Call `list_mcp_tools()` to get available tools
+2. Check each tool in your node exists in the response
+3. If a tool doesn't exist:
+   - **DO NOT proceed** with the node
+   - Inform the user: "The tool 'X' is not available. Available tools are: ..."
+   - Ask if they want to use an alternative or proceed without the tool
+
+### Tool Validation Anti-Patterns
+
+❌ **Never assume a tool exists** - always call `list_mcp_tools()` first
+❌ **Never write a node with unverified tools** - validate before writing
+❌ **Never silently drop tools** - if a tool doesn't exist, inform the user
+❌ **Never guess tool names** - use exact names from discovery response
+
+### Example Validation Flow
+
+```python
+# 1. User requests: "Add a node that searches the web"
+# 2. Discover available tools
+tools_response = mcp__agent-builder__list_mcp_tools()
+
+# 3. Check if web_search exists
+available = [t["name"] for tools in tools_response["tools_by_server"].values() for t in tools]
+if "web_search" not in available:
+    # Inform user and ask how to proceed
+    print("❌ 'web_search' not available. Available tools:", available)
+else:
+    # Proceed with node creation
+    # ...
+```
+
+## Workflow Overview: Incremental File Construction
+
+```
+1. CREATE PACKAGE → mkdir + write skeletons
+2. DEFINE GOAL → Write to agent.py + config.py
+3. FOR EACH NODE:
+   - Propose design
+   - User approves
+   - Write to nodes/__init__.py IMMEDIATELY ← FILE WRITTEN
+   - (Optional) Validate with test_node ← MCP VALIDATION
+   - User can open file and see it
+4. CONNECT EDGES → Update agent.py ← FILE WRITTEN
+   - (Optional) Validate with validate_graph ← MCP VALIDATION
+5. FINALIZE → Write agent class to agent.py ← FILE WRITTEN
+6. DONE - Agent ready at exports/my_agent/
+```
+
+**Files written immediately. MCP tools optional for validation/testing.**
+
+### The Key Difference
+
+**OLD (Bad):**
+
+```
+MCP add_node → Session State → MCP add_node → Session State → ...
+                                                                ↓
+                                                     MCP export_graph
+                                                                ↓
+                                                       Files appear
+```
+
+**NEW (Good):**
+
+```
+Write node to file → (Optional: MCP test_node) → Write node to file → ...
+       ↓                                               ↓
+  File visible                                    File visible
+  immediately                                     immediately
+```
+
+**Bottom line:** Use Write/Edit for construction, MCP for validation if needed.
+
+## When to Use This Skill
+
+Use building-agents-core when:
+- Starting a new agent project and need to understand fundamentals
+- Need to understand agent architecture before building
+- Want to validate tool availability before proceeding
+- Learning about node types, edges, and graph execution
+
+**Next Steps:**
+- Ready to build? → Use building-agents-construction skill
+- Need patterns and examples? → Use building-agents-patterns skill
+
+## MCP Tools for Validation
+
+After writing files, optionally use MCP tools for validation:
+
+**test_node** - Validate node configuration with mock inputs
+```python
+mcp__agent-builder__test_node(
+    node_id="search-web",
+    test_input='{"query": "test query"}',
+    mock_llm_response='{"results": "mock output"}'
+)
+```
+
+**validate_graph** - Check graph structure
+```python
+mcp__agent-builder__validate_graph()
+# Returns: unreachable nodes, missing connections, etc.
+```
+
+**create_session** - Track session state for bookkeeping
+```python
+mcp__agent-builder__create_session(session_name="my-build")
+```
+
+**Key Point:** Files are written FIRST. MCP tools are for validation only.
+
+## Related Skills
+
+- **building-agents-construction** - Step-by-step building process
+- **building-agents-patterns** - Best practices and examples
+- **agent-workflow** - Complete workflow orchestrator
+- **testing-agent** - Test and validate completed agents
\ No newline at end of file
diff --git a/.github/agents/building-agents-patterns.agent.md b/.github/agents/building-agents-patterns.agent.md
new file mode 100644
index 0000000000..37393eb675
--- /dev/null
+++ b/.github/agents/building-agents-patterns.agent.md
@@ -0,0 +1,494 @@
+---
+description: Best practices, patterns, and examples for building goal-driven agents. Includes pause/resume architecture, hybrid workflows, anti-patterns, and handoff to testing. Use when optimizing agent design.
+name: Building Agents - Patterns & Best Practices
+tools: ['agent-builder/*', 'tools/*']
+target: vscode
+---
+
+# Building Agents - Patterns & Best Practices
+
+Design patterns, examples, and best practices for building robust goal-driven agents.
+
+**Prerequisites:** Complete agent structure using building-agents-construction.
+
+## Practical Example: Hybrid Workflow
+
+How to build a node using both direct file writes and optional MCP validation:
+
+```python
+# 1. WRITE TO FILE FIRST (Primary - makes it visible)
+node_code = '''
+search_node = NodeSpec(
+    id="search-web",
+    node_type="llm_tool_use",
+    input_keys=["query"],
+    output_keys=["search_results"],
+    system_prompt="Search the web for: {query}",
+    tools=["web_search"],
+)
+'''
+
+Edit(
+    file_path="exports/research_agent/nodes/__init__.py",
+    old_string="# Nodes will be added here",
+    new_string=node_code
+)
+
+print("✅ Added search_node to nodes/__init__.py")
+print("📁 Open exports/research_agent/nodes/__init__.py to see it!")
+
+# 2. OPTIONALLY VALIDATE WITH MCP (Secondary - bookkeeping)
+validation = mcp__agent-builder__test_node(
+    node_id="search-web",
+    test_input='{"query": "python tutorials"}',
+    mock_llm_response='{"search_results": [...mock results...]}'
+)
+
+print(f"✓ Validation: {validation['success']}")
+```
+
+**User experience:**
+- Immediately sees node in their editor (from step 1)
+- Gets validation feedback (from step 2)
+- Can edit the file directly if needed
+
+This combines visibility (files) with validation (MCP tools).
+
+## Pause/Resume Architecture
+
+For agents needing multi-turn conversations with user interaction:
+
+### Basic Pause/Resume Flow
+
+```python
+# Define pause nodes - execution stops at these nodes
+pause_nodes = ["request-clarification", "await-approval"]
+
+# Define entry points - where to resume from each pause
+entry_points = {
+    "start": "analyze-request",  # Initial entry
+    "request-clarification_resume": "process-clarification",
+    "await-approval_resume": "execute-action",
+}
+```
+
+### Example: Multi-Turn Research Agent
+
+```python
+# Nodes
+nodes = [
+    NodeSpec(id="analyze-request", ...),
+    NodeSpec(id="request-clarification", ...),  # PAUSE NODE
+    NodeSpec(id="process-clarification", ...),
+    NodeSpec(id="generate-results", ...),
+    NodeSpec(id="await-approval", ...),  # PAUSE NODE
+    NodeSpec(id="execute-action", ...),
+]
+
+# Edges with resume flows
+edges = [
+    EdgeSpec(
+        id="analyze-to-clarify",
+        source="analyze-request",
+        target="request-clarification",
+        condition=EdgeCondition.CONDITIONAL,
+        condition_expr="needs_clarification == true",
+    ),
+    # When resumed, goes to process-clarification
+    EdgeSpec(
+        id="clarify-to-process",
+        source="request-clarification",
+        target="process-clarification",
+        condition=EdgeCondition.ALWAYS,
+    ),
+    EdgeSpec(
+        id="results-to-approval",
+        source="generate-results",
+        target="await-approval",
+        condition=EdgeCondition.ALWAYS,
+    ),
+    # When resumed, goes to execute-action
+    EdgeSpec(
+        id="approval-to-execute",
+        source="await-approval",
+        target="execute-action",
+        condition=EdgeCondition.ALWAYS,
+    ),
+]
+
+# Configuration
+pause_nodes = ["request-clarification", "await-approval"]
+entry_points = {
+    "start": "analyze-request",
+    "request-clarification_resume": "process-clarification",
+    "await-approval_resume": "execute-action",
+}
+```
+
+### Running Pause/Resume Agents
+
+```python
+# Initial run - will pause at first pause node
+result1 = await agent.run(
+    context={"query": "research topic"},
+    session_state=None
+)
+
+# Check if paused
+if result1.paused_at:
+    print(f"Paused at: {result1.paused_at}")
+
+    # Resume with user input
+    result2 = await agent.run(
+        context={"user_response": "clarification details"},
+        session_state=result1.session_state
+    )
+```
+
+## Anti-Patterns
+
+### What NOT to Do
+
+❌ **Don't rely on `export_graph`** - Write files immediately, not at end
+
+```python
+# BAD: Building in session state, exporting at end
+mcp__agent-builder__add_node(...)
+mcp__agent-builder__add_node(...)
+mcp__agent-builder__export_graph()  # Files appear only now
+
+# GOOD: Writing files immediately
+Write(file_path="...", content=node_code)  # File visible now
+Write(file_path="...", content=node_code)  # File visible now
+```
+
+❌ **Don't hide code in session** - Write to files as components approved
+
+```python
+# BAD: Accumulating changes invisibly
+session.add_component(component1)
+session.add_component(component2)
+# User can't see anything yet
+
+# GOOD: Incremental visibility
+Edit(file_path="...", ...)  # User sees change 1
+Edit(file_path="...", ...)  # User sees change 2
+```
+
+❌ **Don't wait to write files** - Agent visible from first step
+
+```python
+# BAD: Building everything before writing
+design_all_nodes()
+design_all_edges()
+write_everything_at_once()
+
+# GOOD: Write as you go
+write_package_structure()  # Visible
+write_goal()  # Visible
+write_node_1()  # Visible
+write_node_2()  # Visible
+```
+
+❌ **Don't batch everything** - Write incrementally
+
+```python
+# BAD: Batching all nodes
+nodes = [design_node_1(), design_node_2(), ...]
+write_all_nodes(nodes)
+
+# GOOD: One at a time with user feedback
+write_node_1()  # User approves
+write_node_2()  # User approves
+write_node_3()  # User approves
+```
+
+### MCP Tools - Correct Usage
+
+**MCP tools OK for:**
+✅ `test_node` - Validate node configuration with mock inputs
+✅ `validate_graph` - Check graph structure
+✅ `create_session` - Track session state for bookkeeping
+✅ Other validation tools
+
+**Just don't:** Use MCP as the primary construction method or rely on export_graph
+
+## Best Practices
+
+### 1. Show Progress After Each Write
+
+```python
+print("✅ Added analyze_request_node to nodes/__init__.py")
+print("📊 Progress: 1/6 nodes added")
+print("📁 Open exports/my_agent/nodes/__init__.py to see it!")
+```
+
+### 2. Let User Open Files During Build
+
+```python
+print("✅ Goal written to agent.py")
+print("")
+print("💡 Tip: Open exports/my_agent/agent.py in your editor to see the goal!")
+```
+
+### 3. Write Incrementally - One Component at a Time
+
+```python
+# Good flow
+write_package_structure()
+show_user("Package created")
+
+write_goal()
+show_user("Goal written")
+
+for node in nodes:
+    get_approval(node)
+    write_node(node)
+    show_user(f"Node {node.id} written")
+```
+
+### 4. Test As You Build
+
+```python
+# After adding several nodes
+print("💡 You can test current state with:")
+print("  PYTHONPATH=core:exports python -m my_agent validate")
+print("  PYTHONPATH=core:exports python -m my_agent info")
+```
+
+### 5. Keep User Informed
+
+```python
+# Clear status updates
+print("🔨 Creating package structure...")
+print("✅ Package created: exports/my_agent/")
+print("")
+print("📝 Next: Define agent goal")
+```
+
+## Continuous Monitoring Agents
+
+For agents that run continuously without terminal nodes:
+
+```python
+# No terminal nodes - loops forever
+terminal_nodes = []
+
+# Workflow loops back to start
+edges = [
+    EdgeSpec(id="monitor-to-check", source="monitor", target="check-condition"),
+    EdgeSpec(id="check-to-wait", source="check-condition", target="wait"),
+    EdgeSpec(id="wait-to-monitor", source="wait", target="monitor"),  # Loop
+]
+
+# Entry node only
+entry_node = "monitor"
+entry_points = {"start": "monitor"}
+pause_nodes = []
+```
+
+**Example: File Monitor**
+
+```python
+nodes = [
+    NodeSpec(id="list-files", ...),
+    NodeSpec(id="check-new-files", node_type="router", ...),
+    NodeSpec(id="process-files", ...),
+    NodeSpec(id="wait-interval", node_type="function", ...),
+]
+
+edges = [
+    EdgeSpec(id="list-to-check", source="list-files", target="check-new-files"),
+    EdgeSpec(
+        id="check-to-process",
+        source="check-new-files",
+        target="process-files",
+        condition=EdgeCondition.CONDITIONAL,
+        condition_expr="new_files_count > 0",
+    ),
+    EdgeSpec(
+        id="check-to-wait",
+        source="check-new-files",
+        target="wait-interval",
+        condition=EdgeCondition.CONDITIONAL,
+        condition_expr="new_files_count == 0",
+    ),
+    EdgeSpec(id="process-to-wait", source="process-files", target="wait-interval"),
+    EdgeSpec(id="wait-to-list", source="wait-interval", target="list-files"),  # Loop back
+]
+
+terminal_nodes = []  # No terminal - runs forever
+```
+
+## Complex Routing Patterns
+
+### Multi-Condition Router
+
+```python
+router_node = NodeSpec(
+    id="decision-router",
+    node_type="router",
+    input_keys=["analysis_result"],
+    output_keys=["decision"],
+    system_prompt="""
+    Based on the analysis result, decide the next action:
+    - If confidence > 0.9: route to "execute"
+    - If 0.5 <= confidence <= 0.9: route to "review"
+    - If confidence < 0.5: route to "clarify"
+
+    Return: {"decision": "execute|review|clarify"}
+    """,
+)
+
+# Edges for each route
+edges = [
+    EdgeSpec(
+        id="router-to-execute",
+        source="decision-router",
+        target="execute-action",
+        condition=EdgeCondition.CONDITIONAL,
+        condition_expr="decision == 'execute'",
+        priority=1,
+    ),
+    EdgeSpec(
+        id="router-to-review",
+        source="decision-router",
+        target="human-review",
+        condition=EdgeCondition.CONDITIONAL,
+        condition_expr="decision == 'review'",
+        priority=2,
+    ),
+    EdgeSpec(
+        id="router-to-clarify",
+        source="decision-router",
+        target="request-clarification",
+        condition=EdgeCondition.CONDITIONAL,
+        condition_expr="decision == 'clarify'",
+        priority=3,
+    ),
+]
+```
+
+## Error Handling Patterns
+
+### Graceful Failure with Fallback
+
+```python
+# Primary node with error handling
+nodes = [
+    NodeSpec(id="api-call", max_retries=3, ...),
+    NodeSpec(id="fallback-cache", ...),
+    NodeSpec(id="report-error", ...),
+]
+
+edges = [
+    # Success path
+    EdgeSpec(
+        id="api-success",
+        source="api-call",
+        target="process-results",
+        condition=EdgeCondition.ON_SUCCESS,
+    ),
+    # Fallback on failure
+    EdgeSpec(
+        id="api-to-fallback",
+        source="api-call",
+        target="fallback-cache",
+        condition=EdgeCondition.ON_FAILURE,
+        priority=1,
+    ),
+    # Report if fallback also fails
+    EdgeSpec(
+        id="fallback-to-error",
+        source="fallback-cache",
+        target="report-error",
+        condition=EdgeCondition.ON_FAILURE,
+        priority=1,
+    ),
+]
+```
+
+## Performance Optimization
+
+### Parallel Node Execution
+
+```python
+# Use multiple edges from same source for parallel execution
+edges = [
+    EdgeSpec(
+        id="start-to-search1",
+        source="start",
+        target="search-source-1",
+        condition=EdgeCondition.ALWAYS,
+    ),
+    EdgeSpec(
+        id="start-to-search2",
+        source="start",
+        target="search-source-2",
+        condition=EdgeCondition.ALWAYS,
+    ),
+    EdgeSpec(
+        id="start-to-search3",
+        source="start",
+        target="search-source-3",
+        condition=EdgeCondition.ALWAYS,
+    ),
+    # Converge results
+    EdgeSpec(
+        id="search1-to-merge",
+        source="search-source-1",
+        target="merge-results",
+    ),
+    EdgeSpec(
+        id="search2-to-merge",
+        source="search-source-2",
+        target="merge-results",
+    ),
+    EdgeSpec(
+        id="search3-to-merge",
+        source="search-source-3",
+        target="merge-results",
+    ),
+]
+```
+
+## Handoff to Testing
+
+When agent is complete, transition to testing phase:
+
+```python
+print("""
+✅ Agent complete: exports/my_agent/
+
+Next steps:
+1. Switch to testing-agent skill
+2. Generate and approve tests
+3. Run evaluation
+4. Debug any failures
+
+Command: "Test the agent at exports/my_agent/"
+""")
+```
+
+### Pre-Testing Checklist
+
+Before handing off to testing-agent:
+
+- [ ] Agent structure validates: `python -m agent_name validate`
+- [ ] All nodes defined in nodes/__init__.py
+- [ ] All edges connect valid nodes
+- [ ] Entry node specified
+- [ ] Agent can be imported: `from exports.agent_name import default_agent`
+- [ ] README.md with usage instructions
+- [ ] CLI commands work (info, validate)
+
+## Related Skills
+
+- **building-agents-core** - Fundamental concepts
+- **building-agents-construction** - Step-by-step building
+- **testing-agent** - Test and validate agents
+- **agent-workflow** - Complete workflow orchestrator
+
+---
+
+**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.**
diff --git a/.github/agents/setup-credentials.agent.md b/.github/agents/setup-credentials.agent.md
new file mode 100644
index 0000000000..8b0f16100d
--- /dev/null
+++ b/.github/agents/setup-credentials.agent.md
@@ -0,0 +1,560 @@
+---
+description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the encrypted credential store at ~/.hive/credentials.
+name: Setup Credentials
+tools: ['agent-builder/*', 'tools/*']
+target: vscode
+---
+
+# Setup Credentials
+
+Interactive credential setup for agents with multiple authentication options. Detects what's missing, offers auth method choices, validates with health checks, and stores credentials securely.
+
+## When to Use
+
+- Before running or testing an agent for the first time
+- When `AgentRunner.run()` fails with "missing required credentials"
+- When a user asks to configure credentials for an agent
+- After building a new agent that uses tools requiring API keys
+
+## Workflow
+
+### Step 1: Identify the Agent
+
+Determine which agent needs credentials. The user will either:
+
+- Name the agent directly (e.g., "set up credentials for hubspot-agent")
+- Have an agent directory open (check `exports/` for agent dirs)
+- Be working on an agent in the current session
+
+Locate the agent's directory under `exports/{agent_name}/`.
+
+### Step 2: Detect Required Credentials
+
+Read the agent's configuration to determine which tools and node types it uses:
+
+```python
+from core.framework.runner import AgentRunner
+
+runner = AgentRunner.load("exports/{agent_name}")
+validation = runner.validate()
+
+# validation.missing_credentials contains env var names
+# validation.warnings contains detailed messages with help URLs
+```
+
+Alternatively, check the credential store directly:
+
+```python
+from core.framework.credentials import CredentialStore
+
+# Use encrypted storage (default: ~/.hive/credentials)
+store = CredentialStore.with_encrypted_storage()
+
+# Check what's available
+available = store.list_credentials()
+print(f"Available credentials: {available}")
+
+# Check if specific credential exists
+if store.is_available("hubspot"):
+    print("HubSpot credential found")
+else:
+    print("HubSpot credential missing")
+```
+
+To see all known credential specs (for help URLs and setup instructions):
+
+```python
+from aden_tools.credentials import CREDENTIAL_SPECS
+
+for name, spec in CREDENTIAL_SPECS.items():
+    print(f"{name}: env_var={spec.env_var}, aden={spec.aden_supported}")
+```
+
+### Step 3: Present Auth Options
+
+For each missing credential, check available authentication methods:
+
+```python
+from aden_tools.credentials import CREDENTIAL_SPECS
+
+spec = CREDENTIAL_SPECS.get("hubspot")
+if spec:
+    auth_options = []
+    if spec.aden_supported:
+        auth_options.append("aden")
+    if spec.direct_api_key_supported:
+        auth_options.append("direct")
+    auth_options.append("custom")
+
+    # Get setup info
+    setup_info = {
+        "env_var": spec.env_var,
+        "description": spec.description,
+        "help_url": spec.help_url,
+        "api_key_instructions": spec.api_key_instructions,
+    }
+```
+
+Present options:
+
+```
+Choose how to configure HUBSPOT_ACCESS_TOKEN:
+
+  1) Aden Authorization Server (Recommended)
+     Secure OAuth2 flow via integration.adenhq.com
+     - Quick setup with automatic token refresh
+     - No need to manage API keys manually
+
+  2) Direct API Key
+     Enter your own API key manually
+     - Requires creating a HubSpot Private App
+     - Full control over scopes and permissions
+
+  3) Custom Credential Store (Advanced)
+     Programmatic configuration for CI/CD
+     - For automated deployments
+     - Requires manual API calls
+```
+
+### Step 4: Execute Auth Flow
+
+#### Option 1: Aden Authorization Server
+
+This is the recommended flow for supported integrations (HubSpot, etc.).
+
+**How Aden OAuth Works:**
+
+The ADEN_API_KEY represents a user who has already completed OAuth authorization on Aden's platform. When users sign up and connect integrations on Aden, those OAuth tokens are stored server-side. Having an ADEN_API_KEY means:
+
+1. User has an Aden account
+2. User has already authorized integrations (HubSpot, etc.) via OAuth on Aden
+3. We just need to sync those credentials down to the local credential store
+
+**4.1a. Check for ADEN_API_KEY**
+
+```python
+import os
+aden_key = os.environ.get("ADEN_API_KEY")
+```
+
+If not set, guide user to Aden:
+
+```python
+from aden_tools.credentials import open_browser, get_aden_setup_url
+
+url = get_aden_setup_url()
+success, msg = open_browser(url)
+
+print("Sign in to Aden and connect your integrations.")
+print("Copy your API key and return here.")
+```
+
+Ask user to provide the ADEN_API_KEY they received.
+
+**4.1b. Save ADEN_API_KEY to Shell Config**
+
+With user approval, persist ADEN_API_KEY to their shell config:
+
+```python
+from aden_tools.credentials import (
+    detect_shell,
+    add_env_var_to_shell_config,
+    get_shell_source_command,
+)
+
+shell_type = detect_shell()  # 'bash', 'zsh', or 'unknown'
+
+# Ask user for approval first
+success, config_path = add_env_var_to_shell_config(
+    "ADEN_API_KEY",
+    user_provided_key,
+    comment="Aden authorization server API key"
+)
+
+if success:
+    source_cmd = get_shell_source_command()
+    print(f"Saved to {config_path}")
+    print(f"Run: {source_cmd}")
+```
+
+Also save to `~/.hive/configuration.json` for the framework:
+
+```python
+import json
+from pathlib import Path
+
+config_path = Path.home() / ".hive" / "configuration.json"
+config = json.loads(config_path.read_text()) if config_path.exists() else {}
+
+config["aden"] = {
+    "api_key_configured": True,
+    "api_url": "https://api.adenhq.com"
+}
+
+config_path.parent.mkdir(parents=True, exist_ok=True)
+config_path.write_text(json.dumps(config, indent=2))
+```
+
+**4.1c. Sync Credentials from Aden Server**
+
+Since the user has already authorized integrations on Aden, use the one-liner factory method:
+
+```python
+from core.framework.credentials import CredentialStore
+
+# Single call handles everything
+# This single call handles everything:
+# - Creates encrypted local storage at ~/.hive/credentials
+# - Configures Aden client from ADEN_API_KEY env var
+# - Syncs all credentials from Aden server automatically
+store = CredentialStore.with_aden_sync(
+    base_url="https://api.adenhq.com",
+    auto_sync=True,
+)
+
+# Check what was synced
+synced = store.list_credentials()
+print(f"Synced credentials: {synced}")
+
+# If the required credential wasn't synced, the user hasn't authorized it on Aden yet
+if "hubspot" not in synced:
+    print("HubSpot not found in your Aden account.")
+    print("Please visit https://integration.adenhq.com to connect HubSpot, then try again.")
+```
+
+For more control over the sync process:
+
+```python
+from core.framework.credentials import CredentialStore
+from core.framework.credentials.aden import (
+    AdenCredentialClient,
+    AdenClientConfig,
+    AdenSyncProvider,
+)
+
+# Create client (API key loaded from ADEN_API_KEY env var)
+client = AdenCredentialClient(AdenClientConfig(
+    base_url="https://api.adenhq.com",
+))
+
+# Create provider and store
+provider = AdenSyncProvider(client=client)
+store = CredentialStore.with_encrypted_storage()
+
+# Manual sync
+synced_count = provider.sync_all(store)
+print(f"Synced {synced_count} credentials from Aden")
+```
+
+**4.1d. Run Health Check**
+
+```python
+from aden_tools.credentials import check_credential_health
+
+cred = store.get_credential("hubspot")
+token = cred.keys["access_token"].value.get_secret_value()
+
+result = check_credential_health("hubspot", token)
+if result.valid:
+    print("HubSpot credentials validated!")
+else:
+    print(f"Validation failed: {result.message}")
+    # Offer to retry the OAuth flow
+```
+
+#### Option 2: Direct API Key
+
+For users who prefer manual API key management.
+
+**4.2a. Show Setup Instructions**
+
+```python
+from aden_tools.credentials import CREDENTIAL_SPECS
+
+spec = CREDENTIAL_SPECS.get("hubspot")
+if spec and spec.api_key_instructions:
+    print(spec.api_key_instructions)
+# Output:
+# To get a HubSpot Private App token:
+# 1. Go to HubSpot Settings > Integrations > Private Apps
+# 2. Click "Create a private app"
+# 3. Name your app (e.g., "Hive Agent")
+# ...
+
+if spec and spec.help_url:
+    print(f"More info: {spec.help_url}")
+```
+
+**4.2b. Collect API Key from User**
+
+Use AskUserQuestion to securely collect the API key:
+
+```
+Please provide your HubSpot access token:
+(This will be stored securely in ~/.hive/credentials)
+```
+
+**4.2c. Run Health Check Before Storing**
+
+```python
+from aden_tools.credentials import check_credential_health
+
+result = check_credential_health("hubspot", user_provided_token)
+if not result.valid:
+    print(f"Warning: {result.message}")
+    # Ask user if they want to:
+    # 1. Try a different token
+    # 2. Continue anyway (not recommended)
+```
+
+**4.2d. Store in Encrypted Credential Store**
+
+```python
+from core.framework.credentials import CredentialStore, CredentialObject, CredentialKey
+from pydantic import SecretStr
+
+store = CredentialStore.with_encrypted_storage()
+
+cred = CredentialObject(
+    id="hubspot",
+    name="HubSpot Access Token",
+    keys={
+        "access_token": CredentialKey(
+            name="access_token",
+            value=SecretStr(user_provided_token),
+        )
+    },
+)
+store.save_credential(cred)
+```
+
+**4.2e. Export to Current Session**
+
+```bash
+export HUBSPOT_ACCESS_TOKEN="the-value"
+```
+
+#### Option 3: Custom Credential Store (Advanced)
+
+For programmatic/CI/CD setups.
+
+**4.3a. Show Documentation**
+
+```
+For advanced credential management, you can use the CredentialStore API directly:
+
+  from core.framework.credentials import CredentialStore, CredentialObject, CredentialKey
+  from pydantic import SecretStr
+
+  store = CredentialStore.with_encrypted_storage()
+
+  cred = CredentialObject(
+      id="hubspot",
+      name="HubSpot Access Token",
+      keys={"access_token": CredentialKey(name="access_token", value=SecretStr("..."))}
+  )
+  store.save_credential(cred)
+
+For CI/CD environments:
+  - Set HIVE_CREDENTIAL_KEY for encryption
+  - Pre-populate ~/.hive/credentials programmatically
+  - Or use environment variables directly (HUBSPOT_ACCESS_TOKEN)
+
+Documentation: See core/framework/credentials/README.md
+```
+
+### Step 5: Record Configuration Method
+
+```python
+import json
+from pathlib import Path
+from datetime import datetime
+
+config_path = Path.home() / ".hive" / "configuration.json"
+config = json.loads(config_path.read_text()) if config_path.exists() else {}
+
+if "credential_methods" not in config:
+    config["credential_methods"] = {}
+
+config["credential_methods"]["hubspot"] = {
+    "method": "aden",
+    "configured_at": datetime.now().isoformat(),
+}
+
+config_path.write_text(json.dumps(config, indent=2))
+```
+
+### Step 6: Verify All Credentials
+
+```python
+runner = AgentRunner.load("exports/{agent_name}")
+validation = runner.validate()
+assert not validation.missing_credentials
+```
+
+## Health Check Reference
+
+Health checks validate credentials by making lightweight API calls:
+
+| Credential     | Endpoint                                | What It Checks                    |
+| -------------- | --------------------------------------- | --------------------------------- |
+| `hubspot`      | `GET /crm/v3/objects/contacts?limit=1`  | Bearer token validity, CRM scopes |
+| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity                  |
+
+```python
+from aden_tools.credentials import check_credential_health
+
+result = check_credential_health("hubspot", token_value)
+# result.valid: bool
+# result.message: str
+# result.details: dict (status_code, rate_limited, etc.)
+```
+
+## Encryption Key (HIVE_CREDENTIAL_KEY)
+
+The encrypted credential store requires `HIVE_CREDENTIAL_KEY`:
+
+- If not set, `EncryptedFileStorage` auto-generates one
+- User MUST persist this key (in `~/.bashrc` or secrets manager)
+- Without this key, credentials cannot be decrypted
+- This is the ONLY secret that should live in `~/.bashrc` or environment config
+
+If `HIVE_CREDENTIAL_KEY` is not set:
+
+1. Let the store generate one
+2. Tell the user to save it: `export HIVE_CREDENTIAL_KEY="{generated_key}"`
+3. Recommend adding it to `~/.bashrc` or their shell profile
+
+## Security Rules
+
+- **NEVER** log, print, or echo credential values
+- **NEVER** store credentials in plaintext files
+- **NEVER** hardcode credentials in source code
+- **ALWAYS** use `SecretStr` from Pydantic
+- **ALWAYS** use encrypted credential store
+- **ALWAYS** run health checks before storing
+- **ALWAYS** verify with re-validation, not by reading back
+- **ALWAYS** confirm before modifying shell config
+
+## Credential Sources Reference
+
+All credential specs are defined in `tools/src/aden_tools/credentials/`:
+
+| File              | Category      | Credentials                                   | Aden Supported |
+| ----------------- | ------------- | --------------------------------------------- | -------------- |
+| `llm.py`          | LLM Providers | `anthropic`                                   | No             |
+| `search.py`       | Search Tools  | `brave_search`, `google_search`, `google_cse` | No             |
+| `integrations.py` | Integrations  | `hubspot`                                     | Yes            |
+
+**Note:** Additional LLM providers (Cerebras, Groq, OpenAI) are handled by LiteLLM via environment
+variables (`CEREBRAS_API_KEY`, `GROQ_API_KEY`, `OPENAI_API_KEY`) but are not yet in CREDENTIAL_SPECS.
+Add them to `llm.py` as needed.
+
+To check what's registered:
+
+```python
+from aden_tools.credentials import CREDENTIAL_SPECS
+for name, spec in CREDENTIAL_SPECS.items():
+    print(f"{name}: aden={spec.aden_supported}, direct={spec.direct_api_key_supported}")
+```
+
+## Migration: CredentialManager → CredentialStore
+
+**CredentialManager is deprecated.** Use CredentialStore.
+
+| Old (Deprecated)                          | New (Recommended)                                                    |
+| ----------------------------------------- | -------------------------------------------------------------------- |
+| `CredentialManager()`                     | `CredentialStore.with_encrypted_storage()`                           |
+| `creds.get("hubspot")`                    | `store.get("hubspot")` or `store.get_key("hubspot", "access_token")` |
+| `creds.validate_for_tools(tools)`         | Use `store.is_available(cred_id)` per credential                     |
+| `creds.get_auth_options("hubspot")`       | Check `CREDENTIAL_SPECS["hubspot"].aden_supported`                   |
+| `creds.get_setup_instructions("hubspot")` | Access `CREDENTIAL_SPECS["hubspot"]` directly                        |
+
+**Why migrate?**
+
+- **CredentialStore** supports encrypted storage, multi-key credentials, template resolution, and automatic token refresh
+- **CredentialManager** only reads from environment variables and .env files (no encryption, no refresh)
+- **CredentialStoreAdapter** exists for backward compatibility during migration
+
+```python
+# Old way (deprecated)
+from aden_tools.credentials import CredentialManager
+creds = CredentialManager()
+token = creds.get("hubspot")
+
+# New way (recommended)
+from core.framework.credentials import CredentialStore
+store = CredentialStore.with_encrypted_storage()
+token = store.get("hubspot")
+
+# With Aden sync (recommended for OAuth integrations)
+store = CredentialStore.with_aden_sync()
+token = store.get_key("hubspot", "access_token")
+```
+
+## Example Session
+
+```
+User: /setup-credentials for my hubspot-agent
+
+Agent: Let me check what credentials your hubspot-agent needs.
+
+[Runs validation, finds ANTHROPIC_API_KEY and HUBSPOT_ACCESS_TOKEN missing]
+
+Agent: Your hubspot-agent requires 2 credentials:
+
+  1. ANTHROPIC_API_KEY - Anthropic API key for LLM calls
+  2. HUBSPOT_ACCESS_TOKEN - HubSpot access token
+
+Let's set these up one at a time.
+
+--- Setting up ANTHROPIC_API_KEY ---
+
+This credential only supports direct API key entry.
+
+[Shows instructions for getting Anthropic API key]
+[AskUserQuestion: "Please provide your Anthropic API key:"]
+[User provides key]
+[Stores in credential store, exports to env]
+
+--- Setting up HUBSPOT_ACCESS_TOKEN ---
+
+[AskUserQuestion with 3 options]
+
+Choose how to configure HUBSPOT_ACCESS_TOKEN:
+  1) Aden Authorization Server (Recommended)
+  2) Direct API Key
+  3) Custom Credential Store (Advanced)
+
+User: 1 (Aden)
+
+Agent: Great! Let me check if you have an Aden API key configured...
+
+[Checks for ADEN_API_KEY, not found]
+[Opens browser to integration.adenhq.com/setup]
+
+Agent: I've opened Aden's setup page. Please:
+  1. Sign in or create an account
+  2. Connect your HubSpot account (OAuth happens on Aden's site)
+  3. Copy the API key shown after setup
+
+[AskUserQuestion: "Please provide your Aden API key:"]
+[User provides key]
+
+Agent: Would you like me to save this to your shell config (~/.zshrc) for future sessions?
+
+[User approves]
+[Saves to ~/.zshrc and ~/.hive/configuration.json]
+
+Agent: Let me sync your credentials from Aden...
+
+[Syncs credentials from Aden server - OAuth already done on Aden's side]
+[Runs health check]
+
+Agent: HubSpot credentials validated successfully!
+
+All credentials are now configured:
+  - ANTHROPIC_API_KEY: Stored in encrypted credential store
+  - HUBSPOT_ACCESS_TOKEN: Synced from Aden (OAuth completed on Aden)
+  - Validation passed - your agent is ready to run!
+```
diff --git a/.github/agents/testing-agent.agent.md b/.github/agents/testing-agent.agent.md
new file mode 100644
index 0000000000..91ca95e45a
--- /dev/null
+++ b/.github/agents/testing-agent.agent.md
@@ -0,0 +1,1132 @@
+---
+description: Run goal-based evaluation tests for agents. Use when you need to verify an agent meets its goals, debug failing tests, or iterate on agent improvements based on test results.
+name: Testing Agent
+tools: ['agent-builder/*', 'tools/*']
+target: vscode
+---
+
+# Testing Workflow
+
+This skill provides tools for testing agents built with the building-agents skills.
+
+## Workflow Overview
+
+1. `mcp__agent-builder__list_tests` - Check what tests exist
+2. `mcp__agent-builder__generate_constraint_tests` or `mcp__agent-builder__generate_success_tests` - Get test guidelines
+3. **Write tests directly** using the Write tool with the guidelines provided
+4. `mcp__agent-builder__run_tests` - Execute tests
+5. `mcp__agent-builder__debug_test` - Debug failures
+
+## How Test Generation Works
+
+The `generate_*_tests` MCP tools return **guidelines and templates** - they do NOT generate test code via LLM.
+You (the assistant) write the tests directly using file operations based on the guidelines.
+
+### Example Workflow
+
+```python
+# Step 1: Get test guidelines
+result = mcp__agent-builder__generate_constraint_tests(
+    goal_id="my-goal",
+    goal_json='{"id": "...", "constraints": [...]}',
+    agent_path="exports/my_agent"
+)
+
+# Step 2: The result contains:
+# - output_file: where to write tests
+# - file_header: imports and fixtures to use
+# - test_template: format for test functions
+# - constraints_formatted: the constraints to test
+# - test_guidelines: rules for writing tests
+
+# Step 3: Write tests directly using file operations
+Write(
+    file_path=result["output_file"],
+    content=result["file_header"] + test_code_you_write
+)
+
+# Step 4: Run tests via MCP tool
+mcp__agent-builder__run_tests(
+    goal_id="my-goal",
+    agent_path="exports/my_agent"
+)
+
+# Step 5: Debug failures via MCP tool
+mcp__agent-builder__debug_test(
+    goal_id="my-goal",
+    test_name="test_constraint_foo",
+    agent_path="exports/my_agent"
+)
+```
+
+---
+
+# Testing Agents with MCP Tools
+
+Run goal-based evaluation tests for agents built with the building-agents skills.
+
+**Key Principle: MCP tools provide guidelines, assistant writes tests directly**
+- ✅ Get guidelines: `generate_constraint_tests`, `generate_success_tests` → returns templates and guidelines
+- ✅ Write tests: Use file operations with the provided file_header and test_template
+- ✅ Run tests: `run_tests` (runs pytest via subprocess)
+- ✅ Debug failures: `debug_test` (re-runs single test with verbose output)
+- ✅ List tests: `list_tests` (scans Python test files)
+- ✅ Tests stored in `exports/{agent}/tests/test_*.py`
+
+## Architecture: Python Test Files
+
+```
+exports/my_agent/
+├── __init__.py
+├── agent.py              ← Agent to test
+├── nodes/__init__.py
+├── config.py
+├── __main__.py
+└── tests/                ← Test files written by assistant
+    ├── conftest.py       # Shared fixtures (auto-created)
+    ├── test_constraints.py
+    ├── test_success_criteria.py
+    └── test_edge_cases.py
+```
+
+**Tests import the agent directly:**
+```python
+import pytest
+from exports.my_agent import default_agent
+
+
+@pytest.mark.asyncio
+async def test_happy_path(mock_mode):
+    """Test: {description}"""
+    result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode)
+    assert result.success
+    assert len(result.output) > 0
+```
+
+## Why This Approach
+
+- MCP tools provide consistent test guidelines with proper imports, fixtures, and API key enforcement
+- Assistant writes tests directly, eliminating circular LLM dependencies in the MCP server
+- `run_tests` parses pytest output into structured results for iteration
+- `debug_test` provides formatted output with actionable debugging info
+- File headers include conftest.py setup with proper fixtures
+
+## Quick Start
+
+1. **Check existing tests** - `list_tests(goal_id, agent_path)`
+2. **Get test guidelines** - `generate_constraint_tests` or `generate_success_tests`
+3. **Write tests** - Use file operations with the provided file_header and guidelines
+4. **Run tests** - `run_tests(goal_id, agent_path)`
+5. **Debug failures** - `debug_test(goal_id, test_name, agent_path)`
+6. **Iterate** - Repeat steps 4-5 until all pass
+
+## ⚠️ Credential Requirements for Testing
+
+**CRITICAL: Testing requires ALL credentials the agent depends on.** This includes both the LLM API key AND any tool-specific credentials (HubSpot, Brave Search, etc.).
+
+### Prerequisites
+
+Before running agent tests, you MUST collect ALL required credentials from the user.
+
+**Step 1: LLM API Key (always required)**
+```bash
+export ANTHROPIC_API_KEY="your-key-here"
+```
+
+**Step 2: Tool-specific credentials (depends on agent's tools)**
+
+Inspect the agent's `mcp_servers.json` and tool configuration to determine which tools the agent uses, then check for all required credentials:
+
+```python
+from aden_tools.credentials import CredentialManager, CREDENTIAL_SPECS
+
+creds = CredentialManager()
+
+# Determine which tools the agent uses (from agent.json or mcp_servers.json)
+agent_tools = [...]  # e.g., ["hubspot_search_contacts", "web_search", ...]
+
+# Find all missing credentials for those tools
+missing = creds.get_missing_for_tools(agent_tools)
+```
+
+Common tool credentials:
+| Tool | Env Var | Help URL |
+|------|---------|----------|
+| HubSpot CRM | `HUBSPOT_ACCESS_TOKEN` | https://developers.hubspot.com/docs/api/private-apps |
+| Brave Search | `BRAVE_SEARCH_API_KEY` | https://brave.com/search/api/ |
+| Google Search | `GOOGLE_SEARCH_API_KEY` + `GOOGLE_SEARCH_CX` | https://developers.google.com/custom-search |
+
+**Why ALL credentials are required:**
+- Tests need to execute the agent's LLM nodes to validate behavior
+- Tools with missing credentials will return error dicts instead of real data
+- Mock mode bypasses everything, providing no confidence in real-world performance
+- The `AgentRunner.run()` method validates credentials at startup and will fail fast if any are missing
+
+### Mock Mode Limitations
+
+Mock mode (`--mock` flag or `mock_mode=True`) is **ONLY for structure validation**:
+
+✓ Validates graph structure (nodes, edges, connections)
+✓ Tests that code doesn't crash on execution
+✗ Does NOT test LLM message generation
+✗ Does NOT test reasoning or decision-making quality
+✗ Does NOT test constraint validation (length limits, format rules)
+✗ Does NOT test real API integrations or tool use
+✗ Does NOT test personalization or content quality
+
+**Bottom line:** If you're testing whether an agent achieves its goal, you MUST use real credentials for ALL services.
+
+### Enforcing Credentials in Tests
+
+When generating tests, **ALWAYS include credential checks for ALL required services**:
+
+```python
+import os
+import pytest
+from aden_tools.credentials import CredentialManager
+
+# At the top of every test file
+pytestmark = pytest.mark.skipif(
+    not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
+    reason="API key required for real testing. Set ANTHROPIC_API_KEY or use MOCK_MODE=1 for structure validation only."
+)
+
+
+@pytest.fixture(scope="session", autouse=True)
+def check_credentials():
+    """Ensure ALL required credentials are set for real testing."""
+    creds = CredentialManager()
+    mock_mode = os.environ.get("MOCK_MODE")
+
+    # Always check LLM key
+    if not creds.is_available("anthropic"):
+        if mock_mode:
+            print("\n⚠️  Running in MOCK MODE - structure validation only")
+            print("   This does NOT test LLM behavior or agent quality")
+            print("   Set ANTHROPIC_API_KEY for real testing\n")
+        else:
+            pytest.fail(
+                "\n❌ ANTHROPIC_API_KEY not set!\n\n"
+                "Real testing requires an API key. Choose one:\n"
+                "1. Set API key (RECOMMENDED):\n"
+                "   export ANTHROPIC_API_KEY='your-key-here'\n"
+                "2. Run structure validation only:\n"
+                "   MOCK_MODE=1 pytest exports/{agent}/tests/\n\n"
+                "Note: Mock mode does NOT validate agent behavior or quality."
+            )
+
+    # Check tool-specific credentials (skip in mock mode)
+    if not mock_mode:
+        # List the tools this agent uses - update per agent
+        agent_tools = []  # e.g., ["hubspot_search_contacts", "hubspot_get_contact"]
+        missing = creds.get_missing_for_tools(agent_tools)
+        if missing:
+            lines = ["\n❌ Missing tool credentials!\n"]
+            for name in missing:
+                spec = creds.specs.get(name)
+                if spec:
+                    lines.append(f"  {spec.env_var} - {spec.description}")
+                    if spec.help_url:
+                        lines.append(f"    Setup: {spec.help_url}")
+            lines.append("\nSet the required environment variables and re-run.")
+            pytest.fail("\n".join(lines))
+```
+
+### User Communication
+
+When the user asks to test an agent, **ALWAYS check for ALL credentials first** — not just the LLM key:
+
+1. **Identify the agent's tools** from `agent.json` or `mcp_servers.json`
+2. **Check ALL required credentials** using `CredentialManager`
+3. **Ask the user to provide any missing credentials** before proceeding
+
+```python
+from aden_tools.credentials import CredentialManager, CREDENTIAL_SPECS
+
+creds = CredentialManager()
+
+# 1. Check LLM key
+missing_creds = []
+if not creds.is_available("anthropic"):
+    missing_creds.append(("ANTHROPIC_API_KEY", "Anthropic API key for LLM calls"))
+
+# 2. Check tool-specific credentials
+agent_tools = [...]  # Determined from agent config
+missing_tools = creds.get_missing_for_tools(agent_tools)
+for name in missing_tools:
+    spec = CREDENTIAL_SPECS.get(name)
+    if spec:
+        missing_creds.append((spec.env_var, spec.description))
+
+# 3. Present ALL missing credentials to the user at once
+if missing_creds:
+    print("⚠️  Missing credentials required by this agent:\n")
+    for env_var, description in missing_creds:
+        print(f"  • {env_var} — {description}")
+    print()
+    print("Please set the missing environment variables:")
+    for env_var, _ in missing_creds:
+        print(f"  export {env_var}='your-value-here'")
+    print()
+    print("Or run in mock mode (structure validation only):")
+    print("  MOCK_MODE=1 pytest exports/{agent}/tests/")
+
+    # Ask user to provide credentials or choose mock mode
+```
+
+**IMPORTANT:** Do NOT skip credential collection. If an agent uses HubSpot tools, the user MUST provide `HUBSPOT_ACCESS_TOKEN`. If it uses web search, the user MUST provide the appropriate search API key. Collect ALL missing credentials in a single prompt rather than discovering them one at a time during test failures.
+
+## The Three-Stage Flow
+
+```
+┌─────────────────────────────────────────────────────────────────────────┐
+│                           GOAL STAGE                                     │
+│  (building-agents skill)                                                 │
+│                                                                          │
+│  1. User defines goal with success_criteria and constraints             │
+│  2. Goal written to agent.py immediately                                │
+│  3. Generate CONSTRAINT TESTS → Write to tests/ → USER APPROVAL         │
+│     Files created: exports/{agent}/tests/test_constraints.py            │
+└─────────────────────────────────────────────────────────────────────────┘
+                                   ↓
+┌─────────────────────────────────────────────────────────────────────────┐
+│                          AGENT STAGE                                     │
+│  (building-agents skill)                                                 │
+│                                                                          │
+│  Build nodes + edges, written immediately to files                      │
+│  Constraint tests can run during development:                           │
+│    run_tests(goal_id, agent_path, test_types='["constraint"]')          │
+└─────────────────────────────────────────────────────────────────────────┘
+                                   ↓
+┌─────────────────────────────────────────────────────────────────────────┐
+│                           EVAL STAGE (this skill)                        │
+│                                                                          │
+│  1. Generate SUCCESS_CRITERIA TESTS → Write to tests/ → USER APPROVAL   │
+│     Files created: exports/{agent}/tests/test_success_criteria.py       │
+│  2. Run all tests: run_tests(goal_id, agent_path)                       │
+│  3. On failure → debug_test(goal_id, test_name, agent_path)             │
+│  4. Iterate: Edit agent code → Re-run run_tests (instant feedback)      │
+└─────────────────────────────────────────────────────────────────────────┘
+```
+
+## Step-by-Step: Testing an Agent
+
+### Step 1: Check Existing Tests
+
+**ALWAYS check first** before generating new tests:
+
+```python
+mcp__agent-builder__list_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent"
+)
+```
+
+### Step 2: Get Constraint Test Guidelines
+
+After goal is defined, get test guidelines using the MCP tool:
+
+```python
+# First, read the goal from agent.py to get the goal JSON
+goal_code = Read(file_path="exports/your_agent/agent.py")
+
+# Get constraint test guidelines via MCP tool
+result = mcp__agent-builder__generate_constraint_tests(
+    goal_id="your-goal-id",
+    goal_json='{"id": "goal-id", "name": "...", "constraints": [...]}',
+    agent_path="exports/your_agent"
+)
+```
+
+**Response includes:**
+- `output_file`: Where to write tests
+- `file_header`: Imports, fixtures, and pytest setup
+- `test_template`: Format for test functions
+- `constraints_formatted`: The constraints to test
+- `test_guidelines`: Rules and best practices
+
+**Write tests directly** using file operations:
+
+```python
+Write(
+    file_path=result["output_file"],
+    content=result["file_header"] + "\n\n" + your_test_code
+)
+```
+
+### Step 3: Get Success Criteria Test Guidelines (Eval Stage)
+
+After agent is fully built, get success criteria test guidelines:
+
+```python
+# Get success criteria test guidelines via MCP tool
+result = mcp__agent-builder__generate_success_tests(
+    goal_id="your-goal-id",
+    goal_json='{"id": "goal-id", "name": "...", "success_criteria": [...]}',
+    node_names="analyze_request,search_web,format_results",
+    tool_names="web_search,web_scrape",
+    agent_path="exports/your_agent"
+)
+```
+
+**Write tests directly** using file operations:
+
+```python
+# Write tests using file operations
+Write(
+    file_path=result["output_file"],
+    content=result["file_header"] + "\n\n" + your_test_code
+)
+```
+
+### Step 4: Test Fixtures (conftest.py)
+
+The `file_header` returned by the MCP tools includes proper imports and fixtures.
+You should also create a conftest.py file in the tests directory with shared fixtures:
+
+```python
+# Create conftest.py with the conftest template
+Write(
+    file_path="exports/your_agent/tests/conftest.py",
+    content=conftest_content  # Use PYTEST_CONFTEST_TEMPLATE format
+)
+```
+
+### Step 5: Run Tests
+
+**Use the MCP tool to run tests** (not pytest directly):
+
+```python
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent"
+)
+
+**Response includes structured results:**
+```json
+{
+  "goal_id": "your-goal-id",
+  "overall_passed": false,
+  "summary": {
+    "total": 12,
+    "passed": 10,
+    "failed": 2,
+    "skipped": 0,
+    "errors": 0,
+    "pass_rate": "83.3%"
+  },
+  "test_results": [
+    {"file": "test_constraints.py", "test_name": "test_constraint_api_rate_limits", "status": "passed"},
+    {"file": "test_success_criteria.py", "test_name": "test_success_find_relevant_results", "status": "failed"}
+  ],
+  "failures": [
+    {"test_name": "test_success_find_relevant_results", "details": "AssertionError: Expected 3-5 results..."}
+  ]
+}
+```
+
+**Options for `run_tests`:**
+```python
+# Run only constraint tests
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent",
+    test_types='["constraint"]'
+)
+
+# Run with parallel workers
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent",
+    parallel=4
+)
+
+# Stop on first failure
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent",
+    fail_fast=True
+)
+```
+
+### Step 6: Debug Failed Tests
+
+**Use the MCP tool to debug** (not Bash/pytest directly):
+
+```python
+mcp__agent-builder__debug_test(
+    goal_id="your-goal-id",
+    test_name="test_success_find_relevant_results",
+    agent_path="exports/your_agent"
+)
+```
+
+**Response includes:**
+- Full verbose output from the test
+- Stack trace with exact line numbers
+- Captured logs and prints
+- Suggestions for fixing the issue
+
+### Step 7: Categorize Errors
+
+When a test fails, categorize the error to guide iteration:
+
+```python
+def categorize_test_failure(test_output, agent_code):
+    """Categorize test failure to guide iteration."""
+
+    # Read test output and agent code
+    failure_info = {
+        "test_name": "...",
+        "error_message": "...",
+        "stack_trace": "...",
+    }
+
+    # Pattern-based categorization
+    if any(pattern in failure_info["error_message"].lower() for pattern in [
+        "typeerror", "attributeerror", "keyerror", "valueerror",
+        "null", "none", "undefined", "tool call failed"
+    ]):
+        category = "IMPLEMENTATION_ERROR"
+        guidance = {
+            "stage": "Agent",
+            "action": "Fix the bug in agent code",
+            "files_to_edit": ["agent.py", "nodes/__init__.py"],
+            "restart_required": False,
+            "description": "Code bug - fix and re-run tests"
+        }
+
+    elif any(pattern in failure_info["error_message"].lower() for pattern in [
+        "assertion", "expected", "got", "should be", "success criteria"
+    ]):
+        category = "LOGIC_ERROR"
+        guidance = {
+            "stage": "Goal",
+            "action": "Update goal definition",
+            "files_to_edit": ["agent.py (goal section)"],
+            "restart_required": True,
+            "description": "Goal definition is wrong - update and rebuild"
+        }
+
+    elif any(pattern in failure_info["error_message"].lower() for pattern in [
+        "timeout", "rate limit", "empty", "boundary", "edge case"
+    ]):
+        category = "EDGE_CASE"
+        guidance = {
+            "stage": "Eval",
+            "action": "Add edge case test and fix handling",
+            "files_to_edit": ["agent.py", "tests/test_edge_cases.py"],
+            "restart_required": False,
+            "description": "New scenario - add test and handle it"
+        }
+
+    else:
+        category = "UNKNOWN"
+        guidance = {
+            "stage": "Unknown",
+            "action": "Manual investigation required",
+            "restart_required": False
+        }
+
+    return {
+        "category": category,
+        "guidance": guidance,
+        "failure_info": failure_info
+    }
+```
+
+**Show categorization to user:**
+
+```python
+AskUserQuestion(
+    questions=[{
+        "question": f"Test failed with {category}. How would you like to proceed?",
+        "header": "Test Failure",
+        "options": [
+            {
+                "label": "Fix code directly (Recommended)" if category == "IMPLEMENTATION_ERROR" else "Update goal",
+                "description": guidance["description"]
+            },
+            {
+                "label": "Show detailed error info",
+                "description": "View full stack trace and logs"
+            },
+            {
+                "label": "Skip for now",
+                "description": "Continue with other tests"
+            }
+        ],
+        "multiSelect": false
+    }]
+)
+```
+
+### Step 8: Iterate Based on Error Category
+
+### Step 8: Iterate Based on Error Category
+
+#### IMPLEMENTATION_ERROR → Fix Agent Code
+
+```python
+# 1. Show user the exact file and line that failed
+print(f"Error in: exports/{agent_name}/nodes/__init__.py:42")
+print(f"Issue: 'NoneType' object has no attribute 'get'")
+
+# 2. Read the problematic code
+code = Read(file_path=f"exports/{agent_name}/nodes/__init__.py")
+
+# 3. User can fix directly, or you suggest a fix:
+Edit(
+    file_path=f"exports/{agent_name}/nodes/__init__.py",
+    old_string="if results.get('videos'):",
+    new_string="if results and results.get('videos'):"
+)
+
+# 4. Re-run tests immediately (instant feedback!)
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path=f"exports/{agent_name}"
+)
+```
+
+#### LOGIC_ERROR → Update Goal
+
+```python
+# 1. Show user the goal definition
+goal_code = Read(file_path=f"exports/{agent_name}/agent.py")
+
+# 2. Discuss what needs to change in success_criteria or constraints
+
+# 3. Edit the goal
+Edit(
+    file_path=f"exports/{agent_name}/agent.py",
+    old_string='target="3-5 videos"',
+    new_string='target="1-5 videos"'  # More realistic
+)
+
+# 4. May need to regenerate agent nodes if goal changed significantly
+# This requires going back to building-agents skill
+```
+
+#### EDGE_CASE → Add Test and Fix
+
+```python
+# 1. Create new edge case test with API key enforcement
+edge_case_test = '''
+@pytest.mark.asyncio
+async def test_edge_case_empty_results(mock_mode):
+    """Test: Agent handles no results gracefully"""
+    result = await default_agent.run({{"query": "xyzabc123nonsense"}}, mock_mode=mock_mode)
+
+    # Should succeed with empty results, not crash
+    assert result.success or result.error is not None
+    if result.success:
+        assert result.output.get("message") == "No results found"
+'''
+
+# 2. Add to test file
+Edit(
+    file_path=f"exports/{agent_name}/tests/test_edge_cases.py",
+    old_string="# Add edge case tests here",
+    new_string=edge_case_test
+)
+
+# 3. Fix agent to handle edge case
+# Edit agent code to handle empty results
+
+# 4. Re-run tests
+```
+
+## Test File Templates (Reference Only)
+
+**⚠️ Do NOT copy-paste these templates directly.** Use `generate_constraint_tests` and `generate_success_tests` MCP tools to create properly structured tests with correct imports and fixtures.
+
+These templates show the structure of generated tests for reference only.
+
+### Constraint Test Template
+
+```python
+"""Constraint tests for {agent_name}.
+
+These tests validate that the agent respects its defined constraints.
+Requires ANTHROPIC_API_KEY for real testing.
+"""
+
+import os
+import pytest
+from exports.{agent_name} import default_agent
+from aden_tools.credentials import CredentialManager
+
+
+# Enforce API key for real testing
+pytestmark = pytest.mark.skipif(
+    not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
+    reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1."
+)
+
+
+@pytest.mark.asyncio
+async def test_constraint_{constraint_id}():
+    """Test: {constraint_description}"""
+    # Test implementation based on constraint type
+    mock_mode = bool(os.environ.get("MOCK_MODE"))
+    result = await default_agent.run({{"test": "input"}}, mock_mode=mock_mode)
+
+    # Assert constraint is respected
+    assert True  # Replace with actual check
+```
+
+### Success Criteria Test Template
+
+```python
+"""Success criteria tests for {agent_name}.
+
+These tests validate that the agent achieves its defined success criteria.
+Requires ANTHROPIC_API_KEY for real testing - mock mode cannot validate success criteria.
+"""
+
+import os
+import pytest
+from exports.{agent_name} import default_agent
+from aden_tools.credentials import CredentialManager
+
+
+# Enforce API key for real testing
+pytestmark = pytest.mark.skipif(
+    not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
+    reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1."
+)
+
+
+@pytest.mark.asyncio
+async def test_success_{criteria_id}():
+    """Test: {criteria_description}"""
+    mock_mode = bool(os.environ.get("MOCK_MODE"))
+    result = await default_agent.run({{"test": "input"}}, mock_mode=mock_mode)
+
+    assert result.success, f"Agent failed: {{result.error}}"
+
+    # Verify success criterion met
+    # e.g., assert metric meets target
+    assert True  # Replace with actual check
+```
+
+### Edge Case Test Template
+
+```python
+"""Edge case tests for {agent_name}.
+
+These tests validate agent behavior in unusual or boundary conditions.
+Requires ANTHROPIC_API_KEY for real testing.
+"""
+
+import os
+import pytest
+from exports.{agent_name} import default_agent
+from aden_tools.credentials import CredentialManager
+
+
+# Enforce API key for real testing
+pytestmark = pytest.mark.skipif(
+    not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"),
+    reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1."
+)
+
+
+@pytest.mark.asyncio
+async def test_edge_case_{scenario_name}():
+    """Test: Agent handles {scenario_description}"""
+    mock_mode = bool(os.environ.get("MOCK_MODE"))
+    result = await default_agent.run({{"edge": "case_input"}}, mock_mode=mock_mode)
+
+    # Verify graceful handling
+    assert result.success or result.error is not None
+```
+
+## Interactive Build + Test Loop
+
+During agent construction (Agent stage), you can run constraint tests incrementally:
+
+```python
+# After adding first node
+print("Added search_node. Running relevant constraint tests...")
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path=f"exports/{agent_name}",
+    test_types='["constraint"]'
+)
+
+# After adding second node
+print("Added filter_node. Running all constraint tests...")
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path=f"exports/{agent_name}",
+    test_types='["constraint"]'
+)
+```
+
+This provides **immediate feedback** during development, catching issues early.
+
+## Common Test Patterns
+
+**Note:** All test patterns should include API key enforcement via conftest.py.
+
+### ⚠️ CRITICAL: Framework Features You Must Know
+
+#### OutputCleaner - Automatic I/O Cleaning (NEW!)
+
+**The framework now automatically validates and cleans node outputs** using a fast LLM (Cerebras llama-3.3-70b) at edge traversal time. This prevents cascading failures from malformed output.
+
+**What OutputCleaner does**:
+- ✅ Validates output matches next node's input schema
+- ✅ Detects JSON parsing trap (entire response in one key)
+- ✅ Cleans malformed output automatically (~200-500ms, ~$0.001 per cleaning)
+- ✅ Boosts success rates by 1.8-2.2x
+
+**Impact on tests**: Tests should still use safe patterns because OutputCleaner may not catch all issues in test mode.
+
+#### Safe Test Patterns (REQUIRED)
+
+**❌ UNSAFE** (will cause test failures):
+```python
+# Direct key access - can crash!
+approval_decision = result.output["approval_decision"]
+```
+
+**✅ SAFE** (correct patterns):
+```python
+# 1. Safe dict access with .get()
+output = result.output or {}
+approval_decision = output.get("approval_decision", "UNKNOWN")
+assert "APPROVED" in approval_decision or approval_decision == "APPROVED"
+
+# 2. Type checking before operations
+analysis = output.get("analysis", {})
+if isinstance(analysis, dict):
+    category = analysis.get("category", "unknown")
+
+# 3. Parse JSON from strings (the JSON parsing trap!)
+import json
+recommendation = output.get("recommendation", "{}")
+if isinstance(recommendation, str):
+    try:
+        parsed = json.loads(recommendation)
+        if isinstance(parsed, dict):
+            approval = parsed.get("approval_decision", "UNKNOWN")
+    except json.JSONDecodeError:
+        approval = "UNKNOWN"
+elif isinstance(recommendation, dict):
+    approval = recommendation.get("approval_decision", "UNKNOWN")
+
+# 4. Safe iteration with type check
+compliance_issues = output.get("compliance_issues", [])
+if isinstance(compliance_issues, list):
+    for issue in compliance_issues:
+        ...
+```
+
+#### Helper Functions for Safe Access
+
+**Add to conftest.py**:
+```python
+import json
+import re
+
+def _parse_json_from_output(result, key):
+    """Parse JSON from agent output (framework may store full LLM response as string)."""
+    response_text = result.output.get(key, "")
+    # Remove markdown code blocks if present
+    json_text = re.sub(r'```json\s*|\s*```', '', response_text).strip()
+
+    try:
+        return json.loads(json_text)
+    except (json.JSONDecodeError, AttributeError, TypeError):
+        return result.output.get(key)
+
+def safe_get_nested(result, key_path, default=None):
+    """Safely get nested value from result.output."""
+    output = result.output or {}
+    current = output
+
+    for key in key_path:
+        if isinstance(current, dict):
+            current = current.get(key)
+        elif isinstance(current, str):
+            try:
+                json_text = re.sub(r'```json\s*|\s*```', '', current).strip()
+                parsed = json.loads(json_text)
+                if isinstance(parsed, dict):
+                    current = parsed.get(key)
+                else:
+                    return default
+            except json.JSONDecodeError:
+                return default
+        else:
+            return default
+
+    return current if current is not None else default
+
+# Make available in tests
+pytest.parse_json_from_output = _parse_json_from_output
+pytest.safe_get_nested = safe_get_nested
+```
+
+**Usage in tests**:
+```python
+# Use helper to parse JSON safely
+parsed = pytest.parse_json_from_output(result, "recommendation")
+if isinstance(parsed, dict):
+    approval = parsed.get("approval_decision", "UNKNOWN")
+
+# Safe nested access
+risk_score = pytest.safe_get_nested(result, ["analysis", "risk_score"], default=0.0)
+```
+
+#### Test Count Guidance
+
+**Generate 8-15 tests total, NOT 30+**
+
+- ✅ 2-3 tests per success criterion
+- ✅ 1 happy path test
+- ✅ 1 boundary/edge case test
+- ✅ 1 error handling test (optional)
+
+**Why fewer tests?**:
+- Each test requires real LLM call (~3 seconds, costs money)
+- 30 tests = 90 seconds, $0.30+ in costs
+- 12 tests = 36 seconds, $0.12 in costs
+- Focus on quality over quantity
+
+#### ExecutionResult Fields (Important!)
+
+**`result.success=True` means NO exception, NOT goal achieved**
+
+```python
+# ❌ WRONG - assumes goal achieved
+assert result.success
+
+# ✅ RIGHT - check success AND output
+assert result.success, f"Agent failed: {result.error}"
+output = result.output or {}
+approval = output.get("approval_decision")
+assert approval == "APPROVED", f"Expected APPROVED, got {approval}"
+```
+
+**All ExecutionResult fields**:
+- `success: bool` - Execution completed without exception (NOT goal achieved!)
+- `output: dict` - Complete memory snapshot (may contain raw strings)
+- `error: str | None` - Error message if failed
+- `steps_executed: int` - Number of nodes executed
+- `total_tokens: int` - Cumulative token usage
+- `total_latency_ms: int` - Total execution time
+- `path: list[str]` - Node IDs traversed
+- `paused_at: str | None` - Node ID if HITL pause occurred
+- `session_state: dict` - State for resuming
+
+### Happy Path Test
+```python
+@pytest.mark.asyncio
+async def test_happy_path(mock_mode):
+    """Test normal successful execution"""
+    result = await default_agent.run({"query": "test"}, mock_mode=mock_mode)
+    assert result.success
+    assert len(result.output) > 0
+```
+
+### Boundary Condition Test
+```python
+@pytest.mark.asyncio
+async def test_boundary_minimum(mock_mode):
+    """Test at minimum threshold"""
+    result = await default_agent.run({"query": "specific topic"}, mock_mode=mock_mode)
+    assert result.success
+    assert len(result.output.get("results", [])) >= 1
+```
+
+### Error Handling Test
+```python
+@pytest.mark.asyncio
+async def test_error_handling(mock_mode):
+    """Test graceful error handling"""
+    result = await default_agent.run({"query": ""}, mock_mode=mock_mode)
+    assert not result.success or result.output.get("error") is not None
+```
+
+### Performance Test
+```python
+@pytest.mark.asyncio
+async def test_performance_latency(mock_mode):
+    """Test response time is acceptable"""
+    import time
+    start = time.time()
+    result = await default_agent.run({"query": "test"}, mock_mode=mock_mode)
+    duration = time.time() - start
+    assert duration < 5.0, f"Took {duration}s, expected <5s"
+```
+
+## Integration with building-agents
+
+### Handoff Points
+
+| Scenario | From | To | Action |
+|----------|------|-----|--------|
+| Agent built, ready to test | building-agents | testing-agent | Generate success tests |
+| LOGIC_ERROR found | testing-agent | building-agents | Update goal, rebuild |
+| IMPLEMENTATION_ERROR | testing-agent | Direct fix | Edit agent files, re-run tests |
+| EDGE_CASE found | testing-agent | testing-agent | Add edge case test |
+| All tests pass | testing-agent | Done | Agent validated ✅ |
+
+### Iteration Speed Comparison
+
+| Scenario | Old Approach | New Approach |
+|----------|--------------|--------------|
+| **Bug Fix** | Rebuild via MCP tools (14 min) | Edit Python file, pytest (2 min) |
+| **Add Test** | Generate via MCP, export (5 min) | Write test file directly (1 min) |
+| **Debug** | Read subprocess logs | pdb, breakpoints, prints |
+| **Inspect** | Limited visibility | Full Python introspection |
+
+## Anti-Patterns
+
+### Testing Best Practices
+
+| Don't | Do Instead |
+|-------|------------|
+| ❌ Write tests without getting guidelines first | ✅ Use `generate_*_tests` to get proper file_header and guidelines |
+| ❌ Run pytest via Bash | ✅ Use `run_tests` MCP tool for structured results |
+| ❌ Debug tests with Bash pytest -vvs | ✅ Use `debug_test` MCP tool for formatted output |
+| ❌ Check for tests with Glob | ✅ Use `list_tests` MCP tool |
+| ❌ Skip the file_header from guidelines | ✅ Always include the file_header for proper imports and fixtures |
+
+### General Testing
+
+| Don't | Do Instead |
+|-------|------------|
+| ❌ Treat all failures the same | ✅ Use debug_test to categorize and iterate appropriately |
+| ❌ Rebuild entire agent for small bugs | ✅ Edit code directly, re-run tests |
+| ❌ Run tests without API key | ✅ Always set ANTHROPIC_API_KEY first |
+| ❌ Write tests without understanding the constraints/criteria | ✅ Read the formatted constraints/criteria from guidelines |
+
+## Workflow Summary
+
+```
+1. Check existing tests: list_tests(goal_id, agent_path)
+   → Scans exports/{agent}/tests/test_*.py
+   ↓
+2. Get test guidelines: generate_constraint_tests, generate_success_tests
+   → Returns file_header, test_template, constraints/criteria, guidelines
+   ↓
+3. Write tests: Use file operations with the provided guidelines
+   → Write tests to exports/{agent}/tests/test_*.py
+   ↓
+4. Run tests: run_tests(goal_id, agent_path)
+   → Executes: pytest exports/{agent}/tests/ -v
+   ↓
+5. Debug failures: debug_test(goal_id, test_name, agent_path)
+   → Re-runs single test with verbose output
+   ↓
+6. Fix based on category:
+   - IMPLEMENTATION_ERROR → Edit agent code directly
+   - ASSERTION_FAILURE → Fix agent logic or update test
+   - IMPORT_ERROR → Check package structure
+   - API_ERROR → Check API keys and connectivity
+   ↓
+7. Re-run tests: run_tests(goal_id, agent_path)
+   ↓
+8. Repeat until all pass ✅
+```
+
+## MCP Tools Reference
+
+```python
+# Check existing tests (scans Python test files)
+mcp__agent-builder__list_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent"
+)
+
+# Get constraint test guidelines (returns templates and guidelines, NOT generated tests)
+mcp__agent-builder__generate_constraint_tests(
+    goal_id="your-goal-id",
+    goal_json='{"id": "...", "constraints": [...]}',
+    agent_path="exports/your_agent"
+)
+# Returns: output_file, file_header, test_template, constraints_formatted, test_guidelines
+
+# Get success criteria test guidelines
+mcp__agent-builder__generate_success_tests(
+    goal_id="your-goal-id",
+    goal_json='{"id": "...", "success_criteria": [...]}',
+    node_names="node1,node2",
+    tool_names="tool1,tool2",
+    agent_path="exports/your_agent"
+)
+# Returns: output_file, file_header, test_template, success_criteria_formatted, test_guidelines
+
+# Run tests via pytest subprocess
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent"
+)
+
+# Debug a failed test (re-runs with verbose output)
+mcp__agent-builder__debug_test(
+    goal_id="your-goal-id",
+    test_name="test_constraint_foo",
+    agent_path="exports/your_agent"
+)
+```
+
+## run_tests Options
+
+```python
+# Run only constraint tests
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent",
+    test_types='["constraint"]'
+)
+
+# Run only success criteria tests
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent",
+    test_types='["success"]'
+)
+
+# Run with pytest-xdist parallelism (requires pytest-xdist)
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent",
+    parallel=4
+)
+
+# Stop on first failure
+mcp__agent-builder__run_tests(
+    goal_id="your-goal-id",
+    agent_path="exports/your_agent",
+    fail_fast=True
+)
+```
+
+## Direct pytest Commands
+
+You can also run tests directly with pytest (the MCP tools use pytest internally):
+
+```bash
+# Run all tests
+pytest exports/your_agent/tests/ -v
+
+# Run specific test file
+pytest exports/your_agent/tests/test_constraints.py -v
+
+# Run specific test
+pytest exports/your_agent/tests/test_constraints.py::test_constraint_foo -vvs
+
+# Run in mock mode (structure validation only)
+MOCK_MODE=1 pytest exports/your_agent/tests/ -v
+```
+
+---
+
+**MCP tools generate tests, write them to Python files, and run them via pytest.**
+````
diff --git a/.gitignore b/.gitignore
index adbb2814ac..22078b7d68 100644
--- a/.gitignore
+++ b/.gitignore
@@ -23,6 +23,8 @@ docker-compose.override.yml
 .vscode/*
 !.vscode/extensions.json
 !.vscode/settings.json.example
+!.vscode/mcp.json
+!.vscode/settings.json
 *.swp
 *.swo
 *~
diff --git a/.vscode/mcp.json b/.vscode/mcp.json
new file mode 100644
index 0000000000..1bd389fe37
--- /dev/null
+++ b/.vscode/mcp.json
@@ -0,0 +1,34 @@
+{
+  "servers": {
+    "agent-builder": {
+      "type": "stdio",
+      "command": "uv",
+      "args": [
+        "run",
+        "--directory",
+        "${workspaceFolder}/core",
+        "python",
+        "-m",
+        "framework.mcp.agent_builder_server"
+      ],
+      "env": {
+        "PYTHONPATH": "${workspaceFolder}/tools/src"
+      }
+    },
+    "tools": {
+      "type": "stdio",
+      "command": "uv",
+      "args": [
+        "run",
+        "--directory",
+        "${workspaceFolder}/tools",
+        "python",
+        "mcp_server.py",
+        "--stdio"
+      ],
+      "env": {
+        "PYTHONPATH": "${workspaceFolder}/tools/src:${workspaceFolder}/core"
+      }
+    }
+  }
+}
diff --git a/.vscode/settings.json b/.vscode/settings.json
new file mode 100644
index 0000000000..c303884ae6
--- /dev/null
+++ b/.vscode/settings.json
@@ -0,0 +1,10 @@
+{
+  // Enable Agent Skills (experimental feature as of VS Code 1.108)
+  "chat.useAgentSkills": true,
+
+  // MCP Access Level
+  "chat.mcp.access": "all",
+
+  // Auto-start MCP servers (experimental)
+  "chat.mcp.autostart": "newAndOutdated",
+}
diff --git a/DEVELOPER.md b/DEVELOPER.md
index be3bd6fc10..49643a9adf 100644
--- a/DEVELOPER.md
+++ b/DEVELOPER.md
@@ -26,6 +26,7 @@ Aden Agent Framework is a Python-based system for building goal-driven, self-imp
 | **tools**     | `/tools`   | MCP tools for agent capabilities        | Python 3.11+ |
 | **exports**   | `/exports` | Agent packages (user-created, gitignored) | Python 3.11+ |
 | **skills**    | `.claude`  | Claude Code skills for building/testing | Markdown     |
+| **agents**    | `.github/agents` | VS Code custom agents | Markdown |
 
 ### Key Principles
 
@@ -46,7 +47,10 @@ Ensure you have installed:
 - **Python 3.11+** - [Download](https://www.python.org/downloads/) (3.12 or 3.13 recommended)
 - **uv** - Python package manager ([Install](https://docs.astral.sh/uv/getting-started/installation/))
 - **git** - Version control
-- **Claude Code** - [Install](https://docs.anthropic.com/claude/docs/claude-code) (optional, for using building skills)
+- **IDE** (pick one):
+  - **VS Code** with [GitHub Copilot](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot) - [Setup guide](docs/vscode-copilot-setup.md)
+  - **Cursor** - [Download](https://cursor.sh/)
+  - **Claude Code** - [Install](https://docs.anthropic.com/claude/docs/claude-code)
 
 Verify installation:
 
@@ -217,19 +221,39 @@ hive/                                    # Repository root
 
 ## Building Agents
 
-### Using Claude Code Skills
+### Using IDE Skills/Agents
 
-The fastest way to build agents is using the Claude Code skills:
+The fastest way to build agents is using your IDE's skills or agents. These are installed/configured automatically when you run `./quickstart.sh`.
 
+**Claude Code:**
 ```bash
-# Install skills (one-time)
-./quickstart.sh
-
-# Build a new agent
+# Skills are available in .claude/skills/
+# Use them directly in Claude Code:
 claude> /building-agents-construction
-
-# Test the agent
 claude> /testing-agent
+claude> /agent-workflow
+```
+
+**Cursor:**
+```bash
+# Skills are available via MCP in .cursor/mcp.json
+# Type / in Agent chat to access them:
+/building-agents-construction
+/testing-agent
+/agent-workflow
+```
+
+**VS Code + GitHub Copilot:**
+```bash
+# Custom agents are available in .github/agents/
+# Open Copilot Chat (Cmd/Ctrl + Shift + I)
+# Click the mode dropdown at the top
+# Select a custom agent:
+#   - agent-workflow (for complete workflow)
+#   - building-agents-construction (step-by-step)
+#   - testing-agent (testing)
+# Then type your request:
+Build a customer support agent
 ```
 
 ### Agent Development Workflow
diff --git a/README.md b/README.md
index e1e2cd02e4..fe0c321428 100644
--- a/README.md
+++ b/README.md
@@ -91,7 +91,7 @@ Aden is a platform for building, deploying, operating, and adapting AI agents:
 ## Prerequisites
 
 - Python 3.11+ for agent development
-- Claude Code or Cursor for utilizing agent skills
+- IDE with MCP support: [VS Code](https://code.visualstudio.com/) + [GitHub Copilot](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot), [Cursor](https://cursor.sh/), or [Claude Code](https://claude.ai/)
 
 > **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell.
 
@@ -113,18 +113,38 @@ This sets up:
 
 ### Build Your First Agent
 
+**Claude Code:**
 ```bash
-# Build an agent using Claude Code
+# Use skills directly
 claude> /building-agents-construction
-
-# Test your agent
 claude> /testing-agent
+```
+
+**Cursor:**
+```bash
+# Type / in Agent chat
+/building-agents-construction
+/testing-agent
+```
 
-# Run your agent
-PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}'
+**VS Code + GitHub Copilot:**
+```bash
+# Open Copilot Chat (Cmd/Ctrl + Shift + I)
+# Select a custom agent from the mode dropdown:
+#   - agent-workflow
+#   - building-agents-construction
+#   - testing-agent
+# Then type your request:
+Build a file monitor agent
+```
+
+**Run your agent:**
+```bash
+PYTHONPATH=core:exports python -m your_agent_name run --input '{...}'
 ```
 
 **[📖 Complete Setup Guide](ENVIRONMENT_SETUP.md)** - Detailed instructions for agent development
+**[VS Code + GitHub Copilot Setup](docs/vscode-copilot-setup.md)**
 
 ### Cursor IDE Support
 
diff --git a/docs/vscode-copilot-setup.md b/docs/vscode-copilot-setup.md
new file mode 100644
index 0000000000..edc645d9d5
--- /dev/null
+++ b/docs/vscode-copilot-setup.md
@@ -0,0 +1,295 @@
+# VS Code + GitHub Copilot Setup Guide
+
+This guide helps you set up VS Code with GitHub Copilot to use Hive's MCP servers and custom agents for building and testing AI agents.
+
+## Prerequisites
+
+- **VS Code** version 1.102 or later (for MCP support)
+- **GitHub Copilot** extension installed and activated
+- **Python 3.11+** installed
+- **uv** package manager ([installation guide](https://docs.astral.sh/uv/))
+
+## Quick Start
+
+The Hive repository comes pre-configured with VS Code support! If you cloned the repository, MCP servers and custom agents are already set up.
+
+### 1. Verify Installation
+
+Open VS Code in the Hive repository:
+
+```bash
+cd hive
+code .
+```
+
+### 2. Check MCP Configuration
+
+The `.vscode/mcp.json` file should contain two MCP servers:
+
+- **agent-builder** - Tools for creating and testing agents
+- **tools** - 19 tools for agent capabilities (web search, file operations, etc.)
+
+You can view the configuration:
+
+```bash
+cat .vscode/mcp.json
+```
+
+### 3. Verify Custom Agents
+
+Custom agents are available in `.github/agents/`:
+
+```bash
+ls .github/agents/
+```
+
+You should see 6 `.agent.md` files:
+- `agent-workflow.agent.md` - Complete agent development workflow
+- `building-agents-core.agent.md` - Core concepts and fundamentals
+- `building-agents-construction.agent.md` - Step-by-step agent building
+- `building-agents-patterns.agent.md` - Best practices and patterns
+- `testing-agent.agent.md` - Testing and validation
+- `setup-credentials.agent.md` - Credential management
+
+### 4. Enable MCP in VS Code Settings
+
+The `.vscode/settings.json` is pre-configured with:
+
+```jsonc
+{
+  // Enable Agent Skills (experimental)
+  "chat.useAgentSkills": true,
+
+  // Enable MCP servers
+  "chat.mcp.access": true,
+
+  // Auto-start MCP servers
+  "chat.mcp.autostart": true
+}
+```
+
+### 5. Test the Setup
+
+Open GitHub Copilot Chat (Cmd/Ctrl + Shift + I):
+
+1. Click the mode dropdown at the top of the chat panel
+2. You'll see standard modes (Ask, Plan, Agent) plus your custom agents:
+   - `agent-workflow`
+   - `building-agents-construction`
+   - `building-agents-core`
+   - `building-agents-patterns`
+   - `testing-agent`
+   - `setup-credentials`
+
+3. Select a custom agent (e.g., `agent-workflow`)
+4. Ask the agent to help you:
+   ```
+   Build a simple file monitor agent
+   ```
+
+The custom agent will guide you through the process with access to MCP tools.
+
+## Understanding the Setup
+
+### MCP Configuration (`.vscode/mcp.json`)
+
+MCP servers provide tools that GitHub Copilot can use. The Hive repository includes two servers:
+
+#### agent-builder Server
+
+Provides tools for building and testing agents:
+- `create_session` - Start a new agent build session
+- `add_node` - Add nodes to agent workflow
+- `add_edge` - Connect nodes with edges
+- `set_goal` - Define agent goals and success criteria
+- `test_node` - Validate node configuration
+- `validate_graph` - Check agent structure
+- `generate_constraint_tests` - Create constraint tests
+- `generate_success_tests` - Create success criteria tests
+- `run_tests` - Execute agent tests
+- `debug_test` - Debug test failures
+
+#### tools Server
+
+Provides 19 operational tools:
+- **Web**: `web_search`, `web_scrape`, `fetch_webpage`
+- **Files**: `read_file`, `write_file`, `list_directory`, `file_search`
+- **Shell**: `run_command`
+- **Git**: `git_status`, `git_diff`, `git_commit`
+- **AI**: `llm_generate`, `llm_extract_json`
+- And more...
+
+### Custom Agents (`.github/agents/*.agent.md`)
+
+Custom agents are specialized assistants that guide specific tasks. They have access to MCP tools and workspace context.
+
+#### Available Agents
+
+1. **agent-workflow** - Orchestrates the complete agent development process from concept to production
+2. **building-agents-core** - Teaches agent architecture, node types, and core concepts
+3. **building-agents-construction** - Guides step-by-step agent building with interactive approval
+4. **building-agents-patterns** - Provides best practices, design patterns, and anti-patterns
+5. **testing-agent** - Creates and runs comprehensive test suites for agents
+6. **setup-credentials** - Manages API keys and credentials securely
+
+#### Using Custom Agents
+
+To use a custom agent:
+
+1. Open Copilot Chat (Cmd/Ctrl + Shift + I)
+2. Click the **mode dropdown** at the top of the chat panel
+3. Select the specific custom agent you want to use (they appear alongside Ask, Plan, and Agent modes):
+   - **agent-workflow** - "I want to build a sales prospecting agent"
+   - **building-agents-core** - "Explain node types and agent architecture"
+   - **building-agents-construction** - "Create a new agent step by step"
+   - **building-agents-patterns** - "Show me best practices for error handling"
+   - **testing-agent** - "Test the agent in exports/my_agent"
+   - **setup-credentials** - "Configure credentials for hubspot-agent"
+4. Type your request in the chat
+
+The selected custom agent will guide you through the task with specialized knowledge and access to MCP tools.
+
+## Troubleshooting
+
+### MCP Servers Not Starting
+
+**Symptoms**: Copilot doesn't have access to MCP tools
+
+**Solutions**:
+
+1. Check VS Code version (must be 1.102+)
+2. Verify `uv` is installed: `uv --version`
+3. Check VS Code Output panel → "MCP" for error messages
+4. Manually restart MCP servers:
+   - Open Command Palette (Cmd/Ctrl + Shift + P)
+   - Run: "GitHub Copilot: Restart MCP Servers"
+
+### Custom Agents Not Available
+
+**Symptoms**: Custom agents don't appear in the mode dropdown
+
+**Solutions**:
+
+1. Verify `chat.useAgentSkills` is `true` in `.vscode/settings.json`
+2. Check `.github/agents/` directory exists with 6 `.agent.md` files
+3. Reload VS Code window: "Developer: Reload Window" (Cmd/Ctrl + Shift + P)
+4. Check VS Code version (1.108+ required for custom agents)
+5. Ensure GitHub Copilot extension is up to date
+
+### Permission Errors
+
+**Symptoms**: "Permission denied" when MCP tries to run Python
+
+**Solutions**:
+
+1. Ensure Python 3.11+ is in PATH: `python --version`
+2. Verify `uv` can run Python: `uv run python --version`
+3. Check file permissions on `core/` and `tools/` directories
+
+### Tool Import Errors
+
+**Symptoms**: MCP server fails with "ModuleNotFoundError"
+
+**Solutions**:
+
+1. Install dependencies:
+   ```bash
+   cd core && uv sync
+   cd ../tools && uv sync
+   ```
+
+2. Verify PYTHONPATH in `.vscode/mcp.json`:
+   ```json
+   "env": {
+     "PYTHONPATH": "${workspaceFolder}/tools/src:${workspaceFolder}/core"
+   }
+   ```
+
+## Advanced Configuration
+
+### Adding Custom MCP Servers
+
+To add your own MCP servers, edit `.vscode/mcp.json`:
+
+```json
+{
+  "servers": {
+    "my-server": {
+      "type": "stdio",
+      "command": "uv",
+      "args": [
+        "run",
+        "--directory",
+        "${workspaceFolder}/my-server",
+        "python",
+        "server.py"
+      ],
+      "env": {
+        "PYTHONPATH": "${workspaceFolder}/my-server"
+      }
+    }
+  }
+}
+```
+
+### Creating Custom Agents
+
+Create a new `.agent.md` file in `.github/agents/`:
+
+```markdown
+---
+description: Your agent description
+name: My Custom Agent
+tools: ['agent-builder/*', 'tools/*']
+target: vscode
+---
+
+# Your Agent Content
+
+Instructions and guidance for your custom agent...
+```
+
+### Environment Variables
+
+MCP servers can access environment variables. Common ones:
+
+- `ANTHROPIC_API_KEY` - For LLM calls in agents
+- `HUBSPOT_ACCESS_TOKEN` - For HubSpot integration tools
+- `BRAVE_SEARCH_API_KEY` - For web search tools
+
+Set these in your shell profile (`~/.bashrc`, `~/.zshrc`) or use the `setup-credentials` agent.
+
+## Differences from Other IDEs
+
+| Feature | VS Code | Cursor | Claude Code |
+|---------|---------|--------|-------------|
+| **MCP Config** | `.vscode/mcp.json` | `.cursor/mcp.json` | Built-in |
+| **Agents/Skills** | `.github/agents/*.agent.md` | Symlinks in `.cursor/skills/` | `.claude/skills/` |
+| **Path Variables** | `${workspaceFolder}` | Relative paths | Relative paths |
+| **Discovery** | Workspace settings | IDE-specific | Built-in |
+| **Setup** | Pre-configured | Pre-configured | Pre-configured |
+
+All IDEs in this repository have equivalent functionality - choose based on your preference!
+
+## Next Steps
+
+- **Build your first agent**: Open Copilot Chat, select `agent-workflow` from the mode dropdown, and describe what you want to build
+- **Read the docs**: Check `docs/getting-started.md` for tutorials
+- **Explore examples**: See `exports/` for example agents
+- **Join the community**: [Discord](https://discord.com/invite/MXE49hrKDk)
+
+## Resources
+
+- [VS Code MCP Documentation](https://code.visualstudio.com/docs/copilot/customization/mcp-servers)
+- [VS Code Custom Agents Documentation](https://code.visualstudio.com/docs/copilot/customization/custom-agents)
+- [Hive Documentation](https://docs.adenhq.com/)
+- [GitHub Copilot Extension](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot)
+
+## Support
+
+Having issues? 
+
+1. Check the [troubleshooting section](#troubleshooting) above
+2. Search [GitHub Issues](https://github.com/adenhq/hive/issues)
+3. Ask on [Discord](https://discord.com/invite/MXE49hrKDk)
+4. [Open a new issue](https://github.com/adenhq/hive/issues/new)