diff --git a/.github/agents/agent-workflow.agent.md b/.github/agents/agent-workflow.agent.md new file mode 100644 index 0000000000..7d2065cc6a --- /dev/null +++ b/.github/agents/agent-workflow.agent.md @@ -0,0 +1,455 @@ +--- +description: Complete workflow for building, implementing, and testing goal-driven agents. Orchestrates building-agents-core, building-agents-construction, building-agents-patterns, testing-agent, and setup-credentials skills. Use when starting a new agent project, unsure which skill to use, or need end-to-end guidance. +name: Agent Workflow +tools: ['agent-builder/*', 'tools/*'] +target: vscode +--- + +# Agent Development Workflow + +Complete Standard Operating Procedure (SOP) for building production-ready goal-driven agents. + +## Overview + +This workflow orchestrates specialized skills to take you from initial concept to production-ready agent: + +1. **Understand Concepts** → building-agents-core (optional) +2. **Build Structure** → building-agents-construction +3. **Optimize Design** → building-agents-patterns (optional) +4. **Setup Credentials** → setup-credentials (if agent uses tools requiring API keys) +5. **Test & Validate** → testing-agent + +## When to Use This Workflow + +Use this skill when: +- Starting a new agent from scratch +- Unclear which skill to use first +- Need end-to-end guidance for agent development +- Want consistent, repeatable agent builds + +**Skip this workflow** if: +- You only need to test an existing agent → use testing-agent directly +- You know exactly which phase you're in → use specific skill directly + +## Quick Decision Tree + +``` +"Need to understand agent concepts" → building-agents-core +"Build a new agent" → building-agents-construction +"Optimize my agent design" → building-agents-patterns +"Set up API keys for my agent" → setup-credentials +"Test my agent" → testing-agent +"Not sure what I need" → Read phases below, then decide +"Agent has structure but needs implementation" → See agent directory STATUS.md +``` + +## Phase 0: Understand Concepts (Optional) + +**Duration**: 5-10 minutes +**Skill**: building-agents-core +**Input**: Questions about agent architecture + +### When to Use + +- First time building an agent +- Need to understand node types, edges, goals +- Want to validate tool availability +- Learning about pause/resume architecture + +### What This Phase Provides + +- Architecture overview (Python packages, not JSON) +- Core concepts (Goal, Node, Edge, Pause/Resume) +- Tool discovery and validation procedures +- Workflow overview + +**Skip this phase** if you already understand agent fundamentals. + +## Phase 1: Build Agent Structure + +**Duration**: 15-30 minutes +**Skill**: building-agents-construction +**Input**: User requirements ("Build an agent that...") + +### What This Phase Does + +Creates the complete agent architecture: +- Package structure (`exports/agent_name/`) +- Goal with success criteria and constraints +- Workflow graph (nodes and edges) +- Node specifications +- CLI interface +- Documentation + +### Process + +1. **Create package** - Directory structure with skeleton files +2. **Define goal** - Success criteria and constraints written to agent.py +3. **Design nodes** - Each node approved and written incrementally +4. **Connect edges** - Workflow graph with conditional routing +5. **Finalize** - Agent class, exports, and documentation + +### Outputs + +- ✅ `exports/agent_name/` package created +- ✅ Goal defined in agent.py +- ✅ 3-5 success criteria defined +- ✅ 1-5 constraints defined +- ✅ 5-10 nodes specified in nodes/__init__.py +- ✅ 8-15 edges connecting workflow +- ✅ Validated structure (passes `python -m agent_name validate`) +- ✅ README.md with usage instructions +- ✅ CLI commands (info, validate, run, shell) + +### Success Criteria + +You're ready for Phase 2 when: +- Agent structure validates without errors +- All nodes and edges are defined +- CLI commands work (info, validate) +- You see: "Agent complete: exports/agent_name/" + +### Common Outputs + +The building-agents-construction skill produces: +``` +exports/agent_name/ +├── __init__.py (package exports) +├── __main__.py (CLI interface) +├── agent.py (goal, graph, agent class) +├── nodes/__init__.py (node specifications) +├── config.py (configuration) +├── implementations.py (may be created for Python functions) +└── README.md (documentation) +``` + +### Next Steps + +**If structure complete and validated:** +→ Check `exports/agent_name/STATUS.md` or `IMPLEMENTATION_GUIDE.md` +→ These files explain implementation options +→ You may need to add Python functions or MCP tools (not covered by current skills) + +**If want to optimize design:** +→ Proceed to Phase 1.5 (building-agents-patterns) + +**If ready to test:** +→ Proceed to Phase 2 + +## Phase 1.5: Optimize Design (Optional) + +**Duration**: 10-15 minutes +**Skill**: building-agents-patterns +**Input**: Completed agent structure + +### When to Use + +- Want to add pause/resume functionality +- Need error handling patterns +- Want to optimize performance +- Need examples of complex routing +- Want best practices guidance + +### What This Phase Provides + +- Practical examples and patterns +- Pause/resume architecture +- Error handling strategies +- Anti-patterns to avoid +- Performance optimization techniques + +**Skip this phase** if your agent design is straightforward. + +## Phase 2: Test & Validate + +**Duration**: 20-40 minutes +**Skill**: testing-agent +**Input**: Working agent from Phase 1 + +### What This Phase Does + +Creates comprehensive test suite: +- Constraint tests (verify hard requirements) +- Success criteria tests (measure goal achievement) +- Edge case tests (handle failures gracefully) +- Integration tests (end-to-end workflows) + +### Process + +1. **Analyze agent** - Read goal, constraints, success criteria +2. **Generate tests** - Create pytest files in `exports/agent_name/tests/` +3. **User approval** - Review and approve each test +4. **Run evaluation** - Execute tests and collect results +5. **Debug failures** - Identify and fix issues +6. **Iterate** - Repeat until all tests pass + +### Outputs + +- ✅ Test files in `exports/agent_name/tests/` +- ✅ Test report with pass/fail metrics +- ✅ Coverage of all success criteria +- ✅ Coverage of all constraints +- ✅ Edge case handling verified + +### Success Criteria + +You're done when: +- All tests pass +- All success criteria validated +- All constraints verified +- Agent handles edge cases +- Test coverage is comprehensive + +### Next Steps + +**Agent ready for:** +- Production deployment +- Integration into larger systems +- Documentation and handoff +- Continuous monitoring + +## Phase Transitions + +### From Phase 1 to Phase 2 + +**Trigger signals:** +- "Agent complete: exports/..." +- Structure validation passes +- README indicates implementation complete + +**Before proceeding:** +- Verify agent can be imported: `from exports.agent_name import default_agent` +- Check if implementation is needed (see STATUS.md or IMPLEMENTATION_GUIDE.md) +- Confirm agent executes without import errors + +### Skipping Phases + +**When to skip Phase 1:** +- Agent structure already exists +- Only need to add tests +- Modifying existing agent + +**When to skip Phase 2:** +- Prototyping or exploring +- Agent not production-bound +- Manual testing sufficient + +## Common Patterns + +### Pattern 1: Complete New Build (Simple) + +``` +User: "Build an agent that monitors files" +→ Use building-agents-construction +→ Agent structure created +→ Use testing-agent +→ Tests created and passing +→ Done: Production-ready agent +``` + +### Pattern 1b: Complete New Build (With Learning) + +``` +User: "Build an agent (first time)" +→ Use building-agents-core (understand concepts) +→ Use building-agents-construction (build structure) +→ Use building-agents-patterns (optimize design) +→ Use testing-agent (validate) +→ Done: Production-ready agent +``` + +### Pattern 2: Test Existing Agent + +``` +User: "Test my agent at exports/my_agent" +→ Skip Phase 1 +→ Use testing-agent directly +→ Tests created +→ Done: Validated agent +``` + +### Pattern 3: Iterative Development + +``` +User: "Build an agent" +→ Use building-agents-construction (Phase 1) +→ Implementation needed (see STATUS.md) +→ [User implements functions] +→ Use testing-agent (Phase 2) +→ Tests reveal bugs +→ [Fix bugs manually] +→ Re-run tests +→ Done: Working agent +``` + +### Pattern 4: Complex Agent with Patterns + +``` +User: "Build an agent with multi-turn conversations" +→ Use building-agents-core (learn pause/resume) +→ Use building-agents-construction (build structure) +→ Use building-agents-patterns (implement pause/resume pattern) +→ Use testing-agent (validate conversation flows) +→ Done: Complex conversational agent +``` + +## Skill Dependencies + +``` +agent-workflow (meta-skill) + │ + ├── building-agents-core (foundational) + │ ├── Architecture concepts + │ ├── Node/Edge/Goal definitions + │ ├── Tool discovery procedures + │ └── Workflow overview + │ + ├── building-agents-construction (procedural) + │ ├── Creates package structure + │ ├── Defines goal + │ ├── Adds nodes incrementally + │ ├── Connects edges + │ ├── Finalizes agent class + │ └── Requires: building-agents-core + │ + ├── building-agents-patterns (reference) + │ ├── Best practices + │ ├── Pause/resume patterns + │ ├── Error handling + │ ├── Anti-patterns + │ └── Performance optimization + │ + └── testing-agent + ├── Reads agent goal + ├── Generates tests + ├── Runs evaluation + └── Reports results +``` + +## Troubleshooting + +### "Agent structure won't validate" + +- Check node IDs match between nodes/__init__.py and agent.py +- Verify all edges reference valid node IDs +- Ensure entry_node exists in nodes list +- Run: `PYTHONPATH=core:exports python -m agent_name validate` + +### "Agent has structure but won't run" + +- Check for STATUS.md or IMPLEMENTATION_GUIDE.md in agent directory +- Implementation may be needed (Python functions or MCP tools) +- This is expected - building-agents-construction creates structure, not implementation +- See implementation guide for completion options + +### "Tests are failing" + +- Review test output for specific failures +- Check agent goal and success criteria +- Verify constraints are met +- Use testing-agent to debug and iterate +- Fix agent code and re-run tests + +### "Not sure which phase I'm in" + +Run these checks: + +```bash +# Check if agent structure exists +ls exports/my_agent/agent.py + +# Check if it validates +PYTHONPATH=core:exports python -m my_agent validate + +# Check if tests exist +ls exports/my_agent/tests/ + +# If structure exists and validates → Phase 2 (testing) +# If structure doesn't exist → Phase 1 (building) +# If tests exist but failing → Debug phase +``` + +## Best Practices + +### For Phase 1 (Building) + +1. **Start with clear requirements** - Know what the agent should do +2. **Define success criteria early** - Measurable goals drive design +3. **Keep nodes focused** - One responsibility per node +4. **Use descriptive names** - Node IDs should explain purpose +5. **Validate incrementally** - Check structure after each major addition + +### For Phase 2 (Testing) + +1. **Test constraints first** - Hard requirements must pass +2. **Mock external dependencies** - Use mock mode for LLMs/APIs +3. **Cover edge cases** - Test failures, not just success paths +4. **Iterate quickly** - Fix one test at a time +5. **Document test patterns** - Future tests follow same structure + +### General Workflow + +1. **Use version control** - Git commit after each phase +2. **Document decisions** - Update README with changes +3. **Keep iterations small** - Build → Test → Fix → Repeat +4. **Preserve working states** - Tag successful iterations +5. **Learn from failures** - Failed tests reveal design issues + +## Exit Criteria + +You're done with the workflow when: + +✅ Agent structure validates +✅ All tests pass +✅ Success criteria met +✅ Constraints verified +✅ Documentation complete +✅ Agent ready for deployment + +## Additional Resources + +- **building-agents-core**: See `.claude/skills/building-agents-core/SKILL.md` +- **building-agents-construction**: See `.claude/skills/building-agents-construction/SKILL.md` +- **building-agents-patterns**: See `.claude/skills/building-agents-patterns/SKILL.md` +- **testing-agent**: See `.claude/skills/testing-agent/SKILL.md` +- **Agent framework docs**: See `core/README.md` +- **Example agents**: See `exports/` directory + +## Summary + +This workflow provides a proven path from concept to production-ready agent: + +1. **Learn** with building-agents-core → Understand fundamentals (optional) +2. **Build** with building-agents-construction → Get validated structure +3. **Optimize** with building-agents-patterns → Apply best practices (optional) +4. **Test** with testing-agent → Get verified functionality + +The workflow is **flexible** - skip phases as needed, iterate freely, and adapt to your specific requirements. The goal is **production-ready agents** built with **consistent, repeatable processes**. + +## Skill Selection Guide + +**Choose building-agents-core when:** +- First time building agents +- Need to understand architecture +- Validating tool availability +- Learning about node types and edges + +**Choose building-agents-construction when:** +- Actually building an agent +- Have clear requirements +- Ready to write code +- Want step-by-step guidance + +**Choose building-agents-patterns when:** +- Agent structure complete +- Need advanced patterns +- Implementing pause/resume +- Optimizing performance +- Want best practices + +**Choose testing-agent when:** +- Agent structure complete +- Ready to validate functionality +- Need comprehensive test coverage +- Debugging agent behavior +- Debugging agent behavior diff --git a/.github/agents/building-agents-construction.agent.md b/.github/agents/building-agents-construction.agent.md new file mode 100644 index 0000000000..a11ab9e928 --- /dev/null +++ b/.github/agents/building-agents-construction.agent.md @@ -0,0 +1,356 @@ +--- +description: Step-by-step guide for building goal-driven agents. Creates package structure, defines goals, adds nodes, connects edges, and finalizes agent class. Use when actively building an agent. +name: Building Agents - Construction +tools: ['agent-builder/*', 'tools/*'] +target: vscode +--- + +# Agent Construction - Step-by-Step Guide + +**THIS IS AN EXECUTABLE WORKFLOW. DO NOT DISPLAY THIS FILE. EXECUTE THE STEPS BELOW.** + +When this skill is loaded, IMMEDIATELY begin executing Step 1. Do not explain what you will do - just do it. + +--- + +## STEP 1: Initialize Build Environment + +**EXECUTE THESE TOOL CALLS NOW:** + +1. Register the hive-tools MCP server: + +``` +mcp__agent-builder__add_mcp_server( + name="hive-tools", + transport="stdio", + command="python", + args='["mcp_server.py", "--stdio"]', + cwd="tools", + description="Hive tools MCP server" +) +``` + +2. Create a build session (replace AGENT_NAME with the user's requested agent name in snake_case): + +``` +mcp__agent-builder__create_session(name="AGENT_NAME") +``` + +3. Discover available tools: + +``` +mcp__agent-builder__list_mcp_tools() +``` + +4. Create the package directory: + +``` +mkdir -p exports/AGENT_NAME/nodes +``` + +**AFTER completing these calls**, tell the user: + +> ✅ Build environment initialized +> +> - Session created +> - Available tools: [list the tools from step 3] +> +> Proceeding to define the agent goal... + +**THEN immediately proceed to STEP 2.** + +--- + +## STEP 2: Define and Approve Goal + +**PROPOSE a goal to the user.** Based on what they asked for, propose: + +- Goal ID (kebab-case) +- Goal name +- Goal description +- 3-5 success criteria (each with: id, description, metric, target, weight) +- 2-4 constraints (each with: id, description, constraint_type, category) + +**FORMAT your proposal as a clear summary, then ask for approval:** + +> **Proposed Goal: [Name]** +> +> [Description] +> +> **Success Criteria:** +> +> 1. [criterion 1] +> 2. [criterion 2] +> ... +> +> **Constraints:** +> +> 1. [constraint 1] +> 2. [constraint 2] +> ... + +**THEN call AskUserQuestion:** + +``` +AskUserQuestion(questions=[{ + "question": "Do you approve this goal definition?", + "header": "Goal", + "options": [ + {"label": "Approve", "description": "Goal looks good, proceed"}, + {"label": "Modify", "description": "I want to change something"} + ], + "multiSelect": false +}]) +``` + +**WAIT for user response.** + +- If **Approve**: Call `mcp__agent-builder__set_goal(...)` with the goal details, then proceed to STEP 3 +- If **Modify**: Ask what they want to change, update proposal, ask again + +--- + +## STEP 3: Design Node Workflow + +**BEFORE designing nodes**, review the available tools from Step 1. Nodes can ONLY use tools that exist. + +**DESIGN the workflow** as a series of nodes. For each node, determine: + +- node_id (kebab-case) +- name +- description +- node_type: `"llm_generate"` (no tools) or `"llm_tool_use"` (uses tools) +- input_keys (what data this node receives) +- output_keys (what data this node produces) +- tools (ONLY tools that exist - empty list for llm_generate) +- system_prompt + +**PRESENT the workflow to the user:** + +> **Proposed Workflow: [N] nodes** +> +> 1. **[node-id]** - [description] +> +> - Type: [llm_generate/llm_tool_use] +> - Input: [keys] +> - Output: [keys] +> - Tools: [tools or "none"] +> +> 2. **[node-id]** - [description] +> ... +> +> **Flow:** node1 → node2 → node3 → ... + +**THEN call AskUserQuestion:** + +``` +AskUserQuestion(questions=[{ + "question": "Do you approve this workflow design?", + "header": "Workflow", + "options": [ + {"label": "Approve", "description": "Workflow looks good, proceed to build nodes"}, + {"label": "Modify", "description": "I want to change the workflow"} + ], + "multiSelect": false +}]) +``` + +**WAIT for user response.** + +- If **Approve**: Proceed to STEP 4 +- If **Modify**: Ask what they want to change, update design, ask again + +--- + +## STEP 4: Build Nodes One by One + +**FOR EACH node in the approved workflow:** + +1. **Call** `mcp__agent-builder__add_node(...)` with the node details + + - input_keys and output_keys must be JSON strings: `'["key1", "key2"]'` + - tools must be a JSON string: `'["tool1"]'` or `'[]'` + +2. **Call** `mcp__agent-builder__test_node(...)` to validate: + +``` +mcp__agent-builder__test_node( + node_id="the-node-id", + test_input='{"key": "test value"}', + mock_llm_response='{"output_key": "test output"}' +) +``` + +3. **Check result:** + + - If valid: Tell user "✅ Node [id] validated" and continue to next node + - If invalid: Show errors, fix the node, re-validate + +4. **Show progress** after each node: + +``` +mcp__agent-builder__get_session_status() +``` + +> ✅ Node [X] of [Y] complete: [node-id] + +**AFTER all nodes are added and validated**, proceed to STEP 5. + +--- + +## STEP 5: Connect Edges + +**DETERMINE the edges** based on the workflow flow. For each connection: + +- edge_id (kebab-case) +- source (node that outputs) +- target (node that receives) +- condition: `"on_success"`, `"always"`, `"on_failure"`, or `"conditional"` +- condition_expr (Python expression, only if conditional) +- priority (integer, lower = higher priority) + +**FOR EACH edge, call:** + +``` +mcp__agent-builder__add_edge( + edge_id="source-to-target", + source="source-node-id", + target="target-node-id", + condition="on_success", + condition_expr="", + priority=1 +) +``` + +**AFTER all edges are added, validate the graph:** + +``` +mcp__agent-builder__validate_graph() +``` + +- If valid: Tell user "✅ Graph structure validated" and proceed to STEP 6 +- If invalid: Show errors, fix edges, re-validate + +--- + +## STEP 6: Generate Agent Package + +**EXPORT the graph data:** + +``` +mcp__agent-builder__export_graph() +``` + +This returns JSON with all the goal, nodes, edges, and MCP server configurations. + +**THEN write the Python package files** using the exported data. Create these files in `exports/AGENT_NAME/`: + +1. `config.py` - Runtime configuration with model settings +2. `nodes/__init__.py` - All NodeSpec definitions +3. `agent.py` - Goal, edges, graph config, and agent class +4. `__init__.py` - Package exports +5. `__main__.py` - CLI interface +6. `mcp_servers.json` - MCP server configurations +7. `README.md` - Usage documentation + +**IMPORTANT entry_points format:** + +- MUST be: `{"start": "first-node-id"}` +- NOT: `{"first-node-id": ["input_keys"]}` (WRONG) +- NOT: `{"first-node-id"}` (WRONG - this is a set) + +**Use the example agent** at `.claude/skills/building-agents-construction/examples/online_research_agent/` as a template for file structure and patterns. + +**AFTER writing all files, tell the user:** + +> ✅ Agent package created: `exports/AGENT_NAME/` +> +> **Files generated:** +> +> - `__init__.py` - Package exports +> - `agent.py` - Goal, nodes, edges, agent class +> - `config.py` - Runtime configuration +> - `__main__.py` - CLI interface +> - `nodes/__init__.py` - Node definitions +> - `mcp_servers.json` - MCP server config +> - `README.md` - Usage documentation +> +> **Test your agent:** +> +> ```bash +> cd /home/timothy/oss/hive +> PYTHONPATH=core:exports python -m AGENT_NAME validate +> PYTHONPATH=core:exports python -m AGENT_NAME info +> ``` + +--- + +## STEP 7: Verify and Test + +**RUN validation:** + +```bash +cd /home/timothy/oss/hive && PYTHONPATH=core:exports python -m AGENT_NAME validate +``` + +- If valid: Agent is complete! +- If errors: Fix the issues and re-run + +**SHOW final session summary:** + +``` +mcp__agent-builder__get_session_status() +``` + +**TELL the user the agent is ready** and suggest next steps: + +- Run with mock mode to test without API calls +- Use testing-agent skill for comprehensive testing +- Use setup-credentials if the agent needs API keys + +--- + +## REFERENCE: Node Types + +| Type | tools param | Use when | +| -------------- | ---------------------- | ---------------------------------------------- | +| `llm_generate` | `'[]'` | Pure reasoning, JSON output, no external calls | +| `llm_tool_use` | `'["tool1", "tool2"]'` | Needs to call MCP tools | + +--- + +## REFERENCE: Edge Conditions + +| Condition | When edge is followed | +| ------------- | ------------------------------------- | +| `on_success` | Source node completed successfully | +| `on_failure` | Source node failed | +| `always` | Always, regardless of success/failure | +| `conditional` | When condition_expr evaluates to True | + +--- + +## REFERENCE: System Prompt Best Practice + +For nodes with JSON output, include this in the system_prompt: + +``` +CRITICAL: Return ONLY raw JSON. NO markdown, NO code blocks. +Just the JSON object starting with { and ending with }. + +Return this exact structure: +{ + "key1": "...", + "key2": "..." +} +``` + +--- + +## COMMON MISTAKES TO AVOID + +1. **Using tools that don't exist** - Always check `mcp__agent-builder__list_mcp_tools()` first +2. **Wrong entry_points format** - Must be `{"start": "node-id"}`, NOT a set or list +3. **Skipping validation** - Always validate nodes and graph before proceeding +4. **Not waiting for approval** - Always ask user before major steps +5. **Displaying this file** - Execute the steps, don't show documentation diff --git a/.github/agents/building-agents-core.agent.md b/.github/agents/building-agents-core.agent.md new file mode 100644 index 0000000000..561ea137bb --- /dev/null +++ b/.github/agents/building-agents-core.agent.md @@ -0,0 +1,299 @@ +--- +description: Core concepts for goal-driven agents - architecture, node types, tool discovery, and workflow overview. Use when starting agent development or need to understand agent fundamentals. +name: Building Agents - Core Concepts +tools: ['agent-builder/*', 'tools/*'] +target: vscode +--- + +# Building Agents - Core Concepts + +Foundational knowledge for building goal-driven agents as Python packages. + +## Architecture: Python Services (Not JSON Configs) + +Agents are built as Python packages: + +``` +exports/my_agent/ +├── __init__.py # Package exports +├── __main__.py # CLI (run, info, validate, shell) +├── agent.py # Graph construction (goal, edges, agent class) +├── nodes/__init__.py # Node definitions (NodeSpec) +├── config.py # Runtime config +└── README.md # Documentation +``` + +**Key Principle: Agent is visible and editable during build** + +- ✅ Files created immediately as components are approved +- ✅ User can watch files grow in their editor +- ✅ No session state - just direct file writes +- ✅ No "export" step - agent is ready when build completes + +## Core Concepts + +### Goal + +Success criteria and constraints (written to agent.py) + +```python +goal = Goal( + id="research-goal", + name="Technical Research Agent", + description="Research technical topics thoroughly", + success_criteria=[ + SuccessCriterion( + id="completeness", + description="Cover all aspects of topic", + metric="coverage_score", + target=">=0.9", + weight=0.4, + ), + # 3-5 success criteria total + ], + constraints=[ + Constraint( + id="accuracy", + description="All information must be verified", + constraint_type="hard", + category="quality", + ), + # 1-5 constraints total + ], +) +``` + +### Node + +Unit of work (written to nodes/__init__.py) + +**Node Types:** + +- `llm_generate` - Text generation, parsing +- `llm_tool_use` - Actions requiring tools +- `router` - Conditional branching +- `function` - Deterministic operations + +```python +search_node = NodeSpec( + id="search-web", + name="Search Web", + description="Search for information online", + node_type="llm_tool_use", + input_keys=["query"], + output_keys=["search_results"], + system_prompt="Search the web for: {query}", + tools=["web_search"], + max_retries=3, +) +``` + +### Edge + +Connection between nodes (written to agent.py) + +**Edge Conditions:** + +- `on_success` - Proceed if node succeeds +- `on_failure` - Handle errors +- `always` - Always proceed +- `conditional` - Based on expression + +```python +EdgeSpec( + id="search-to-analyze", + source="search-web", + target="analyze-results", + condition=EdgeCondition.ON_SUCCESS, + priority=1, +) +``` + +### Pause/Resume + +Multi-turn conversations + +- **Pause nodes** - Stop execution, wait for user input +- **Resume entry points** - Continue from pause with user's response + +```python +# Example pause/resume configuration +pause_nodes = ["request-clarification"] +entry_points = { + "start": "analyze-request", + "request-clarification_resume": "process-clarification" +} +``` + +## Tool Discovery & Validation + +**CRITICAL:** Before adding a node with tools, you MUST verify the tools exist. + +Tools are provided by MCP servers. Never assume a tool exists - always discover dynamically. + +### Step 1: Register MCP Server (if not already done) + +```python +mcp__agent-builder__add_mcp_server( + name="tools", + transport="stdio", + command="python", + args='["mcp_server.py", "--stdio"]', + cwd="../tools" +) +``` + +### Step 2: Discover Available Tools + +```python +# List all tools from all registered servers +mcp__agent-builder__list_mcp_tools() + +# Or list tools from a specific server +mcp__agent-builder__list_mcp_tools(server_name="tools") +``` + +This returns available tools with their descriptions and parameters: + +```json +{ + "success": true, + "tools_by_server": { + "tools": [ + { + "name": "web_search", + "description": "Search the web...", + "parameters": ["query"] + }, + { + "name": "web_scrape", + "description": "Scrape a URL...", + "parameters": ["url"] + } + ] + }, + "total_tools": 14 +} +``` + +### Step 3: Validate Before Adding Nodes + +Before writing a node with `tools=[...]`: + +1. Call `list_mcp_tools()` to get available tools +2. Check each tool in your node exists in the response +3. If a tool doesn't exist: + - **DO NOT proceed** with the node + - Inform the user: "The tool 'X' is not available. Available tools are: ..." + - Ask if they want to use an alternative or proceed without the tool + +### Tool Validation Anti-Patterns + +❌ **Never assume a tool exists** - always call `list_mcp_tools()` first +❌ **Never write a node with unverified tools** - validate before writing +❌ **Never silently drop tools** - if a tool doesn't exist, inform the user +❌ **Never guess tool names** - use exact names from discovery response + +### Example Validation Flow + +```python +# 1. User requests: "Add a node that searches the web" +# 2. Discover available tools +tools_response = mcp__agent-builder__list_mcp_tools() + +# 3. Check if web_search exists +available = [t["name"] for tools in tools_response["tools_by_server"].values() for t in tools] +if "web_search" not in available: + # Inform user and ask how to proceed + print("❌ 'web_search' not available. Available tools:", available) +else: + # Proceed with node creation + # ... +``` + +## Workflow Overview: Incremental File Construction + +``` +1. CREATE PACKAGE → mkdir + write skeletons +2. DEFINE GOAL → Write to agent.py + config.py +3. FOR EACH NODE: + - Propose design + - User approves + - Write to nodes/__init__.py IMMEDIATELY ← FILE WRITTEN + - (Optional) Validate with test_node ← MCP VALIDATION + - User can open file and see it +4. CONNECT EDGES → Update agent.py ← FILE WRITTEN + - (Optional) Validate with validate_graph ← MCP VALIDATION +5. FINALIZE → Write agent class to agent.py ← FILE WRITTEN +6. DONE - Agent ready at exports/my_agent/ +``` + +**Files written immediately. MCP tools optional for validation/testing.** + +### The Key Difference + +**OLD (Bad):** + +``` +MCP add_node → Session State → MCP add_node → Session State → ... + ↓ + MCP export_graph + ↓ + Files appear +``` + +**NEW (Good):** + +``` +Write node to file → (Optional: MCP test_node) → Write node to file → ... + ↓ ↓ + File visible File visible + immediately immediately +``` + +**Bottom line:** Use Write/Edit for construction, MCP for validation if needed. + +## When to Use This Skill + +Use building-agents-core when: +- Starting a new agent project and need to understand fundamentals +- Need to understand agent architecture before building +- Want to validate tool availability before proceeding +- Learning about node types, edges, and graph execution + +**Next Steps:** +- Ready to build? → Use building-agents-construction skill +- Need patterns and examples? → Use building-agents-patterns skill + +## MCP Tools for Validation + +After writing files, optionally use MCP tools for validation: + +**test_node** - Validate node configuration with mock inputs +```python +mcp__agent-builder__test_node( + node_id="search-web", + test_input='{"query": "test query"}', + mock_llm_response='{"results": "mock output"}' +) +``` + +**validate_graph** - Check graph structure +```python +mcp__agent-builder__validate_graph() +# Returns: unreachable nodes, missing connections, etc. +``` + +**create_session** - Track session state for bookkeeping +```python +mcp__agent-builder__create_session(session_name="my-build") +``` + +**Key Point:** Files are written FIRST. MCP tools are for validation only. + +## Related Skills + +- **building-agents-construction** - Step-by-step building process +- **building-agents-patterns** - Best practices and examples +- **agent-workflow** - Complete workflow orchestrator +- **testing-agent** - Test and validate completed agents \ No newline at end of file diff --git a/.github/agents/building-agents-patterns.agent.md b/.github/agents/building-agents-patterns.agent.md new file mode 100644 index 0000000000..37393eb675 --- /dev/null +++ b/.github/agents/building-agents-patterns.agent.md @@ -0,0 +1,494 @@ +--- +description: Best practices, patterns, and examples for building goal-driven agents. Includes pause/resume architecture, hybrid workflows, anti-patterns, and handoff to testing. Use when optimizing agent design. +name: Building Agents - Patterns & Best Practices +tools: ['agent-builder/*', 'tools/*'] +target: vscode +--- + +# Building Agents - Patterns & Best Practices + +Design patterns, examples, and best practices for building robust goal-driven agents. + +**Prerequisites:** Complete agent structure using building-agents-construction. + +## Practical Example: Hybrid Workflow + +How to build a node using both direct file writes and optional MCP validation: + +```python +# 1. WRITE TO FILE FIRST (Primary - makes it visible) +node_code = ''' +search_node = NodeSpec( + id="search-web", + node_type="llm_tool_use", + input_keys=["query"], + output_keys=["search_results"], + system_prompt="Search the web for: {query}", + tools=["web_search"], +) +''' + +Edit( + file_path="exports/research_agent/nodes/__init__.py", + old_string="# Nodes will be added here", + new_string=node_code +) + +print("✅ Added search_node to nodes/__init__.py") +print("📁 Open exports/research_agent/nodes/__init__.py to see it!") + +# 2. OPTIONALLY VALIDATE WITH MCP (Secondary - bookkeeping) +validation = mcp__agent-builder__test_node( + node_id="search-web", + test_input='{"query": "python tutorials"}', + mock_llm_response='{"search_results": [...mock results...]}' +) + +print(f"✓ Validation: {validation['success']}") +``` + +**User experience:** +- Immediately sees node in their editor (from step 1) +- Gets validation feedback (from step 2) +- Can edit the file directly if needed + +This combines visibility (files) with validation (MCP tools). + +## Pause/Resume Architecture + +For agents needing multi-turn conversations with user interaction: + +### Basic Pause/Resume Flow + +```python +# Define pause nodes - execution stops at these nodes +pause_nodes = ["request-clarification", "await-approval"] + +# Define entry points - where to resume from each pause +entry_points = { + "start": "analyze-request", # Initial entry + "request-clarification_resume": "process-clarification", + "await-approval_resume": "execute-action", +} +``` + +### Example: Multi-Turn Research Agent + +```python +# Nodes +nodes = [ + NodeSpec(id="analyze-request", ...), + NodeSpec(id="request-clarification", ...), # PAUSE NODE + NodeSpec(id="process-clarification", ...), + NodeSpec(id="generate-results", ...), + NodeSpec(id="await-approval", ...), # PAUSE NODE + NodeSpec(id="execute-action", ...), +] + +# Edges with resume flows +edges = [ + EdgeSpec( + id="analyze-to-clarify", + source="analyze-request", + target="request-clarification", + condition=EdgeCondition.CONDITIONAL, + condition_expr="needs_clarification == true", + ), + # When resumed, goes to process-clarification + EdgeSpec( + id="clarify-to-process", + source="request-clarification", + target="process-clarification", + condition=EdgeCondition.ALWAYS, + ), + EdgeSpec( + id="results-to-approval", + source="generate-results", + target="await-approval", + condition=EdgeCondition.ALWAYS, + ), + # When resumed, goes to execute-action + EdgeSpec( + id="approval-to-execute", + source="await-approval", + target="execute-action", + condition=EdgeCondition.ALWAYS, + ), +] + +# Configuration +pause_nodes = ["request-clarification", "await-approval"] +entry_points = { + "start": "analyze-request", + "request-clarification_resume": "process-clarification", + "await-approval_resume": "execute-action", +} +``` + +### Running Pause/Resume Agents + +```python +# Initial run - will pause at first pause node +result1 = await agent.run( + context={"query": "research topic"}, + session_state=None +) + +# Check if paused +if result1.paused_at: + print(f"Paused at: {result1.paused_at}") + + # Resume with user input + result2 = await agent.run( + context={"user_response": "clarification details"}, + session_state=result1.session_state + ) +``` + +## Anti-Patterns + +### What NOT to Do + +❌ **Don't rely on `export_graph`** - Write files immediately, not at end + +```python +# BAD: Building in session state, exporting at end +mcp__agent-builder__add_node(...) +mcp__agent-builder__add_node(...) +mcp__agent-builder__export_graph() # Files appear only now + +# GOOD: Writing files immediately +Write(file_path="...", content=node_code) # File visible now +Write(file_path="...", content=node_code) # File visible now +``` + +❌ **Don't hide code in session** - Write to files as components approved + +```python +# BAD: Accumulating changes invisibly +session.add_component(component1) +session.add_component(component2) +# User can't see anything yet + +# GOOD: Incremental visibility +Edit(file_path="...", ...) # User sees change 1 +Edit(file_path="...", ...) # User sees change 2 +``` + +❌ **Don't wait to write files** - Agent visible from first step + +```python +# BAD: Building everything before writing +design_all_nodes() +design_all_edges() +write_everything_at_once() + +# GOOD: Write as you go +write_package_structure() # Visible +write_goal() # Visible +write_node_1() # Visible +write_node_2() # Visible +``` + +❌ **Don't batch everything** - Write incrementally + +```python +# BAD: Batching all nodes +nodes = [design_node_1(), design_node_2(), ...] +write_all_nodes(nodes) + +# GOOD: One at a time with user feedback +write_node_1() # User approves +write_node_2() # User approves +write_node_3() # User approves +``` + +### MCP Tools - Correct Usage + +**MCP tools OK for:** +✅ `test_node` - Validate node configuration with mock inputs +✅ `validate_graph` - Check graph structure +✅ `create_session` - Track session state for bookkeeping +✅ Other validation tools + +**Just don't:** Use MCP as the primary construction method or rely on export_graph + +## Best Practices + +### 1. Show Progress After Each Write + +```python +print("✅ Added analyze_request_node to nodes/__init__.py") +print("📊 Progress: 1/6 nodes added") +print("📁 Open exports/my_agent/nodes/__init__.py to see it!") +``` + +### 2. Let User Open Files During Build + +```python +print("✅ Goal written to agent.py") +print("") +print("💡 Tip: Open exports/my_agent/agent.py in your editor to see the goal!") +``` + +### 3. Write Incrementally - One Component at a Time + +```python +# Good flow +write_package_structure() +show_user("Package created") + +write_goal() +show_user("Goal written") + +for node in nodes: + get_approval(node) + write_node(node) + show_user(f"Node {node.id} written") +``` + +### 4. Test As You Build + +```python +# After adding several nodes +print("💡 You can test current state with:") +print(" PYTHONPATH=core:exports python -m my_agent validate") +print(" PYTHONPATH=core:exports python -m my_agent info") +``` + +### 5. Keep User Informed + +```python +# Clear status updates +print("🔨 Creating package structure...") +print("✅ Package created: exports/my_agent/") +print("") +print("📝 Next: Define agent goal") +``` + +## Continuous Monitoring Agents + +For agents that run continuously without terminal nodes: + +```python +# No terminal nodes - loops forever +terminal_nodes = [] + +# Workflow loops back to start +edges = [ + EdgeSpec(id="monitor-to-check", source="monitor", target="check-condition"), + EdgeSpec(id="check-to-wait", source="check-condition", target="wait"), + EdgeSpec(id="wait-to-monitor", source="wait", target="monitor"), # Loop +] + +# Entry node only +entry_node = "monitor" +entry_points = {"start": "monitor"} +pause_nodes = [] +``` + +**Example: File Monitor** + +```python +nodes = [ + NodeSpec(id="list-files", ...), + NodeSpec(id="check-new-files", node_type="router", ...), + NodeSpec(id="process-files", ...), + NodeSpec(id="wait-interval", node_type="function", ...), +] + +edges = [ + EdgeSpec(id="list-to-check", source="list-files", target="check-new-files"), + EdgeSpec( + id="check-to-process", + source="check-new-files", + target="process-files", + condition=EdgeCondition.CONDITIONAL, + condition_expr="new_files_count > 0", + ), + EdgeSpec( + id="check-to-wait", + source="check-new-files", + target="wait-interval", + condition=EdgeCondition.CONDITIONAL, + condition_expr="new_files_count == 0", + ), + EdgeSpec(id="process-to-wait", source="process-files", target="wait-interval"), + EdgeSpec(id="wait-to-list", source="wait-interval", target="list-files"), # Loop back +] + +terminal_nodes = [] # No terminal - runs forever +``` + +## Complex Routing Patterns + +### Multi-Condition Router + +```python +router_node = NodeSpec( + id="decision-router", + node_type="router", + input_keys=["analysis_result"], + output_keys=["decision"], + system_prompt=""" + Based on the analysis result, decide the next action: + - If confidence > 0.9: route to "execute" + - If 0.5 <= confidence <= 0.9: route to "review" + - If confidence < 0.5: route to "clarify" + + Return: {"decision": "execute|review|clarify"} + """, +) + +# Edges for each route +edges = [ + EdgeSpec( + id="router-to-execute", + source="decision-router", + target="execute-action", + condition=EdgeCondition.CONDITIONAL, + condition_expr="decision == 'execute'", + priority=1, + ), + EdgeSpec( + id="router-to-review", + source="decision-router", + target="human-review", + condition=EdgeCondition.CONDITIONAL, + condition_expr="decision == 'review'", + priority=2, + ), + EdgeSpec( + id="router-to-clarify", + source="decision-router", + target="request-clarification", + condition=EdgeCondition.CONDITIONAL, + condition_expr="decision == 'clarify'", + priority=3, + ), +] +``` + +## Error Handling Patterns + +### Graceful Failure with Fallback + +```python +# Primary node with error handling +nodes = [ + NodeSpec(id="api-call", max_retries=3, ...), + NodeSpec(id="fallback-cache", ...), + NodeSpec(id="report-error", ...), +] + +edges = [ + # Success path + EdgeSpec( + id="api-success", + source="api-call", + target="process-results", + condition=EdgeCondition.ON_SUCCESS, + ), + # Fallback on failure + EdgeSpec( + id="api-to-fallback", + source="api-call", + target="fallback-cache", + condition=EdgeCondition.ON_FAILURE, + priority=1, + ), + # Report if fallback also fails + EdgeSpec( + id="fallback-to-error", + source="fallback-cache", + target="report-error", + condition=EdgeCondition.ON_FAILURE, + priority=1, + ), +] +``` + +## Performance Optimization + +### Parallel Node Execution + +```python +# Use multiple edges from same source for parallel execution +edges = [ + EdgeSpec( + id="start-to-search1", + source="start", + target="search-source-1", + condition=EdgeCondition.ALWAYS, + ), + EdgeSpec( + id="start-to-search2", + source="start", + target="search-source-2", + condition=EdgeCondition.ALWAYS, + ), + EdgeSpec( + id="start-to-search3", + source="start", + target="search-source-3", + condition=EdgeCondition.ALWAYS, + ), + # Converge results + EdgeSpec( + id="search1-to-merge", + source="search-source-1", + target="merge-results", + ), + EdgeSpec( + id="search2-to-merge", + source="search-source-2", + target="merge-results", + ), + EdgeSpec( + id="search3-to-merge", + source="search-source-3", + target="merge-results", + ), +] +``` + +## Handoff to Testing + +When agent is complete, transition to testing phase: + +```python +print(""" +✅ Agent complete: exports/my_agent/ + +Next steps: +1. Switch to testing-agent skill +2. Generate and approve tests +3. Run evaluation +4. Debug any failures + +Command: "Test the agent at exports/my_agent/" +""") +``` + +### Pre-Testing Checklist + +Before handing off to testing-agent: + +- [ ] Agent structure validates: `python -m agent_name validate` +- [ ] All nodes defined in nodes/__init__.py +- [ ] All edges connect valid nodes +- [ ] Entry node specified +- [ ] Agent can be imported: `from exports.agent_name import default_agent` +- [ ] README.md with usage instructions +- [ ] CLI commands work (info, validate) + +## Related Skills + +- **building-agents-core** - Fundamental concepts +- **building-agents-construction** - Step-by-step building +- **testing-agent** - Test and validate agents +- **agent-workflow** - Complete workflow orchestrator + +--- + +**Remember: Agent is actively constructed, visible the whole time. No hidden state. No surprise exports. Just transparent, incremental file building.** diff --git a/.github/agents/setup-credentials.agent.md b/.github/agents/setup-credentials.agent.md new file mode 100644 index 0000000000..8b0f16100d --- /dev/null +++ b/.github/agents/setup-credentials.agent.md @@ -0,0 +1,560 @@ +--- +description: Set up and install credentials for an agent. Detects missing credentials from agent config, collects them from the user, and stores them securely in the encrypted credential store at ~/.hive/credentials. +name: Setup Credentials +tools: ['agent-builder/*', 'tools/*'] +target: vscode +--- + +# Setup Credentials + +Interactive credential setup for agents with multiple authentication options. Detects what's missing, offers auth method choices, validates with health checks, and stores credentials securely. + +## When to Use + +- Before running or testing an agent for the first time +- When `AgentRunner.run()` fails with "missing required credentials" +- When a user asks to configure credentials for an agent +- After building a new agent that uses tools requiring API keys + +## Workflow + +### Step 1: Identify the Agent + +Determine which agent needs credentials. The user will either: + +- Name the agent directly (e.g., "set up credentials for hubspot-agent") +- Have an agent directory open (check `exports/` for agent dirs) +- Be working on an agent in the current session + +Locate the agent's directory under `exports/{agent_name}/`. + +### Step 2: Detect Required Credentials + +Read the agent's configuration to determine which tools and node types it uses: + +```python +from core.framework.runner import AgentRunner + +runner = AgentRunner.load("exports/{agent_name}") +validation = runner.validate() + +# validation.missing_credentials contains env var names +# validation.warnings contains detailed messages with help URLs +``` + +Alternatively, check the credential store directly: + +```python +from core.framework.credentials import CredentialStore + +# Use encrypted storage (default: ~/.hive/credentials) +store = CredentialStore.with_encrypted_storage() + +# Check what's available +available = store.list_credentials() +print(f"Available credentials: {available}") + +# Check if specific credential exists +if store.is_available("hubspot"): + print("HubSpot credential found") +else: + print("HubSpot credential missing") +``` + +To see all known credential specs (for help URLs and setup instructions): + +```python +from aden_tools.credentials import CREDENTIAL_SPECS + +for name, spec in CREDENTIAL_SPECS.items(): + print(f"{name}: env_var={spec.env_var}, aden={spec.aden_supported}") +``` + +### Step 3: Present Auth Options + +For each missing credential, check available authentication methods: + +```python +from aden_tools.credentials import CREDENTIAL_SPECS + +spec = CREDENTIAL_SPECS.get("hubspot") +if spec: + auth_options = [] + if spec.aden_supported: + auth_options.append("aden") + if spec.direct_api_key_supported: + auth_options.append("direct") + auth_options.append("custom") + + # Get setup info + setup_info = { + "env_var": spec.env_var, + "description": spec.description, + "help_url": spec.help_url, + "api_key_instructions": spec.api_key_instructions, + } +``` + +Present options: + +``` +Choose how to configure HUBSPOT_ACCESS_TOKEN: + + 1) Aden Authorization Server (Recommended) + Secure OAuth2 flow via integration.adenhq.com + - Quick setup with automatic token refresh + - No need to manage API keys manually + + 2) Direct API Key + Enter your own API key manually + - Requires creating a HubSpot Private App + - Full control over scopes and permissions + + 3) Custom Credential Store (Advanced) + Programmatic configuration for CI/CD + - For automated deployments + - Requires manual API calls +``` + +### Step 4: Execute Auth Flow + +#### Option 1: Aden Authorization Server + +This is the recommended flow for supported integrations (HubSpot, etc.). + +**How Aden OAuth Works:** + +The ADEN_API_KEY represents a user who has already completed OAuth authorization on Aden's platform. When users sign up and connect integrations on Aden, those OAuth tokens are stored server-side. Having an ADEN_API_KEY means: + +1. User has an Aden account +2. User has already authorized integrations (HubSpot, etc.) via OAuth on Aden +3. We just need to sync those credentials down to the local credential store + +**4.1a. Check for ADEN_API_KEY** + +```python +import os +aden_key = os.environ.get("ADEN_API_KEY") +``` + +If not set, guide user to Aden: + +```python +from aden_tools.credentials import open_browser, get_aden_setup_url + +url = get_aden_setup_url() +success, msg = open_browser(url) + +print("Sign in to Aden and connect your integrations.") +print("Copy your API key and return here.") +``` + +Ask user to provide the ADEN_API_KEY they received. + +**4.1b. Save ADEN_API_KEY to Shell Config** + +With user approval, persist ADEN_API_KEY to their shell config: + +```python +from aden_tools.credentials import ( + detect_shell, + add_env_var_to_shell_config, + get_shell_source_command, +) + +shell_type = detect_shell() # 'bash', 'zsh', or 'unknown' + +# Ask user for approval first +success, config_path = add_env_var_to_shell_config( + "ADEN_API_KEY", + user_provided_key, + comment="Aden authorization server API key" +) + +if success: + source_cmd = get_shell_source_command() + print(f"Saved to {config_path}") + print(f"Run: {source_cmd}") +``` + +Also save to `~/.hive/configuration.json` for the framework: + +```python +import json +from pathlib import Path + +config_path = Path.home() / ".hive" / "configuration.json" +config = json.loads(config_path.read_text()) if config_path.exists() else {} + +config["aden"] = { + "api_key_configured": True, + "api_url": "https://api.adenhq.com" +} + +config_path.parent.mkdir(parents=True, exist_ok=True) +config_path.write_text(json.dumps(config, indent=2)) +``` + +**4.1c. Sync Credentials from Aden Server** + +Since the user has already authorized integrations on Aden, use the one-liner factory method: + +```python +from core.framework.credentials import CredentialStore + +# Single call handles everything +# This single call handles everything: +# - Creates encrypted local storage at ~/.hive/credentials +# - Configures Aden client from ADEN_API_KEY env var +# - Syncs all credentials from Aden server automatically +store = CredentialStore.with_aden_sync( + base_url="https://api.adenhq.com", + auto_sync=True, +) + +# Check what was synced +synced = store.list_credentials() +print(f"Synced credentials: {synced}") + +# If the required credential wasn't synced, the user hasn't authorized it on Aden yet +if "hubspot" not in synced: + print("HubSpot not found in your Aden account.") + print("Please visit https://integration.adenhq.com to connect HubSpot, then try again.") +``` + +For more control over the sync process: + +```python +from core.framework.credentials import CredentialStore +from core.framework.credentials.aden import ( + AdenCredentialClient, + AdenClientConfig, + AdenSyncProvider, +) + +# Create client (API key loaded from ADEN_API_KEY env var) +client = AdenCredentialClient(AdenClientConfig( + base_url="https://api.adenhq.com", +)) + +# Create provider and store +provider = AdenSyncProvider(client=client) +store = CredentialStore.with_encrypted_storage() + +# Manual sync +synced_count = provider.sync_all(store) +print(f"Synced {synced_count} credentials from Aden") +``` + +**4.1d. Run Health Check** + +```python +from aden_tools.credentials import check_credential_health + +cred = store.get_credential("hubspot") +token = cred.keys["access_token"].value.get_secret_value() + +result = check_credential_health("hubspot", token) +if result.valid: + print("HubSpot credentials validated!") +else: + print(f"Validation failed: {result.message}") + # Offer to retry the OAuth flow +``` + +#### Option 2: Direct API Key + +For users who prefer manual API key management. + +**4.2a. Show Setup Instructions** + +```python +from aden_tools.credentials import CREDENTIAL_SPECS + +spec = CREDENTIAL_SPECS.get("hubspot") +if spec and spec.api_key_instructions: + print(spec.api_key_instructions) +# Output: +# To get a HubSpot Private App token: +# 1. Go to HubSpot Settings > Integrations > Private Apps +# 2. Click "Create a private app" +# 3. Name your app (e.g., "Hive Agent") +# ... + +if spec and spec.help_url: + print(f"More info: {spec.help_url}") +``` + +**4.2b. Collect API Key from User** + +Use AskUserQuestion to securely collect the API key: + +``` +Please provide your HubSpot access token: +(This will be stored securely in ~/.hive/credentials) +``` + +**4.2c. Run Health Check Before Storing** + +```python +from aden_tools.credentials import check_credential_health + +result = check_credential_health("hubspot", user_provided_token) +if not result.valid: + print(f"Warning: {result.message}") + # Ask user if they want to: + # 1. Try a different token + # 2. Continue anyway (not recommended) +``` + +**4.2d. Store in Encrypted Credential Store** + +```python +from core.framework.credentials import CredentialStore, CredentialObject, CredentialKey +from pydantic import SecretStr + +store = CredentialStore.with_encrypted_storage() + +cred = CredentialObject( + id="hubspot", + name="HubSpot Access Token", + keys={ + "access_token": CredentialKey( + name="access_token", + value=SecretStr(user_provided_token), + ) + }, +) +store.save_credential(cred) +``` + +**4.2e. Export to Current Session** + +```bash +export HUBSPOT_ACCESS_TOKEN="the-value" +``` + +#### Option 3: Custom Credential Store (Advanced) + +For programmatic/CI/CD setups. + +**4.3a. Show Documentation** + +``` +For advanced credential management, you can use the CredentialStore API directly: + + from core.framework.credentials import CredentialStore, CredentialObject, CredentialKey + from pydantic import SecretStr + + store = CredentialStore.with_encrypted_storage() + + cred = CredentialObject( + id="hubspot", + name="HubSpot Access Token", + keys={"access_token": CredentialKey(name="access_token", value=SecretStr("..."))} + ) + store.save_credential(cred) + +For CI/CD environments: + - Set HIVE_CREDENTIAL_KEY for encryption + - Pre-populate ~/.hive/credentials programmatically + - Or use environment variables directly (HUBSPOT_ACCESS_TOKEN) + +Documentation: See core/framework/credentials/README.md +``` + +### Step 5: Record Configuration Method + +```python +import json +from pathlib import Path +from datetime import datetime + +config_path = Path.home() / ".hive" / "configuration.json" +config = json.loads(config_path.read_text()) if config_path.exists() else {} + +if "credential_methods" not in config: + config["credential_methods"] = {} + +config["credential_methods"]["hubspot"] = { + "method": "aden", + "configured_at": datetime.now().isoformat(), +} + +config_path.write_text(json.dumps(config, indent=2)) +``` + +### Step 6: Verify All Credentials + +```python +runner = AgentRunner.load("exports/{agent_name}") +validation = runner.validate() +assert not validation.missing_credentials +``` + +## Health Check Reference + +Health checks validate credentials by making lightweight API calls: + +| Credential | Endpoint | What It Checks | +| -------------- | --------------------------------------- | --------------------------------- | +| `hubspot` | `GET /crm/v3/objects/contacts?limit=1` | Bearer token validity, CRM scopes | +| `brave_search` | `GET /res/v1/web/search?q=test&count=1` | API key validity | + +```python +from aden_tools.credentials import check_credential_health + +result = check_credential_health("hubspot", token_value) +# result.valid: bool +# result.message: str +# result.details: dict (status_code, rate_limited, etc.) +``` + +## Encryption Key (HIVE_CREDENTIAL_KEY) + +The encrypted credential store requires `HIVE_CREDENTIAL_KEY`: + +- If not set, `EncryptedFileStorage` auto-generates one +- User MUST persist this key (in `~/.bashrc` or secrets manager) +- Without this key, credentials cannot be decrypted +- This is the ONLY secret that should live in `~/.bashrc` or environment config + +If `HIVE_CREDENTIAL_KEY` is not set: + +1. Let the store generate one +2. Tell the user to save it: `export HIVE_CREDENTIAL_KEY="{generated_key}"` +3. Recommend adding it to `~/.bashrc` or their shell profile + +## Security Rules + +- **NEVER** log, print, or echo credential values +- **NEVER** store credentials in plaintext files +- **NEVER** hardcode credentials in source code +- **ALWAYS** use `SecretStr` from Pydantic +- **ALWAYS** use encrypted credential store +- **ALWAYS** run health checks before storing +- **ALWAYS** verify with re-validation, not by reading back +- **ALWAYS** confirm before modifying shell config + +## Credential Sources Reference + +All credential specs are defined in `tools/src/aden_tools/credentials/`: + +| File | Category | Credentials | Aden Supported | +| ----------------- | ------------- | --------------------------------------------- | -------------- | +| `llm.py` | LLM Providers | `anthropic` | No | +| `search.py` | Search Tools | `brave_search`, `google_search`, `google_cse` | No | +| `integrations.py` | Integrations | `hubspot` | Yes | + +**Note:** Additional LLM providers (Cerebras, Groq, OpenAI) are handled by LiteLLM via environment +variables (`CEREBRAS_API_KEY`, `GROQ_API_KEY`, `OPENAI_API_KEY`) but are not yet in CREDENTIAL_SPECS. +Add them to `llm.py` as needed. + +To check what's registered: + +```python +from aden_tools.credentials import CREDENTIAL_SPECS +for name, spec in CREDENTIAL_SPECS.items(): + print(f"{name}: aden={spec.aden_supported}, direct={spec.direct_api_key_supported}") +``` + +## Migration: CredentialManager → CredentialStore + +**CredentialManager is deprecated.** Use CredentialStore. + +| Old (Deprecated) | New (Recommended) | +| ----------------------------------------- | -------------------------------------------------------------------- | +| `CredentialManager()` | `CredentialStore.with_encrypted_storage()` | +| `creds.get("hubspot")` | `store.get("hubspot")` or `store.get_key("hubspot", "access_token")` | +| `creds.validate_for_tools(tools)` | Use `store.is_available(cred_id)` per credential | +| `creds.get_auth_options("hubspot")` | Check `CREDENTIAL_SPECS["hubspot"].aden_supported` | +| `creds.get_setup_instructions("hubspot")` | Access `CREDENTIAL_SPECS["hubspot"]` directly | + +**Why migrate?** + +- **CredentialStore** supports encrypted storage, multi-key credentials, template resolution, and automatic token refresh +- **CredentialManager** only reads from environment variables and .env files (no encryption, no refresh) +- **CredentialStoreAdapter** exists for backward compatibility during migration + +```python +# Old way (deprecated) +from aden_tools.credentials import CredentialManager +creds = CredentialManager() +token = creds.get("hubspot") + +# New way (recommended) +from core.framework.credentials import CredentialStore +store = CredentialStore.with_encrypted_storage() +token = store.get("hubspot") + +# With Aden sync (recommended for OAuth integrations) +store = CredentialStore.with_aden_sync() +token = store.get_key("hubspot", "access_token") +``` + +## Example Session + +``` +User: /setup-credentials for my hubspot-agent + +Agent: Let me check what credentials your hubspot-agent needs. + +[Runs validation, finds ANTHROPIC_API_KEY and HUBSPOT_ACCESS_TOKEN missing] + +Agent: Your hubspot-agent requires 2 credentials: + + 1. ANTHROPIC_API_KEY - Anthropic API key for LLM calls + 2. HUBSPOT_ACCESS_TOKEN - HubSpot access token + +Let's set these up one at a time. + +--- Setting up ANTHROPIC_API_KEY --- + +This credential only supports direct API key entry. + +[Shows instructions for getting Anthropic API key] +[AskUserQuestion: "Please provide your Anthropic API key:"] +[User provides key] +[Stores in credential store, exports to env] + +--- Setting up HUBSPOT_ACCESS_TOKEN --- + +[AskUserQuestion with 3 options] + +Choose how to configure HUBSPOT_ACCESS_TOKEN: + 1) Aden Authorization Server (Recommended) + 2) Direct API Key + 3) Custom Credential Store (Advanced) + +User: 1 (Aden) + +Agent: Great! Let me check if you have an Aden API key configured... + +[Checks for ADEN_API_KEY, not found] +[Opens browser to integration.adenhq.com/setup] + +Agent: I've opened Aden's setup page. Please: + 1. Sign in or create an account + 2. Connect your HubSpot account (OAuth happens on Aden's site) + 3. Copy the API key shown after setup + +[AskUserQuestion: "Please provide your Aden API key:"] +[User provides key] + +Agent: Would you like me to save this to your shell config (~/.zshrc) for future sessions? + +[User approves] +[Saves to ~/.zshrc and ~/.hive/configuration.json] + +Agent: Let me sync your credentials from Aden... + +[Syncs credentials from Aden server - OAuth already done on Aden's side] +[Runs health check] + +Agent: HubSpot credentials validated successfully! + +All credentials are now configured: + - ANTHROPIC_API_KEY: Stored in encrypted credential store + - HUBSPOT_ACCESS_TOKEN: Synced from Aden (OAuth completed on Aden) + - Validation passed - your agent is ready to run! +``` diff --git a/.github/agents/testing-agent.agent.md b/.github/agents/testing-agent.agent.md new file mode 100644 index 0000000000..91ca95e45a --- /dev/null +++ b/.github/agents/testing-agent.agent.md @@ -0,0 +1,1132 @@ +--- +description: Run goal-based evaluation tests for agents. Use when you need to verify an agent meets its goals, debug failing tests, or iterate on agent improvements based on test results. +name: Testing Agent +tools: ['agent-builder/*', 'tools/*'] +target: vscode +--- + +# Testing Workflow + +This skill provides tools for testing agents built with the building-agents skills. + +## Workflow Overview + +1. `mcp__agent-builder__list_tests` - Check what tests exist +2. `mcp__agent-builder__generate_constraint_tests` or `mcp__agent-builder__generate_success_tests` - Get test guidelines +3. **Write tests directly** using the Write tool with the guidelines provided +4. `mcp__agent-builder__run_tests` - Execute tests +5. `mcp__agent-builder__debug_test` - Debug failures + +## How Test Generation Works + +The `generate_*_tests` MCP tools return **guidelines and templates** - they do NOT generate test code via LLM. +You (the assistant) write the tests directly using file operations based on the guidelines. + +### Example Workflow + +```python +# Step 1: Get test guidelines +result = mcp__agent-builder__generate_constraint_tests( + goal_id="my-goal", + goal_json='{"id": "...", "constraints": [...]}', + agent_path="exports/my_agent" +) + +# Step 2: The result contains: +# - output_file: where to write tests +# - file_header: imports and fixtures to use +# - test_template: format for test functions +# - constraints_formatted: the constraints to test +# - test_guidelines: rules for writing tests + +# Step 3: Write tests directly using file operations +Write( + file_path=result["output_file"], + content=result["file_header"] + test_code_you_write +) + +# Step 4: Run tests via MCP tool +mcp__agent-builder__run_tests( + goal_id="my-goal", + agent_path="exports/my_agent" +) + +# Step 5: Debug failures via MCP tool +mcp__agent-builder__debug_test( + goal_id="my-goal", + test_name="test_constraint_foo", + agent_path="exports/my_agent" +) +``` + +--- + +# Testing Agents with MCP Tools + +Run goal-based evaluation tests for agents built with the building-agents skills. + +**Key Principle: MCP tools provide guidelines, assistant writes tests directly** +- ✅ Get guidelines: `generate_constraint_tests`, `generate_success_tests` → returns templates and guidelines +- ✅ Write tests: Use file operations with the provided file_header and test_template +- ✅ Run tests: `run_tests` (runs pytest via subprocess) +- ✅ Debug failures: `debug_test` (re-runs single test with verbose output) +- ✅ List tests: `list_tests` (scans Python test files) +- ✅ Tests stored in `exports/{agent}/tests/test_*.py` + +## Architecture: Python Test Files + +``` +exports/my_agent/ +├── __init__.py +├── agent.py ← Agent to test +├── nodes/__init__.py +├── config.py +├── __main__.py +└── tests/ ← Test files written by assistant + ├── conftest.py # Shared fixtures (auto-created) + ├── test_constraints.py + ├── test_success_criteria.py + └── test_edge_cases.py +``` + +**Tests import the agent directly:** +```python +import pytest +from exports.my_agent import default_agent + + +@pytest.mark.asyncio +async def test_happy_path(mock_mode): + """Test: {description}""" + result = await default_agent.run({{"query": "test"}}, mock_mode=mock_mode) + assert result.success + assert len(result.output) > 0 +``` + +## Why This Approach + +- MCP tools provide consistent test guidelines with proper imports, fixtures, and API key enforcement +- Assistant writes tests directly, eliminating circular LLM dependencies in the MCP server +- `run_tests` parses pytest output into structured results for iteration +- `debug_test` provides formatted output with actionable debugging info +- File headers include conftest.py setup with proper fixtures + +## Quick Start + +1. **Check existing tests** - `list_tests(goal_id, agent_path)` +2. **Get test guidelines** - `generate_constraint_tests` or `generate_success_tests` +3. **Write tests** - Use file operations with the provided file_header and guidelines +4. **Run tests** - `run_tests(goal_id, agent_path)` +5. **Debug failures** - `debug_test(goal_id, test_name, agent_path)` +6. **Iterate** - Repeat steps 4-5 until all pass + +## ⚠️ Credential Requirements for Testing + +**CRITICAL: Testing requires ALL credentials the agent depends on.** This includes both the LLM API key AND any tool-specific credentials (HubSpot, Brave Search, etc.). + +### Prerequisites + +Before running agent tests, you MUST collect ALL required credentials from the user. + +**Step 1: LLM API Key (always required)** +```bash +export ANTHROPIC_API_KEY="your-key-here" +``` + +**Step 2: Tool-specific credentials (depends on agent's tools)** + +Inspect the agent's `mcp_servers.json` and tool configuration to determine which tools the agent uses, then check for all required credentials: + +```python +from aden_tools.credentials import CredentialManager, CREDENTIAL_SPECS + +creds = CredentialManager() + +# Determine which tools the agent uses (from agent.json or mcp_servers.json) +agent_tools = [...] # e.g., ["hubspot_search_contacts", "web_search", ...] + +# Find all missing credentials for those tools +missing = creds.get_missing_for_tools(agent_tools) +``` + +Common tool credentials: +| Tool | Env Var | Help URL | +|------|---------|----------| +| HubSpot CRM | `HUBSPOT_ACCESS_TOKEN` | https://developers.hubspot.com/docs/api/private-apps | +| Brave Search | `BRAVE_SEARCH_API_KEY` | https://brave.com/search/api/ | +| Google Search | `GOOGLE_SEARCH_API_KEY` + `GOOGLE_SEARCH_CX` | https://developers.google.com/custom-search | + +**Why ALL credentials are required:** +- Tests need to execute the agent's LLM nodes to validate behavior +- Tools with missing credentials will return error dicts instead of real data +- Mock mode bypasses everything, providing no confidence in real-world performance +- The `AgentRunner.run()` method validates credentials at startup and will fail fast if any are missing + +### Mock Mode Limitations + +Mock mode (`--mock` flag or `mock_mode=True`) is **ONLY for structure validation**: + +✓ Validates graph structure (nodes, edges, connections) +✓ Tests that code doesn't crash on execution +✗ Does NOT test LLM message generation +✗ Does NOT test reasoning or decision-making quality +✗ Does NOT test constraint validation (length limits, format rules) +✗ Does NOT test real API integrations or tool use +✗ Does NOT test personalization or content quality + +**Bottom line:** If you're testing whether an agent achieves its goal, you MUST use real credentials for ALL services. + +### Enforcing Credentials in Tests + +When generating tests, **ALWAYS include credential checks for ALL required services**: + +```python +import os +import pytest +from aden_tools.credentials import CredentialManager + +# At the top of every test file +pytestmark = pytest.mark.skipif( + not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"), + reason="API key required for real testing. Set ANTHROPIC_API_KEY or use MOCK_MODE=1 for structure validation only." +) + + +@pytest.fixture(scope="session", autouse=True) +def check_credentials(): + """Ensure ALL required credentials are set for real testing.""" + creds = CredentialManager() + mock_mode = os.environ.get("MOCK_MODE") + + # Always check LLM key + if not creds.is_available("anthropic"): + if mock_mode: + print("\n⚠️ Running in MOCK MODE - structure validation only") + print(" This does NOT test LLM behavior or agent quality") + print(" Set ANTHROPIC_API_KEY for real testing\n") + else: + pytest.fail( + "\n❌ ANTHROPIC_API_KEY not set!\n\n" + "Real testing requires an API key. Choose one:\n" + "1. Set API key (RECOMMENDED):\n" + " export ANTHROPIC_API_KEY='your-key-here'\n" + "2. Run structure validation only:\n" + " MOCK_MODE=1 pytest exports/{agent}/tests/\n\n" + "Note: Mock mode does NOT validate agent behavior or quality." + ) + + # Check tool-specific credentials (skip in mock mode) + if not mock_mode: + # List the tools this agent uses - update per agent + agent_tools = [] # e.g., ["hubspot_search_contacts", "hubspot_get_contact"] + missing = creds.get_missing_for_tools(agent_tools) + if missing: + lines = ["\n❌ Missing tool credentials!\n"] + for name in missing: + spec = creds.specs.get(name) + if spec: + lines.append(f" {spec.env_var} - {spec.description}") + if spec.help_url: + lines.append(f" Setup: {spec.help_url}") + lines.append("\nSet the required environment variables and re-run.") + pytest.fail("\n".join(lines)) +``` + +### User Communication + +When the user asks to test an agent, **ALWAYS check for ALL credentials first** — not just the LLM key: + +1. **Identify the agent's tools** from `agent.json` or `mcp_servers.json` +2. **Check ALL required credentials** using `CredentialManager` +3. **Ask the user to provide any missing credentials** before proceeding + +```python +from aden_tools.credentials import CredentialManager, CREDENTIAL_SPECS + +creds = CredentialManager() + +# 1. Check LLM key +missing_creds = [] +if not creds.is_available("anthropic"): + missing_creds.append(("ANTHROPIC_API_KEY", "Anthropic API key for LLM calls")) + +# 2. Check tool-specific credentials +agent_tools = [...] # Determined from agent config +missing_tools = creds.get_missing_for_tools(agent_tools) +for name in missing_tools: + spec = CREDENTIAL_SPECS.get(name) + if spec: + missing_creds.append((spec.env_var, spec.description)) + +# 3. Present ALL missing credentials to the user at once +if missing_creds: + print("⚠️ Missing credentials required by this agent:\n") + for env_var, description in missing_creds: + print(f" • {env_var} — {description}") + print() + print("Please set the missing environment variables:") + for env_var, _ in missing_creds: + print(f" export {env_var}='your-value-here'") + print() + print("Or run in mock mode (structure validation only):") + print(" MOCK_MODE=1 pytest exports/{agent}/tests/") + + # Ask user to provide credentials or choose mock mode +``` + +**IMPORTANT:** Do NOT skip credential collection. If an agent uses HubSpot tools, the user MUST provide `HUBSPOT_ACCESS_TOKEN`. If it uses web search, the user MUST provide the appropriate search API key. Collect ALL missing credentials in a single prompt rather than discovering them one at a time during test failures. + +## The Three-Stage Flow + +``` +┌─────────────────────────────────────────────────────────────────────────┐ +│ GOAL STAGE │ +│ (building-agents skill) │ +│ │ +│ 1. User defines goal with success_criteria and constraints │ +│ 2. Goal written to agent.py immediately │ +│ 3. Generate CONSTRAINT TESTS → Write to tests/ → USER APPROVAL │ +│ Files created: exports/{agent}/tests/test_constraints.py │ +└─────────────────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────────────────┐ +│ AGENT STAGE │ +│ (building-agents skill) │ +│ │ +│ Build nodes + edges, written immediately to files │ +│ Constraint tests can run during development: │ +│ run_tests(goal_id, agent_path, test_types='["constraint"]') │ +└─────────────────────────────────────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────────────────────────────────────┐ +│ EVAL STAGE (this skill) │ +│ │ +│ 1. Generate SUCCESS_CRITERIA TESTS → Write to tests/ → USER APPROVAL │ +│ Files created: exports/{agent}/tests/test_success_criteria.py │ +│ 2. Run all tests: run_tests(goal_id, agent_path) │ +│ 3. On failure → debug_test(goal_id, test_name, agent_path) │ +│ 4. Iterate: Edit agent code → Re-run run_tests (instant feedback) │ +└─────────────────────────────────────────────────────────────────────────┘ +``` + +## Step-by-Step: Testing an Agent + +### Step 1: Check Existing Tests + +**ALWAYS check first** before generating new tests: + +```python +mcp__agent-builder__list_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent" +) +``` + +### Step 2: Get Constraint Test Guidelines + +After goal is defined, get test guidelines using the MCP tool: + +```python +# First, read the goal from agent.py to get the goal JSON +goal_code = Read(file_path="exports/your_agent/agent.py") + +# Get constraint test guidelines via MCP tool +result = mcp__agent-builder__generate_constraint_tests( + goal_id="your-goal-id", + goal_json='{"id": "goal-id", "name": "...", "constraints": [...]}', + agent_path="exports/your_agent" +) +``` + +**Response includes:** +- `output_file`: Where to write tests +- `file_header`: Imports, fixtures, and pytest setup +- `test_template`: Format for test functions +- `constraints_formatted`: The constraints to test +- `test_guidelines`: Rules and best practices + +**Write tests directly** using file operations: + +```python +Write( + file_path=result["output_file"], + content=result["file_header"] + "\n\n" + your_test_code +) +``` + +### Step 3: Get Success Criteria Test Guidelines (Eval Stage) + +After agent is fully built, get success criteria test guidelines: + +```python +# Get success criteria test guidelines via MCP tool +result = mcp__agent-builder__generate_success_tests( + goal_id="your-goal-id", + goal_json='{"id": "goal-id", "name": "...", "success_criteria": [...]}', + node_names="analyze_request,search_web,format_results", + tool_names="web_search,web_scrape", + agent_path="exports/your_agent" +) +``` + +**Write tests directly** using file operations: + +```python +# Write tests using file operations +Write( + file_path=result["output_file"], + content=result["file_header"] + "\n\n" + your_test_code +) +``` + +### Step 4: Test Fixtures (conftest.py) + +The `file_header` returned by the MCP tools includes proper imports and fixtures. +You should also create a conftest.py file in the tests directory with shared fixtures: + +```python +# Create conftest.py with the conftest template +Write( + file_path="exports/your_agent/tests/conftest.py", + content=conftest_content # Use PYTEST_CONFTEST_TEMPLATE format +) +``` + +### Step 5: Run Tests + +**Use the MCP tool to run tests** (not pytest directly): + +```python +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent" +) + +**Response includes structured results:** +```json +{ + "goal_id": "your-goal-id", + "overall_passed": false, + "summary": { + "total": 12, + "passed": 10, + "failed": 2, + "skipped": 0, + "errors": 0, + "pass_rate": "83.3%" + }, + "test_results": [ + {"file": "test_constraints.py", "test_name": "test_constraint_api_rate_limits", "status": "passed"}, + {"file": "test_success_criteria.py", "test_name": "test_success_find_relevant_results", "status": "failed"} + ], + "failures": [ + {"test_name": "test_success_find_relevant_results", "details": "AssertionError: Expected 3-5 results..."} + ] +} +``` + +**Options for `run_tests`:** +```python +# Run only constraint tests +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent", + test_types='["constraint"]' +) + +# Run with parallel workers +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent", + parallel=4 +) + +# Stop on first failure +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent", + fail_fast=True +) +``` + +### Step 6: Debug Failed Tests + +**Use the MCP tool to debug** (not Bash/pytest directly): + +```python +mcp__agent-builder__debug_test( + goal_id="your-goal-id", + test_name="test_success_find_relevant_results", + agent_path="exports/your_agent" +) +``` + +**Response includes:** +- Full verbose output from the test +- Stack trace with exact line numbers +- Captured logs and prints +- Suggestions for fixing the issue + +### Step 7: Categorize Errors + +When a test fails, categorize the error to guide iteration: + +```python +def categorize_test_failure(test_output, agent_code): + """Categorize test failure to guide iteration.""" + + # Read test output and agent code + failure_info = { + "test_name": "...", + "error_message": "...", + "stack_trace": "...", + } + + # Pattern-based categorization + if any(pattern in failure_info["error_message"].lower() for pattern in [ + "typeerror", "attributeerror", "keyerror", "valueerror", + "null", "none", "undefined", "tool call failed" + ]): + category = "IMPLEMENTATION_ERROR" + guidance = { + "stage": "Agent", + "action": "Fix the bug in agent code", + "files_to_edit": ["agent.py", "nodes/__init__.py"], + "restart_required": False, + "description": "Code bug - fix and re-run tests" + } + + elif any(pattern in failure_info["error_message"].lower() for pattern in [ + "assertion", "expected", "got", "should be", "success criteria" + ]): + category = "LOGIC_ERROR" + guidance = { + "stage": "Goal", + "action": "Update goal definition", + "files_to_edit": ["agent.py (goal section)"], + "restart_required": True, + "description": "Goal definition is wrong - update and rebuild" + } + + elif any(pattern in failure_info["error_message"].lower() for pattern in [ + "timeout", "rate limit", "empty", "boundary", "edge case" + ]): + category = "EDGE_CASE" + guidance = { + "stage": "Eval", + "action": "Add edge case test and fix handling", + "files_to_edit": ["agent.py", "tests/test_edge_cases.py"], + "restart_required": False, + "description": "New scenario - add test and handle it" + } + + else: + category = "UNKNOWN" + guidance = { + "stage": "Unknown", + "action": "Manual investigation required", + "restart_required": False + } + + return { + "category": category, + "guidance": guidance, + "failure_info": failure_info + } +``` + +**Show categorization to user:** + +```python +AskUserQuestion( + questions=[{ + "question": f"Test failed with {category}. How would you like to proceed?", + "header": "Test Failure", + "options": [ + { + "label": "Fix code directly (Recommended)" if category == "IMPLEMENTATION_ERROR" else "Update goal", + "description": guidance["description"] + }, + { + "label": "Show detailed error info", + "description": "View full stack trace and logs" + }, + { + "label": "Skip for now", + "description": "Continue with other tests" + } + ], + "multiSelect": false + }] +) +``` + +### Step 8: Iterate Based on Error Category + +### Step 8: Iterate Based on Error Category + +#### IMPLEMENTATION_ERROR → Fix Agent Code + +```python +# 1. Show user the exact file and line that failed +print(f"Error in: exports/{agent_name}/nodes/__init__.py:42") +print(f"Issue: 'NoneType' object has no attribute 'get'") + +# 2. Read the problematic code +code = Read(file_path=f"exports/{agent_name}/nodes/__init__.py") + +# 3. User can fix directly, or you suggest a fix: +Edit( + file_path=f"exports/{agent_name}/nodes/__init__.py", + old_string="if results.get('videos'):", + new_string="if results and results.get('videos'):" +) + +# 4. Re-run tests immediately (instant feedback!) +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path=f"exports/{agent_name}" +) +``` + +#### LOGIC_ERROR → Update Goal + +```python +# 1. Show user the goal definition +goal_code = Read(file_path=f"exports/{agent_name}/agent.py") + +# 2. Discuss what needs to change in success_criteria or constraints + +# 3. Edit the goal +Edit( + file_path=f"exports/{agent_name}/agent.py", + old_string='target="3-5 videos"', + new_string='target="1-5 videos"' # More realistic +) + +# 4. May need to regenerate agent nodes if goal changed significantly +# This requires going back to building-agents skill +``` + +#### EDGE_CASE → Add Test and Fix + +```python +# 1. Create new edge case test with API key enforcement +edge_case_test = ''' +@pytest.mark.asyncio +async def test_edge_case_empty_results(mock_mode): + """Test: Agent handles no results gracefully""" + result = await default_agent.run({{"query": "xyzabc123nonsense"}}, mock_mode=mock_mode) + + # Should succeed with empty results, not crash + assert result.success or result.error is not None + if result.success: + assert result.output.get("message") == "No results found" +''' + +# 2. Add to test file +Edit( + file_path=f"exports/{agent_name}/tests/test_edge_cases.py", + old_string="# Add edge case tests here", + new_string=edge_case_test +) + +# 3. Fix agent to handle edge case +# Edit agent code to handle empty results + +# 4. Re-run tests +``` + +## Test File Templates (Reference Only) + +**⚠️ Do NOT copy-paste these templates directly.** Use `generate_constraint_tests` and `generate_success_tests` MCP tools to create properly structured tests with correct imports and fixtures. + +These templates show the structure of generated tests for reference only. + +### Constraint Test Template + +```python +"""Constraint tests for {agent_name}. + +These tests validate that the agent respects its defined constraints. +Requires ANTHROPIC_API_KEY for real testing. +""" + +import os +import pytest +from exports.{agent_name} import default_agent +from aden_tools.credentials import CredentialManager + + +# Enforce API key for real testing +pytestmark = pytest.mark.skipif( + not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"), + reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1." +) + + +@pytest.mark.asyncio +async def test_constraint_{constraint_id}(): + """Test: {constraint_description}""" + # Test implementation based on constraint type + mock_mode = bool(os.environ.get("MOCK_MODE")) + result = await default_agent.run({{"test": "input"}}, mock_mode=mock_mode) + + # Assert constraint is respected + assert True # Replace with actual check +``` + +### Success Criteria Test Template + +```python +"""Success criteria tests for {agent_name}. + +These tests validate that the agent achieves its defined success criteria. +Requires ANTHROPIC_API_KEY for real testing - mock mode cannot validate success criteria. +""" + +import os +import pytest +from exports.{agent_name} import default_agent +from aden_tools.credentials import CredentialManager + + +# Enforce API key for real testing +pytestmark = pytest.mark.skipif( + not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"), + reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1." +) + + +@pytest.mark.asyncio +async def test_success_{criteria_id}(): + """Test: {criteria_description}""" + mock_mode = bool(os.environ.get("MOCK_MODE")) + result = await default_agent.run({{"test": "input"}}, mock_mode=mock_mode) + + assert result.success, f"Agent failed: {{result.error}}" + + # Verify success criterion met + # e.g., assert metric meets target + assert True # Replace with actual check +``` + +### Edge Case Test Template + +```python +"""Edge case tests for {agent_name}. + +These tests validate agent behavior in unusual or boundary conditions. +Requires ANTHROPIC_API_KEY for real testing. +""" + +import os +import pytest +from exports.{agent_name} import default_agent +from aden_tools.credentials import CredentialManager + + +# Enforce API key for real testing +pytestmark = pytest.mark.skipif( + not CredentialManager().is_available("anthropic") and not os.environ.get("MOCK_MODE"), + reason="API key required. Set ANTHROPIC_API_KEY or use MOCK_MODE=1." +) + + +@pytest.mark.asyncio +async def test_edge_case_{scenario_name}(): + """Test: Agent handles {scenario_description}""" + mock_mode = bool(os.environ.get("MOCK_MODE")) + result = await default_agent.run({{"edge": "case_input"}}, mock_mode=mock_mode) + + # Verify graceful handling + assert result.success or result.error is not None +``` + +## Interactive Build + Test Loop + +During agent construction (Agent stage), you can run constraint tests incrementally: + +```python +# After adding first node +print("Added search_node. Running relevant constraint tests...") +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path=f"exports/{agent_name}", + test_types='["constraint"]' +) + +# After adding second node +print("Added filter_node. Running all constraint tests...") +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path=f"exports/{agent_name}", + test_types='["constraint"]' +) +``` + +This provides **immediate feedback** during development, catching issues early. + +## Common Test Patterns + +**Note:** All test patterns should include API key enforcement via conftest.py. + +### ⚠️ CRITICAL: Framework Features You Must Know + +#### OutputCleaner - Automatic I/O Cleaning (NEW!) + +**The framework now automatically validates and cleans node outputs** using a fast LLM (Cerebras llama-3.3-70b) at edge traversal time. This prevents cascading failures from malformed output. + +**What OutputCleaner does**: +- ✅ Validates output matches next node's input schema +- ✅ Detects JSON parsing trap (entire response in one key) +- ✅ Cleans malformed output automatically (~200-500ms, ~$0.001 per cleaning) +- ✅ Boosts success rates by 1.8-2.2x + +**Impact on tests**: Tests should still use safe patterns because OutputCleaner may not catch all issues in test mode. + +#### Safe Test Patterns (REQUIRED) + +**❌ UNSAFE** (will cause test failures): +```python +# Direct key access - can crash! +approval_decision = result.output["approval_decision"] +``` + +**✅ SAFE** (correct patterns): +```python +# 1. Safe dict access with .get() +output = result.output or {} +approval_decision = output.get("approval_decision", "UNKNOWN") +assert "APPROVED" in approval_decision or approval_decision == "APPROVED" + +# 2. Type checking before operations +analysis = output.get("analysis", {}) +if isinstance(analysis, dict): + category = analysis.get("category", "unknown") + +# 3. Parse JSON from strings (the JSON parsing trap!) +import json +recommendation = output.get("recommendation", "{}") +if isinstance(recommendation, str): + try: + parsed = json.loads(recommendation) + if isinstance(parsed, dict): + approval = parsed.get("approval_decision", "UNKNOWN") + except json.JSONDecodeError: + approval = "UNKNOWN" +elif isinstance(recommendation, dict): + approval = recommendation.get("approval_decision", "UNKNOWN") + +# 4. Safe iteration with type check +compliance_issues = output.get("compliance_issues", []) +if isinstance(compliance_issues, list): + for issue in compliance_issues: + ... +``` + +#### Helper Functions for Safe Access + +**Add to conftest.py**: +```python +import json +import re + +def _parse_json_from_output(result, key): + """Parse JSON from agent output (framework may store full LLM response as string).""" + response_text = result.output.get(key, "") + # Remove markdown code blocks if present + json_text = re.sub(r'```json\s*|\s*```', '', response_text).strip() + + try: + return json.loads(json_text) + except (json.JSONDecodeError, AttributeError, TypeError): + return result.output.get(key) + +def safe_get_nested(result, key_path, default=None): + """Safely get nested value from result.output.""" + output = result.output or {} + current = output + + for key in key_path: + if isinstance(current, dict): + current = current.get(key) + elif isinstance(current, str): + try: + json_text = re.sub(r'```json\s*|\s*```', '', current).strip() + parsed = json.loads(json_text) + if isinstance(parsed, dict): + current = parsed.get(key) + else: + return default + except json.JSONDecodeError: + return default + else: + return default + + return current if current is not None else default + +# Make available in tests +pytest.parse_json_from_output = _parse_json_from_output +pytest.safe_get_nested = safe_get_nested +``` + +**Usage in tests**: +```python +# Use helper to parse JSON safely +parsed = pytest.parse_json_from_output(result, "recommendation") +if isinstance(parsed, dict): + approval = parsed.get("approval_decision", "UNKNOWN") + +# Safe nested access +risk_score = pytest.safe_get_nested(result, ["analysis", "risk_score"], default=0.0) +``` + +#### Test Count Guidance + +**Generate 8-15 tests total, NOT 30+** + +- ✅ 2-3 tests per success criterion +- ✅ 1 happy path test +- ✅ 1 boundary/edge case test +- ✅ 1 error handling test (optional) + +**Why fewer tests?**: +- Each test requires real LLM call (~3 seconds, costs money) +- 30 tests = 90 seconds, $0.30+ in costs +- 12 tests = 36 seconds, $0.12 in costs +- Focus on quality over quantity + +#### ExecutionResult Fields (Important!) + +**`result.success=True` means NO exception, NOT goal achieved** + +```python +# ❌ WRONG - assumes goal achieved +assert result.success + +# ✅ RIGHT - check success AND output +assert result.success, f"Agent failed: {result.error}" +output = result.output or {} +approval = output.get("approval_decision") +assert approval == "APPROVED", f"Expected APPROVED, got {approval}" +``` + +**All ExecutionResult fields**: +- `success: bool` - Execution completed without exception (NOT goal achieved!) +- `output: dict` - Complete memory snapshot (may contain raw strings) +- `error: str | None` - Error message if failed +- `steps_executed: int` - Number of nodes executed +- `total_tokens: int` - Cumulative token usage +- `total_latency_ms: int` - Total execution time +- `path: list[str]` - Node IDs traversed +- `paused_at: str | None` - Node ID if HITL pause occurred +- `session_state: dict` - State for resuming + +### Happy Path Test +```python +@pytest.mark.asyncio +async def test_happy_path(mock_mode): + """Test normal successful execution""" + result = await default_agent.run({"query": "test"}, mock_mode=mock_mode) + assert result.success + assert len(result.output) > 0 +``` + +### Boundary Condition Test +```python +@pytest.mark.asyncio +async def test_boundary_minimum(mock_mode): + """Test at minimum threshold""" + result = await default_agent.run({"query": "specific topic"}, mock_mode=mock_mode) + assert result.success + assert len(result.output.get("results", [])) >= 1 +``` + +### Error Handling Test +```python +@pytest.mark.asyncio +async def test_error_handling(mock_mode): + """Test graceful error handling""" + result = await default_agent.run({"query": ""}, mock_mode=mock_mode) + assert not result.success or result.output.get("error") is not None +``` + +### Performance Test +```python +@pytest.mark.asyncio +async def test_performance_latency(mock_mode): + """Test response time is acceptable""" + import time + start = time.time() + result = await default_agent.run({"query": "test"}, mock_mode=mock_mode) + duration = time.time() - start + assert duration < 5.0, f"Took {duration}s, expected <5s" +``` + +## Integration with building-agents + +### Handoff Points + +| Scenario | From | To | Action | +|----------|------|-----|--------| +| Agent built, ready to test | building-agents | testing-agent | Generate success tests | +| LOGIC_ERROR found | testing-agent | building-agents | Update goal, rebuild | +| IMPLEMENTATION_ERROR | testing-agent | Direct fix | Edit agent files, re-run tests | +| EDGE_CASE found | testing-agent | testing-agent | Add edge case test | +| All tests pass | testing-agent | Done | Agent validated ✅ | + +### Iteration Speed Comparison + +| Scenario | Old Approach | New Approach | +|----------|--------------|--------------| +| **Bug Fix** | Rebuild via MCP tools (14 min) | Edit Python file, pytest (2 min) | +| **Add Test** | Generate via MCP, export (5 min) | Write test file directly (1 min) | +| **Debug** | Read subprocess logs | pdb, breakpoints, prints | +| **Inspect** | Limited visibility | Full Python introspection | + +## Anti-Patterns + +### Testing Best Practices + +| Don't | Do Instead | +|-------|------------| +| ❌ Write tests without getting guidelines first | ✅ Use `generate_*_tests` to get proper file_header and guidelines | +| ❌ Run pytest via Bash | ✅ Use `run_tests` MCP tool for structured results | +| ❌ Debug tests with Bash pytest -vvs | ✅ Use `debug_test` MCP tool for formatted output | +| ❌ Check for tests with Glob | ✅ Use `list_tests` MCP tool | +| ❌ Skip the file_header from guidelines | ✅ Always include the file_header for proper imports and fixtures | + +### General Testing + +| Don't | Do Instead | +|-------|------------| +| ❌ Treat all failures the same | ✅ Use debug_test to categorize and iterate appropriately | +| ❌ Rebuild entire agent for small bugs | ✅ Edit code directly, re-run tests | +| ❌ Run tests without API key | ✅ Always set ANTHROPIC_API_KEY first | +| ❌ Write tests without understanding the constraints/criteria | ✅ Read the formatted constraints/criteria from guidelines | + +## Workflow Summary + +``` +1. Check existing tests: list_tests(goal_id, agent_path) + → Scans exports/{agent}/tests/test_*.py + ↓ +2. Get test guidelines: generate_constraint_tests, generate_success_tests + → Returns file_header, test_template, constraints/criteria, guidelines + ↓ +3. Write tests: Use file operations with the provided guidelines + → Write tests to exports/{agent}/tests/test_*.py + ↓ +4. Run tests: run_tests(goal_id, agent_path) + → Executes: pytest exports/{agent}/tests/ -v + ↓ +5. Debug failures: debug_test(goal_id, test_name, agent_path) + → Re-runs single test with verbose output + ↓ +6. Fix based on category: + - IMPLEMENTATION_ERROR → Edit agent code directly + - ASSERTION_FAILURE → Fix agent logic or update test + - IMPORT_ERROR → Check package structure + - API_ERROR → Check API keys and connectivity + ↓ +7. Re-run tests: run_tests(goal_id, agent_path) + ↓ +8. Repeat until all pass ✅ +``` + +## MCP Tools Reference + +```python +# Check existing tests (scans Python test files) +mcp__agent-builder__list_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent" +) + +# Get constraint test guidelines (returns templates and guidelines, NOT generated tests) +mcp__agent-builder__generate_constraint_tests( + goal_id="your-goal-id", + goal_json='{"id": "...", "constraints": [...]}', + agent_path="exports/your_agent" +) +# Returns: output_file, file_header, test_template, constraints_formatted, test_guidelines + +# Get success criteria test guidelines +mcp__agent-builder__generate_success_tests( + goal_id="your-goal-id", + goal_json='{"id": "...", "success_criteria": [...]}', + node_names="node1,node2", + tool_names="tool1,tool2", + agent_path="exports/your_agent" +) +# Returns: output_file, file_header, test_template, success_criteria_formatted, test_guidelines + +# Run tests via pytest subprocess +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent" +) + +# Debug a failed test (re-runs with verbose output) +mcp__agent-builder__debug_test( + goal_id="your-goal-id", + test_name="test_constraint_foo", + agent_path="exports/your_agent" +) +``` + +## run_tests Options + +```python +# Run only constraint tests +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent", + test_types='["constraint"]' +) + +# Run only success criteria tests +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent", + test_types='["success"]' +) + +# Run with pytest-xdist parallelism (requires pytest-xdist) +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent", + parallel=4 +) + +# Stop on first failure +mcp__agent-builder__run_tests( + goal_id="your-goal-id", + agent_path="exports/your_agent", + fail_fast=True +) +``` + +## Direct pytest Commands + +You can also run tests directly with pytest (the MCP tools use pytest internally): + +```bash +# Run all tests +pytest exports/your_agent/tests/ -v + +# Run specific test file +pytest exports/your_agent/tests/test_constraints.py -v + +# Run specific test +pytest exports/your_agent/tests/test_constraints.py::test_constraint_foo -vvs + +# Run in mock mode (structure validation only) +MOCK_MODE=1 pytest exports/your_agent/tests/ -v +``` + +--- + +**MCP tools generate tests, write them to Python files, and run them via pytest.** +```` diff --git a/.gitignore b/.gitignore index adbb2814ac..22078b7d68 100644 --- a/.gitignore +++ b/.gitignore @@ -23,6 +23,8 @@ docker-compose.override.yml .vscode/* !.vscode/extensions.json !.vscode/settings.json.example +!.vscode/mcp.json +!.vscode/settings.json *.swp *.swo *~ diff --git a/.vscode/mcp.json b/.vscode/mcp.json new file mode 100644 index 0000000000..1bd389fe37 --- /dev/null +++ b/.vscode/mcp.json @@ -0,0 +1,34 @@ +{ + "servers": { + "agent-builder": { + "type": "stdio", + "command": "uv", + "args": [ + "run", + "--directory", + "${workspaceFolder}/core", + "python", + "-m", + "framework.mcp.agent_builder_server" + ], + "env": { + "PYTHONPATH": "${workspaceFolder}/tools/src" + } + }, + "tools": { + "type": "stdio", + "command": "uv", + "args": [ + "run", + "--directory", + "${workspaceFolder}/tools", + "python", + "mcp_server.py", + "--stdio" + ], + "env": { + "PYTHONPATH": "${workspaceFolder}/tools/src:${workspaceFolder}/core" + } + } + } +} diff --git a/.vscode/settings.json b/.vscode/settings.json new file mode 100644 index 0000000000..c303884ae6 --- /dev/null +++ b/.vscode/settings.json @@ -0,0 +1,10 @@ +{ + // Enable Agent Skills (experimental feature as of VS Code 1.108) + "chat.useAgentSkills": true, + + // MCP Access Level + "chat.mcp.access": "all", + + // Auto-start MCP servers (experimental) + "chat.mcp.autostart": "newAndOutdated", +} diff --git a/DEVELOPER.md b/DEVELOPER.md index be3bd6fc10..49643a9adf 100644 --- a/DEVELOPER.md +++ b/DEVELOPER.md @@ -26,6 +26,7 @@ Aden Agent Framework is a Python-based system for building goal-driven, self-imp | **tools** | `/tools` | MCP tools for agent capabilities | Python 3.11+ | | **exports** | `/exports` | Agent packages (user-created, gitignored) | Python 3.11+ | | **skills** | `.claude` | Claude Code skills for building/testing | Markdown | +| **agents** | `.github/agents` | VS Code custom agents | Markdown | ### Key Principles @@ -46,7 +47,10 @@ Ensure you have installed: - **Python 3.11+** - [Download](https://www.python.org/downloads/) (3.12 or 3.13 recommended) - **uv** - Python package manager ([Install](https://docs.astral.sh/uv/getting-started/installation/)) - **git** - Version control -- **Claude Code** - [Install](https://docs.anthropic.com/claude/docs/claude-code) (optional, for using building skills) +- **IDE** (pick one): + - **VS Code** with [GitHub Copilot](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot) - [Setup guide](docs/vscode-copilot-setup.md) + - **Cursor** - [Download](https://cursor.sh/) + - **Claude Code** - [Install](https://docs.anthropic.com/claude/docs/claude-code) Verify installation: @@ -217,19 +221,39 @@ hive/ # Repository root ## Building Agents -### Using Claude Code Skills +### Using IDE Skills/Agents -The fastest way to build agents is using the Claude Code skills: +The fastest way to build agents is using your IDE's skills or agents. These are installed/configured automatically when you run `./quickstart.sh`. +**Claude Code:** ```bash -# Install skills (one-time) -./quickstart.sh - -# Build a new agent +# Skills are available in .claude/skills/ +# Use them directly in Claude Code: claude> /building-agents-construction - -# Test the agent claude> /testing-agent +claude> /agent-workflow +``` + +**Cursor:** +```bash +# Skills are available via MCP in .cursor/mcp.json +# Type / in Agent chat to access them: +/building-agents-construction +/testing-agent +/agent-workflow +``` + +**VS Code + GitHub Copilot:** +```bash +# Custom agents are available in .github/agents/ +# Open Copilot Chat (Cmd/Ctrl + Shift + I) +# Click the mode dropdown at the top +# Select a custom agent: +# - agent-workflow (for complete workflow) +# - building-agents-construction (step-by-step) +# - testing-agent (testing) +# Then type your request: +Build a customer support agent ``` ### Agent Development Workflow diff --git a/README.md b/README.md index e1e2cd02e4..fe0c321428 100644 --- a/README.md +++ b/README.md @@ -91,7 +91,7 @@ Aden is a platform for building, deploying, operating, and adapting AI agents: ## Prerequisites - Python 3.11+ for agent development -- Claude Code or Cursor for utilizing agent skills +- IDE with MCP support: [VS Code](https://code.visualstudio.com/) + [GitHub Copilot](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot), [Cursor](https://cursor.sh/), or [Claude Code](https://claude.ai/) > **Note for Windows Users:** It is strongly recommended to use **WSL (Windows Subsystem for Linux)** or **Git Bash** to run this framework. Some core automation scripts may not execute correctly in standard Command Prompt or PowerShell. @@ -113,18 +113,38 @@ This sets up: ### Build Your First Agent +**Claude Code:** ```bash -# Build an agent using Claude Code +# Use skills directly claude> /building-agents-construction - -# Test your agent claude> /testing-agent +``` + +**Cursor:** +```bash +# Type / in Agent chat +/building-agents-construction +/testing-agent +``` -# Run your agent -PYTHONPATH=exports uv run python -m your_agent_name run --input '{...}' +**VS Code + GitHub Copilot:** +```bash +# Open Copilot Chat (Cmd/Ctrl + Shift + I) +# Select a custom agent from the mode dropdown: +# - agent-workflow +# - building-agents-construction +# - testing-agent +# Then type your request: +Build a file monitor agent +``` + +**Run your agent:** +```bash +PYTHONPATH=core:exports python -m your_agent_name run --input '{...}' ``` **[📖 Complete Setup Guide](ENVIRONMENT_SETUP.md)** - Detailed instructions for agent development +**[VS Code + GitHub Copilot Setup](docs/vscode-copilot-setup.md)** ### Cursor IDE Support diff --git a/docs/vscode-copilot-setup.md b/docs/vscode-copilot-setup.md new file mode 100644 index 0000000000..edc645d9d5 --- /dev/null +++ b/docs/vscode-copilot-setup.md @@ -0,0 +1,295 @@ +# VS Code + GitHub Copilot Setup Guide + +This guide helps you set up VS Code with GitHub Copilot to use Hive's MCP servers and custom agents for building and testing AI agents. + +## Prerequisites + +- **VS Code** version 1.102 or later (for MCP support) +- **GitHub Copilot** extension installed and activated +- **Python 3.11+** installed +- **uv** package manager ([installation guide](https://docs.astral.sh/uv/)) + +## Quick Start + +The Hive repository comes pre-configured with VS Code support! If you cloned the repository, MCP servers and custom agents are already set up. + +### 1. Verify Installation + +Open VS Code in the Hive repository: + +```bash +cd hive +code . +``` + +### 2. Check MCP Configuration + +The `.vscode/mcp.json` file should contain two MCP servers: + +- **agent-builder** - Tools for creating and testing agents +- **tools** - 19 tools for agent capabilities (web search, file operations, etc.) + +You can view the configuration: + +```bash +cat .vscode/mcp.json +``` + +### 3. Verify Custom Agents + +Custom agents are available in `.github/agents/`: + +```bash +ls .github/agents/ +``` + +You should see 6 `.agent.md` files: +- `agent-workflow.agent.md` - Complete agent development workflow +- `building-agents-core.agent.md` - Core concepts and fundamentals +- `building-agents-construction.agent.md` - Step-by-step agent building +- `building-agents-patterns.agent.md` - Best practices and patterns +- `testing-agent.agent.md` - Testing and validation +- `setup-credentials.agent.md` - Credential management + +### 4. Enable MCP in VS Code Settings + +The `.vscode/settings.json` is pre-configured with: + +```jsonc +{ + // Enable Agent Skills (experimental) + "chat.useAgentSkills": true, + + // Enable MCP servers + "chat.mcp.access": true, + + // Auto-start MCP servers + "chat.mcp.autostart": true +} +``` + +### 5. Test the Setup + +Open GitHub Copilot Chat (Cmd/Ctrl + Shift + I): + +1. Click the mode dropdown at the top of the chat panel +2. You'll see standard modes (Ask, Plan, Agent) plus your custom agents: + - `agent-workflow` + - `building-agents-construction` + - `building-agents-core` + - `building-agents-patterns` + - `testing-agent` + - `setup-credentials` + +3. Select a custom agent (e.g., `agent-workflow`) +4. Ask the agent to help you: + ``` + Build a simple file monitor agent + ``` + +The custom agent will guide you through the process with access to MCP tools. + +## Understanding the Setup + +### MCP Configuration (`.vscode/mcp.json`) + +MCP servers provide tools that GitHub Copilot can use. The Hive repository includes two servers: + +#### agent-builder Server + +Provides tools for building and testing agents: +- `create_session` - Start a new agent build session +- `add_node` - Add nodes to agent workflow +- `add_edge` - Connect nodes with edges +- `set_goal` - Define agent goals and success criteria +- `test_node` - Validate node configuration +- `validate_graph` - Check agent structure +- `generate_constraint_tests` - Create constraint tests +- `generate_success_tests` - Create success criteria tests +- `run_tests` - Execute agent tests +- `debug_test` - Debug test failures + +#### tools Server + +Provides 19 operational tools: +- **Web**: `web_search`, `web_scrape`, `fetch_webpage` +- **Files**: `read_file`, `write_file`, `list_directory`, `file_search` +- **Shell**: `run_command` +- **Git**: `git_status`, `git_diff`, `git_commit` +- **AI**: `llm_generate`, `llm_extract_json` +- And more... + +### Custom Agents (`.github/agents/*.agent.md`) + +Custom agents are specialized assistants that guide specific tasks. They have access to MCP tools and workspace context. + +#### Available Agents + +1. **agent-workflow** - Orchestrates the complete agent development process from concept to production +2. **building-agents-core** - Teaches agent architecture, node types, and core concepts +3. **building-agents-construction** - Guides step-by-step agent building with interactive approval +4. **building-agents-patterns** - Provides best practices, design patterns, and anti-patterns +5. **testing-agent** - Creates and runs comprehensive test suites for agents +6. **setup-credentials** - Manages API keys and credentials securely + +#### Using Custom Agents + +To use a custom agent: + +1. Open Copilot Chat (Cmd/Ctrl + Shift + I) +2. Click the **mode dropdown** at the top of the chat panel +3. Select the specific custom agent you want to use (they appear alongside Ask, Plan, and Agent modes): + - **agent-workflow** - "I want to build a sales prospecting agent" + - **building-agents-core** - "Explain node types and agent architecture" + - **building-agents-construction** - "Create a new agent step by step" + - **building-agents-patterns** - "Show me best practices for error handling" + - **testing-agent** - "Test the agent in exports/my_agent" + - **setup-credentials** - "Configure credentials for hubspot-agent" +4. Type your request in the chat + +The selected custom agent will guide you through the task with specialized knowledge and access to MCP tools. + +## Troubleshooting + +### MCP Servers Not Starting + +**Symptoms**: Copilot doesn't have access to MCP tools + +**Solutions**: + +1. Check VS Code version (must be 1.102+) +2. Verify `uv` is installed: `uv --version` +3. Check VS Code Output panel → "MCP" for error messages +4. Manually restart MCP servers: + - Open Command Palette (Cmd/Ctrl + Shift + P) + - Run: "GitHub Copilot: Restart MCP Servers" + +### Custom Agents Not Available + +**Symptoms**: Custom agents don't appear in the mode dropdown + +**Solutions**: + +1. Verify `chat.useAgentSkills` is `true` in `.vscode/settings.json` +2. Check `.github/agents/` directory exists with 6 `.agent.md` files +3. Reload VS Code window: "Developer: Reload Window" (Cmd/Ctrl + Shift + P) +4. Check VS Code version (1.108+ required for custom agents) +5. Ensure GitHub Copilot extension is up to date + +### Permission Errors + +**Symptoms**: "Permission denied" when MCP tries to run Python + +**Solutions**: + +1. Ensure Python 3.11+ is in PATH: `python --version` +2. Verify `uv` can run Python: `uv run python --version` +3. Check file permissions on `core/` and `tools/` directories + +### Tool Import Errors + +**Symptoms**: MCP server fails with "ModuleNotFoundError" + +**Solutions**: + +1. Install dependencies: + ```bash + cd core && uv sync + cd ../tools && uv sync + ``` + +2. Verify PYTHONPATH in `.vscode/mcp.json`: + ```json + "env": { + "PYTHONPATH": "${workspaceFolder}/tools/src:${workspaceFolder}/core" + } + ``` + +## Advanced Configuration + +### Adding Custom MCP Servers + +To add your own MCP servers, edit `.vscode/mcp.json`: + +```json +{ + "servers": { + "my-server": { + "type": "stdio", + "command": "uv", + "args": [ + "run", + "--directory", + "${workspaceFolder}/my-server", + "python", + "server.py" + ], + "env": { + "PYTHONPATH": "${workspaceFolder}/my-server" + } + } + } +} +``` + +### Creating Custom Agents + +Create a new `.agent.md` file in `.github/agents/`: + +```markdown +--- +description: Your agent description +name: My Custom Agent +tools: ['agent-builder/*', 'tools/*'] +target: vscode +--- + +# Your Agent Content + +Instructions and guidance for your custom agent... +``` + +### Environment Variables + +MCP servers can access environment variables. Common ones: + +- `ANTHROPIC_API_KEY` - For LLM calls in agents +- `HUBSPOT_ACCESS_TOKEN` - For HubSpot integration tools +- `BRAVE_SEARCH_API_KEY` - For web search tools + +Set these in your shell profile (`~/.bashrc`, `~/.zshrc`) or use the `setup-credentials` agent. + +## Differences from Other IDEs + +| Feature | VS Code | Cursor | Claude Code | +|---------|---------|--------|-------------| +| **MCP Config** | `.vscode/mcp.json` | `.cursor/mcp.json` | Built-in | +| **Agents/Skills** | `.github/agents/*.agent.md` | Symlinks in `.cursor/skills/` | `.claude/skills/` | +| **Path Variables** | `${workspaceFolder}` | Relative paths | Relative paths | +| **Discovery** | Workspace settings | IDE-specific | Built-in | +| **Setup** | Pre-configured | Pre-configured | Pre-configured | + +All IDEs in this repository have equivalent functionality - choose based on your preference! + +## Next Steps + +- **Build your first agent**: Open Copilot Chat, select `agent-workflow` from the mode dropdown, and describe what you want to build +- **Read the docs**: Check `docs/getting-started.md` for tutorials +- **Explore examples**: See `exports/` for example agents +- **Join the community**: [Discord](https://discord.com/invite/MXE49hrKDk) + +## Resources + +- [VS Code MCP Documentation](https://code.visualstudio.com/docs/copilot/customization/mcp-servers) +- [VS Code Custom Agents Documentation](https://code.visualstudio.com/docs/copilot/customization/custom-agents) +- [Hive Documentation](https://docs.adenhq.com/) +- [GitHub Copilot Extension](https://marketplace.visualstudio.com/items?itemName=GitHub.copilot) + +## Support + +Having issues? + +1. Check the [troubleshooting section](#troubleshooting) above +2. Search [GitHub Issues](https://github.com/adenhq/hive/issues) +3. Ask on [Discord](https://discord.com/invite/MXE49hrKDk) +4. [Open a new issue](https://github.com/adenhq/hive/issues/new)