Skip to content

Latest commit

 

History

History
498 lines (381 loc) · 17.5 KB

File metadata and controls

498 lines (381 loc) · 17.5 KB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Project Overview

Auto Claude is a multi-agent autonomous coding framework that builds software through coordinated AI agent sessions. It uses the Claude Agent SDK to run agents in isolated workspaces with security controls.

CRITICAL: All AI interactions use the Claude Agent SDK (claude-agent-sdk package), NOT the Anthropic API directly.

Project Structure

autonomous-coding/
├── apps/
│   ├── backend/           # Python backend/CLI - ALL agent logic lives here
│   │   ├── core/          # Client, auth, security
│   │   ├── agents/        # Agent implementations
│   │   ├── spec_agents/   # Spec creation agents
│   │   ├── integrations/  # Graphiti, Linear, GitHub
│   │   └── prompts/       # Agent system prompts
│   └── frontend/          # Electron desktop UI
├── guides/                # Documentation
├── tests/                 # Test suite
└── scripts/               # Build and utility scripts

When working with AI/LLM code:

  • Look in apps/backend/core/client.py for the Claude SDK client setup
  • Reference apps/backend/agents/ for working agent implementations
  • Check apps/backend/spec_agents/ for spec creation agent examples
  • NEVER use anthropic.Anthropic() directly - always use create_client() from core.client

Frontend (Electron Desktop App):

  • Built with Electron, React, TypeScript
  • AI agents can perform E2E testing using the Electron MCP server
  • When bug fixing or implementing features, use the Electron MCP server for automated testing
  • See "End-to-End Testing" section below for details

Commands

Setup

Requirements:

  • Python 3.12+ (required for backend)
  • Node.js (for frontend)
# Install all dependencies from root
npm run install:all

# Or install separately:
# Backend (from apps/backend/)
cd apps/backend && uv venv && uv pip install -r requirements.txt

# Frontend (from apps/frontend/)
cd apps/frontend && npm install

# Set up OAuth token
claude setup-token
# Add to apps/backend/.env: CLAUDE_CODE_OAUTH_TOKEN=your-token

Creating and Running Specs

cd apps/backend

# Create a spec interactively
python spec_runner.py --interactive

# Create spec from task description
python spec_runner.py --task "Add user authentication"

# Force complexity level (simple/standard/complex)
python spec_runner.py --task "Fix button" --complexity simple

# Run autonomous build
python run.py --spec 001

# List all specs
python run.py --list

Workspace Management

cd apps/backend

# Review changes in isolated worktree
python run.py --spec 001 --review

# Merge completed build into project
python run.py --spec 001 --merge

# Discard build
python run.py --spec 001 --discard

QA Validation

cd apps/backend

# Run QA manually
python run.py --spec 001 --qa

# Check QA status
python run.py --spec 001 --qa-status

Testing

# Install test dependencies (required first time)
cd apps/backend && uv pip install -r ../../tests/requirements-test.txt

# Run all tests (use virtual environment pytest)
apps/backend/.venv/bin/pytest tests/ -v

# Run single test file
apps/backend/.venv/bin/pytest tests/test_security.py -v

# Run specific test
apps/backend/.venv/bin/pytest tests/test_security.py::test_bash_command_validation -v

# Skip slow tests
apps/backend/.venv/bin/pytest tests/ -m "not slow"

# Or from root
npm run test:backend

Spec Validation

python apps/backend/validate_spec.py --spec-dir apps/backend/specs/001-feature --checkpoint all

Releases

# 1. Bump version on your branch (creates commit, no tag)
node scripts/bump-version.js patch   # 2.8.0 -> 2.8.1
node scripts/bump-version.js minor   # 2.8.0 -> 2.9.0
node scripts/bump-version.js major   # 2.8.0 -> 3.0.0

# 2. Push and create PR to main
git push origin your-branch
gh pr create --base main

# 3. Merge PR → GitHub Actions automatically:
#    - Creates tag
#    - Builds all platforms
#    - Creates release with changelog
#    - Updates README

See RELEASE.md for detailed release process documentation.

Architecture

Core Pipeline

Spec Creation (spec_runner.py) - Dynamic 3-8 phase pipeline based on task complexity:

  • SIMPLE (3 phases): Discovery → Quick Spec → Validate
  • STANDARD (6-7 phases): Discovery → Requirements → [Research] → Context → Spec → Plan → Validate
  • COMPLEX (8 phases): Full pipeline with Research and Self-Critique phases

Implementation (run.py → agent.py) - Multi-session build:

  1. Planner Agent creates subtask-based implementation plan
  2. Coder Agent implements subtasks (can spawn subagents for parallel work)
  3. QA Reviewer validates acceptance criteria (can perform E2E testing via Electron MCP for frontend changes)
  4. QA Fixer resolves issues in a loop (with E2E testing to verify fixes)

Key Components (apps/backend/)

Core Infrastructure:

  • core/client.py - Claude Agent SDK client factory with security hooks and tool permissions
  • core/security.py - Dynamic command allowlisting based on detected project stack
  • core/auth.py - OAuth token management for Claude SDK authentication
  • agents/ - Agent implementations (planner, coder, qa_reviewer, qa_fixer)
  • spec_agents/ - Spec creation agents (gatherer, researcher, writer, critic)

Memory & Context:

  • integrations/graphiti/ - Graphiti memory system (mandatory)
    • queries_pkg/graphiti.py - Main GraphitiMemory class
    • queries_pkg/client.py - LadybugDB client wrapper
    • queries_pkg/queries.py - Graph query operations
    • queries_pkg/search.py - Semantic search logic
    • queries_pkg/schema.py - Graph schema definitions
  • graphiti_config.py - Configuration and validation for Graphiti integration
  • graphiti_providers.py - Multi-provider factory (OpenAI, Anthropic, Azure, Ollama, Google AI)
  • agents/memory_manager.py - Session memory orchestration

Workspace & Security:

  • cli/worktree.py - Git worktree isolation for safe feature development
  • context/project_analyzer.py - Project stack detection for dynamic tooling
  • auto_claude_tools.py - Custom MCP tools integration

Integrations:

  • linear_updater.py - Optional Linear integration for progress tracking
  • runners/github/ - GitHub Issues & PRs automation
  • Electron MCP - E2E testing integration for QA agents (Chrome DevTools Protocol)
    • Enabled with ELECTRON_MCP_ENABLED=true in .env
    • Allows QA agents to interact with running Electron app
    • See "End-to-End Testing" section for details

Agent Prompts (apps/backend/prompts/)

Prompt Purpose
planner.md Creates implementation plan with subtasks
coder.md Implements individual subtasks
coder_recovery.md Recovers from stuck/failed subtasks
qa_reviewer.md Validates acceptance criteria
qa_fixer.md Fixes QA-reported issues
spec_gatherer.md Collects user requirements
spec_researcher.md Validates external integrations
spec_writer.md Creates spec.md document
spec_critic.md Self-critique using ultrathink
complexity_assessor.md AI-based complexity assessment

Spec Directory Structure

Each spec in .auto-claude/specs/XXX-name/ contains:

  • spec.md - Feature specification
  • requirements.json - Structured user requirements
  • context.json - Discovered codebase context
  • implementation_plan.json - Subtask-based plan with status tracking
  • qa_report.md - QA validation results
  • QA_FIX_REQUEST.md - Issues to fix (when rejected)

Branching & Worktree Strategy

Auto Claude uses git worktrees for isolated builds. All branches stay LOCAL until user explicitly pushes:

main (user's branch)
└── auto-claude/{spec-name}  ← spec branch (isolated worktree)

Key principles:

  • ONE branch per spec (auto-claude/{spec-name})
  • Parallel work uses subagents (agent decides when to spawn)
  • NO automatic pushes to GitHub - user controls when to push
  • User reviews in spec worktree (.worktrees/{spec-name}/)
  • Final merge: spec branch → main (after user approval)

Workflow:

  1. Build runs in isolated worktree on spec branch
  2. Agent implements subtasks (can spawn subagents for parallel work)
  3. User tests feature in .worktrees/{spec-name}/
  4. User runs --merge to add to their project
  5. User pushes to remote when ready

Contributing to Upstream

CRITICAL: When submitting PRs to AndyMik90/Auto-Claude, always target the develop branch, NOT main.

Correct workflow for contributions:

  1. Fetch upstream: git fetch upstream
  2. Create feature branch from upstream/develop: git checkout -b fix/my-fix upstream/develop
  3. Make changes and commit with sign-off: git commit -s -m "fix: description"
  4. Push to your fork: git push origin fix/my-fix
  5. Create PR targeting develop: gh pr create --repo AndyMik90/Auto-Claude --base develop

Verify before PR:

# Ensure only your commits are included
git log --oneline upstream/develop..HEAD

Security Model

Three-layer defense:

  1. OS Sandbox - Bash command isolation
  2. Filesystem Permissions - Operations restricted to project directory
  3. Command Allowlist - Dynamic allowlist from project analysis (security.py + project_analyzer.py)

Security profile cached in .auto-claude-security.json.

Claude Agent SDK Integration

CRITICAL: Auto Claude uses the Claude Agent SDK for ALL AI interactions. Never use the Anthropic API directly.

Client Location: apps/backend/core/client.py

The create_client() function creates a configured ClaudeSDKClient instance with:

  • Multi-layered security (sandbox, permissions, security hooks)
  • Agent-specific tool permissions (planner, coder, qa_reviewer, qa_fixer)
  • Dynamic MCP server integration based on project capabilities
  • Extended thinking token budget control

Example usage in agents:

from core.client import create_client

# Create SDK client (NOT raw Anthropic API client)
client = create_client(
    project_dir=project_dir,
    spec_dir=spec_dir,
    model="claude-sonnet-4-5-20250929",
    agent_type="coder",
    max_thinking_tokens=None  # or 5000/10000/16000
)

# Run agent session
response = client.create_agent_session(
    name="coder-agent-session",
    starting_message="Implement the authentication feature"
)

Why use the SDK:

  • Pre-configured security (sandbox, allowlists, hooks)
  • Automatic MCP server integration (Context7, Linear, Graphiti, Electron, Puppeteer)
  • Tool permissions based on agent role
  • Session management and recovery
  • Unified API across all agent types

Where to find working examples:

  • apps/backend/agents/planner.py - Planner agent
  • apps/backend/agents/coder.py - Coder agent
  • apps/backend/agents/qa_reviewer.py - QA reviewer
  • apps/backend/agents/qa_fixer.py - QA fixer
  • apps/backend/spec_agents/ - Spec creation agents

Memory System

Graphiti Memory (Mandatory) - integrations/graphiti/

Auto Claude uses Graphiti as its primary memory system with embedded LadybugDB (no Docker required):

  • Graph database with semantic search - Knowledge graph for cross-session context
  • Session insights - Patterns, gotchas, discoveries automatically extracted
  • Multi-provider support:
    • LLM: OpenAI, Anthropic, Azure OpenAI, Ollama, Google AI (Gemini)
    • Embedders: OpenAI, Voyage AI, Azure OpenAI, Ollama, Google AI
  • Modular architecture: (integrations/graphiti/queries_pkg/)
    • graphiti.py - Main GraphitiMemory class
    • client.py - LadybugDB client wrapper
    • queries.py - Graph query operations
    • search.py - Semantic search logic
    • schema.py - Graph schema definitions

Configuration:

  • Set provider credentials in apps/backend/.env (see .env.example)
  • Required env vars: GRAPHITI_ENABLED=true, ANTHROPIC_API_KEY or other provider keys
  • Memory data stored in .auto-claude/specs/XXX/graphiti/

Usage in agents:

from integrations.graphiti.memory import get_graphiti_memory

memory = get_graphiti_memory(spec_dir, project_dir)
context = memory.get_context_for_session("Implementing feature X")
memory.add_session_insight("Pattern: use React hooks for state")

Development Guidelines

Frontend Internationalization (i18n)

CRITICAL: Always use i18n translation keys for all user-facing text in the frontend.

The frontend uses react-i18next for internationalization. All labels, buttons, messages, and user-facing text MUST use translation keys.

Translation file locations:

  • apps/frontend/src/shared/i18n/locales/en/*.json - English translations
  • apps/frontend/src/shared/i18n/locales/fr/*.json - French translations

Translation namespaces:

  • common.json - Shared labels, buttons, common terms
  • navigation.json - Sidebar navigation items, sections
  • settings.json - Settings page content
  • dialogs.json - Dialog boxes and modals
  • tasks.json - Task/spec related content
  • onboarding.json - Onboarding wizard content
  • welcome.json - Welcome screen content

Usage pattern:

import { useTranslation } from 'react-i18next';

// In component
const { t } = useTranslation(['navigation', 'common']);

// Use translation keys, NOT hardcoded strings
<span>{t('navigation:items.githubPRs')}</span>  // ✅ CORRECT
<span>GitHub PRs</span>                          //  WRONG

When adding new UI text:

  1. Add the translation key to ALL language files (at minimum: en/*.json and fr/*.json)
  2. Use namespace:section.key format (e.g., navigation:items.githubPRs)
  3. Never use hardcoded strings in JSX/TSX files

End-to-End Testing (Electron App)

IMPORTANT: When bug fixing or implementing new features in the frontend, AI agents can perform automated E2E testing using the Electron MCP server.

The Electron MCP server allows QA agents to interact with the running Electron app via Chrome DevTools Protocol:

Setup:

  1. Start the Electron app with remote debugging enabled:

    npm run dev  # Already configured with --remote-debugging-port=9222
  2. Enable Electron MCP in apps/backend/.env:

    ELECTRON_MCP_ENABLED=true
    ELECTRON_DEBUG_PORT=9222  # Default port

Available Testing Capabilities:

QA agents (qa_reviewer and qa_fixer) automatically get access to Electron MCP tools:

  1. Window Management

    • mcp__electron__get_electron_window_info - Get info about running windows
    • mcp__electron__take_screenshot - Capture screenshots for visual verification
  2. UI Interaction

    • mcp__electron__send_command_to_electron with commands:
      • click_by_text - Click buttons/links by visible text
      • click_by_selector - Click elements by CSS selector
      • fill_input - Fill form fields by placeholder or selector
      • select_option - Select dropdown options
      • send_keyboard_shortcut - Send keyboard shortcuts (Enter, Ctrl+N, etc.)
      • navigate_to_hash - Navigate to hash routes (#settings, #create, etc.)
  3. Page Inspection

    • get_page_structure - Get organized overview of page elements
    • debug_elements - Get debugging info about buttons and forms
    • verify_form_state - Check form state and validation
    • eval - Execute custom JavaScript code
  4. Logging

    • mcp__electron__read_electron_logs - Read console logs for debugging

Example E2E Test Flow:

# 1. Agent takes screenshot to see current state
agent: "Take a screenshot to see the current UI"
# Uses: mcp__electron__take_screenshot

# 2. Agent inspects page structure
agent: "Get page structure to find available buttons"
# Uses: mcp__electron__send_command_to_electron (command: "get_page_structure")

# 3. Agent clicks a button to navigate
agent: "Click the 'Create New Spec' button"
# Uses: mcp__electron__send_command_to_electron (command: "click_by_text", args: {text: "Create New Spec"})

# 4. Agent fills out a form
agent: "Fill the task description field"
# Uses: mcp__electron__send_command_to_electron (command: "fill_input", args: {placeholder: "Describe your task", value: "Add login feature"})

# 5. Agent submits and verifies
agent: "Click Submit and verify success"
# Uses: click_by_text → take_screenshot → verify result

When to Use E2E Testing:

  • Bug Fixes: Reproduce the bug, apply fix, verify it's resolved
  • New Features: Implement feature, test the UI flow end-to-end
  • UI Changes: Verify visual changes and interactions work correctly
  • Form Validation: Test form submission, validation, error handling

Configuration in core/client.py:

The client automatically enables Electron MCP tools for QA agents when:

  • Project is detected as Electron (is_electron capability)
  • ELECTRON_MCP_ENABLED=true is set
  • Agent type is qa_reviewer or qa_fixer

Note: Screenshots are automatically compressed (1280x720, quality 60, JPEG) to stay under Claude SDK's 1MB JSON message buffer limit.

Running the Application

As a standalone CLI tool:

cd apps/backend
python run.py --spec 001

With the Electron frontend:

npm start        # Build and run desktop app
npm run dev      # Run in development mode (includes --remote-debugging-port=9222 for E2E testing)

For E2E Testing with QA Agents:

  1. Start the Electron app: npm run dev
  2. Enable Electron MCP in apps/backend/.env: ELECTRON_MCP_ENABLED=true
  3. Run QA: python run.py --spec 001 --qa
  4. QA agents will automatically interact with the running app for testing

Project data storage:

  • .auto-claude/specs/ - Per-project data (specs, plans, QA reports, memory) - gitignored