feat: Multi-Agent Engine System for Ralphy by zkwentz · Pull Request #7 · zkwentz/ralphy

zkwentz · 2026-01-19T02:49:32Z

Summary

This PR introduces a comprehensive multi-agent engine system that transforms Ralphy from a single-engine orchestrator into an intelligent multi-engine system. The implementation enables multiple AI coding engines to work simultaneously with intelligent task routing, consensus-based decision making, and performance-based learning.

Key Features

Consensus Mode: Run multiple engines on the same task with AI meta-agent reviewing and selecting/merging the best solution
Specialization Mode: Intelligent task routing to specialized engines based on pattern matching rules
Race Mode: Multiple engines compete in parallel, first successful solution wins
Meta-Agent Resolution: AI-powered conflict resolution and solution comparison
Performance Tracking: Adaptive engine selection based on historical performance metrics
Cost Management: Built-in cost controls and estimation for multi-engine execution

Execution Modes

Consensus Mode - Critical tasks requiring high confidence
- 2+ engines work on same task
- Meta-agent reviews all solutions
- Best for: Complex refactoring, critical bug fixes, architecture changes
Specialization Mode - Efficient task distribution
- Routes tasks based on engine strengths
- Configurable pattern matching rules
- Best for: Large PRDs with mixed task types
Race Mode - Speed optimization
- Parallel execution with early winner detection
- First valid solution accepted
- Best for: Simple bug fixes, formatting, documentation

Architecture Changes

New Modules: .ralphy/engines.sh, .ralphy/modes.sh, .ralphy/meta-agent.sh, .ralphy/metrics.sh
Enhanced Config: Extended .ralphy/config.yaml with engine settings, routing rules, and cost controls
Metrics System: Performance tracking with adaptive learning capabilities
Validation Gates: Comprehensive solution validation (tests, linting, build verification)

CLI Interface

# Mode selection
./ralphy.sh --mode consensus
./ralphy.sh --mode specialization
./ralphy.sh --mode race

# Engine configuration
./ralphy.sh --consensus-engines "claude,cursor,opencode"
./ralphy.sh --meta-agent claude

# Performance tracking
./ralphy.sh --show-metrics

Implementation Highlights

27 files changed: 7,717 insertions, 67 deletions
Comprehensive test suites for each mode and component
Backwards compatible with existing Ralphy workflows
Bash-based implementation maintaining low barrier to entry
Modular design for easy maintenance and extension

Testing Coverage

Consensus mode with similar/different results
Specialization mode with pattern matching
Race mode with early winner and failure scenarios
Meta-agent decision parsing and validation
Metrics recording and adaptive selection
Cost limit enforcement
Validation gate handling

Documentation

Detailed MultiAgentPlan.md with architecture overview
Extended README with usage examples
Comprehensive test files demonstrating each feature

Test Plan

All test files passing (test_consensus.sh, test_race_mode.sh, test_specialization.sh, etc.)
Manual testing of all three execution modes
Meta-agent decision making validated
Performance metrics tracking verified
Cost controls functioning
Backwards compatibility confirmed

🤖 Generated with Claude Code

Created demo login page with enhanced button styling featuring: - Gradient background with smooth color transitions - Hover animations with scale and shadow effects - Shimmer effect overlay for visual polish - Icon animation on interaction - Proper accessibility with focus states - Responsive design for mobile devices This implementation demonstrates modern UI/UX patterns for the Ralphy multi-agent system testing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implemented a complete authentication module with extensive test coverage: Features: - User creation and management (create, activate, deactivate) - Secure password hashing (SHA-256) - Session token generation and validation - Token expiration and cleanup - Concurrent authentication support - Race condition handling Test Coverage: - 56 comprehensive unit tests - All tests passing successfully - Covers authentication, authorization, session management - Tests edge cases, race conditions, and security features Files Added: - .ralphy/auth.sh: Core authentication module (385 lines) - .ralphy/auth.test.sh: Complete test suite (709 lines) - .ralphy/AUTH_README.md: Documentation with usage examples - .ralphy/progress.txt: Task progress tracking This implementation is ready for race mode testing across Cursor, Codex, and Qwen engines as specified in the task requirements. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Extract engine-specific authentication logic from ralphy.sh into a dedicated authentication module (.ralphy/auth.sh) for improved maintainability and modularity. Changes: - Create .ralphy/auth.sh with 11 authentication functions - Centralize permission configurations for all 6 AI engines - Refactor run_ai_command() to use new auth module - Update cleanup operations to use cleanup_engine_auth() - Add comprehensive test suite (.ralphy/test_auth.sh) - Maintain backward compatibility with fallback implementation Benefits: - Separation of concerns: auth logic isolated from orchestration - Easier to add new engines (update one module vs scattered code) - Testable authentication layer - Consistent permission handling across all engines - Clear documentation of each engine's auth requirements The refactoring supports the planned multi-engine consensus mode per MultiAgentPlan.md by providing a clean foundation for engine management and authentication. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit implements Phase 2 of the multi-agent architecture for Ralphy, enabling consensus mode where multiple AI engines can work on the same task in parallel and their solutions are compared and merged. Features: - New .ralphy/modes.sh module with consensus orchestration - New .ralphy/meta-agent.sh module for solution comparison - CLI flags: --mode, --consensus-engines, --meta-agent - Git worktree isolation for each engine - Solution similarity detection (>80% threshold) - Auto-acceptance when solutions are similar - Test suite for validation Usage: ./ralphy.sh "task" --consensus-engines "claude,cursor" ./ralphy.sh "task" --mode consensus --consensus-engines "claude,opencode,cursor" Implementation follows MultiAgentPlan.md specification. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit addresses multiple critical and high-severity command injection vulnerabilities identified in ralphy.sh that could allow arbitrary code execution through malicious task titles. Security Fixes: 1. YQ Command Injection (CRITICAL) - Fixed mark_task_complete_yaml() to use env(TASK) instead of string interpolation - Fixed get_parallel_group_yaml() to use env(TASK) pattern - Prevents arbitrary YAML manipulation and code execution via task titles 2. GitHub CLI Argument Injection (HIGH) - Added sanitize_task_title() function to remove control characters - Applied sanitization in create_pull_request() function - Applied sanitization in parallel execution PR creation - Prevents command injection through gh pr create arguments Implementation Details: - Added sanitize_task_title() function (removes newlines, null bytes, control chars) - Changed YQ calls from direct interpolation to environment variable passing - All PR creation now sanitizes titles before passing to GitHub CLI - Followed existing secure pattern from add_rule() function Testing: - Created comprehensive security test suite (test_security_fixes_simple.sh) - All 6 security tests passing - Validates proper use of env(TASK) pattern - Confirms sanitization in all PR creation paths - Verifies CWE-78 documentation Impact: Prevents attackers from executing arbitrary commands, injecting YAML content, or breaking out of shell commands through specially crafted task titles. References: CWE-78 (Improper Neutralization of Special Elements in OS Command) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit implements a complete consensus mode feature that enables multiple AI engines to work on the same task in parallel, with a meta-agent selecting the best solution when results differ. New Features: - Multi-engine parallel execution using isolated git worktrees - Meta-agent AI comparison and selection of best solution - Real-time status monitoring across all engines - Comprehensive solution storage for comparison - CLI flags: --mode consensus, --consensus-engines, --meta-agent Files Added: - .ralphy/modes.sh: Core consensus mode orchestration logic - .ralphy/meta-agent.sh: AI-powered solution comparison - test_consensus.sh: Comprehensive test suite (10 tests, all passing) - .ralphy/progress.txt: Implementation documentation Files Modified: - ralphy.sh: Added consensus mode support to brownfield tasks Implementation Details: - Each engine runs in isolated worktree (agent-N/) - Solutions stored as diffs, commits, and statistics - Meta-agent analyzes: correctness, quality, completeness, testing - Automatic merge of chosen solution to current branch - Supports all 6 engines: claude, opencode, cursor, codex, qwen, droid Usage: ./ralphy.sh --mode consensus "implement feature" ./ralphy.sh --mode consensus --consensus-engines "claude,cursor,opencode" "fix bug" All tests passing. Ready for production use. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Added race mode functionality that allows multiple AI engines to compete on the same task, with the first to complete successfully being declared the winner. This optimizes speed for straightforward tasks. Features: - CLI flags: --race, --race-engines, --no-validation, --race-timeout - Real-time early winner detection with 0.3s polling - Automatic cleanup of losing agents and their worktrees - Validation pipeline (commits, tests, lint) - Multi-engine support (claude, opencode, cursor, codex, qwen, droid) - Timeout handling with configurable multiplier Functions added: - validate_race_solution(): Validates solutions before declaring winner - run_race_agent(): Runs individual racing agent in isolated worktree - run_race_mode(): Orchestrates the race and manages cleanup Integration: - Routes --race flag in main() for single-task mode - Uses existing worktree infrastructure for isolation - Maintains backward compatibility with parallel mode Testing: - Comprehensive test suite (test_race_mode.sh) verifies all features - All tests pass successfully Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Implemented Phase 3 of the multi-agent architecture to enable intelligent task routing to specialized AI engines based on pattern matching rules. Changes: - Created .ralphy/modes.sh module for multi-engine execution modes - Added match_specialization_rule() for pattern-based task routing - Added get_engine_for_task() to select optimal engine per task - Added validate_engine_available() to check engine availability - Integrated specialization mode into ralphy.sh main execution flow - Added --mode and --specialization CLI flags - Enhanced config.yaml schema with engines.specialization_rules section - Added default rules for UI, refactoring, testing, bugs, API, and database tasks - Created comprehensive test suite (.ralphy/test_specialization.sh) - All 10 tests passing successfully Features: - Routes tasks to specialized engines based on regex pattern matching - Case-insensitive pattern matching - Falls back to default engine when no rule matches or engine unavailable - Fully backward compatible (defaults to single-engine mode) - Foundation for future consensus and race modes Usage: ./ralphy.sh --specialization --prd PRD.md ./ralphy.sh --mode specialization "add login button" Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit implements Phase 3 (Specialization Mode) of the multi-agent architecture, with specific focus on the fallback behavior when no specialization rules match a task description. Features: - New .ralphy/modes.sh module with specialization functions - Pattern-based task routing using config rules - Three-level fallback: config → env var → hardcoded default (claude) - Comprehensive test suite with 10 test cases - Metadata tracking for performance metrics - yq-based YAML config parsing Key Functions: - run_specialization_mode() - Main orchestration - match_specialization_rule() - Regex pattern matching against config - get_default_engine() - Multi-level fallback logic - run_single_engine_task() - Single engine execution Fallback Behavior (No Matching Rules): 1. Task description doesn't match any pattern in config 2. match_specialization_rule() returns empty string 3. get_default_engine() tries: - .ralphy/config.yaml: engines.meta_agent.engine - Environment: $AI_ENGINE - Hardcoded: "claude" 4. Metadata logs: matched_pattern = "(no match - default)" Test Coverage: - Pattern matching (UI, test, refactor patterns) - No-match scenario returns empty - Default engine fallback from config - Environment variable fallback - Hardcoded default fallback - Missing config handling - Case-insensitive matching - First-match-wins precedence Config Schema: engines: meta_agent: engine: "claude" specialization_rules: - pattern: "UI|frontend|styling" engines: ["cursor"] Manual Testing Checklist Progress: ✓ Specialization with no matching rules (this commit) - Specialization with matching rules (next) Implementation follows MultiAgentPlan.md specification (lines 268-300). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

- Add .ralphy/meta-agent.sh with parse_meta_decision() function - Parses meta-agent output with DECISION, CHOSEN, REASONING fields - Supports both "select" and "merge" decision types - Extracts merged solution code blocks - Returns JSON-formatted output with proper escaping - Handles multiline text and edge cases - Add helper functions: - prepare_meta_prompt(): Build comparison prompt for solutions - run_meta_agent(): Execute meta-agent with multi-engine support - merge_solutions(): Apply merged solutions (placeholder) - Add comprehensive test suite (.ralphy/test-meta-agent.sh) - 32 tests covering all parsing scenarios - Tests for error conditions and edge cases - All tests passing ✓ Part of multi-agent engine implementation (Phase 5). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Adds race mode implementation for Ralphy's multi-agent system with robust handling when all engines fail to complete a task. Key Features: - Parallel engine execution with isolated git worktrees - Comprehensive failure reporting and diagnostics - Four fallback strategies presented to users - Metrics tracking for race history and outcomes - Automatic cleanup of worktrees and branches - Validation system for solution quality - Bash 3 compatible for macOS support Files Added: - .ralphy/engines.sh: Engine abstraction layer - .ralphy/modes.sh: Race mode implementation with failure handling - .ralphy/test-race-mode.sh: Comprehensive test suite (all tests passing) - .ralphy/RACE_MODE.md: Complete documentation - .ralphy/progress.txt: Implementation progress tracking When all engines fail, users receive: 1. Detailed failure report with exit codes and outputs 2. Metrics recorded in .ralphy/metrics.json 3. Fallback strategies: - Retry with different engines - Switch to consensus mode - Manual intervention guidance - Task breakdown suggestions Part of the multi-agent system implementation outlined in MultiAgentPlan.md (Phase 4: Race Mode Implementation). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

This commit adds comprehensive performance tracking and intelligent engine selection to Ralphy based on historical success rates. New Features: - Automatic metrics recording for all task executions - Pattern-based task categorization (10 patterns) - Adaptive engine selection based on historical performance - CLI flags: --show-metrics, --reset-metrics, --export-metrics, --no-adapt - Comprehensive test suite (14 tests, all passing) Implementation Details: - New .ralphy/metrics.sh module with 11 core functions - JSON-based metrics storage in .ralphy/metrics.json - Tracks success rate, duration, cost, and tokens per engine - Pattern-specific performance metrics - Minimum 5 samples required for adaptive recommendations - Zero breaking changes, backward compatible Files Added: - .ralphy/metrics.sh (536 lines) - Core metrics module - .ralphy/test_metrics.sh (290 lines) - Test suite - .ralphy/progress.txt - Implementation documentation Files Modified: - ralphy.sh - Integrated metrics recording and adaptive selection * Added metrics module sourcing * Added task timing tracking * Added success/failure metrics recording * Added adaptive engine selection logic * Added 4 new CLI flags * Updated help text and init function Foundation for future multi-agent features including consensus mode, race mode, and specialization routing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add comprehensive cost control features to prevent runaway costs during multi-agent execution as specified in MultiAgentPlan.md. Features: - Per-task cost limits with configurable thresholds - Per-session cost limits to control total spending - Warning alerts when approaching cost limits (default 75%) - Automatic task/session termination when limits exceeded - Support for actual costs (OpenCode) and estimated costs (token-based) Configuration (in .ralphy/config.yaml): - cost_controls.max_per_task: Maximum USD per task (0 = unlimited) - cost_controls.max_per_session: Maximum USD per session (0 = unlimited) - cost_controls.warn_threshold: Warning percentage (default 0.75) Implementation details: - Added cost tracking variables and functions to ralphy.sh - Integrated cost checking after each AI API call - Enhanced config initialization with cost_controls section - Updated --config display to show cost limits Testing: - Syntax validation passed - Cost calculation verified (1M input + 500k output = $10.50) - Limit detection and warning thresholds tested successfully Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add comprehensive validation module for multi-engine mode support: Features: - Four validation gates: diff check, linting, tests, and build - Configurable retry logic with timeout support - Cross-platform compatibility (macOS/Linux) - JSON reporting for metrics integration - Loads commands from .ralphy/config.yaml Implementation: - .ralphy/validation.sh: Core validation module with 400+ lines - validate_solution(): Main function with all 4 gates - validate_solution_with_retry(): Retry wrapper with smart retry logic - Individual gate functions with timeout handling - Validation reporting and config loading - tests/test_validation.sh: Comprehensive test suite - 22 tests covering all validation scenarios - Mock git repository testing for diff validation - JSON report validation - Cross-platform test handling Testing: - All 22 tests passing - Graceful fallback when timeout command unavailable - Tested on macOS environment This module provides the foundation for handling validation failures in consensus, race, and specialization modes as specified in MultiAgentPlan.md lines 474-506. Closes validation gate failures task (line 619 in MultiAgentPlan.md). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

…cation-system-consensus-claude-op

…tion-system-consensus-claude-op Refactor authentication system into modular architecture

…on-styling-auto

…-styling-auto Update login button styling with modern design

…auth-race-cursor-codex-qwen Add comprehensive unit tests for auth module

…ty-bug-consensus-claude-cursor- Fix critical security vulnerabilities (CWE-78) in command injection

…-2-engines-similar-results Ralphy/agent 5 consensus mode with 2 engines similar results

zkwentz and others added 30 commits January 18, 2026 20:17

Added plan and ralphy output of implementation

d1f85de

Merge branch 'feat/multi-agent' into ralphy/agent-1-refactor-authenti…

6b2808f

…cation-system-consensus-claude-op

Merge pull request #1 from zkwentz/ralphy/agent-1-refactor-authentica…

abaf852

…tion-system-consensus-claude-op Refactor authentication system into modular architecture

Merge branch 'feat/multi-agent' into ralphy/agent-2-update-login-butt…

57d1dd1

…on-styling-auto

Merge pull request #2 from zkwentz/ralphy/agent-2-update-login-button…

cb7e556

…-styling-auto Update login button styling with modern design

ignore progress

ad71f3c

ignore progress

6a975a8

remove progress from git

0064f5a

fixed conflicts

b62e1cf

Merge pull request #3 from zkwentz/ralphy/agent-3-add-unit-tests-for-…

7731bba

…auth-race-cursor-codex-qwen Add comprehensive unit tests for auth module

no more progress

7239606

Merge pull request #4 from zkwentz/ralphy/agent-4-fix-critical-securi…

ab76e17

…ty-bug-consensus-claude-cursor- Fix critical security vulnerabilities (CWE-78) in command injection

no progress

0a4d9d0

Merge pull request #5 from zkwentz/ralphy/agent-5-consensus-mode-with…

fbab959

…-2-engines-similar-results Ralphy/agent 5 consensus mode with 2 engines similar results

resolved merged conflicts

2ccb871

Merge feat/multi-agent into agent-7

eeb3372

zkwentz added 7 commits January 18, 2026 21:41

Merge feat/multi-agent into agent-8

a8eb7eb

Merge feat/multi-agent into agent-9

6658f5b

Merge feat/multi-agent into agent-10

f7d243f

Merge feat/multi-agent into agent-11

5357e6f

Merge feat/multi-agent into agent-12

5fd6355

Merge feat/multi-agent into agent-13

26492d1

Merge feat/multi-agent into agent-14

8dc2a59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Multi-Agent Engine System for Ralphy#7

feat: Multi-Agent Engine System for Ralphy#7
zkwentz wants to merge 37 commits intomainfrom
feat/multi-agent

zkwentz commented Jan 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

zkwentz commented Jan 19, 2026

Summary

Key Features

Execution Modes

Architecture Changes

CLI Interface

Implementation Highlights

Testing Coverage

Documentation

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant