Open
Conversation
Created demo login page with enhanced button styling featuring: - Gradient background with smooth color transitions - Hover animations with scale and shadow effects - Shimmer effect overlay for visual polish - Icon animation on interaction - Proper accessibility with focus states - Responsive design for mobile devices This implementation demonstrates modern UI/UX patterns for the Ralphy multi-agent system testing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented a complete authentication module with extensive test coverage: Features: - User creation and management (create, activate, deactivate) - Secure password hashing (SHA-256) - Session token generation and validation - Token expiration and cleanup - Concurrent authentication support - Race condition handling Test Coverage: - 56 comprehensive unit tests - All tests passing successfully - Covers authentication, authorization, session management - Tests edge cases, race conditions, and security features Files Added: - .ralphy/auth.sh: Core authentication module (385 lines) - .ralphy/auth.test.sh: Complete test suite (709 lines) - .ralphy/AUTH_README.md: Documentation with usage examples - .ralphy/progress.txt: Task progress tracking This implementation is ready for race mode testing across Cursor, Codex, and Qwen engines as specified in the task requirements. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Extract engine-specific authentication logic from ralphy.sh into a dedicated authentication module (.ralphy/auth.sh) for improved maintainability and modularity. Changes: - Create .ralphy/auth.sh with 11 authentication functions - Centralize permission configurations for all 6 AI engines - Refactor run_ai_command() to use new auth module - Update cleanup operations to use cleanup_engine_auth() - Add comprehensive test suite (.ralphy/test_auth.sh) - Maintain backward compatibility with fallback implementation Benefits: - Separation of concerns: auth logic isolated from orchestration - Easier to add new engines (update one module vs scattered code) - Testable authentication layer - Consistent permission handling across all engines - Clear documentation of each engine's auth requirements The refactoring supports the planned multi-engine consensus mode per MultiAgentPlan.md by providing a clean foundation for engine management and authentication. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements Phase 2 of the multi-agent architecture for Ralphy, enabling consensus mode where multiple AI engines can work on the same task in parallel and their solutions are compared and merged. Features: - New .ralphy/modes.sh module with consensus orchestration - New .ralphy/meta-agent.sh module for solution comparison - CLI flags: --mode, --consensus-engines, --meta-agent - Git worktree isolation for each engine - Solution similarity detection (>80% threshold) - Auto-acceptance when solutions are similar - Test suite for validation Usage: ./ralphy.sh "task" --consensus-engines "claude,cursor" ./ralphy.sh "task" --mode consensus --consensus-engines "claude,opencode,cursor" Implementation follows MultiAgentPlan.md specification. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit addresses multiple critical and high-severity command injection vulnerabilities identified in ralphy.sh that could allow arbitrary code execution through malicious task titles. Security Fixes: 1. YQ Command Injection (CRITICAL) - Fixed mark_task_complete_yaml() to use env(TASK) instead of string interpolation - Fixed get_parallel_group_yaml() to use env(TASK) pattern - Prevents arbitrary YAML manipulation and code execution via task titles 2. GitHub CLI Argument Injection (HIGH) - Added sanitize_task_title() function to remove control characters - Applied sanitization in create_pull_request() function - Applied sanitization in parallel execution PR creation - Prevents command injection through gh pr create arguments Implementation Details: - Added sanitize_task_title() function (removes newlines, null bytes, control chars) - Changed YQ calls from direct interpolation to environment variable passing - All PR creation now sanitizes titles before passing to GitHub CLI - Followed existing secure pattern from add_rule() function Testing: - Created comprehensive security test suite (test_security_fixes_simple.sh) - All 6 security tests passing - Validates proper use of env(TASK) pattern - Confirms sanitization in all PR creation paths - Verifies CWE-78 documentation Impact: Prevents attackers from executing arbitrary commands, injecting YAML content, or breaking out of shell commands through specially crafted task titles. References: CWE-78 (Improper Neutralization of Special Elements in OS Command) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements a complete consensus mode feature that enables multiple AI engines to work on the same task in parallel, with a meta-agent selecting the best solution when results differ. New Features: - Multi-engine parallel execution using isolated git worktrees - Meta-agent AI comparison and selection of best solution - Real-time status monitoring across all engines - Comprehensive solution storage for comparison - CLI flags: --mode consensus, --consensus-engines, --meta-agent Files Added: - .ralphy/modes.sh: Core consensus mode orchestration logic - .ralphy/meta-agent.sh: AI-powered solution comparison - test_consensus.sh: Comprehensive test suite (10 tests, all passing) - .ralphy/progress.txt: Implementation documentation Files Modified: - ralphy.sh: Added consensus mode support to brownfield tasks Implementation Details: - Each engine runs in isolated worktree (agent-N/) - Solutions stored as diffs, commits, and statistics - Meta-agent analyzes: correctness, quality, completeness, testing - Automatic merge of chosen solution to current branch - Supports all 6 engines: claude, opencode, cursor, codex, qwen, droid Usage: ./ralphy.sh --mode consensus "implement feature" ./ralphy.sh --mode consensus --consensus-engines "claude,cursor,opencode" "fix bug" All tests passing. Ready for production use. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Added race mode functionality that allows multiple AI engines to compete on the same task, with the first to complete successfully being declared the winner. This optimizes speed for straightforward tasks. Features: - CLI flags: --race, --race-engines, --no-validation, --race-timeout - Real-time early winner detection with 0.3s polling - Automatic cleanup of losing agents and their worktrees - Validation pipeline (commits, tests, lint) - Multi-engine support (claude, opencode, cursor, codex, qwen, droid) - Timeout handling with configurable multiplier Functions added: - validate_race_solution(): Validates solutions before declaring winner - run_race_agent(): Runs individual racing agent in isolated worktree - run_race_mode(): Orchestrates the race and manages cleanup Integration: - Routes --race flag in main() for single-task mode - Uses existing worktree infrastructure for isolation - Maintains backward compatibility with parallel mode Testing: - Comprehensive test suite (test_race_mode.sh) verifies all features - All tests pass successfully Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Implemented Phase 3 of the multi-agent architecture to enable intelligent task routing to specialized AI engines based on pattern matching rules. Changes: - Created .ralphy/modes.sh module for multi-engine execution modes - Added match_specialization_rule() for pattern-based task routing - Added get_engine_for_task() to select optimal engine per task - Added validate_engine_available() to check engine availability - Integrated specialization mode into ralphy.sh main execution flow - Added --mode and --specialization CLI flags - Enhanced config.yaml schema with engines.specialization_rules section - Added default rules for UI, refactoring, testing, bugs, API, and database tasks - Created comprehensive test suite (.ralphy/test_specialization.sh) - All 10 tests passing successfully Features: - Routes tasks to specialized engines based on regex pattern matching - Case-insensitive pattern matching - Falls back to default engine when no rule matches or engine unavailable - Fully backward compatible (defaults to single-engine mode) - Foundation for future consensus and race modes Usage: ./ralphy.sh --specialization --prd PRD.md ./ralphy.sh --mode specialization "add login button" Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit implements Phase 3 (Specialization Mode) of the multi-agent
architecture, with specific focus on the fallback behavior when no
specialization rules match a task description.
Features:
- New .ralphy/modes.sh module with specialization functions
- Pattern-based task routing using config rules
- Three-level fallback: config → env var → hardcoded default (claude)
- Comprehensive test suite with 10 test cases
- Metadata tracking for performance metrics
- yq-based YAML config parsing
Key Functions:
- run_specialization_mode() - Main orchestration
- match_specialization_rule() - Regex pattern matching against config
- get_default_engine() - Multi-level fallback logic
- run_single_engine_task() - Single engine execution
Fallback Behavior (No Matching Rules):
1. Task description doesn't match any pattern in config
2. match_specialization_rule() returns empty string
3. get_default_engine() tries:
- .ralphy/config.yaml: engines.meta_agent.engine
- Environment: $AI_ENGINE
- Hardcoded: "claude"
4. Metadata logs: matched_pattern = "(no match - default)"
Test Coverage:
- Pattern matching (UI, test, refactor patterns)
- No-match scenario returns empty
- Default engine fallback from config
- Environment variable fallback
- Hardcoded default fallback
- Missing config handling
- Case-insensitive matching
- First-match-wins precedence
Config Schema:
engines:
meta_agent:
engine: "claude"
specialization_rules:
- pattern: "UI|frontend|styling"
engines: ["cursor"]
Manual Testing Checklist Progress:
✓ Specialization with no matching rules (this commit)
- Specialization with matching rules (next)
Implementation follows MultiAgentPlan.md specification (lines 268-300).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
- Add .ralphy/meta-agent.sh with parse_meta_decision() function - Parses meta-agent output with DECISION, CHOSEN, REASONING fields - Supports both "select" and "merge" decision types - Extracts merged solution code blocks - Returns JSON-formatted output with proper escaping - Handles multiline text and edge cases - Add helper functions: - prepare_meta_prompt(): Build comparison prompt for solutions - run_meta_agent(): Execute meta-agent with multi-engine support - merge_solutions(): Apply merged solutions (placeholder) - Add comprehensive test suite (.ralphy/test-meta-agent.sh) - 32 tests covering all parsing scenarios - Tests for error conditions and edge cases - All tests passing ✓ Part of multi-agent engine implementation (Phase 5). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Adds race mode implementation for Ralphy's multi-agent system with robust handling when all engines fail to complete a task. Key Features: - Parallel engine execution with isolated git worktrees - Comprehensive failure reporting and diagnostics - Four fallback strategies presented to users - Metrics tracking for race history and outcomes - Automatic cleanup of worktrees and branches - Validation system for solution quality - Bash 3 compatible for macOS support Files Added: - .ralphy/engines.sh: Engine abstraction layer - .ralphy/modes.sh: Race mode implementation with failure handling - .ralphy/test-race-mode.sh: Comprehensive test suite (all tests passing) - .ralphy/RACE_MODE.md: Complete documentation - .ralphy/progress.txt: Implementation progress tracking When all engines fail, users receive: 1. Detailed failure report with exit codes and outputs 2. Metrics recorded in .ralphy/metrics.json 3. Fallback strategies: - Retry with different engines - Switch to consensus mode - Manual intervention guidance - Task breakdown suggestions Part of the multi-agent system implementation outlined in MultiAgentPlan.md (Phase 4: Race Mode Implementation). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
This commit adds comprehensive performance tracking and intelligent engine selection to Ralphy based on historical success rates. New Features: - Automatic metrics recording for all task executions - Pattern-based task categorization (10 patterns) - Adaptive engine selection based on historical performance - CLI flags: --show-metrics, --reset-metrics, --export-metrics, --no-adapt - Comprehensive test suite (14 tests, all passing) Implementation Details: - New .ralphy/metrics.sh module with 11 core functions - JSON-based metrics storage in .ralphy/metrics.json - Tracks success rate, duration, cost, and tokens per engine - Pattern-specific performance metrics - Minimum 5 samples required for adaptive recommendations - Zero breaking changes, backward compatible Files Added: - .ralphy/metrics.sh (536 lines) - Core metrics module - .ralphy/test_metrics.sh (290 lines) - Test suite - .ralphy/progress.txt - Implementation documentation Files Modified: - ralphy.sh - Integrated metrics recording and adaptive selection * Added metrics module sourcing * Added task timing tracking * Added success/failure metrics recording * Added adaptive engine selection logic * Added 4 new CLI flags * Updated help text and init function Foundation for future multi-agent features including consensus mode, race mode, and specialization routing. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive cost control features to prevent runaway costs during multi-agent execution as specified in MultiAgentPlan.md. Features: - Per-task cost limits with configurable thresholds - Per-session cost limits to control total spending - Warning alerts when approaching cost limits (default 75%) - Automatic task/session termination when limits exceeded - Support for actual costs (OpenCode) and estimated costs (token-based) Configuration (in .ralphy/config.yaml): - cost_controls.max_per_task: Maximum USD per task (0 = unlimited) - cost_controls.max_per_session: Maximum USD per session (0 = unlimited) - cost_controls.warn_threshold: Warning percentage (default 0.75) Implementation details: - Added cost tracking variables and functions to ralphy.sh - Integrated cost checking after each AI API call - Enhanced config initialization with cost_controls section - Updated --config display to show cost limits Testing: - Syntax validation passed - Cost calculation verified (1M input + 500k output = $10.50) - Limit detection and warning thresholds tested successfully Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Add comprehensive validation module for multi-engine mode support: Features: - Four validation gates: diff check, linting, tests, and build - Configurable retry logic with timeout support - Cross-platform compatibility (macOS/Linux) - JSON reporting for metrics integration - Loads commands from .ralphy/config.yaml Implementation: - .ralphy/validation.sh: Core validation module with 400+ lines - validate_solution(): Main function with all 4 gates - validate_solution_with_retry(): Retry wrapper with smart retry logic - Individual gate functions with timeout handling - Validation reporting and config loading - tests/test_validation.sh: Comprehensive test suite - 22 tests covering all validation scenarios - Mock git repository testing for diff validation - JSON report validation - Cross-platform test handling Testing: - All 22 tests passing - Graceful fallback when timeout command unavailable - Tested on macOS environment This module provides the foundation for handling validation failures in consensus, race, and specialization modes as specified in MultiAgentPlan.md lines 474-506. Closes validation gate failures task (line 619 in MultiAgentPlan.md). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…cation-system-consensus-claude-op
…tion-system-consensus-claude-op Refactor authentication system into modular architecture
…-styling-auto Update login button styling with modern design
…auth-race-cursor-codex-qwen Add comprehensive unit tests for auth module
…ty-bug-consensus-claude-cursor- Fix critical security vulnerabilities (CWE-78) in command injection
…-2-engines-similar-results Ralphy/agent 5 consensus mode with 2 engines similar results
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces a comprehensive multi-agent engine system that transforms Ralphy from a single-engine orchestrator into an intelligent multi-engine system. The implementation enables multiple AI coding engines to work simultaneously with intelligent task routing, consensus-based decision making, and performance-based learning.
Key Features
Execution Modes
Consensus Mode - Critical tasks requiring high confidence
Specialization Mode - Efficient task distribution
Race Mode - Speed optimization
Architecture Changes
.ralphy/engines.sh,.ralphy/modes.sh,.ralphy/meta-agent.sh,.ralphy/metrics.sh.ralphy/config.yamlwith engine settings, routing rules, and cost controlsCLI Interface
Implementation Highlights
Testing Coverage
Documentation
MultiAgentPlan.mdwith architecture overviewTest Plan
test_consensus.sh,test_race_mode.sh,test_specialization.sh, etc.)🤖 Generated with Claude Code