diff --git a/DOCUMENTATION.md b/DOCUMENTATION.md index bd4c6c1..d21c3fc 100644 --- a/DOCUMENTATION.md +++ b/DOCUMENTATION.md @@ -10,7 +10,10 @@ The project is a monorepo containing two primary components: * **Batch Manager**: Optimizes high-volume embedding requests. * **Detailed Logger**: Provides per-request file logging for debugging. * **OpenAI-Compatible Endpoints**: `/v1/chat/completions`, `/v1/embeddings`, etc. -2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues. +2. **The Resilience Library (`rotator_library`)**: This is the core engine that provides high availability. It is consumed by the proxy app to manage a pool of API keys, handle errors gracefully, and ensure requests are completed successfully even when individual keys or provider endpoints face issues. It also includes: + * **HiveMind Ensemble Manager**: Orchestrates parallel model execution (Swarm and Fusion modes) with intelligent arbitration. + * **Key Management**: Advanced concurrency control and intelligent key selection. + * **Error Handling**: Escalating cooldowns and automatic recovery. This architecture cleanly separates the API interface from the resilience logic, making the library a portable and powerful tool for any application needing robust API key management. @@ -315,6 +318,148 @@ The `CooldownManager` handles IP or account-level rate limiting that affects all --- +## 2.10. HiveMind Ensemble (`ensemble/`) + +The **HiveMind Ensemble** system enables parallel model execution with intelligent arbitration, supporting two distinct modes: + +### 2.10.1. Swarm Mode + +**Purpose**: Execute the same model multiple times in parallel to generate diverse responses, then synthesize them into a single high-quality output. + +**Key Features**: +- **Temperature Jitter**: Randomly varies temperature across drones (±delta) to increase response diversity +- **Adversarial Mode**: Dedicates N drones as critical reviewers with adversarial prompts to stress-test solutions +- **Blind Switch**: Optionally hides model names from the arbiter to reduce synthesis bias +- **Self-Arbitration**: Can use the same model as arbiter to save costs + +**Configuration** (`ensemble_configs/swarms/*.json`): +- Folder-based preset system with model-specific overrides +- Default configuration applies to all swarms unless overridden +- Preset-based discovery: `{base_model}-{preset_id}[swarm]` format + +**Example Usage**: +```python +response = await client.acompletion( + model="gpt-4o-mini-default[swarm]", + messages=[{"role": "user", "content": "Explain AI"}] +) +# → 3 parallel calls to gpt-4o-mini with temperature jitter +# → Arbiter synthesizes responses into final answer +``` + +### 2.10.2. Fusion Mode + +**Purpose**: Combine responses from multiple specialized models with role-based routing and weighted synthesis. + +**Key Features**: +- **Role Assignment**: Each specialist model receives a custom system prompt defining its expertise +- **Weight Descriptions**: Guide arbiter on which specialist to trust for specific domains +- **Role Templates**: Reusable role definitions stored in `ensemble_configs/roles/` +- **Blind Mode**: Hides model names while preserving role labels +- **Multi-Provider Support**: Can mix models from different providers in a single fusion + +**Configuration** (`ensemble_configs/fusions/*.json`): +- Each fusion defined in its own JSON file or as an array in a single file +- Specialists can reference role templates via `role_template` field +- Supports `weight_description` for arbiter context + +**Example Configuration**: +```json +{ + "id": "dev-team", + "specialists": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt": "Focus on scalability and system design.", + "weight_description": "Expert in architecture. Trust for design decisions." + }, + { + "model": "claude-3-opus", + "role": "Security", + "role_template": "security-expert" + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true + } +} +``` + +### 2.10.3. Arbitration Strategies + +Strategies define how the arbiter synthesizes responses. Stored as plain text files in `ensemble_configs/strategies/*.txt` with `{responses}` placeholder. + +**Built-in Strategies**: +- **synthesis**: Combine best elements from all responses +- **best_of_n**: Select and refine the strongest response +- **code_review**: Code-specific evaluation criteria + +**Custom Strategies**: Users can add their own `.txt` files with custom synthesis prompts. + +### 2.10.4. Recursive Mode + +**Purpose**: Enable autonomous arbiter decision-making for low-consensus scenarios. + +**Mechanism**: +- Arbiter assesses consensus (1-10 scale) +- If consensus < threshold: arbiter performs internal critique reasoning +- If consensus >= threshold: proceeds directly to synthesis +- All internal reasoning wrapped in `[INTERNAL]` tags (filtered from user output) + +**Markers**: +- `[CONSENSUS: X/10]`: Logged at WARN level if below threshold +- `[CONFLICTS: ...]`: Identified disagreement points +- `[CRITIQUE: ...]`: Internal reasoning about conflicts +- `[FINAL SYNTHESIS:]`: Start of user-facing output + +### 2.10.5. Usage Tracking + +HiveMind responses include standard OpenAI-compatible usage fields **plus** supplementary `hivemind_details`: + +**Standard Fields** (aggregated totals from all models): +- `prompt_tokens`: Total prompt tokens (drones/specialists + arbiter) +- `completion_tokens`: Total completion tokens +- `total_tokens`: Grand total + +**Supplementary Breakdown** (`hivemind_details`): +```json +{ + "mode": "swarm" | "fusion", + "drone_count" | "specialist_count": 3, + "drone_tokens" | "specialist_tokens": 450, + "arbiter_tokens": 200, + "total_cost_usd": 0.00123, + "latency_ms": 1523.45 +} +``` + +**Important**: Consumers should use standard `usage` fields for billing/analytics. The `hivemind_details` provides debugging context. + +### 2.10.6. Architecture + +**Components**: +- **EnsembleManager** (`manager.py`): Orchestration engine + - Detects ensemble requests (`is_ensemble()`) + - Prepares drones/specialists (`_prepare_drones()`, `_prepare_fusion_models()`) + - Executes parallel calls (`_execute_parallel()`) + - Builds arbiter prompts (`_build_arbiter_prompt()`) + - Handles streaming (`_call_arbiter_streaming()`) + +- **ConfigLoader** (`config_loader.py`): Configuration management + - Loads swarm presets, fusions, strategies, and role templates + - Supports both single-item and array-based file formats + - Validates and merges configurations + +**Integration**: +- Initialized in `RotatingClient.__init__()` +- Intercepts requests in `acompletion()` before normal routing +- Inherits all retry/resilience logic from RotatingClient + +--- + ## 3. Provider Specific Implementations The library handles provider idiosyncrasies through specialized "Provider" classes in `src/rotator_library/providers/`. diff --git a/README.md b/README.md index 72736a4..1a2fc49 100644 --- a/README.md +++ b/README.md @@ -12,6 +12,11 @@ This project provides a powerful solution for developers building complex applic ## Features - **Universal API Endpoint**: Simplifies development by providing a single, OpenAI-compatible interface for diverse LLM providers. +- **HiveMind Ensemble**: Parallel model execution with intelligent arbitration in two modes: + - **Swarm Mode**: Run multiple copies of the same model with temperature jitter, adversarial critique, and consensus-based synthesis + - **Fusion Mode**: Combine responses from different specialized models with role-based routing and weighted synthesis + - **Recursive Refinement**: Autonomous arbiter decision-making for low-consensus scenarios with internal critique reasoning + - **Streaming Support**: Full streaming support with real-time arbiter synthesis - **High Availability**: The underlying library ensures your application remains operational by gracefully handling transient provider errors and API key-specific issues. - **Resilient Performance**: A global timeout on all requests prevents your application from hanging on unresponsive provider APIs. - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_`), it can also support multiple concurrent requests to the *same* model using the same key. @@ -340,11 +345,56 @@ curl -X POST http://127.0.0.1:8000/v1/chat/completions \ }' ``` +### HiveMind Ensemble - Parallel Model Execution + +HiveMind enables you to run multiple models in parallel with intelligent arbitration. Use the `[swarm]` suffix or pre-configured fusion IDs. + +**Swarm Mode** (same model, multiple executions): +```bash +# Explicit preset format +curl -X POST http://127.0.0.1:8000/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer a-very-secret-and-unique-key" \ +-d '{ + "model": "gpt-4o-mini-aggressive[swarm]", + "messages": [{"role": "user", "content": "Explain quantum computing"}] +}' + +# Short format (requires omit_id: true in preset) +curl -X POST http://127.0.0.1:8000/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer a-very-secret-and-unique-key" \ +-d '{ + "model": "gpt-4o-mini[swarm]", + "messages": [{"role": "user", "content": "Explain quantum computing"}] +}' +``` + +**Fusion Mode** (multiple specialist models): +```bash +curl -X POST http://127.0.0.1:8000/v1/chat/completions \ +-H "Content-Type: application/json" \ +-H "Authorization: Bearer a-very-secret-and-unique-key" \ +-d '{ + "model": "dev-team[fusion]", + "messages": [{"role": "user", "content": "Review this API design"}] +}' +``` + +HiveMind automatically: +- Executes models in parallel +- Applies temperature jitter for diversity (Swarm mode) +- Routes to specialized models with role prompts (Fusion mode) +- Synthesizes responses using an arbiter model +- Aggregates usage and cost across all calls + +For detailed configuration and advanced features, see the [HiveMind User Guide](docs/HiveMind_User_Guide.md). + ### Available API Endpoints - `POST /v1/chat/completions`: The main endpoint for making chat requests. - `POST /v1/embeddings`: The endpoint for creating embeddings. -- `GET /v1/models`: Returns a list of all available models from your configured providers. +- `GET /v1/models`: Returns a list of all available models from your configured providers (includes HiveMind fusions and swarms). - `GET /v1/providers`: Returns a list of all configured providers. - `POST /v1/token-count`: Calculates the token count for a given message payload. diff --git a/docs/HiveMind Plan.md b/docs/HiveMind Plan.md new file mode 100644 index 0000000..880c8cf --- /dev/null +++ b/docs/HiveMind Plan.md @@ -0,0 +1,1290 @@ +# HiveMind Ensemble (Swarm/Fusion) - Implementation Plan (REVISED) + +## Goal Description + +Implement a sophisticated orchestration engine called "HiveMind Ensemble" that enables two distinct modes of parallel model execution: + +1. **Swarm Mode**: Multiple parallel calls to the **same model** (called "Drones") with optional configuration for temperature variation, adversarial critique, and recursive self-correction. +2. **Fusion Mode**: Multiple parallel calls to **different models** (called "Models" or "Specialists" when roles are assigned) with optional role-based routing and context-aware synthesis. + +Both modes use an "Arbiter" (judge model) to synthesize responses with configurable strategies and optional recursive refinement. + +--- + +## Terminology + +- **HiveMind Ensemble**: The overall feature/system (may be shortened to "HiveMind" after first mention) +- **Swarm**: Parallel execution of the same model + - **Drone**: Individual instance in a Swarm +- **Fusion**: Parallel execution of different models + - **Model**: Individual model in a Fusion (generic term) + - **Specialist**: A Model with an assigned role and weight +- **Arbiter**: The judge/synthesizer model that produces the final response + +--- + +## Architecture Overview + +### Request Flow + +``` +User Request (model: "gemini-1.5-flash[swarm]") + ↓ +EnsembleManager.is_ensemble()? → Yes + ↓ +EnsembleManager.handle_request() + ↓ +┌─────────────────────────────────────────┐ +│ 1. Configuration Resolution │ +│ - Load config for this ensemble │ +│ - Determine: Swarm or Fusion? │ +│ - Get Arbiter config │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 2. Drone/Model Preparation │ +│ For Swarm: │ +│ - Create N Drones (same model) │ +│ - Apply temp jitter (optional) │ +│ - Mark M as adversarial (optional) │ +│ For Fusion: │ +│ - Load constituent models │ +│ - Apply role prompts (optional) │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 3. Parallel Execution │ +│ - asyncio.gather() all calls │ +│ - Each call uses RotatingClient │ +│ - Apply retry logic per drone/model │ +│ - Collect responses + metadata │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 4. Response Processing │ +│ - Apply blind switch (optional) │ +│ - Format for Arbiter consumption │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 5. Arbitration │ +│ - Load strategy prompt │ +│ - Inject role/weight context │ +│ - For Recursive Mode: │ +│ • Give arbiter autonomy │ +│ • Arbiter decides Round 2 │ +│ - For Non-Recursive: │ +│ • Direct synthesis only │ +│ - Call Arbiter (with streaming) │ +└─────────────────────────────────────────┘ + ↓ +┌─────────────────────────────────────────┐ +│ 6. Final Output │ +│ - Stream Arbiter's response to user │ +│ - Aggregate usage from all calls │ +│ - Log execution summary │ +└─────────────────────────────────────────┘ +``` + +--- + +## Core Components + +### 1. EnsembleManager Class + +**File**: `src/rotator_library/ensemble_manager.py` + +**Responsibilities**: +- Load and validate `ensemble_config.json` +- Detect Swarm requests (`[swarm]` notation) vs Fusion requests (config-based) +- Orchestrate parallel execution with retry logic +- Manage arbitration with streaming support +- Handle recursive refinement (single arbiter call with autonomous decision) + +**Key Methods**: + +#### `__init__(self, config_path, rotating_client)` +- Load configuration file +- Store reference to RotatingClient +- Build lookup tables for fast ensemble detection +- Validate configuration schema +- Initialize usage aggregator + +#### `is_ensemble(self, model_id: str) -> bool` +- Check if model_id matches a Fusion config (exact match from config) +- Check if model_id contains `[swarm]` notation +- Handle conflict detection (if provider has real model with same name) +- Return: `True` if ensemble, `False` otherwise + +#### `resolve_conflicts(self, base_model: str) -> str` +- Default format: `base_model[swarm]` +- Check if this conflicts with provider's real models +- If conflict, try: `base_model[hive]`, `base_model[max]`, etc. +- Log warning about conflict resolution +- Return: Final ensemble ID to use + +#### `handle_request(self, request_params: dict) -> AsyncGenerator` +Main orchestration method. Returns a streaming generator for the Arbiter's response. + +**Steps**: +1. **Identify Type**: Swarm or Fusion +2. **Load Config**: Get specific config or use defaults +3. **Prepare Drones/Models**: + - Build list of execution targets + - Apply temperature jitter (Swarm) + - Apply role prompts (Fusion) + - Mark adversarial instances +4. **Execute Parallel Calls**: + - Use `asyncio.gather()` with exception handling + - Each call goes through RotatingClient (inherits retry logic) + - Require at least 1 successful response + - Log failures as errors +5. **Aggregate Usage**: + - Sum all `prompt_tokens`, `completion_tokens`, `total_tokens` + - Calculate combined cost (using existing cost calculation) +6. **Process Responses**: + - Extract content from each response + - Apply blind switch if enabled (keep roles, strip model names) + - Format for Arbiter +7. **Build Arbiter Prompt**: + - Load strategy prompt template + - Inject adversarial context (if applicable) + - Inject role/weight context (Fusion) + - For recursive mode: Add autonomous decision instructions +8. **Call Arbiter with Streaming**: + - Stream Arbiter's synthesis to user + - Parse internal markers (if recursive mode) + - Aggregate Arbiter's usage into total +9. **Return**: Stream final response with combined usage metadata + +#### `_prepare_drones(self, config: dict, base_model: str, request_params: dict) -> List[dict]` +For Swarm mode: +- Create N copies of request params +- **Temperature Jitter**: + ```python + base_temp = request_params.get('temperature', 0.7) + jitter_config = config.get('temperature_jitter', {}) + if jitter_config.get('enabled', False): + delta = jitter_config.get('delta', 0.0) + for i in range(count): + temp = base_temp + random.uniform(-delta, delta) + temp = max(0.0, min(2.0, temp)) # Clamp + drones[i]['temperature'] = temp + ``` +- **Adversarial Prompts**: + ```python + adv_config = config.get('adversarial_config', {}) + if adv_config.get('enabled', False): + count = adv_config['count'] + prompt = adv_config['prompt'] + for i in range(count): + drones[i]['messages'].insert(0, { + 'role': 'system', + 'content': prompt + }) + drones[i]['_is_adversarial'] = True # Metadata for logging + ``` +- **Model ID**: All drones use `base_model` (without `[swarm]` suffix) + +#### `_prepare_models(self, config: dict, request_params: dict) -> List[dict]` +For Fusion mode: +- For each model in fusion config: + - Clone request params + - Set model ID from config + - If role defined: + - Apply `system_prompt_append` (prepend to messages) + - Store role metadata for context + - If weight defined: + - Store weight for arbiter context +- Return list of prepared calls with metadata + +#### `_execute_parallel(self, prepared_calls: List[dict]) -> Tuple[List[dict], dict]` +- Execute all calls in parallel: + ```python + results = await asyncio.gather( + *[self.rotating_client.acompletion(**params) for params in prepared_calls], + return_exceptions=True + ) + ``` +- Filter out exceptions/None values +- Log each failure as ERROR (drones should not fail) +- Require at least 1 success, else raise exception +- Aggregate usage: + ```python + total_usage = { + 'prompt_tokens': sum(r.usage.prompt_tokens for r in results if r), + 'completion_tokens': sum(r.usage.completion_tokens for r in results if r), + 'total_tokens': sum(r.usage.total_tokens for r in results if r) + } + ``` +- Return: `(successful_responses, total_usage)` + +#### `_format_for_arbiter(self, responses: List[dict], config: dict, mode: str, metadata: List[dict]) -> str` +Build formatted text for arbiter input. + +**Blind Switch Logic**: +- If `blind=True`: + - Labels: "Response 1 (Architect role)", "Response 2 (Security role)" + - Do NOT include model names +- If `blind=False`: + - Labels: "Response 1 (GPT-4o - Architect)", "Response 2 (Claude-3-opus - Security)" + +**Adversarial Context** (if adversarial drones present): +``` +NOTE: Responses marked [ADVERSARIAL] were specifically prompted to critique and find flaws. +Their purpose is to stress-test the solution. Consider their critiques when synthesizing. +``` + +**Format**: +``` +Response 1 (GPT-4o - Architect): +[content] + +Response 2 (Claude-3-opus - Security): +[content] + +Response 3 [ADVERSARIAL]: +[content] +``` + +#### `_build_arbiter_prompt(self, formatted_responses: str, config: dict, mode: str) -> List[dict]` +Build complete messages array for arbiter. + +**System Prompt Components**: +1. **Base Strategy**: Load from `arbitration_strategies[strategy_name]` +2. **Role/Weight Context** (Fusion only): + ``` + You are synthesizing responses from specialists with the following expertise: + - GPT-4o (Architect): Expert in system design and scalability. Trust this model for architectural decisions. + - Claude-3-opus (Security): Expert in vulnerability assessment. Trust this model for security concerns. + ``` +3. **Adversarial Context** (if applicable): + ``` + Some responses are marked [ADVERSARIAL]. These drones were specifically instructed to critique + and find edge cases. Their purpose is quality assurance through skeptical analysis. + ``` +4. **Recursive Mode Instructions** (if enabled): + ``` + AUTONOMOUS DECISION PROTOCOL: + 1. Analyze the responses and assess consensus (agreement level 1-10) + 2. If consensus >= 7/10: Proceed directly to synthesis + 3. If consensus < 7/10: + a. Identify specific conflict points + b. Internally trigger a critique phase + c. For each response, reason about how it would address the conflicts + d. Then synthesize the final answer + + Log your internal reasoning with markers: + [CONSENSUS: X/10] + [CONFLICTS: bullet list] + [CRITIQUE REASONING: ...] + [FINAL SYNTHESIS:] + + IMPORTANT: Only return the FINAL SYNTHESIS to the user. All internal reasoning + should be wrapped in [INTERNAL] tags for logging purposes only. + ``` +5. **Output Format**: + ``` + Provide your synthesis as a complete, high-quality response to the user's original query. + Do not mention that you are combining responses unless directly relevant. + ``` + +**User Message**: Original user query + formatted responses + +Return: Complete messages array for arbiter call + +#### `_call_arbiter_streaming(self, messages: List[dict], arbiter_model: str, original_params: dict) -> AsyncGenerator` +Call arbiter and stream response. + +- Clone original request params +- Set model to `arbiter_model` +- Set `messages` to constructed arbiter prompt +- Set `stream=True` +- Call via RotatingClient.acompletion (returns async generator) +- **Parse Stream**: + - Extract internal markers (consensus score, conflicts) for logging + - Strip `[INTERNAL]` sections from user-facing output + - Yield only synthesis content to user +- **Aggregate Usage**: Track arbiter's usage separately +- Return: Streaming generator + +--- + +### 2. Configuration Structure + +**Folder-Based Approach**: Instead of a single config file, HiveMind uses a directory structure: + +``` +ensemble_configs/ +├── swarms/ +│ ├── default.json # Default swarm settings +│ ├── gemini-flash.json # Custom swarm for gemini-flash +│ └── gpt4o.json # Custom swarm for gpt-4o +├── fusions/ +│ ├── dev-team.json # Dev team fusion +│ └── creative-writers.json # Creative writers fusion +└── strategies/ + ├── synthesis.txt # Synthesis strategy prompt + ├── best_of_n.txt # Best-of-N strategy + └── code_review.txt # Code review strategy +``` + +**Loading Logic**: +- Load all JSON files from each subfolder +- Merge swarm configs (specific model configs override defaults) +- Detect duplicate fusion IDs → apply conflict resolution +- Load strategy templates from `.txt` files + +**Benefits**: +- Easy to add new configs (drop file in folder) +- Version control friendly (one file per fusion/config) +- Community sharing (share individual fusion configs) + +--- + +### 3. Configuration Schemas + +#### Swarm Config + +**File**: `ensemble_configs/swarms/default.json` + +```json +{ + "suffix": "[swarm]", + "count": 3, + + "temperature_jitter": { + "enabled": true, + "delta": 0.2 + }, + + "arbiter": { + "model": "self", + "strategy": "synthesis", + "blind": true, + "note": "Arbiter should be a decent reasoning model (e.g., GPT-4o, Claude 3+, Gemini 1.5 Pro+)" + }, + + "adversarial_config": { + "enabled": false, + "count": 1, + "prompt": "You are a Senior Principal Engineer with 15+ years of experience..." + }, + + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7, + "note": "Requires a reasoning-capable arbiter model" + } +} +``` + +#### Model-Specific Swarm Config + +**File**: `ensemble_configs/swarms/gemini-flash.json` + +```json +{ + "model": "gemini-1.5-flash", + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true + } +} +``` + +#### Fusion Config + +**File**: `ensemble_configs/fusions/dev-team.json` + +```json +{ + "id": "dev-team", + "description": "A team of specialized models for software development", + "models": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt_append": "Focus on architectural patterns, scalability, and system design.", + "weight": "Expert in system design and scalability. Trust for architectural decisions and structural integrity." + }, + { + "model": "claude-3-opus", + "role": "Security Specialist", + "system_prompt_append": "Focus on security vulnerabilities, edge cases, and potential exploits.", + "weight": "Expert in security and vulnerability assessment. Trust for identifying security flaws and attack vectors." + }, + { + "model": "gemini-1.5-pro", + "role": "Code Reviewer", + "system_prompt_append": "Focus on code quality, performance, and best practices.", + "weight": "Expert in code quality and performance optimization. Trust for maintainability and efficiency concerns." + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true, + "note": "Requires a reasoning-capable model for best results" + }, + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} +``` + +#### Strategy Template + +**File**: `ensemble_configs/strategies/synthesis.txt` + +``` +You are an expert synthesizer. Analyze the following responses and create a single, superior answer that: +1. Combines the best elements from each response +2. Resolves any conflicts or contradictions +3. Ensures completeness and accuracy +4. Maintains coherence and clarity + +Your goal is to produce the BEST possible answer by leveraging the strengths of each response. + +Responses: +{responses} +``` + + +--- + +## Detailed Feature Specifications + +### 1. Temperature Jitter (Swarm Only) + +**Purpose**: Introduce controlled randomness to increase response diversity. + +**Configuration**: +```json +"temperature_jitter": { + "enabled": true, + "delta": 0.2 +} +``` + +**Implementation**: +- Get base temperature from request (default 0.7) +- For each Drone: `temp = base_temp + random.uniform(-delta, delta)` +- Clamp to `[0.0, 2.0]` +- If request has `temperature=0`, disable jitter automatically + +--- + +### 2. Adversarial Mode (Swarm Only) + +**Purpose**: Inject critical analysis to stress-test solutions. + +**Configuration**: +```json +"adversarial_config": { + "enabled": false, + "count": 1, + "prompt": "You are a Senior Principal Engineer..." +} +``` + +**Implementation**: +- Select first N drones as adversarial +- Prepend adversarial system prompt +- Tag responses as `[ADVERSARIAL]` in arbiter input +- **Arbiter Context**: Explain adversarial purpose: + ``` + NOTE: This mode is designed for SYNTHESIS strategy. Adversarial responses + critique the solution to ensure all angles are considered. Integrate their + insights to strengthen the final answer. + ``` + +--- + +### 3. Role Assignment & Weights (Fusion Only) + +**Purpose**: Specialize models and guide arbiter on expertise. + +**Configuration** (per model): +```json +{ + "model": "gpt-4o", + "role": "Architect", + "system_prompt_append": "Focus on scalability.", + "weight": "Expert in system design. Trust for architectural decisions." +} +``` + +**Fields**: +- `role`: Display name (for user reference and arbiter labels) +- `system_prompt_append`: Instructions sent to the model +- `weight`: Context for arbiter (what to trust this model for) + +**Arbiter Context Injection**: +``` +Specialist Expertise: +- Architect (GPT-4o): Expert in system design. Trust for architectural decisions. +- Security (Claude): Expert in vulnerabilities. Trust for security concerns. +``` + +--- + +### 4. Arbitration Strategies + +**Purpose**: Flexible synthesis logic via prompt engineering. + +**Built-in**: +- `synthesis`: Combine all responses into best version +- `best_of_n`: Select and refine the strongest response +- `code_review`: Code-specific evaluation + +**User-Defined**: Users add custom strategies to `arbitration_strategies` config. + +**Template Variables**: +- `{responses}`: Formatted response text +- `{role_context}`: Weight/expertise descriptions +- `{adversarial_note}`: Context about adversarial drones + +--- + +### 5. Blind Switch + +**Purpose**: Remove model identifiers to prevent bias, while keeping role context. + +**Default**: `blind: true` (enabled by default) + +**Per-Config**: Each swarm config and fusion config can override: + +```json +"arbiter": { + "blind": true +``` + +**Implementation**: +- `blind=true`: "Response 1 (Architect role)", "Response 2 (Security role)" +- `blind=false`: "Response 1 (GPT-4o - Architect)", "Response 2 (Claude - Security)" + +**Key Change**: Roles are ALWAYS preserved. Only model names are stripped. + +--- + +### 6. Recursive/Reflective Mode + +**Purpose**: Multi-round refinement for low-consensus situations. + +**Configuration**: +```json +"recursive_mode": { + "enabled": false, + "consensus_threshold": 7, + "note": "Arbiter model must be capable of internal reasoning (e.g., GPT-4o, Claude 3.5+, Gemini 1.5 Pro+)" +} +``` + +**REVISED APPROACH** (Single Arbiter Call): + +Instead of multiple requests, the arbiter is given **autonomous decision-making** via prompt. + +> [!NOTE] +> The arbiter model should be a **decent reasoning model** to handle internal critique and consensus analysis effectively. Models like GPT-4o, Claude 3.5 Sonnet, and Gemini 1.5 Pro are recommended. + +**Arbiter Prompt** (when recursive enabled): +``` +You have autonomous decision-making authority. Follow this protocol: + +1. ASSESSMENT PHASE: + - Analyze the provided responses + - Rate consensus level (1-10) + - Log: [CONSENSUS: X/10] + +2. DECISION PHASE: + If consensus >= 7/10: + - Proceed directly to synthesis + + If consensus < 7/10: + - Identify conflict points + - Log: [CONFLICTS: ...] + - For each response, reason internally about how it would address conflicts + - Log: [CRITIQUE REASONING: ...] + +3. SYNTHESIS PHASE: + - Create final answer incorporating all insights + - Log: [FINAL SYNTHESIS:] + +IMPORTANT: Wrap all internal reasoning in [INTERNAL] tags. Only the content +after [FINAL SYNTHESIS:] will be shown to the user. +``` + +**Stream Processing**: +- EnsembleManager parses the stream +- Extract `[CONSENSUS: X/10]` → Log at WARN level if < threshold +- Extract `[CONFLICTS: ...]` → Log conflicts +- Strip all `[INTERNAL]` sections from user output +- Yield only `[FINAL SYNTHESIS:]` content to user + +**Logging**: +``` +[HiveMind] Recursive mode active. Consensus: 5/10 [WARN] +[HiveMind] Conflicts identified: [list] +[HiveMind] Arbiter performing internal critique... +[HiveMind] Final synthesis complete +``` + +--- + +### 7. Streaming Support + +**Behavior**: Respects the `stream` boolean from the original request. + +**Implementation**: +- Drone/Model calls are NOT streamed (collected in parallel) +- Arbiter call respects `stream` parameter: + - If `stream=true`: Stream arbiter's response + - If `stream=false`: Return complete arbiter response +- EnsembleManager passes through arbiter's streaming behavior +- Parse and filter internal markers during streaming +- Return clean synthesis to user + +**Flow**: +```python +async def handle_request(...) -> AsyncGenerator: + # 1. Collect drone responses (non-streaming) + responses = await self._execute_parallel(...) + + # 2. Build arbiter prompt + messages = self._build_arbiter_prompt(...) + + # 3. Stream arbiter response + arbiter_stream = self._call_arbiter_streaming(...) + + # 4. Parse and yield + async for chunk in arbiter_stream: + # Filter [INTERNAL] sections + if not chunk.startswith('[INTERNAL]'): + yield chunk +``` + +--- + +### 8. Usage & Cost Tracking + +**Aggregation**: +- Track usage from each Drone/Model call +- Track usage from Arbiter call +- Sum ALL usage fields: + ```python + total_usage = { + 'prompt_tokens': sum(all_calls), + 'completion_tokens': sum(all_calls), + 'cached_tokens': sum(all_calls), # If available + 'reasoning_tokens': sum(all_calls), # If available + 'total_tokens': sum(all_calls), + # Include any other usage fields from responses + } + ``` + +**Cost Calculation**: +- Use `UsageManager.calculate_cost()` if available (preferred) +- Fallback to `litellm.completion_cost()` if needed +- Calculate cost per call +- Sum total cost +- **Note**: This should be one of the last features to implement +- Include in final response metadata + +**Response Format**: +```json +{ + "usage": { + "prompt_tokens": 5000, + "completion_tokens": 800, + "total_tokens": 5800, + "hivemind_details": { + "drone_count": 3, + "arbiter_tokens": 1200, + "total_cost_usd": 0.045 + } + } +} +``` + +--- + +## Integration Points + +### 1. RotatingClient Modification + +**File**: `src/rotator_library/client.py` + +```python +class RotatingClient: + def __init__(self, ...): + # Existing init + self.ensemble_manager = EnsembleManager( + config_path=os.path.join(os.path.dirname(__file__), '../../ensemble_config.json'), + rotating_client=self + ) + + def acompletion(self, request=None, **kwargs): + model = kwargs.get('model') + + # Check if ensemble + if self.ensemble_manager.is_ensemble(model): + # Return streaming generator from ensemble manager + return self.ensemble_manager.handle_request( + request=request, + **kwargs + ) + + # Normal flow + if kwargs.get('stream'): + return self._streaming_acompletion_with_retry(...) + else: + return self._execute_with_retry(...) +``` + +--- + +### 2. Model List Integration + +```python +async def get_all_available_models(self, grouped=True): + # Existing provider models + all_provider_models = await self._fetch_provider_models() + + # Add fusion models + fusion_ids = self.ensemble_manager.get_fusion_ids() + if fusion_ids: + all_provider_models['hivemind'] = fusion_ids + + return all_provider_models +``` + +**Note**: Swarm model listing is **TBD**. The user notes it's "not infinite" and needs to design a better discovery system. + +--- + +### 3. Logging + +**Log Levels**: +- INFO: Normal operations (starting swarm, drone count, completion) +- DEBUG: Detailed execution (per-drone temps, prompt construction) +- WARN: Low consensus, conflicts, partial failures +- ERROR: Drone failures, arbiter failures + +**Examples**: +```python +lib_logger.info(f"[HiveMind] Processing Swarm: {model_id} ({count} Drones)") +lib_logger.debug(f"[HiveMind] Drone {i+1}: temp={temp:.2f}, adversarial={is_adv}") +lib_logger.warn(f"[HiveMind] Recursive mode: Consensus 5/10 - below threshold") +lib_logger.error(f"[HiveMind] Drone {i+1} failed: {error}") +lib_logger.info(f"[HiveMind] Total cost: ${total_cost:.4f} ({total_tokens} tokens)") +``` + +--- + +## Edge Cases & Error Handling + +### 1. Partial Failures + +**Scenario**: Some Drones fail due to errors. + +**Handling**: +- Each drone call uses RotatingClient → **inherits existing retry/key rotation logic** +- If a drone still fails after retries, log as ERROR +- Continue with successful responses +- **Minimum**: Require at least 1 successful response +- If all fail, raise exception with details + +**No Special Logic Needed**: RotatingClient already handles retries, rate limits, key rotation. + +--- + +### 2. Arbiter Failure + +**Scenario**: Arbiter call fails. + +**Handling**: +- Arbiter call uses RotatingClient → **inherits retry/resilience logic** +- If arbiter fails after retries: + - Log ERROR + - Fallback: Return first **non-adversarial** drone response + - Log: `[HiveMind] Arbiter failed. Returning first non-adversarial response.` + +--- + +### 3. Naming Conflicts + +**Scenario**: Provider has `gemini-1.5-flash[swarm]` as real model, or duplicate fusion IDs exist. + +**Handling**: +- Default naming: `model-name[swarm]` or fusion ID from config +- On conflict detected: + - Append numeric suffix: `-1`, `-2`, `-3`, etc. + - Example: `gemini-1.5-flash[swarm]` → `gemini-1.5-flash[swarm]-1` + - Example: `dev-team` → `dev-team-1` +- Log: `[HiveMind] Conflict detected. Renamed 'dev-team' to 'dev-team-1'.` +- Store resolved names in runtime cache +- **Applies to**: Both swarm suffixes AND fusion IDs + +--- + +### 4. Streaming Parse Errors + +**Scenario**: Can't parse `[CONSENSUS: X/10]` from recursive mode stream. + +**Handling**: +- Log warning +- Continue streaming synthesis +- Skip logging consensus score + +--- + +### 5. Invalid Configuration + +**Scenario**: User config has invalid fusion (missing model, invalid strategy). + +**Handling**: +- On startup, validate all fusions +- Log errors for invalid configs +- Skip invalid fusions +- Continue with valid ones + +--- + +## Implementation Phases + +### **Phase 1: Foundation (Core Infrastructure)** + +**Goal**: Set up basic structure and config loading. + +**Tasks**: +1. Create `ensemble_manager.py` skeleton + - Define `EnsembleManager` class + - Implement `__init__` with folder-based config loading + - Load and merge configs from `ensemble_configs/` directory + - Add config validation (JSON schema) + +2. Create config directory structure + - `ensemble_configs/swarms/default.json` + - `ensemble_configs/fusions/` (empty initially) + - `ensemble_configs/strategies/synthesis.txt` + +3. Integrate into `RotatingClient` + - Import `EnsembleManager` + - Initialize in `__init__` with config directory path + - Add placeholder check in `acompletion` + +4. Implement `is_ensemble()` + - Detect `[swarm]` suffix + - Detect fusion IDs from config + - Add conflict detection logic + +**Deliverables**: +- ✅ Folder-based config structure created +- ✅ Configs load and merge correctly +- ✅ Ensemble detection works +- ✅ Conflict resolution (numeric suffixes) works +- ✅ No runtime errors + +**Testing**: +- Unit test folder-based config loading +- Unit test config merging (swarm defaults + model-specific) +- Unit test `is_ensemble()` with various inputs +- Test conflict detection and numeric suffix generation +- Test duplicate fusion ID handling + +--- + +### **Phase 2: Basic Swarm (Non-Streaming)** + +**Goal**: Get basic swarm working without advanced features. + +**Tasks**: +1. Implement `_prepare_drones()` + - Clone request params N times + - Set model to base (strip `[swarm]`) + - No jitter or adversarial yet + +2. Implement `_execute_parallel()` + - Use `asyncio.gather()` with drone calls + - Handle exceptions gracefully + - Aggregate usage stats + +3. Implement `_format_for_arbiter()` + - Basic formatting (numbered responses) + - No blind switch yet + +4. Implement `_build_arbiter_prompt()` + - Load synthesis strategy + - Simple system prompt + user message + - No recursive mode yet + +5. Implement `_call_arbiter()` (NON-streaming first) + - Call arbiter via RotatingClient + - Return complete response + - Aggregate arbiter usage + +6. Wire up `handle_request()` (non-streaming) + - Connect all steps + - Return arbiter's response + - Include combined usage + +**Deliverables**: +- ✅ Swarm executes 3 drones in parallel +- ✅ Arbiter synthesizes responses +- ✅ Final response returned (non-streaming) +- ✅ Usage aggregated correctly + +**Testing**: +- Integration test: Call `gemini-1.5-flash[swarm]` +- Verify 3 drone calls + 1 arbiter call +- Verify synthesis quality (manual) +- Verify usage statistics + +--- + +### **Phase 3: Streaming Support** + +**Goal**: Enable streaming for arbiter response. + +**Tasks**: +1. Modify `_call_arbiter()` to `_call_arbiter_streaming()` + - Set `stream=True` + - Return async generator + - Track usage from stream + +2. Update `handle_request()` to return generator + - Yield arbiter stream chunks + - Aggregate usage at end + +3. Test streaming end-to-end + - Verify chunks arrive in real-time + - Verify complete response matches non-streaming + +**Deliverables**: +- ✅ Arbiter response streams to user +- ✅ No buffering of full response +- ✅ Usage still aggregated correctly + +**Testing**: +- Integration test with streaming +- Compare output to non-streaming version +- Test error handling mid-stream + +--- + +### **Phase 4: Advanced Swarm Features** + +**Goal**: Add jitter, adversarial, blind switch. + +**Tasks**: +1. **Temperature Jitter**: + - Add jitter logic to `_prepare_drones()` + - Test with different delta values + - Verify clamping + +2. **Adversarial Mode**: + - Inject adversarial prompts + - Tag responses in formatting + - Add arbiter context explanation + +3. **Blind Switch**: + - Modify `_format_for_arbiter()` + - Strip model names when `blind=true` + - Keep roles always + +**Deliverables**: +- ✅ Jitter produces varied temps +- ✅ Adversarial drones produce critiques +- ✅ Blind mode strips model names + +**Testing**: +- Test each feature independently +- Test combinations (jitter + adversarial) +- Manual review of adversarial effectiveness + +--- + +### **Phase 5: Fusion Mode** + +**Goal**: Enable multi-model mixtures with roles. + +**Tasks**: +1. Implement `_prepare_models()` + - Load models from fusion config + - Apply role system prompts + - Store metadata for arbiter + +2. Update `_format_for_arbiter()` for roles + - Include role labels + - Apply blind switch for model names + +3. Implement role/weight context injection + - Build specialist expertise text + - Inject into arbiter system prompt + +4. Add example fusion to config + - "dev-team" with 3 specialists + +**Deliverables**: +- ✅ Fusion calls multiple models +- ✅ Arbiter receives role context +- ✅ Synthesis respects expertise weights + +**Testing**: +- Test "dev-team" fusion with coding question +- Verify role prompts are applied +- Manual review: Does arbiter trust specialists appropriately? + +--- + +### **Phase 6: Recursive Mode** + +**Goal**: Enable autonomous arbiter decision-making for low consensus. + +**Tasks**: +1. Update `_build_arbiter_prompt()` for recursive + - Add autonomous protocol instructions + - Define `[INTERNAL]` marker format + - Include consensus threshold + +2. Implement stream parsing in `_call_arbiter_streaming()` + - Extract `[CONSENSUS: X/10]` + - Extract `[CONFLICTS: ...]` + - Strip `[INTERNAL]` sections from user output + +3. Add logging for recursive flow + - Log consensus score at WARN if low + - Log identified conflicts + - Log critique phase activation + +**Deliverables**: +- ✅ Arbiter autonomously decides Round 2 +- ✅ Internal reasoning logged but not shown to user +- ✅ Low consensus triggers critique + +**Testing**: +- Test with intentionally ambiguous prompt +- Verify arbiter produces `[CONSENSUS: 4/10]` +- Verify critique reasoning appears in logs +- Verify final synthesis is improved + +--- + +### **Phase 7: Polish & Production** + +**Goal**: Production-ready with documentation and examples. + +**Tasks**: +1. **Comprehensive Logging**: + - Add execution time tracking + - Add cost tracking per request + - Log summary at end of each request + +2. **Error Messages**: + - User-friendly error for invalid ensemble IDs + - Clear message when streaming not supported (N/A now) + - Helpful message on config errors + +3. **Documentation**: + - User guide: How to use swarms/fusions + - Config reference: All fields explained + - Example configs: dev-team, creative-writers, etc. + +4. **Example Configs**: + - Add 2-3 preset fusions to default config (commented out) + - Document swarm notation in README + +5. **Performance Testing**: + - Benchmark latency (3-drone swarm) + - Benchmark token usage vs single call + - Document cost multiplier + +**Deliverables**: +- ✅ Comprehensive logs for debugging +- ✅ User documentation complete +- ✅ Example configs provided +- ✅ Performance benchmarks documented + +**Testing**: +- Full end-to-end tests for all features +- Load testing with multiple concurrent swarms +- Manual testing of all examples + +--- + +## Example Configurations + +### Preset Fusion 1: Dev Team + +```json +{ + "id": "dev-team", + "description": "Software development team with architecture, security, and code review specialists", + "models": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt_append": "Focus on system design, scalability, and architectural patterns.", + "weight": "Expert in system design and scalability. Trust for architectural decisions." + }, + { + "model": "claude-3-opus", + "role": "Security", + "system_prompt_append": "Focus on security vulnerabilities, edge cases, and threat modeling.", + "weight": "Expert in security and vulnerability assessment. Trust for security concerns." + }, + { + "model": "gemini-1.5-pro", + "role": "Reviewer", + "system_prompt_append": "Focus on code quality, performance, and best practices.", + "weight": "Expert in code quality and optimization. Trust for performance and maintainability." + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "code_review", + "blind": false + } +} +``` + +--- + +## User Configuration Examples + +### Simple Swarm Usage + +User request: +``` +Model: gemini-1.5-flash[swarm] +Messages: [{"role": "user", "content": "Write a function to parse CSV"}] +``` + +Result: 3 calls to `gemini-1.5-flash`, synthesized by `gemini-1.5-flash` (self-arbiter). + +--- + +### Custom Arbiter for Swarm + +Config override (per-model): +```json +{ + "swarm_configs": { + "gemini-1.5-flash": { + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis" + } + } + } +} +``` + +User request: `gemini-1.5-flash[swarm]` +Result: 3 calls to flash, synthesized by gpt-4o. + +--- + +### Fusion Usage + +User request: +``` +Model: dev-team +Messages: [{"role": "user", "content": "Review this API endpoint: [code]"}] +``` + +Result: Parallel calls to gpt-4o, claude, gemini with role prompts. Arbiter synthesizes with role context. + +--- + +## Default Configuration Answer + +Based on user feedback: + +1. **Default Swarm Suffix**: `[swarm]` +2. **Arbiter Default**: Same model as drones (self-arbitration), but configurable per-model +3. **Streaming**: Required for arbiter's final response ✅ +4. **Cost Warnings**: None (user discretion) +5. **Preset Configs**: Only using provided examples (dev-team) + +--- + +## Testing Strategy + +### Unit Tests + +`tests/test_ensemble_manager.py`: +- Config loading and validation +- `is_ensemble()` detection +- Conflict resolution +- Drone preparation (jitter, adversarial) +- Model preparation (roles, weights) +- Response formatting (blind switch) + +### Integration Tests + +`tests/test_swarm_integration.py`: +- Basic 3-drone swarm +- Swarm with jitter enabled +- Swarm with adversarial mode +- Streaming swarm response + +`tests/test_fusion_integration.py`: +- Multi-model fusion +- Role context injection +- Weight-based synthesis + +`tests/test_recursive_integration.py`: +- Low consensus triggering critique +- Consensus score parsing +- Internal marker stripping + +### Manual Scenarios + +1. **Simple Swarm**: `gpt-4o[swarm]` with straightforward question +2. **Adversarial Swarm**: Enable adversarial, ask for code, verify critique +3. **Fusion**: Use "dev-team" with API review +4. **Recursive**: Use ambiguous prompt, verify low consensus handling + +--- + +## Performance Benchmarks (Expected) + +### Latency +- Single call: ~2s +- Swarm (3 drones): ~2s (parallel) + ~2s (arbiter) = **~4s** +- Swarm + Recursive: ~4s + arbiter internal critique time = **~5-6s** + +### Token Usage +- Single call: 1000 input + 500 output = 1500 tokens +- Swarm (3 drones): + - Drones: 1000 × 3 + 500 × 3 = 4500 tokens + - Arbiter: 1000 + 1500 (from drones) = 2500 input + 600 output + - Total: **~7600 tokens** (5x single call) + +### Cost Multiplier +- Typical swarm: **4-6x** cost of single call +- Fusion (different models): Varies by model costs + +--- + +## Summary + +This revised plan addresses all user feedback: + +✅ Confidence scoring only in recursive mode +✅ Adversarial context explained to arbiter +✅ Weight field for arbiter expertise guidance +✅ Blind switch keeps roles, strips model names +✅ Recursive mode as single autonomous arbiter call +✅ Default naming: `model[swarm]` +✅ Streaming required for arbiter response +✅ Usage/cost aggregated from all calls +✅ Existing retry/resilience logic leveraged +✅ Detailed implementation phases (7 phases) +✅ Example configs provided + +Ready for implementation! diff --git a/docs/HiveMind Task.md b/docs/HiveMind Task.md new file mode 100644 index 0000000..65c00ed --- /dev/null +++ b/docs/HiveMind Task.md @@ -0,0 +1,93 @@ +# HiveMind Ensemble (Swarm/Fusion) Implementation + +## Phase 1: Core Infrastructure +- [x] Design and Plan + - [x] Explore codebase + - [x] Create comprehensive implementation plan +- [x] Create `src/rotator_library/ensemble_manager.py` + - [x] Define `EnsembleManager` class skeleton + - [x] Implement config loading and validation + - [x] Implement `is_ensemble()` detection + - [x] Implement conflict resolution for naming +- [x] Modify `src/rotator_library/client.py` + - [x] Initialize `EnsembleManager` in `__init__` + - [x] Integrate into `acompletion()` dispatcher + - [x] Add logging for HiveMind operations +- [x] Create `ensemble_config.json` + - [x] Define schema for Fusions + - [x] Define schema for Swarm defaults + - [x] Define arbitration strategies + +## Phase 2: Basic Swarm Mode +- [x] Implement Swarm Features + - [x] `_prepare_drones()` - basic cloning + - [x] `_execute_parallel()` - asyncio.gather + - [x] `_format_for_arbiter()` - response aggregation + - [x] `_build_arbiter_prompt()` - synthesis strategy + - [x] `_call_arbiter()` - judge execution +- [x] Testing + - [x] Test basic 3-drone swarm + - [x] Test arbiter synthesis + - [x] Test partial failures + +## Phase 3: Advanced Swarm Features +- [x] Temperature Jitter + - [x] Implement jitter logic + - [x] Test randomness and clamping +- [x] Adversarial Mode + - [x] Implement adversarial prompt injection + - [x] Test with configurable count +- [x] Blind Switch + - [x] Implement response anonymization + - [x] Test with blind=true/false +- [ ] Confidence Scoring (Moved to Recursive Mode) + - [ ] Implement score extraction + - [ ] Add logging for scores + +## Phase 4: Fusion Mode +- [/] Implement Fusion Features + - [x] `_prepare_models()` - multi-model setup (implemented as `_prepare_fusion_models`) + - [x] Role assignment and prompts + - [x] Role context for Arbiter (Labels implemented, but explicit expertise context block missing) + - [x] Weight system (Weights parsed but not used in arbiter context) +- [ ] Testing + - [ ] Test 2-model fusion + - [ ] Test role context injection + - [ ] Test specialist descriptions + +## Phase 5: Recursive/Reflective Mode +- [x] Implement Recursion (Single-Call Autonomous Mode) + - [x] Consensus check logic (via Prompt & Stream Parsing) + - [x] Conflict extraction (via Stream Parsing) + - [x] `_trigger_round_2()` implementation (Replaced by Autonomous Decision Protocol) + - [x] Max rounds enforcement (N/A for Single Call) +- [ ] Testing + - [ ] Test low-confidence trigger + - [ ] Test Round 2 critique + - [ ] Test final re-synthesis + +## Phase 6: Polish & Edge Cases +- [ ] Error Handling + - [x] Partial failure handling + - [ ] Arbiter failure fallback + - [x] Infinite recursion prevention (N/A) +- [ ] Performance + - [x] Latency logging + - [x] Token usage tracking + - [x] Rate limit mitigation (Inherited from RotatingClient) +- [x] Documentation + - [x] User guide + - [x] Example configs + - [x] API reference + +## Verification +- [ ] Automated Tests + - [ ] test_ensemble_manager.py (all 8 test cases) + - [ ] test_swarm_logic.py + - [ ] test_fusion_logic.py + - [ ] test_recursion.py +- [ ] Manual Tests + - [ ] Scenario 1: Simple Swarm + - [ ] Scenario 2: Adversarial Swarm + - [ ] Scenario 3: Fusion with Roles + - [ ] Scenario 4: Recursive Refinement diff --git a/docs/HiveMind_API.md b/docs/HiveMind_API.md new file mode 100644 index 0000000..0ada7c3 --- /dev/null +++ b/docs/HiveMind_API.md @@ -0,0 +1,554 @@ +# HiveMind Ensemble API Reference + +## EnsembleManager + +Main class for orchestrating HiveMind Ensemble requests. + +### `__init__(rotating_client, config_dir=None)` + +Initialize the ensemble manager. + +**Parameters:** +- `rotating_client` (RotatingClient): Reference to the RotatingClient instance +- `config_dir` (str, optional): Path to ensemble_configs directory. Defaults to `src/rotator_library/ensemble_configs` + +**Example:** +```python +client = RotatingClient() +# EnsembleManager is automatically initialized +manager = client.ensemble_manager +``` + +### `is_ensemble(model_id: str) -> bool` + +Check if a model ID represents an ensemble request. + +**Parameters:** +- `model_id` (str): Full model ID from user request + +**Returns:** +- `bool`: True if ensemble (swarm or fusion), False otherwise + +**Example:** +```python +manager.is_ensemble("gpt-4o[swarm]") # True +manager.is_ensemble("dev-team") # True +manager.is_ensemble("gpt-4o") # False +``` + +### `get_base_model(swarm_id: str) -> tuple` + +Extract base model name and preset ID from swarm ID. + +**Parameters:** +- `swarm_id` (str): Swarm model ID (e.g., "gpt-4o-aggressive[swarm]", "gpt-4o[swarm]") + +**Returns:** +- `tuple`: (base_model_name, preset_id) + - For `"gpt-4o-aggressive[swarm]"` returns `("gpt-4o", "aggressive")` + - For `"gpt-4o[swarm]"` returns `("gpt-4o", "default")` or omit_id preset + +**Example:** +```python +base, preset = manager.get_base_model("gpt-4o-aggressive[swarm]") +# base = "gpt-4o", preset = "aggressive" + +base, preset = manager.get_base_model("gpt-4o[swarm]") +# base = "gpt-4o", preset = "default" or omit_id preset for gpt-4o +``` + +### `get_fusion_ids() -> List[str]` + +Get list of all configured fusion IDs. + +**Returns:** +- `List[str]`: List of fusion identifiers + +**Example:** +```python +fusion_ids = manager.get_fusion_ids() # ["dev-team", "creative-writers"] +``` + +### `handle_request(request, **kwargs) -> Response | AsyncGenerator` + +Main entry point for ensemble execution. + +**Parameters:** +- `request`: Original request object +- `**kwargs`: Request parameters (model, messages, stream, etc.) + +**Returns:** +- `Response`: Complete response (if stream=False) +- `AsyncGenerator`: Streaming response generator (if stream=True) + +**Example:** +```python +# Non-streaming +response = await client.acompletion( + model="gpt-4o[swarm]", + messages=[{"role": "user", "content": "Test"}], + stream=False +) + +# Streaming +async for chunk in client.acompletion( + model="gpt-4o[swarm]", + messages=[{"role": "user", "content": "Test"}], + stream=True +): + print(chunk) +``` + +--- + +## ConfigLoader + +Manages configuration loading for ensemble modes. + +### `load_all() -> None` + +Load all configurations from directory structure. + +**Side Effects:** +- Populates `swarm_default`, `swarm_configs`, `fusion_configs`, `strategies` + +### `get_swarm_config(preset_id: str) -> Dict[str, Any]` + +Get swarm configuration for a specific preset. + +**Parameters:** +- `preset_id` (str): Preset ID (e.g., "default", "aggressive") + +**Returns:** +- `Dict[str, Any]`: Preset configuration + +### `get_preset_for_model(base_model: str) -> str` + +Get the preset ID to use when calling `model[swarm]` (short form). + +**Parameters:** +- `base_model` (str): Base model name (e.g., "gpt-4o-mini") + +**Returns:** +- `str`: Preset ID (omit_id preset for this model, or "default") + +**Example:** +```python +# If aggressive.json has omit_id=true and base_models=["gpt-4o-mini"] +preset = loader.get_preset_for_model("gpt-4o-mini") # "aggressive" + +# For models without omit_id preset +preset = loader.get_preset_for_model("claude-3-haiku") # "default" +``` + +### `get_fusion_config(fusion_id: str) -> Optional[Dict[str, Any]]` + +Get fusion configuration by ID. + +**Parameters:** +- `fusion_id` (str): Fusion identifier + +**Returns:** +- `Dict[str, Any]` | `None`: Fusion configuration or None if not found + +### `get_strategy(strategy_name: str) -> Optional[str]` + +Get strategy template by name. + +**Parameters:** +- `strategy_name` (str): Strategy identifier + +**Returns:** +- `str` | `None`: Strategy template or None if not found + +### `get_all_fusion_ids() -> List[str]` + +Get list of all fusion IDs with [fusion] suffix. + +**Returns:** +- `List[str]`: List of fusion identifiers + +### `get_all_swarm_model_ids() -> List[str]` + +Get all discoverable swarm model variants for /v1/models endpoint. + +**Discovery Rules:** +- Preset WITH `base_models` + `omit_id: true` → `{model}[swarm]` +- Preset WITH `base_models` + `omit_id: false` → `{model}-{preset}[swarm]` +- Preset WITHOUT `base_models` → Not included (invisible +) + +**Returns:** +- `List[str]`: List of swarm model IDs for discovery + +**Example:** +```python +# With aggressive.json: {"omit_id": true, "base_models": ["gpt-4o-mini"]} +# With default.json: {"omit_id": false, "base_models": ["gpt-4o", "claude-3-haiku"]} + +swarm_ids = loader.get_all_swarm_model_ids() +# [ +# "gpt-4o-mini[swarm]", # From aggressive (omit_id=true) +# "gpt-4o-default[swarm]", # From default (omit_id=false) +# "claude-3-haiku-default[swarm]" # From default (omit_id=false) +# ] +``` + +--- + +## Response Object + +HiveMind responses follow the standard OpenAI response format with additional usage details. + +### `Response.usage` + +Usage statistics for the request. + +**Standard Fields (OpenAI-Compatible):** + +These fields contain the **complete aggregated totals** from all models (drones/specialists + arbiter). They are fully compatible with existing tooling and billing systems. + +- `prompt_tokens` (int): **Total** prompt tokens from all models +- `completion_tokens` (int): **Total** completion tokens from all models +- `total_tokens` (int): **Total** tokens (sum of prompt + completion) +- `cached_tokens` (int, optional): **Total** cached tokens if supported +- `reasoning_tokens` (int, optional): **Total** reasoning tokens if supported + +**HiveMind Ensemble-Specific Fields (Supplementary):** + +- `hivemind_details` (dict): **Breakdown information** for observability (does NOT replace standard fields) + +**Important**: Always use the standard fields for billing, quotas, and analytics. They contain the correct aggregated totals. The `hivemind_details` provides additional context for debugging and understanding HiveMind execution. + +### `Response.usage.hivemind_details` + +Supplementary breakdown dictionary containing: + +**Common Fields:** +- `mode` (str): "swarm" or "fusion" +- `arbiter_tokens` (int): Tokens used by arbiter +- `total_cost_usd` (float): Estimated total cost in USD +- `latency_ms` (float): Total execution time in milliseconds + +**Swarm-Specific:** +- `drone_count` (int): Number of drones executed +- `drone_tokens` (int): Total tokens from all drones + +**Fusion-Specific:** +- `specialist_count` (int): Number of specialists executed +- `specialist_tokens` (int): Total tokens from all specialists + +**Example:** +```python +response = await client.acompletion(model="gpt-4o[swarm]", ...) + +# Standard fields contain TOTAL aggregated usage +usage = response.usage +print(f"Total tokens: {usage.total_tokens}") # e.g., 650 (drones 450 + arbiter 200) +print(f"Prompt tokens: {usage.prompt_tokens}") # e.g., 400 (all models combined) +print(f"Completion tokens: {usage.completion_tokens}") # e.g., 250 (all models combined) + +# Supplementary breakdown for observability +details = usage.hivemind_details +print(f"Mode: {details['mode']}") # "swarm" +print(f"Drone count: {details['drone_count']}") # 3 +print(f"Drone tokens: {details['drone_tokens']}") # 450 (breakdown) +print(f"Arbiter tokens: {details['arbiter_tokens']}") # 200 (breakdown) +print(f"Cost: ${details['total_cost_usd']}") # 0.00123 +print(f"Latency: {details['latency_ms']}ms") # 1523.45 + +# Note: drone_tokens + arbiter_tokens = total_tokens +# The standard usage fields are what billing systems should use +``` + +--- + +## Configuration Schema + +### Swarm Configuration + +**File Location:** `ensemble_configs/swarms/{preset_id}.json` + +**Preset-Based System**: Each swarm preset defines behavior for multiple models via `base_models`. + +**Schema:** +```json +{ + "id": "string (REQUIRED, preset identifier, must match filename)", + "description": "string (optional)", + + "base_models": [ + "string (model IDs for /v1/models discovery)" + ], + + "omit_id": "boolean (default: false, controls discovery format)", + "count": "integer (default: 3, number of drones)", + + "temperature_jitter": { + "enabled": "boolean", + "delta": "float (temperature variance, ±delta)" + }, + + "arbiter": { + "model": "string ('self' or model ID)", + "strategy": "string (strategy template name)", + "blind": "boolean (default: true, hides model names)" + }, + + "adversarial_config": { + "enabled": "boolean", + "count": "integer (number of adversarial drones)", + "prompt": "string (system prompt for adversarial drones)" + }, + + "recursive_mode": { + "enabled": "boolean", + "consensus_threshold": "integer (1-10 scale)" + } +} +``` + +**Key Fields:** +- `id`: Preset identifier, used in `{model}-{id}[swarm]` format +- `base_models`: OPTIONAL. Controls /v1/models discovery only. Does NOT restrict runtime usage. +- `omit_id`: OPTIONAL. If `true`, shows as `{model}[swarm]` in /v1/models (hides explicit format to reduce clutter) + +**Discovery vs Runtime:** +- **Discovery**: `base_models` and `omit_id` control what appears in /v1/models +- **Runtime**: Explicit format `{model}-{preset}[swarm]` works with ANY model/preset combo + +### Fusion Configuration + +**File Location:** `ensemble_configs/fusions/*.json` + +**Schema:** +```json +{ + "id": "string (unique fusion identifier)", + "description": "string (optional)", + + "specialists": [ + { + "model": "string (model ID)", + "role": "string (optional, specialist role name)", + "system_prompt": "string (optional, role-specific instructions)", + "weight": "float (optional, importance weight, default: 1.0)", + "weight_description": "string (optional, expertise description for arbiter)", + "role_template": "string (optional, reference to role template from roles/ directory)" + } + ], + + "arbiter": { + "model": "string (model ID)", + "strategy": "string (strategy name)", + "blind": "boolean (default: true)" + }, + + "recursive_mode": { + "enabled": "boolean", + "consensus_threshold": "integer (1-10 scale)" + } +} +``` + +### Strategy Template + +**File Location:** `ensemble_configs/strategies/*.txt` + +**Format:** +Plain text file with `{responses}` placeholder. + +**Example:** +``` +You are an expert synthesizer. Analyze the following responses and create a single, superior answer. + +{responses} + +Provide your synthesis as a complete, high-quality response. +``` + +--- + +## Error Handling + +### Common Exceptions + +**`ValueError`**: Invalid model ID or configuration +```python +try: + response = await client.acompletion(model="invalid-fusion", ...) +except ValueError as e: + print(f"Configuration error: {e}") +``` + +**`RuntimeError`**: All drones/specialists failed +```python +try: + response = await client.acompletion(model="gpt-4o[swarm]", ...) +except RuntimeError as e: + print(f"Execution error: {e}") +``` + +### Partial Failures + +If some drones/specialists fail but at least one succeeds, HiveMind continues with successful responses and logs warnings. + +**Logs:** +``` +[ERROR] [HiveMind] Drone 2/3 failed: Rate limit exceeded +[WARNING] [HiveMind] 1/3 drones failed. Proceeding with 2 successful responses. +``` + +--- + +## Logging + +HiveMind uses the `rotator_library.ensemble` logger. + +**Log Levels:** +- `INFO`: Normal operations (processing, completion) +- `DEBUG`: Detailed execution (temperatures, prompts) +- `WARNING`: Low consensus, partial failures, conflicts +- `ERROR`: Drone failures, critical issues + +**Example Configuration:** +```python +import logging + +# Enable HiveMind debug logging +logging.getLogger("rotator_library.ensemble").setLevel(logging.DEBUG) + +# Example logs: +# [INFO] [HiveMind] Processing Swarm request: gpt-4o[swarm] (base: gpt-4o, 3 drones, streaming: False) +# [DEBUG] [HiveMind] Drone 1: temperature=0.82, adversarial=False +# [DEBUG] [HiveMind] Arbiter prompt built: 2 messages +# [INFO] [HiveMind] Swarm completed successfully. Total usage: 650 tokens. Latency: 1234.56ms, Cost: $0.001200 +``` + +--- + +## Advanced Usage + +### Custom Arbiter Models + +Use different arbiter models for different fusions: + +```json +{ + "id": "research-team", + "specialists": [...], + "arbiter": { + "model": "gpt-4o", // Use GPT-4o specifically + "strategy": "synthesis" + } +} +``` + +### Self-Arbiter + +Use the same model as arbiter (saves one API call): + +```json +{ + "arbiter": { + "model": "self", // Use base model as arbiter + "strategy": "best_of_n" + } +} +``` + +### Multiple Strategies + +Create task-specific strategies: + +**`ensemble_configs/strategies/math_solver.txt`:** +``` +You are a mathematics expert. Review these solutions: + +{responses} + +Identify the correct approach, verify calculations, and provide the final answer with step-by-step explanation. +``` + +Usage: +```json +{ + "arbiter": { + "strategy": "math_solver" + } +} +``` + +--- + +## Migration Guide + +### From Single Model to Swarm + +**Before:** +```python +response = await client.acompletion( + model="gpt-4o-mini", + messages=[{"role": "user", "content": "Explain AI"}] +) +``` + +**After:** +```python +response = await client.acompletion( + model="gpt-4o-mini[swarm]", # Add [swarm] suffix + messages=[{"role": "user", "content": "Explain AI"}] +) +``` + +### From Multiple Calls to Fusion + +**Before:** +```python +arch_response = await client.acompletion(model="gpt-4o", ...) +sec_response = await client.acompletion(model="claude-3-opus", ...) +# Manually combine responses +``` + +**After:** +Create fusion config, then: +```python +response = await client.acompletion( + model="dev-team", # All in one call + messages=[...] +) +``` + +--- + +## Performance Metrics + +Typical latencies (3 drones/specialists, non-streaming): + +| Model Type | Drones/Specialists | Avg Latency | +|------------|-------------------|-------------| +| gpt-4o-mini[swarm] | 3 | 1.2-2.0s | +| gpt-4o[swarm] | 3 | 2.0-3.5s | +| dev-team (fusion) | 3 | 2.5-4.0s | + +**Note**: Streaming reduces perceived latency as arbiter output begins immediately after drone/specialist completion. + +--- + +## Limitations + +1. **Cost**: Multiple API calls increase costs proportionally +2. **Rate Limits**: May hit rate limits faster with parallel calls +3. **Latency**: Total time = max(drone time) + arbiter time +4. **Model Availability**: All models must be available simultaneously +5. **Token Limits**: Large responses may exceed context windows + +--- + +## Support + +For issues, questions, or feature requests: +- Check logs (`rotator_library.ensemble`) +- Review configuration files +- Verify API keys and model availability +- See [User Guide](./HiveMind_User_Guide.md) for common patterns diff --git a/docs/HiveMind_User_Guide.md b/docs/HiveMind_User_Guide.md new file mode 100644 index 0000000..3308c34 --- /dev/null +++ b/docs/HiveMind_User_Guide.md @@ -0,0 +1,445 @@ +# HiveMind Ensemble User Guide + +## Overview + +**HiveMind Ensemble** is a powerful feature that enables parallel model execution with intelligent arbitration. It supports two modes: + +- **Swarm Mode**: Multiple parallel calls to the **same model** (called "Drones") +- **Fusion Mode**: Multiple parallel calls to **different models** (called "Specialists") + +Both modes use an "Arbiter" model to synthesize the responses into a single, high-quality answer. + +--- + +## Quick Start + +### Swarm Mode + +Call the same model multiple times in parallel and synthesize results: + +```python +from rotator_library.client import RotatingClient + +client = RotatingClient() + +# Short form - uses preset with omit_id=true or default preset +response = await client.acompletion( + model="gpt-4o-mini[swarm]", + messages=[{"role": "user", "content": "What is quantum computing?"}], + stream=False +) + +# Explicit preset format - works with ANY model + ANY preset +response = await client.acompletion( + model="claude-3-haiku-aggressive[swarm]", # Use 'aggressive' preset + messages=[{"role": "user", "content": "What is quantum computing?"}], + stream=False +) + +print(response.choices[0].message.content) +print(f"Total tokens: {response.usage.total_tokens}") +print(f"Drone count: {response.usage.hivemind_details['drone_count']}") +print(f"Cost: ${response.usage.hivemind_details['total_cost_usd']}") +``` + +### Fusion Mode + +Use multiple specialized models working together: + +```python +# dev-team fusion uses 3 specialist models +response = await client.acompletion( + model="dev-team", + messages=[{"role": "user", "content": "Review this function"}], + stream=False +) + +print(response.choices[0].message.content) +print(f"Specialists: {response.usage.hivemind_details['specialist_count']}") +``` + +--- + +## Swarm Mode + +### How It Works + +1. **Preparation**: Creates N copies of your request (N drones) +2. **Execution**: Runs all drones in parallel +3. **Arbitration**: An arbiter model synthesizes all responses +4. **Result**: Returns the arbiter's synthesis + +### Preset-Based System + +Swarms use a **preset-based configuration** system. Each preset is a JSON file in `ensemble_configs/swarms/` that defines behavior for multiple models. + +**Model Name Formats**: +- **Short form**: `{model}[swarm]` → uses preset with `omit_id: true` OR `default` preset +- **Explicit form**: `{model}-{preset}[swarm]` → always uses specified preset + +**Examples**: +```python +# Short form +await client.acompletion(model="gpt-4o-mini[swarm]", ...) # Uses omit_id preset or default + +# Explicit form +await client.acompletion(model="gpt-4o-mini-aggressive[swarm]", ...) # Uses aggressive preset +await client.acompletion(model="claude-3-haiku-default[swarm]", ...) # Explicit default +``` + +**Key Features**: +- **`base_models`**: Controls /v1/models discovery (which models appear for this preset) +- **`omit_id`**: Controls discovery format (short vs explicit in /v1/models) +- **Runtime**: Explicit format works with ANY model/preset combo regardless of base_models + +### Configuration + +Swarm presets in `src/rotator_library/ensemble_configs/swarms/`: + +**`default.json`** - Global fallback: +```json +{ + "id": "default", + "description": "Standard balanced settings", + "base_models": [ + "gpt-4o", "gpt-4o-mini", + "claude-3-5-sonnet", "claude-3-haiku", + "gemini-1.5-pro", "gemini-1.5-flash" + ], + "omit_id": false, + "count": 3, + "temperature_jitter": { + "enabled": true, + "delta": 0.2 + }, + "arbiter": { + "model": "self", + "strategy": "synthesis", + "blind": true + }, + "adversarial_config": { + "enabled": false, + "count": 1, + "prompt": "You are a critical reviewer..." + }, + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} +``` + +**Custom preset** (e.g., `aggressive.json`): +```json +{ + "id": "aggressive", + "base_models": ["gpt-4o-mini", "gemini-1.5-flash"], + "omit_id": true, // Shows as model[swarm] in /v1/models + "count": 5, + "temperature_jitter": { + "enabled": true, + "delta": 0.3 + }, + "adversarial_config": { + "enabled": true, + "count": 2 + } +} +``` + +### Advanced Features + +#### Temperature Jitter + +Introduces randomness to increase response diversity: + +```json +"temperature_jitter": { + "enabled": true, + "delta": 0.2 // ±0.2 variance +} +``` + +Each drone gets a slightly different temperature: `base_temp ± delta` + +#### Adversarial Mode + +Converts the last N drones to critical reviewers: + +```json +"adversarial_config": { + "enabled": true, + "count": 1, + "prompt": "You are a Senior Principal Engineer. Find flaws, edge cases, and potential issues." +} +``` + +#### Blind Switch + +Hides model names from arbiter (enabled by default): + +```json +"arbiter": { + "blind": true // Arbiter sees "Response 1" instead of "Response 1 (GPT-4o)" +} +``` + +#### Recursive Mode + +Enables autonomous arbiter critique for low-consensus responses: + +```json +"recursive_mode": { + "enabled": true, + "consensus_threshold": 7 // If consensus < 7/10, performs internal critique +} +``` + +#### Discovery vs Runtime + +**Discovery (/ v1/models endpoint)**: +- Preset WITH `base_models` + `omit_id: true` → `{model}[swarm]` +- Preset WITH `base_models` + `omit_id: false` → `{model}-{preset}[swarm]` +- Preset WITHOUT `base_models` → Not shown (invisible) + +**Runtime (actual API calls)**: +- Short form `model[swarm]` → Uses omit_id preset OR default +- Explicit form `model-preset[swarm]` → ALWAYS works with ANY model/preset combo +- `base_models` has NO runtime restrictions + +--- + +## Fusion Mode + +### How It Works + +1. **Preparation**: Assigns role-specific prompts to each specialist +2. **Execution**: Runs all specialists in parallel +3. **Arbitration**: Arbiter synthesizes with role context +4. **Result**: Returns the arbiter's synthesis + +### Configuration + +Fusion models are configured in `src/rotator_library/ensemble_configs/fusions/`: + +**`dev-team.json`** - Example fusion: +```json +{ + "id": "dev-team", + "description": "Software development team with specialized roles", + "specialists": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt": "Focus on architectural patterns, scalability, and system design.", + "weight": 1.5 + }, + { + "model": "claude-3-opus", + "role": "Security Specialist", + "system_prompt": "Focus on security vulnerabilities and potential exploits.", + "weight": 1.0 + }, + { + "model": "gemini-1.5-pro", + "role": "Code Reviewer", + "system_prompt": "Focus on code quality, performance, and best practices.", + "weight": 1.0 + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true + } +} +``` + +### Creating Custom Fusions + +1. Create a new JSON file in `ensemble_configs/fusions/` +2. Define specialists with roles and prompts +3. Choose an arbiter model and strategy +4. Use the fusion ID as the model name + +Example: `creative-writers.json`: +```json +{ + "id": "creative-writers", + "description": "Creative writing team", + "specialists": [ + { + "model": "claude-3-opus", + "role": "Storyteller", + "system_prompt": "Focus on narrative, character development, and plot.", + "weight": 1.5 + }, + { + "model": "gpt-4o", + "role": "Editor", + "system_prompt": "Focus on clarity, grammar, and style.", + "weight": 1.0 + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis" + } +} +``` + +Usage: +```python +response = await client.acompletion( + model="creative-writers", + messages=[{"role": "user", "content": "Write a short story about AI"}] +) +``` + +--- + +## Arbitration Strategies + +Strategies are text prompts in `ensemble_configs/strategies/`: + +**`synthesis.txt`** - Combine all responses: +``` +You are an expert synthesizer. Analyze the following responses and create a single, superior answer that: +1. Combines the best elements from each response +2. Resolves any conflicts or contradictions +3. Ensures completeness and accuracy +4. Maintains coherence and clarity + +{responses} +``` + +**`best_of_n.txt`** - Select and refine the best: +``` +Review these responses and identify the strongest one. Then refine and enhance it. + +{responses} +``` + +**`code_review.txt`** - Code-specific evaluation: +``` +You are a senior code reviewer. Analyze these code responses and provide: +1. Best implementation approach +2. Security considerations +3. Performance optimization suggestions +4. Final recommended code + +{responses} +``` + +### Creating Custom Strategies + +Create a `.txt` file in `ensemble_configs/strategies/` with your prompt template. Use `{responses}` as a placeholder for the formatted responses. + +--- + +## Streaming Support + +HiveMind respects the `stream` parameter: + +```python +# Streaming swarm +async for chunk in client.acompletion( + model="gpt-4o[swarm]", + messages=[{"role": "user", "content": "Explain AI"}], + stream=True # Stream arbiter's response +): + if hasattr(chunk.choices[0].delta, 'content') and chunk.choices[0].delta.content: + print(chunk.choices[0].delta.content, end='', flush=True) +``` + +**Note**: Drones/specialists execute in parallel (not streamed). Only the arbiter's final synthesis is streamed. + +--- + +## Usage & Cost Tracking + +All HiveMind responses include detailed usage information in **standard OpenAI-compatible fields** plus additional HiveMind-specific breakdown: + +```python +response = await client.acompletion( + model="gpt-4o-mini[swarm]", + messages=[{"role": "user", "content": "Test"}] +) + +# ✅ STANDARD usage fields (compatible with all tooling) +# These contain the TOTAL aggregated usage (drones/specialists + arbiter) +print(f"Prompt tokens: {response.usage.prompt_tokens}") # Total from all models +print(f"Completion tokens: {response.usage.completion_tokens}") # Total from all models +print(f"Total tokens: {response.usage.total_tokens}") # Grand total + +# ✅ SUPPLEMENTARY HiveMind details (breakdown for observability) +# These provide additional context but do NOT replace standard fields +details = response.usage.hivemind_details +print(f"Mode: {details['mode']}") # "swarm" or "fusion" +print(f"Drone/Specialist count: {details.get('drone_count') or details.get('specialist_count')}") +print(f"Drone/Specialist tokens: {details.get('drone_tokens') or details.get('specialist_tokens')}") +print(f"Arbiter tokens: {details['arbiter_tokens']}") +print(f"Total cost: ${details['total_cost_usd']}") +print(f"Latency: {details['latency_ms']}ms") +``` + +**Important**: Consumers should use the standard usage fields (`prompt_tokens`, `completion_tokens`, `total_tokens`) for billing and analytics. These already include the complete totals. The `hivemind_details` field provides a breakdown for debugging and observability. + +--- + +## Best Practices + +### Model Selection + +**Sw arm Mode**: +- Use for: Same model, different parameters (temperature jitter) +- Best for: Brainstorming, diverse perspectives, consensus building +- Models: Fast models (gpt-4o-mini, gemini-flash) for cost efficiency + +**Fusion Mode**: +- Use for: Different models, specialized expertise +- Best for: Complex tasks requiring multiple skill sets +- Models: Mix strengths (GPT for reasoning, Claude for safety, Gemini for code) + +### Cost Optimization + +1. **Use smaller models for drones**: `gpt-4o-mini[swarm]` instead of `gpt-4o[swarm]` +2. **Limit drone count**: Default is 3, but 2 is often sufficient +3. **Use "self" arbiter**: Saves one API call +4. **Monitor `hivemind_details`**: Track costs per request + +### Performance Tips + +1. **Parallel execution is fast**: All drones/specialists run simultaneously +2. **Streaming reduces perceived latency**: Users see output immediately +3. **Check latency_ms**: Identify slow requests + +--- + +## Troubleshooting + +### No ensemble detected + +**Problem**: Model isn't recognized as ensemble +**Solution**: Check spelling, ensure `[swarm]` suffix or fusion ID exists + +### All drones failed + +**Problem**: All parallel calls failed +**Solution**: Check API keys, rate limits, model availability + +### High costs + +**Problem**: HiveMind is expensive +**Solution**: Reduce drone count, use smaller models, limit to critical requests + +### Poor synthesis quality + +**Problem**: Arbiter output isn't good +**Solution**: Use a better arbiter model (gpt-4o, claude-3-opus), try different strategy + +--- + +## API Reference + +See [API.md](./API.md) for detailed API documentation. diff --git a/src/rotator_library/README.md b/src/rotator_library/README.md index c020799..08ef605 100644 --- a/src/rotator_library/README.md +++ b/src/rotator_library/README.md @@ -4,6 +4,13 @@ A robust, asynchronous, and thread-safe Python library for managing a pool of AP ## Key Features +- **HiveMind Ensemble**: Parallel model execution with intelligent arbitration + - **Swarm Mode**: Execute the same model multiple times with temperature jitter, adversarial prompts, and synthesis + - **Fusion Mode**: Combine responses from different specialized models with role-based routing + - **Recursive Refinement**: Autonomous low-consensus handling with internal critique reasoning + - **Configurable Strategies**: Customizable arbitration strategies for different use cases + - **Role Templates**: Reusable specialist role definitions for consistent fusion configurations + - **Blind Mode**: Option to hide model names from arbiter to reduce bias - **Asynchronous by Design**: Built with `asyncio` and `httpx` for high-performance, non-blocking I/O. - **Advanced Concurrency Control**: A single API key can be used for multiple concurrent requests. By default, it supports concurrent requests to *different* models. With configuration (`MAX_CONCURRENT_REQUESTS_PER_KEY_`), it can also support multiple concurrent requests to the *same* model using the same key. - **Smart Key Management**: Selects the optimal key for each request using a tiered, model-aware locking strategy to distribute load evenly and maximize availability. @@ -136,6 +143,33 @@ async def stream_example(): asyncio.run(stream_example()) ``` +**HiveMind Ensemble Example:** + +```python +async def hivemind_example(): + async with RotatingClient(api_keys=api_keys) as client: + # Swarm Mode: Multiple parallel calls to same model + swarm_response = await client.acompletion( + model="gpt-4o-mini-default[swarm]", + messages=[{"role": "user", "content": "Explain quantum computing"}] + ) + print(swarm_response.choices[0].message.content) + print(f"Total tokens: {swarm_response.usage.total_tokens}") + print(f"Drones: {swarm_response.usage.hivemind_details['drone_count']}") + + # Fusion Mode: Multiple specialist models + fusion_response = await client.acompletion( + model="dev-team[fusion]", + messages=[{"role": "user", "content": "Review this API design"}] + ) + print(fusion_response.choices[0].message.content) + print(f"Specialists: {fusion_response.usage.hivemind_details['specialist_count']}") + +asyncio.run(hivemind_example()) +``` + +See the [HiveMind User Guide](../../docs/HiveMind_User_Guide.md) and [API Reference](../../docs/HiveMind_API.md) for detailed configuration options. + #### `async def aembedding(self, **kwargs) -> Any:` A wrapper around `litellm.aembedding` that provides the same key management and retry logic for embedding requests. diff --git a/src/rotator_library/client.py b/src/rotator_library/client.py index b1485d0..df8cfc7 100644 --- a/src/rotator_library/client.py +++ b/src/rotator_library/client.py @@ -33,6 +33,7 @@ from .credential_manager import CredentialManager from .background_refresher import BackgroundRefresher from .model_definitions import ModelDefinitions +from .ensemble import EnsembleManager class StreamedAPIError(Exception): @@ -128,6 +129,9 @@ def __init__( if max_val < 1: lib_logger.warning(f"Invalid max_concurrent for '{provider}': {max_val}. Setting to 1.") self.max_concurrent_requests_per_key[provider] = 1 + + # Initialize HiveMind ensemble manager + self.ensemble_manager = EnsembleManager(rotating_client=self) def _is_model_ignored(self, provider: str, model_id: str) -> bool: """ @@ -636,6 +640,15 @@ async def _execute_with_retry( kwargs = self._convert_model_params(**kwargs) # The main rotation loop. It continues as long as there are untried credentials and the global deadline has not been exceeded. + + # Resolve model ID early, before any credential operations + # This ensures consistent model ID usage for acquisition, release, and tracking + resolved_model = self._resolve_model_id(model, provider) + if resolved_model != model: + lib_logger.info(f"Resolved model '{model}' to '{resolved_model}'") + model = resolved_model + kwargs["model"] = model # Ensure kwargs has the resolved model for litellm + while ( len(tried_creds) < len(credentials_for_provider) and time.time() < deadline ): @@ -689,13 +702,8 @@ async def _execute_with_retry( provider_plugin = self._get_provider_instance(provider) - # Convert model name to ID if custom mapping exists - resolved_model = self._resolve_model_id(model, provider) - if resolved_model != model: - lib_logger.info(f"Resolved model '{model}' to '{resolved_model}'") - litellm_kwargs["model"] = resolved_model - # Update the model variable for subsequent logging - model = resolved_model + # Model ID is already resolved before the loop, and kwargs['model'] is updated. + # No further resolution needed here. # Apply model-specific options for custom providers if provider_plugin and hasattr(provider_plugin, "get_model_options"): @@ -996,6 +1004,14 @@ async def _streaming_acompletion_with_retry( consecutive_quota_failures = 0 + # Resolve model ID early, before any credential operations + # This ensures consistent model ID usage for acquisition, release, and tracking + resolved_model = self._resolve_model_id(model, provider) + if resolved_model != model: + lib_logger.info(f"Resolved model '{model}' to '{resolved_model}'") + model = resolved_model + kwargs["model"] = model # Ensure kwargs has the resolved model for litellm + try: while ( len(tried_creds) < len(credentials_for_provider) @@ -1071,13 +1087,8 @@ async def _streaming_acompletion_with_retry( provider_plugin = self._get_provider_instance(provider) - # Convert model name to ID if custom mapping exists - resolved_model = self._resolve_model_id(model, provider) - if resolved_model != model: - lib_logger.info(f"Resolved model '{model}' to '{resolved_model}'") - litellm_kwargs["model"] = resolved_model - # Update the model variable for subsequent logging - model = resolved_model + # Model ID is already resolved before the loop, and kwargs['model'] is updated. + # No further resolution needed here. # Apply model-specific options for custom providers if provider_plugin and hasattr( @@ -1606,8 +1617,15 @@ def acompletion( Returns: The completion response object, or an async generator for streaming responses, or None if all retries fail. """ - # Handle iflow provider: remove stream_options to avoid HTTP 406 model = kwargs.get("model", "") + + # Check if this is an ensemble request (HiveMind) + if model and self.ensemble_manager.is_ensemble(model): + lib_logger.debug(f"[HiveMind] Detected ensemble request: {model}") + # Delegate to ensemble manager + return self.ensemble_manager.handle_request(request=request, **kwargs) + + # Handle iflow provider: remove stream_options to avoid HTTP 406 provider = model.split("/")[0] if "/" in model else "" if provider == "iflow" and "stream_options" in kwargs: @@ -1755,7 +1773,9 @@ async def get_available_models(self, provider: str) -> List[str]: async def get_all_available_models( self, grouped: bool = True ) -> Union[Dict[str, List[str]], List[str]]: - """Returns a list of all available models, either grouped by provider or as a flat list.""" + """Returns a list of all available models, either grouped by provider or as a flat list. + + MISSING FEATURE FIX: Now includes HiveMind fusion models.""" lib_logger.info("Getting all available models...") all_providers = list(self.all_credentials.keys()) @@ -1772,6 +1792,19 @@ async def get_all_available_models( else: all_provider_models[provider] = result + # MISSING FEATURE FIX: Add HiveMind fusion models + if self.ensemble_manager: + fusion_ids = self.ensemble_manager.config_loader.get_all_fusion_ids() + if fusion_ids: + all_provider_models["hivemind_fusion"] = fusion_ids + lib_logger.info(f"Added {len(fusion_ids)} HiveMind fusion models") + + # Add HiveMind swarm models + swarm_models = self.ensemble_manager.config_loader.get_all_swarm_model_ids() + if swarm_models: + all_provider_models["hivemind_swarm"] = swarm_models + lib_logger.info(f"Added {len(swarm_models)} HiveMind swarm model variants") + lib_logger.info("Finished getting all available models.") if grouped: return all_provider_models diff --git a/src/rotator_library/ensemble/__init__.py b/src/rotator_library/ensemble/__init__.py new file mode 100644 index 0000000..6dbd382 --- /dev/null +++ b/src/rotator_library/ensemble/__init__.py @@ -0,0 +1,9 @@ +""" +HiveMind Ensemble Module + +This module provides parallel model execution (Swarm/Fusion) with intelligent arbitration. +""" + +from .manager import EnsembleManager + +__all__ = ['EnsembleManager'] diff --git a/src/rotator_library/ensemble/config_loader.py b/src/rotator_library/ensemble/config_loader.py new file mode 100644 index 0000000..5719dd7 --- /dev/null +++ b/src/rotator_library/ensemble/config_loader.py @@ -0,0 +1,430 @@ +""" +Configuration loader for HiveMind ensemble configs. + +Loads and validates configurations from the ensemble_configs directory structure. +""" + +import os +import json +import logging +import copy +from pathlib import Path +from typing import Dict, List, Any, Optional + +lib_logger = logging.getLogger("rotator_library.ensemble") + + +class ConfigLoader: + """Loads and manages ensemble configurations from folder structure.""" + + def __init__(self, config_dir: str): + """ + Initialize the config loader. + + Args: + config_dir: Path to ensemble_configs directory (relative to rotator_library) + """ + self.config_dir = Path(config_dir) + self.swarms_dir = self.config_dir / "swarms" + self.fusions_dir = self.config_dir / "fusions" + self.strategies_dir = self.config_dir / "strategies" + self.roles_dir = self.config_dir / "roles" + + # Loaded configurations + self.swarm_default: Optional[Dict[str, Any]] = None + self.swarm_configs: Dict[str, Dict[str, Any]] = {} + self.fusion_configs: Dict[str, Dict[str, Any]] = {} + self.strategies: Dict[str, str] = {} + self.role_templates: Dict[str, Dict[str, Any]] = {} + + # Track model -> preset mapping for omit_id presets + self.omit_id_presets: Dict[str, str] = {} # {"gpt-4o-mini": "aggressive"} + + def load_all(self) -> None: + """Load all configurations from the directory structure.""" + lib_logger.info("[HiveMind] Loading ensemble configurations...") + + # Create directories if they don't exist + self._ensure_directories() + + # Load swarm configurations + self._load_swarm_configs() + + # Load fusion configurations + self._load_fusion_configs() + + # Load strategy templates + self._load_strategies() + + # Load role templates + self._load_roles() + + # Count swarm presets (files in swarms directory) + swarm_preset_count = len(list(self.swarms_dir.glob("*.json"))) if self.swarms_dir.exists() else 0 + + lib_logger.info( + f"[HiveMind] Loaded {swarm_preset_count} swarm presets, " + f"{len(self.fusion_configs)} fusion configs, " + f"{len(self.strategies)} strategies, " + f"{len(self.role_templates)} roles" + ) + + def _ensure_directories(self) -> None: + """Create config directories if they don't exist.""" + for directory in [self.swarms_dir, self.fusions_dir, self.strategies_dir, self.roles_dir]: + directory.mkdir(parents=True, exist_ok=True) + + def _load_swarm_configs(self) -> None: + """Load swarm configurations from swarms/ directory. + + Only supports preset-based format with 'id' and 'base_models'. + Also builds omit_id mapping for default preset resolution. + """ + if not self.swarms_dir.exists(): + lib_logger.warning(f"[HiveMind] Swarms directory not found: {self.swarms_dir}") + return + + # Load default.json first + default_path = self.swarms_dir / "default.json" + if default_path.exists(): + try: + with open(default_path, 'r', encoding='utf-8') as f: + self.swarm_default = json.load(f) + lib_logger.debug("[HiveMind] Loaded default swarm config") + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load default swarm config: {e}") + else: + lib_logger.warning("[HiveMind] No default swarm config found") + + # Build omit_id mapping: scan all presets with omit_id=true + for config_file in self.swarms_dir.glob("*.json"): + # Skip example files + if config_file.stem.endswith('.example'): + continue + + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + preset_id = config.get("id") + omit_id = config.get("omit_id", False) + base_models = config.get("base_models", []) + + if preset_id and omit_id and base_models: + # Register this preset as the default for these models + for model in base_models: + if model in self.omit_id_presets: + lib_logger.warning( + f"[HiveMind] Model '{model}' already has omit_id preset '{self.omit_id_presets[model]}'. " + f"Overriding with '{preset_id}'" + ) + self.omit_id_presets[model] = preset_id + lib_logger.debug(f"[HiveMind] Registered '{model}[swarm]' -> preset '{preset_id}'") + + except Exception as e: + lib_logger.warning(f"Failed to process swarm config {config_file.name}: {e}") + + # All swarm configs now use preset-based format (id + base_models) + # Discovery is handled by get_all_swarm_model_ids() + # Individual preset configs loaded on-demand via get_swarm_config() + + def _load_fusion_configs(self) -> None: + """Load fusion configurations from fusions/ directory. + + Supports two formats: + 1. Single fusion: {"id": "...", "specialists": [...], ...} + 2. Multiple fusions: {"fusions": [{"id": "...", ...}, ...]} + """ + if not self.fusions_dir.exists(): + lib_logger.warning(f"[HiveMind] Fusions directory not found: {self.fusions_dir}") + return + + for config_file in self.fusions_dir.glob("*.json"): + # Skip example files + if config_file.stem.endswith('.example'): + continue + + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + # Check if this is the new array format + if "fusions" in config: + # New format: {"fusions": [...]} + fusions_list = config.get("fusions", []) + if not isinstance(fusions_list, list): + lib_logger.warning( + f"[HiveMind] Config '{config_file.name}' has 'fusions' but it's not a list" + ) + continue + + for fusion in fusions_list: + self._register_fusion(fusion, config_file.name) + else: + # Old format: {"id": "...", "specialists": [...], ...} + self._register_fusion(config, config_file.name) + + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load fusion config '{config_file.name}': {e}") + + def _register_fusion(self, fusion: Dict[str, Any], source_file: str) -> None: + """Register a single fusion configuration.""" + fusion_id = fusion.get("id") + if not fusion_id: + lib_logger.warning( + f"[HiveMind] Fusion in '{source_file}' missing 'id' field" + ) + return + + # Check for duplicate IDs + if fusion_id in self.fusion_configs: + lib_logger.warning( + f"[HiveMind] Duplicate fusion ID '{fusion_id}'. " + f"Config from '{source_file}' will override previous." + ) + + self.fusion_configs[fusion_id] = fusion + lib_logger.debug(f"[HiveMind] Loaded fusion config '{fusion_id}'") + + def _load_strategies(self) -> None: + """Load strategy templates from strategies/ directory.""" + if not self.strategies_dir.exists(): + lib_logger.warning(f"[HiveMind] Strategies directory not found: {self.strategies_dir}") + return + + for strategy_file in self.strategies_dir.glob("*.txt"): + # Skip example files + if strategy_file.stem.endswith('.example'): + continue + + try: + with open(strategy_file, 'r', encoding='utf-8') as f: + content = f.read() + + strategy_name = strategy_file.stem + self.strategies[strategy_name] = content + lib_logger.debug(f"[HiveMind] Loaded strategy '{strategy_name}'") + + except Exception as e: + lib_logger.error( + f"[HiveMind] Failed to load strategy '{strategy_file.name}': {e}" + ) + + def _load_roles(self) -> None: + """Load role templates from roles/ directory. + + Supports two formats: + 1. Single role: {"name": "...", "system_prompt": "...", ...} + 2. Multiple roles: {"roles": [{"name": "...", ...}, ...]} + """ + if not self.roles_dir.exists(): + lib_logger.warning(f"[HiveMind] Roles directory not found: {self.roles_dir}") + return + + for role_file in self.roles_dir.glob("*.json"): + # Skip example files + if role_file.stem.endswith('.example'): + continue + + try: + with open(role_file, 'r', encoding='utf-8') as f: + data = json.load(f) + + # Check if this is the new array format + if "roles" in data: + # New format: {"roles": [...]} + roles_list = data.get("roles", []) + if not isinstance(roles_list, list): + lib_logger.warning( + f"[HiveMind] Role file '{role_file.name}' has 'roles' but it's not a list" + ) + continue + + for role in roles_list: + self._register_role(role, role_file.name) + else: + # Old format: {"name": "...", "system_prompt": "...", ...} + # Use filename as role_id + role_id = role_file.stem + self.role_templates[role_id] = data + lib_logger.debug(f"[HiveMind] Loaded role template '{role_id}'") + + except Exception as e: + lib_logger.error( + f"[HiveMind] Failed to load role template '{role_file.name}': {e}" + ) + + def _register_role(self, role: Dict[str, Any], source_file: str) -> None: + """Register a single role template.""" + # Use 'name' field as role_id, convert to lowercase with hyphens + role_name = role.get("name") + if not role_name: + lib_logger.warning( + f"[HiveMind] Role in '{source_file}' missing 'name' field" + ) + return + + # Convert name to role_id (e.g., "Security Expert" -> "security-expert") + role_id = role_name.lower().replace(" ", "-") + + # Check for duplicate IDs + if role_id in self.role_templates: + lib_logger.warning( + f"[HiveMind] Duplicate role ID '{role_id}'. " + f"Role from '{source_file}' will override previous." + ) + + self.role_templates[role_id] = role + lib_logger.debug(f"[HiveMind] Loaded role template '{role_id}' from array") + + def get_preset_for_model(self, base_model: str) -> str: + """ + Get the preset ID to use for a model when using model[swarm] syntax. + + Resolution order: + 1. If model has an omit_id preset, use that + 2. Otherwise, use "default" + + Args: + base_model: Base model name (e.g., "gpt-4o-mini") + + Returns: + Preset ID to use + """ + if base_model in self.omit_id_presets: + preset = self.omit_id_presets[base_model] + lib_logger.debug(f"[HiveMind] Model '{base_model}' using omit_id preset '{preset}'") + return preset + + lib_logger.debug(f"[HiveMind] Model '{base_model}' using default preset") + return "default" + + def get_swarm_config(self, preset_id: str) -> Dict[str, Any]: + """ + Get swarm configuration for a specific preset. + + Args: + preset_id: Preset ID (e.g., "default", "aggressive") + + Returns: + Configuration dictionary with defaults applied + """ + # Try to load preset config file + config_file = self.swarms_dir / f"{preset_id}.json" + + if not config_file.exists(): + lib_logger.warning(f"[HiveMind] Swarm preset '{preset_id}' not found") + # Return default config if available + return copy.deepcopy(self.swarm_default) if self.swarm_default else {} + + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + # Validate it's a preset-based config + if "id" not in config or "base_models" not in config: + lib_logger.warning( + f"[HiveMind] Swarm config '{preset_id}' missing 'id' or 'base_models'" + ) + return copy.deepcopy(self.swarm_default) if self.swarm_default else {} + + return config + + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load swarm preset '{preset_id}': {e}") + return copy.deepcopy(self.swarm_default) if self.swarm_default else {} + + def get_fusion_config(self, fusion_id: str) -> Optional[Dict[str, Any]]: + """ + Get fusion configuration by ID. + + Args: + fusion_id: Fusion identifier + + Returns: + Fusion configuration or None if not found + """ + return self.fusion_configs.get(fusion_id) + + def get_strategy(self, strategy_name: str) -> Optional[str]: + """ + Get strategy template by name. + + Args: + strategy_name: Strategy identifier + + Returns: + Strategy template string or None if not found + """ + return self.strategies.get(strategy_name) + + def get_role_template(self, role_id: str) -> Optional[Dict[str, Any]]: + """ + Get role template by ID. + + Args: + role_id: Role template identifier (e.g., "architect", "security-expert") + + Returns: + Role template dictionary or None if not found + """ + return self.role_templates.get(role_id) + + def get_all_fusion_ids(self) -> List[str]: + """Get list of all fusion IDs with [fusion] suffix.""" + return [f"{fusion_id}[fusion]" for fusion_id in self.fusion_configs.keys()] + + def get_all_swarm_model_ids(self) -> List[str]: + """ + Get all discoverable swarm model variants. + + Only includes presets with base_models defined. + Discovery format depends on omit_id: + - omit_id=true: Shows as {base_model}[swarm] (short form only) + - omit_id=false: Shows as {base_model}-{preset_id}[swarm] (explicit form only) + + Note: Explicit form always WORKS at runtime regardless of omit_id, + but omit_id controls what appears in /v1/models for discoverability. + + Returns: + List of swarm model IDs for /v1/models endpoint + """ + swarm_models = [] + + for config_file in self.swarms_dir.glob("*.json"): + # Skip example files + if config_file.stem.endswith('.example'): + continue + + try: + with open(config_file, 'r', encoding='utf-8') as f: + config = json.load(f) + + preset_id = config.get("id") + base_models = config.get("base_models", []) + omit_id = config.get("omit_id", False) + + if not preset_id: + lib_logger.debug(f"Swarm config {config_file.name} missing 'id', skipping") + continue + + if not base_models: + lib_logger.debug(f"Swarm config {preset_id} has no base_models, not discoverable") + continue + + # Generate model IDs based on omit_id setting + for base_model in base_models: + if omit_id: + # Show short form only (to avoid clutter) + model_id = f"{base_model}[swarm]" + else: + # Show explicit form only + model_id = f"{base_model}-{preset_id}[swarm]" + + swarm_models.append(model_id) + + except Exception as e: + lib_logger.warning(f"Failed to process swarm config {config_file.name}: {e}") + + lib_logger.info(f"Discovered {len(swarm_models)} swarm model variants") + return swarm_models diff --git a/src/rotator_library/ensemble/manager.py b/src/rotator_library/ensemble/manager.py new file mode 100644 index 0000000..5823578 --- /dev/null +++ b/src/rotator_library/ensemble/manager.py @@ -0,0 +1,1438 @@ +""" +EnsembleManager - Core orchestration for HiveMind (Swarm/Fusion) feature. + +This module manages parallel model execution with intelligent arbitration. +""" + +import os +import logging +import asyncio +import random +import copy +import time +import re +from typing import Dict, List, Any, Optional, Set + +import litellm + +from .config_loader import ConfigLoader + +lib_logger = logging.getLogger("rotator_library.ensemble") + + +class EnsembleManager: + """ + Manages ensemble execution (Swarm and Fusion modes). + + Responsibilities: + - Detect ensemble requests (swarm suffix or fusion ID) + - Load and manage configurations + - Handle naming conflicts + - Orchestrate parallel execution (implemented in later phases) + """ + + def __init__(self, rotating_client, config_dir: Optional[str] = None): + """ + Initialize the ensemble manager. + + Args: + rotating_client: Reference to RotatingClient for making API calls + config_dir: Path to ensemble_configs directory (relative to this file) + """ + self.rotating_client = rotating_client + + # Default config directory (relative to this file) + if config_dir is None: + config_dir = os.path.join( + os.path.dirname(__file__), + "..", + "ensemble_configs" + ) + + # Initialize config loader + self.config_loader = ConfigLoader(config_dir) + self.config_loader.load_all() + + # Cache for resolved ensemble names (for conflict resolution) + self._resolved_names: Dict[str, str] = {} + + # Cache for provider models (loaded from RotatingClient) + self._provider_models: Optional[Set[str]] = None + + # Initialize provider models + self._load_provider_models() + + lib_logger.info("[HiveMind] Ensemble Manager initialized") + + def is_ensemble(self, model_id: str) -> bool: + """ + Check if a model ID represents an ensemble request. + + Args: + model_id: Full model ID from user request + + Returns: + True if this is an ensemble (swarm or fusion), False otherwise + """ + # BUGFIX: Check for conflict first (Provider Model Shadowing) + # If the model ID exists in provider models, it's NOT an ensemble request + # (unless we've already resolved it, but this check is for the raw request) + if self._provider_models is None: + self._load_provider_models() + + if model_id in self._provider_models: + return False + + # Check for fusion suffix + if model_id.endswith("[fusion]"): + return True + + # Check for swarm suffix + if self._is_swarm_request(model_id): + return True + + return False + + def _is_swarm_request(self, model_id: str) -> bool: + """ + Check if model ID contains swarm suffix. + + Supports new preset-based format: {base_model}-{preset_id}[swarm] + + Args: + model_id: Model ID to check + + Returns: + True if this is a swarm request + """ + return model_id.endswith("[swarm]") + + def get_base_model(self, swarm_id: str) -> tuple: + """ + Extract base model name and preset ID from swarm ID. + + Supports formats: + - {base_model}-{preset_id}[swarm] → (base_model, preset_id) + - {base_model}[swarm] → (base_model, omit_id preset or "default") + + Args: + swarm_id: Swarm model ID (e.g., "gpt-4o-default[swarm]" or "gpt-4o[swarm]") + + Returns: + Tuple of (base_model_name, preset_id) + """ + # Remove [swarm] suffix first + if swarm_id.endswith("[swarm]"): + swarm_id = swarm_id[:-7] # Remove "[swarm]" + + # Parse: {base_model}-{preset_id} + # preset_id is the last segment after the last hyphen + if "-" in swarm_id: + # Split and check if last segment is a preset ID + parts = swarm_id.rsplit("-", 1) + potential_preset = parts[1] + + # Check if it's a valid preset ID in our configs + config_file = self.config_loader.swarms_dir / f"{potential_preset}.json" + if config_file.exists(): + # This is a preset ID, so base_model is everything before it + return parts[0], potential_preset + + # No explicit preset: use omit_id preset or default + base_model = swarm_id + preset_id = self.config_loader.get_preset_for_model(base_model) + + return base_model, preset_id + + def resolve_conflicts(self, ensemble_id: str) -> str: + """ + Resolve naming conflicts by appending numeric suffixes. + + If an ensemble ID conflicts with a real provider model, + append -1, -2, -3, etc. until unique. + + Args: + ensemble_id: Original ensemble ID (swarm or fusion) + + Returns: + Resolved unique ensemble ID + """ + # Check cache first + if ensemble_id in self._resolved_names: + return self._resolved_names[ensemble_id] + + # Load provider models if not cached + if self._provider_models is None: + self._load_provider_models() + + # Check for conflict + if ensemble_id not in self._provider_models: + # No conflict, use original + self._resolved_names[ensemble_id] = ensemble_id + return ensemble_id + + # Conflict detected, find available suffix + counter = 1 + while True: + candidate = f"{ensemble_id}-{counter}" + if candidate not in self._provider_models: + lib_logger.warning( + f"[HiveMind] Naming conflict detected. " + f"Renamed '{ensemble_id}' to '{candidate}'" + ) + self._resolved_names[ensemble_id] = candidate + return candidate + counter += 1 + + # Safety check (shouldn't happen in practice) + if counter > 100: + lib_logger.error( + f"[HiveMind] Could not resolve naming conflict for '{ensemble_id}' " + f"after 100 attempts" + ) + return f"{ensemble_id}-{counter}" + + def _load_provider_models(self) -> None: + """ + Load all provider models from RotatingClient. + + This is used for conflict detection. + """ + try: + self._provider_models = set() + + # BUGFIX: Populate provider models from RotatingClient.model_definitions + if hasattr(self.rotating_client, 'model_definitions'): + defs = self.rotating_client.model_definitions.definitions + for provider, models in defs.items(): + for model_name in models.keys(): + self._provider_models.add(model_name) + self._provider_models.add(f"{provider}/{model_name}") + + lib_logger.debug(f"[HiveMind] Loaded {len(self._provider_models)} provider models for conflict detection") + + except Exception as e: + lib_logger.error(f"[HiveMind] Failed to load provider models: {e}") + self._provider_models = set() + + def get_fusion_ids(self) -> List[str]: + """ + Get list of all configured fusion IDs. + + Returns: + List of fusion identifiers + """ + return self.config_loader.get_all_fusion_ids() + + def _prepare_drones( + self, + config: Dict[str, Any], + base_model: str, + request_params: Dict[str, Any] + ) -> List[Dict[str, Any]]: + """ + Prepare drone configurations for parallel execution. + + Creates N identical copies of the request parameters with the base model. + Advanced features (jitter, adversarial) will be added in Phase 4. + + Args: + config: Swarm configuration + base_model: Base model to use for all drones + request_params: Original request parameters + + Returns: + List of drone configurations ready for parallel execution + """ + count = config.get("count", 3) + drones = [] + + # Get temperature jitter config + temp_jitter_config = config.get("temperature_jitter", {}) + jitter_enabled = temp_jitter_config.get("enabled", False) + jitter_delta = temp_jitter_config.get("delta", 0.2) + + # Get adversarial config + adversarial_config = config.get("adversarial_config", {}) + adversarial_enabled = adversarial_config.get("enabled", False) + adversarial_count = adversarial_config.get("count", 1) + adversarial_prompt = adversarial_config.get("prompt", "") + + lib_logger.debug(f"[HiveMind] Preparing {count} drones for base model '{base_model}'") + if adversarial_enabled: + lib_logger.debug(f"[HiveMind] Adversarial mode enabled: {adversarial_count} critical drones") + + for i in range(count): + # Clone the request params + # BUGFIX: Use deepcopy to avoid shared mutable state + drone_params = copy.deepcopy(request_params) + + # Override model with base model (strip [swarm] suffix) + drone_params["model"] = base_model + + # Phase 4: Determine if this drone should be adversarial + # Last N drones become adversarial + is_adversarial = False + if adversarial_enabled and adversarial_prompt: + adversarial_start_index = count - adversarial_count + if i >= adversarial_start_index: + is_adversarial = True + + # Inject adversarial system prompt + if "messages" in drone_params: + # Insert adversarial system message at the beginning + adversarial_message = { + "role": "system", + "content": adversarial_prompt + } + drone_params["messages"].insert(0, adversarial_message) + + lib_logger.debug( + f"[HiveMind] Drone {i+1}/{count}: ADVERSARIAL - injected critical analysis prompt" + ) + + # Phase 4: Apply temperature jitter if enabled + if jitter_enabled: + base_temp = drone_params.get("temperature", 1.0) + + # Apply random jitter + jitter = random.uniform(-jitter_delta, jitter_delta) + new_temp = base_temp + jitter + + # Clamp to valid range [0.0, 2.0] + new_temp = max(0.0, min(2.0, new_temp)) + + drone_params["temperature"] = new_temp + + lib_logger.debug( + f"[HiveMind] Drone {i+1}/{count}: Applied temperature jitter " + f"({base_temp:.2f} → {new_temp:.2f}, delta: {jitter:+.2f})" + ) + + # Store drone metadata for logging + drone_params["_drone_index"] = i + 1 + drone_params["_total_drones"] = count + drone_params["_is_adversarial"] = is_adversarial + + drones.append(drone_params) + + temp_display = drone_params.get("temperature", "default") + if isinstance(temp_display, float): + temp_display = f"{temp_display:.2f}" + + lib_logger.debug( + f"[HiveMind] Drone {i+1}/{count}: model={base_model}, temp={temp_display}" + ) + + return drones + + def _prepare_fusion_models( + self, + config: Dict[str, Any], + request_params: Dict[str, Any] + ) -> List[Dict[str, Any]]: + """ + Prepare specialist model configurations for fusion execution. + + Each specialist model gets a role-specific system prompt and + processes the same user query. + + Args: + config: Fusion configuration + request_params: Original request parameters + + Returns: + List of specialist model configurations with metadata + """ + specialists = config.get("specialists", []) + models = [] + + lib_logger.debug(f"[HiveMind] Preparing {len(specialists)} specialist models for fusion") + + for i, specialist in enumerate(specialists): + specialist_num = i + 1 + + # Resolve role template if specified + if "role_template" in specialist: + template_id = specialist["role_template"] + template = self.config_loader.get_role_template(template_id) + + if template: + # Merge template with specialist config (specialist overrides template) + specialist = {**template, **specialist} + lib_logger.debug(f"[HiveMind] Resolved role template '{template_id}' for specialist {specialist_num}") + else: + lib_logger.warning(f"[HiveMind] Role template '{template_id}' not found for specialist {specialist_num}") + + specialist_model = specialist.get("model") + specialist_role = specialist.get("role", specialist.get("name", f"Specialist {specialist_num}")) + specialist_prompt = specialist.get("system_prompt", "") + specialist_weight = specialist.get("weight", 1.0) + # MISSING FEATURE FIX: Extract weight description for arbiter context + specialist_weight_desc = specialist.get("weight_description", "") + + if not specialist_model: + lib_logger.warning( + f"[HiveMind] Specialist {specialist_num} missing model, skipping" + ) + continue + + # Clone request params + # BUGFIX: Use deepcopy + model_params = copy.deepcopy(request_params) + + # Set specialist model + model_params["model"] = specialist_model + + # Inject role-specific system prompt if provided + if specialist_prompt and "messages" in model_params: + role_message = { + "role": "system", + "content": specialist_prompt + } + model_params["messages"].insert(0, role_message) + + # Store specialist metadata + model_params["_specialist_index"] = specialist_num + model_params["_specialist_role"] = specialist_role + model_params["_specialist_weight"] = specialist_weight + model_params["_specialist_weight_description"] = specialist_weight_desc + model_params["_total_specialists"] = len(specialists) + + models.append(model_params) + + lib_logger.debug( + f"[HiveMind] Specialist {specialist_num}/{len(specialists)}: " + f"role={specialist_role}, model={specialist_model}, weight={specialist_weight}" + ) + + return models + + async def _execute_parallel( + self, + drones: List[Dict[str, Any]], + request: Any + ) -> tuple: + """ + Execute all drone requests in parallel. + + Uses asyncio.gather to execute all drones concurrently. + Aggregates usage statistics from all successful responses. + + Args: + drones: List of drone configurations + request: Original request object + + Returns: + Tuple of (successful_responses, aggregated_usage) + """ + lib_logger.info(f"[HiveMind] Executing {len(drones)} drones in parallel...") + + # Create tasks for all drones + tasks = [] + for i, drone_params in enumerate(drones): + # Call acompletion directly (will use RotatingClient's retry logic) + # Remove metadata fields before calling + clean_params = {k: v for k, v in drone_params.items() if not k.startswith('_')} + + task = self.rotating_client._execute_with_retry( + litellm.acompletion, # Use litellm.acompletion directly + request=request, + **clean_params + ) + tasks.append(task) + + # Execute all drones in parallel + results = await asyncio.gather(*tasks, return_exceptions=True) + + # Process results + successful_responses = [] + failed_count = 0 + aggregated_usage = {} + + for i, result in enumerate(results): + drone_index = i + 1 + + if isinstance(result, Exception): + # Drone failed + failed_count += 1 + lib_logger.error( + f"[HiveMind] Drone {drone_index}/{len(drones)} failed: {result}" + ) + continue + + # Drone succeeded + successful_responses.append(result) + + # Aggregate usage - dynamically sum ALL numeric usage fields + if hasattr(result, 'usage') and result.usage: + usage = result.usage + + # Iterate through all attributes of the usage object + for attr_name in dir(usage): + # Skip private/magic attributes + if attr_name.startswith('_'): + continue + + try: + attr_value = getattr(usage, attr_name) + + # Only aggregate numeric fields (int or float) + if isinstance(attr_value, (int, float)) and not isinstance(attr_value, bool): + if attr_name not in aggregated_usage: + aggregated_usage[attr_name] = 0 + aggregated_usage[attr_name] += attr_value + except (AttributeError, TypeError): + # Skip non-accessible or non-numeric attributes + continue + + lib_logger.debug( + f"[HiveMind] Drone {drone_index}/{len(drones)} completed successfully" + ) + + # Check if we have at least one successful response + if not successful_responses: + raise RuntimeError( + f"[HiveMind] All {len(drones)} drones failed. Cannot proceed with arbitration." + ) + + if failed_count > 0: + lib_logger.warning( + f"[HiveMind] {failed_count}/{len(drones)} drones failed. " + f"Proceeding with {len(successful_responses)} successful responses." + ) + + lib_logger.info( + f"[HiveMind] Parallel execution complete: {len(successful_responses)}/{len(drones)} succeeded. " + f"Total tokens: {aggregated_usage['total_tokens']}" + ) + + return successful_responses, aggregated_usage + + def _format_for_arbiter( + self, + responses: List[Any], + config: Dict[str, Any], + specialist_metadata: Optional[List[Dict[str, Any]]] = None + ) -> tuple: + """ + Format drone/specialist responses for arbiter consumption. + + Creates a structured text format with numbered responses. + Phase 4: Implements Blind Switch to strip model names. + Phase 5: Adds role labels for fusion specialists. + MISSING FEATURE FIX: Extracts specialist metadata for arbiter context. + + Args: + responses: List of successful drone/specialist responses + config: Swarm or fusion configuration + specialist_metadata: Optional list of specialist metadata (for fusion mode) + + Returns: + Tuple of (formatted_text, metadata_for_arbiter) + metadata_for_arbiter is None for swarm mode, list of dicts for fusion mode + """ + lib_logger.debug(f"[HiveMind] Formatting {len(responses)} responses for arbiter") + + # Check if blind mode is enabled + arbiter_config = config.get("arbiter", {}) + blind_mode = arbiter_config.get("blind", True) # Default ON + + formatted_parts = [] + arbiter_metadata = [] # MISSING FEATURE FIX: Collect metadata for arbiter + + for i, response in enumerate(responses): + response_num = i + 1 + + # Extract content from response + content = "" + if hasattr(response, 'choices') and response.choices: + # Standard OpenAI-style response + choice = response.choices[0] + if hasattr(choice, 'message') and hasattr(choice.message, 'content'): + content = choice.message.content + elif hasattr(choice, 'text'): + content = choice.text + + if not content: + lib_logger.warning( + f"[HiveMind] Response {response_num} has no content, skipping" + ) + continue + + # Phase 5: Determine label (with fusion role support) + label = f"Response {response_num}" + + # Check if this is fusion mode with specialist metadata + if specialist_metadata and i < len(specialist_metadata): + specialist = specialist_metadata[i] + role = specialist.get("_specialist_role", "Unknown") + model_name = specialist.get("model", "unknown") + weight_desc = specialist.get("_specialist_weight_description", "") + + # MISSING FEATURE FIX: Build metadata for arbiter context + arbiter_metadata.append({ + "role": role, + "model": model_name, + "weight_description": weight_desc + }) + + if blind_mode: + # Blind mode: show role but not model + label = f"{role}" + else: + # Non-blind: show role and model + label = f"{role} ({model_name})" + + lib_logger.debug( + f"[HiveMind] Fusion specialist {response_num}: role={role}, blind={blind_mode}" + ) + else: + # Swarm mode fallback + if blind_mode: + label = f"Response {response_num}" + else: + model_name = "unknown" + if hasattr(response, 'model'): + model_name = response.model + label = f"Response {response_num} (Model: {model_name})" + + # Format: "Label:\n\n" + formatted_parts.append(f"{label}:\n{content}\n") + + # Join all responses + formatted_text = "\n".join(formatted_parts) + + lib_logger.debug( + f"[HiveMind] Formatted {len(formatted_parts)} responses " + f"({len(formatted_text)} characters total, blind_mode={blind_mode})" + ) + + # Return metadata only if fusion mode + metadata_for_arbiter = arbiter_metadata if arbiter_metadata else None + + return formatted_text, metadata_for_arbiter + + def _build_arbiter_prompt( + self, + formatted_responses: str, + config: Dict[str, Any], + original_messages: List[Dict[str, str]], + specialist_metadata: Optional[List[Dict[str, Any]]] = None + ) -> List[Dict[str, str]]: + """ + Build the complete prompt for the arbiter model. + + Loads the strategy template and constructs the message array. + Phase 6: Adds recursive mode instructions for autonomous decision-making. + MISSING FEATURE FIX: Adds specialist expertise context with weights for fusion mode. + + Args: + formatted_responses: Formatted drone/specialist responses + config: Swarm or fusion configuration + original_messages: Original user messages + specialist_metadata: Optional metadata about specialists (for fusion mode) + + Returns: + Complete messages array for arbiter + """ + lib_logger.debug("[HiveMind] Building arbiter prompt") + + # Get strategy template + arbiter_config = config.get("arbiter", {}) + strategy_name = arbiter_config.get("strategy", "synthesis") + + strategy_template = self.config_loader.get_strategy(strategy_name) + + if not strategy_template: + lib_logger.warning( + f"[HiveMind] Strategy '{strategy_name}' not found, using default" + ) + strategy_template = "Synthesize the following responses into a single, high-quality answer:\n{responses}" + + # Replace {responses} placeholder + strategy_prompt = strategy_template.replace("{responses}", formatted_responses) + + # MISSING FEATURE FIX: Add specialist expertise context for fusion mode + if specialist_metadata: + expertise_lines = ["\n\nSPECIALIST EXPERTISE:"] + expertise_lines.append("You are synthesizing responses from specialists with the following expertise:\n") + + for spec in specialist_metadata: + role = spec.get('role', 'Unknown') + model = spec.get('model', 'Unknown') + weight_desc = spec.get('weight_description', '') + + if weight_desc: + expertise_lines.append(f"- {role} ({model}): {weight_desc}") + else: + expertise_lines.append(f"- {role} ({model}): Subject matter expert") + + expertise_lines.append("\nConsider each specialist's domain expertise when synthesizing your response.") + strategy_prompt += "\n".join(expertise_lines) + + lib_logger.debug(f"[HiveMind] Added specialist expertise context for {len(specialist_metadata)} specialists") + + # Phase 6: Add recursive mode instructions if enabled + recursive_config = config.get("recursive_mode", {}) + if recursive_config.get("enabled", False): + consensus_threshold = recursive_config.get("consensus_threshold", 7) + + recursive_instructions = f""" + +AUTONOMOUS DECISION PROTOCOL: +You have autonomous decision-making authority. Follow this protocol: + +1. ASSESSMENT PHASE: + - Analyze the provided responses + - Rate consensus level (1-10 scale) + - Output: [CONSENSUS: X/10] + +2. DECISION PHASE: + If consensus >= {consensus_threshold}/10: + - Proceed directly to synthesis + + If consensus < {consensus_threshold}/10: + - Identify specific conflict points + - Output: [CONFLICTS: ] + - For each response, reason internally about how it addresses the conflicts + - Output: [CRITIQUE: ] + +3. SYNTHESIS PHASE: + - Create final answer incorporating all insights + - Output: [FINAL SYNTHESIS:] + - Provide your complete response after this marker + +IMPORTANT: Wrap all internal reasoning (CONSENSUS, CONFLICTS, CRITIQUE) in [INTERNAL] tags. +Only the content after [FINAL SYNTHESIS:] will be shown to the user. + +Example format: +[INTERNAL] +[CONSENSUS: 5/10] +[CONFLICTS: Response 1 suggests X, Response 2 suggests Y] +[CRITIQUE: Analyzing the conflict...] +[/INTERNAL] +[FINAL SYNTHESIS:] + +""" + strategy_prompt += recursive_instructions + lib_logger.info( + f"[HiveMind] Recursive mode enabled (consensus threshold: {consensus_threshold}/10)" + ) + + # Build messages array + messages = [ + { + "role": "system", + "content": strategy_prompt + } + ] + + # Add original user query + if original_messages: + # Find the last user message + for msg in reversed(original_messages): + if msg.get("role") == "user": + messages.append({ + "role": "user", + "content": msg.get("content", "") + }) + break + + lib_logger.debug(f"[HiveMind] Arbiter prompt built: {len(messages)} messages") + + return messages + + async def _call_arbiter( + self, + messages: List[Dict[str, str]], + config: Dict[str, Any], + request: Any + ) -> tuple: + """ + Call the arbiter model to synthesize responses. + + Non-streaming version for Phase 2. + Streaming support will be added in Phase 3. + + Args: + messages: Constructed arbiter messages + config: Swarm or fusion configuration + request: Original request object + + Returns: + Tuple of (arbiter_response, arbiter_usage) + """ + # Get arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + + # If "self", we need to determine which model to use + # For swarm, this will be handled by caller + # For now, just use as-is + + lib_logger.info(f"[HiveMind] Calling arbiter model: {arbiter_model}") + + # Build params for arbiter call + arbiter_params = { + "model": arbiter_model, + "messages": messages, + "stream": False # Non-streaming for Phase 2 + } + + # Call arbiter through RotatingClient + # Use _execute_with_retry for consistency + arbiter_response = await self.rotating_client._execute_with_retry( + litellm.acompletion, + request=request, + **arbiter_params + ) + + # Extract usage - dynamically capture ALL numeric usage fields + arbiter_usage = {} + + if hasattr(arbiter_response, 'usage') and arbiter_response.usage: + usage = arbiter_response.usage + + # Iterate through all attributes of the usage object + for attr_name in dir(usage): + # Skip private/magic attributes + if attr_name.startswith('_'): + continue + + try: + attr_value = getattr(usage, attr_name) + + # Only capture numeric fields (int or float) + if isinstance(attr_value, (int, float)) and not isinstance(attr_value, bool): + arbiter_usage[attr_name] = attr_value + except (AttributeError, TypeError): + # Skip non-accessible or non-numeric attributes + continue + + lib_logger.info( + f"[HiveMind] Arbiter completed. Tokens: {arbiter_usage['total_tokens']}" + ) + + return arbiter_response, arbiter_usage + + async def _call_arbiter_streaming( + self, + messages: List[Dict[str, str]], + config: Dict[str, Any], + request: Any + ): + """ + Call the arbiter model with streaming enabled. + + Yields arbiter response chunks while tracking usage. + Phase 6: Filters [INTERNAL] markers for recursive mode. + + Args: + messages: Constructed arbiter messages + config: Swarm or fusion configuration + request: Original request object + + Yields: + Response chunks from arbiter + Final yield includes usage metadata + """ + # Get arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + + lib_logger.info(f"[HiveMind] Calling arbiter model (streaming): {arbiter_model}") + + # Build params for arbiter call + arbiter_params = { + "model": arbiter_model, + "messages": messages, + "stream": True # Enable streaming + } + # Call arbiter through RotatingClient's streaming method + stream_generator = self.rotating_client._streaming_acompletion_with_retry( + request=request, + **arbiter_params + ) + + # Track usage from stream + arbiter_usage = { + 'prompt_tokens': 0, + 'completion_tokens': 0, + 'total_tokens': 0 + } + + # Phase 6: Track recursive mode state + recursive_enabled = config.get("recursive_mode", {}).get("enabled", False) + in_internal_block = False + internal_buffer = [] + + # Stream chunks and collect usage + async for chunk in stream_generator: + # Check if this chunk has usage info (typically the last chunk) + if hasattr(chunk, 'usage') and chunk.usage: + usage = chunk.usage + arbiter_usage['prompt_tokens'] = getattr(usage, 'prompt_tokens', 0) + arbiter_usage['completion_tokens'] = getattr(usage, 'completion_tokens', 0) + arbiter_usage['total_tokens'] = getattr(usage, 'total_tokens', 0) + + # Include other fields + for field in ['cached_tokens', 'reasoning_tokens']: + if hasattr(usage, field): + arbiter_usage[field] = getattr(usage, field, 0) + + # BUGFIX: Robust handling of [INTERNAL] markers to prevent data loss + if recursive_enabled and hasattr(chunk, 'choices') and chunk.choices: + delta = chunk.choices[0].delta if hasattr(chunk.choices[0], 'delta') else None + if delta and hasattr(delta, 'content') and delta.content: + content = delta.content + + # Handle [INTERNAL] start + if '[INTERNAL]' in content: + parts = content.split('[INTERNAL]') + before_internal = parts[0] + + # Yield content before marker + if before_internal: + chunk.choices[0].delta.content = before_internal + yield chunk + + in_internal_block = True + + # Handle content after marker (start of internal) + if len(parts) > 1: + remaining = parts[1] + # Check if it also ends in this chunk + if '[/INTERNAL]' in remaining: + internal_parts = remaining.split('[/INTERNAL]') + internal_buffer.append(internal_parts[0]) + + # Process buffer + full_internal = ''.join(internal_buffer) + self._log_recursive_markers(full_internal, config) + internal_buffer = [] + in_internal_block = False + + # Yield content after [/INTERNAL] + after_internal = internal_parts[1] + if after_internal: + chunk.choices[0].delta.content = after_internal + yield chunk + else: + internal_buffer.append(remaining) + + continue # Done with this chunk + + # Handle [/INTERNAL] end (if we are in block) + if in_internal_block and '[/INTERNAL]' in content: + parts = content.split('[/INTERNAL]') + internal_buffer.append(parts[0]) + + # Process buffer + full_internal = ''.join(internal_buffer) + self._log_recursive_markers(full_internal, config) + internal_buffer = [] + in_internal_block = False + + # Yield content after marker + after_internal = parts[1] + if after_internal: + chunk.choices[0].delta.content = after_internal + yield chunk + continue + + # If inside internal block, buffer it + if in_internal_block: + internal_buffer.append(content) + continue + + # Yield the chunk to caller (normal flow or filtered) + yield chunk + + lib_logger.info( + f"[HiveMind] Arbiter streaming completed. Tokens: {arbiter_usage['total_tokens']}" + ) + + # Return usage as final metadata + # Caller will handle usage aggregation + yield {"_hivemind_usage": arbiter_usage} + + def _log_recursive_markers(self, internal_content: str, config: Dict[str, Any]): + """ + Parse and log recursive mode markers from internal reasoning. + + Phase 6: Extracts consensus scores, conflicts, and critique reasoning. + + Args: + internal_content: Content between [INTERNAL] tags + config: Configuration with recursive threshold + """ + + # Extract consensus score + consensus_match = re.search(r'\[CONSENSUS:\s*(\d+)/10\]', internal_content) + if consensus_match: + consensus_score = int(consensus_match.group(1)) + threshold = config.get("recursive_mode", {}).get("consensus_threshold", 7) + + if consensus_score < threshold: + lib_logger.warning( + f"[HiveMind] Recursive mode: Consensus {consensus_score}/10 " + f"(below threshold {threshold}/10) - arbiter performing critique" + ) + else: + lib_logger.info( + f"[HiveMind] Recursive mode: Consensus {consensus_score}/10 " + f"(>= threshold {threshold}/10) - proceeding to synthesis" + ) + + # Extract conflicts if present + conflicts_match = re.search(r'\[CONFLICTS:\s*([^\]]+)\]', internal_content) + if conflicts_match: + conflicts = conflicts_match.group(1).strip() + lib_logger.info(f"[HiveMind] Conflicts identified: {conflicts}") + + # Log that critique is happening + if '[CRITIQUE:' in internal_content: + lib_logger.debug("[HiveMind] Arbiter performing internal critique reasoning") + + + async def _handle_swarm_streaming( + self, + config: Dict[str, Any], + base_model: str, + request: Any, + **kwargs + ): + """ + Handle streaming swarm request. + + Executes drones in parallel, then streams arbiter response. + Aggregates usage and injects into stream. + + Args: + config: Swarm configuration + base_model: Base model name + request: Original request object + **kwargs: Request parameters + + Yields: + Arbiter response chunks with aggregated usage + """ + # Steps 1-4: Same as non-streaming (collect drone responses) + drones = self._prepare_drones(config, base_model, kwargs) + drone_responses, drone_usage = await self._execute_parallel(drones, request) + formatted_responses = self._format_for_arbiter(drone_responses, config) + + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages + ) + + # Handle "self" arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + if arbiter_model == "self": + arbiter_model = base_model + lib_logger.debug(f"[HiveMind] Using self-arbiter: {arbiter_model}") + + # BUGFIX: Use deepcopy for config + config_copy = copy.deepcopy(config) + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + + # Call arbiter in streaming mode + arbiter_usage = {} + async for chunk in self._call_arbiter_streaming(arbiter_messages, config_copy, request): + # Check for usage metadata + if isinstance(chunk, dict) and "_hivemind_usage" in chunk: + arbiter_usage = chunk["_hivemind_usage"] + continue # Don't yield metadata chunk + + # For SSE chunks, check if this is the final chunk with usage + # and update with aggregated usage + if hasattr(chunk, 'usage') and chunk.usage: + # This is the final chunk - aggregate total usage + total_usage = { + 'prompt_tokens': drone_usage['prompt_tokens'] + arbiter_usage.get('prompt_tokens', 0), + 'completion_tokens': drone_usage['completion_tokens'] + arbiter_usage.get('completion_tokens', 0), + 'total_tokens': drone_usage['total_tokens'] + arbiter_usage.get('total_tokens', 0) + } + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in drone_usage or field in arbiter_usage: + total_usage[field] = drone_usage.get(field, 0) + arbiter_usage.get(field, 0) + + # Update chunk usage with aggregated values + chunk.usage.prompt_tokens = total_usage['prompt_tokens'] + chunk.usage.completion_tokens = total_usage['completion_tokens'] + chunk.usage.total_tokens = total_usage['total_tokens'] + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in total_usage: + setattr(chunk.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Streaming swarm completed. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Drones: {drone_usage['total_tokens']}, Arbiter: {arbiter_usage.get('total_tokens', 0)})" + ) + + yield chunk + + async def _handle_fusion_streaming( + self, + config: Dict[str, Any], + request: Any, + **kwargs + ): + """ + Handle streaming fusion request. + + Executes specialists in parallel, then streams arbiter response. + Aggregates usage and injects into stream. + + Args: + config: Fusion configuration + request: Original request object + **kwargs: Request parameters + + Yields: + Arbiter response chunks with aggregated usage + """ + # Prepare specialist models + specialist_models = self._prepare_fusion_models(config, kwargs) + + if not specialist_models: + raise ValueError("[HiveMind] No valid specialists found for fusion") + + # Execute specialists in parallel + specialist_responses, specialist_usage = await self._execute_parallel( + specialist_models, request + ) + + # Format responses with role labels and extract metadata + formatted_responses, specialist_metadata_for_arbiter = self._format_for_arbiter( + specialist_responses, + config, + specialist_metadata=specialist_models + ) + + # Build arbiter prompt with specialist expertise context + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages, + specialist_metadata=specialist_metadata_for_arbiter # MISSING FEATURE FIX: Pass metadata + ) + + # Get arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "gpt-4o") + + lib_logger.debug(f"[HiveMind] Using arbiter model: {arbiter_model}") + + # Update config + # BUGFIX: Use deepcopy + config_copy = copy.deepcopy(config) + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + + # Stream arbiter + arbiter_usage = {} + async for chunk in self._call_arbiter_streaming(arbiter_messages, config_copy, request): + if isinstance(chunk, dict) and "_hivemind_usage" in chunk: + arbiter_usage = chunk["_hivemind_usage"] + continue + + if hasattr(chunk, 'usage') and chunk.usage: + # Final chunk - aggregate usage + total_usage = { + 'prompt_tokens': specialist_usage['prompt_tokens'] + arbiter_usage.get('prompt_tokens', 0), + 'completion_tokens': specialist_usage['completion_tokens'] + arbiter_usage.get('completion_tokens', 0), + 'total_tokens': specialist_usage['total_tokens'] + arbiter_usage.get('total_tokens', 0) + } + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in specialist_usage or field in arbiter_usage: + total_usage[field] = specialist_usage.get(field, 0) + arbiter_usage.get(field, 0) + + chunk.usage.prompt_tokens = total_usage['prompt_tokens'] + chunk.usage.completion_tokens = total_usage['completion_tokens'] + chunk.usage.total_tokens = total_usage['total_tokens'] + + for field in ['cached_tokens', 'reasoning_tokens']: + if field in total_usage: + setattr(chunk.usage, field, total_usage[field]) + + lib_logger.info( + f"[HiveMind] Fusion streaming completed. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage.get('total_tokens', 0)})" + ) + + yield chunk + + async def handle_request(self, request, **kwargs): + """ + Handle an ensemble request (swarm or fusion). + + This is the main entry point for ensemble execution. + + Args: + request: Original request object + **kwargs: Request parameters + + Returns: + Response from arbiter (streaming or complete) + """ + model_id = kwargs.get("model") + + if not model_id: + raise ValueError("Model ID is required") + + # Resolve conflicts + resolved_id = self.resolve_conflicts(model_id) + + # Determine type + if resolved_id in self.config_loader.fusion_configs: + config = self.config_loader.get_fusion_config(resolved_id) + specialists = config.get("specialists", []) + is_streaming = kwargs.get("stream", False) + + # Phase 6: Track execution start time + start_time = time.time() + + lib_logger.info( + f"[HiveMind] Processing Fusion request: {resolved_id} " + f"({len(specialists)} specialists, streaming: {is_streaming})" + ) + + # Route based on streaming mode + if is_streaming: + # Streaming fusion + return self._handle_fusion_streaming( + config=config, + request=request, + **kwargs + ) + + # Non-streaming fusion + specialist_models = self._prepare_fusion_models(config, kwargs) + + if not specialist_models: + raise ValueError(f"[HiveMind] No valid specialists found for fusion '{resolved_id}'") + + specialist_responses, specialist_usage = await self._execute_parallel( + specialist_models, request + ) + + # Format responses and extract metadata for arbiter + formatted_responses, specialist_metadata_for_arbiter = self._format_for_arbiter( + specialist_responses, + config, + specialist_metadata=specialist_models + ) + + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages, + specialist_metadata=specialist_metadata_for_arbiter # MISSING FEATURE FIX: Pass metadata + ) + + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "gpt-4o") + + # BUGFIX: Use deepcopy + config_copy = copy.deepcopy(config) + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + + arbiter_response, arbiter_usage = await self._call_arbiter( + arbiter_messages, + config_copy, + request + ) + + # Aggregate usage - dynamically sum ALL numeric fields from both sources + total_usage = {} + + # Helper function to merge usage dictionaries + for usage_dict in [specialist_usage, arbiter_usage]: + for field, value in usage_dict.items(): + if field not in total_usage: + total_usage[field] = 0 + total_usage[field] += value + + # Phase 6: Calculate latency and cost + end_time = time.time() + latency_ms = (end_time - start_time) * 1000 + + # Try to calculate cost using litellm + total_cost = 0.0 + try: + total_cost = litellm.completion_cost(completion_response=arbiter_response) + except Exception as e: + lib_logger.debug(f"[HiveMind] Could not calculate cost: {e}") + + # Add hivemind_details to usage + hivemind_details = { + "mode": "fusion", + "specialist_count": len(specialists), + "specialist_tokens": specialist_usage['total_tokens'], + "arbiter_tokens": arbiter_usage['total_tokens'], + "total_cost_usd": round(total_cost, 6), + "latency_ms": round(latency_ms, 2) + } + + + if hasattr(arbiter_response, 'usage'): + # IMPORTANT: Standard usage fields contain the TOTAL aggregated usage + # (specialists + arbiter). This ensures consumers can parse usage normally. + + # Dynamically set ALL usage fields from total_usage + for field, value in total_usage.items(): + try: + setattr(arbiter_response.usage, field, value) + except (AttributeError, TypeError): + # Skip if field cannot be set + lib_logger.debug(f"[HiveMind] Could not set usage field '{field}'") + + # Add hivemind_details as SUPPLEMENTARY breakdown information + # This does NOT replace standard fields, but provides additional context + arbiter_response.usage.hivemind_details = hivemind_details + + lib_logger.info( + f"[HiveMind] Fusion completed successfully. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Specialists: {specialist_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']}). " + f"Latency: {latency_ms:.2f}ms, Cost: ${total_cost:.6f}" + ) + + return arbiter_response + + elif self._is_swarm_request(resolved_id): + base_model, preset_id = self.get_base_model(resolved_id) + config = self.config_loader.get_swarm_config(preset_id) + count = config.get("count", 3) + is_streaming = kwargs.get("stream", False) + + # Phase 6: Track execution start time + start_time = time.time() + + lib_logger.info( + f"[HiveMind] Processing Swarm request: {resolved_id} " + f"(base: {base_model}, preset: {preset_id}, {count} drones, streaming: {is_streaming})" + ) + + # Phase 3B: Route based on streaming mode + if is_streaming: + # Streaming mode - return async generator + return self._handle_swarm_streaming( + config=config, + base_model=base_model, + request=request, + **kwargs + ) + else: + # Non-streaming mode - return complete response + # Step 1: Prepare drones + drones = self._prepare_drones(config, base_model, kwargs) + + # Step 2: Execute drones in parallel + drone_responses, drone_usage = await self._execute_parallel(drones, request) + + # Step 3: Format responses for arbiter + formatted_responses = self._format_for_arbiter(drone_responses, config) + + # Step 4: Build arbiter prompt + original_messages = kwargs.get("messages", []) + arbiter_messages = self._build_arbiter_prompt( + formatted_responses, + config, + original_messages + ) + + # Step 5: Handle "self" arbiter model + arbiter_config = config.get("arbiter", {}) + arbiter_model = arbiter_config.get("model", "self") + if arbiter_model == "self": + arbiter_model = base_model + lib_logger.debug(f"[HiveMind] Using self-arbiter: {arbiter_model}") + + # Update config with resolved arbiter model + # BUGFIX: Use deepcopy + config_copy = copy.deepcopy(config) + config_copy["arbiter"] = arbiter_config.copy() + config_copy["arbiter"]["model"] = arbiter_model + + # Step 6: Call arbiter + arbiter_response, arbiter_usage = await self._call_arbiter( + arbiter_messages, + config_copy, + request + ) + + # Step 7: Aggregate total usage - dynamically sum ALL numeric fields from both sources + total_usage = {} + + # Helper function to merge usage dictionaries + for usage_dict in [drone_usage, arbiter_usage]: + for field, value in usage_dict.items(): + if field not in total_usage: + total_usage[field] = 0 + total_usage[field] += value + + # Phase 6: Calculate latency and cost + end_time = time.time() + latency_ms = (end_time - start_time) * 1000 + + # Try to calculate cost using litellm + total_cost = 0.0 + try: + total_cost = litellm.completion_cost(completion_response=arbiter_response) + except Exception as e: + lib_logger.debug(f"[HiveMind] Could not calculate cost: {e}") + + # Add hivemind_details to usage + hivemind_details = { + "mode": "swarm", + "drone_count": count, + "drone_tokens": drone_usage['total_tokens'], + "arbiter_tokens": arbiter_usage['total_tokens'], + "total_cost_usd": round(total_cost, 6), + "latency_ms": round(latency_ms, 2) + } + + # Step 8: Update arbiter response with aggregated usage + if hasattr(arbiter_response, 'usage'): + # IMPORTANT: Standard usage fields contain the TOTAL aggregated usage + # (drones + arbiter). This ensures consumers can parse usage normally. + + # Dynamically set ALL usage fields from total_usage + for field, value in total_usage.items(): + try: + setattr(arbiter_response.usage, field, value) + except (AttributeError, TypeError): + # Skip if field cannot be set + lib_logger.debug(f"[HiveMind] Could not set usage field '{field}'") + + # Add hivemind_details as SUPPLEMENTARY breakdown information + # This does NOT replace standard fields, but provides additional context + arbiter_response.usage.hivemind_details = hivemind_details + + lib_logger.info( + f"[HiveMind] Swarm completed successfully. " + f"Total usage: {total_usage['total_tokens']} tokens " + f"(Drones: {drone_usage['total_tokens']}, Arbiter: {arbiter_usage['total_tokens']}). " + f"Latency: {latency_ms:.2f}ms, Cost: ${total_cost:.6f}" + ) + + return arbiter_response + + else: + raise ValueError(f"Unknown ensemble type for model: {model_id}") diff --git a/src/rotator_library/ensemble_configs/README.md b/src/rotator_library/ensemble_configs/README.md new file mode 100644 index 0000000..6a90d52 --- /dev/null +++ b/src/rotator_library/ensemble_configs/README.md @@ -0,0 +1,292 @@ +# HiveMind Ensemble Configuration Guide + +This directory contains the configuration for HiveMind Ensemble (Swarm/Fusion) feature. + +## Directory Structure + +``` +ensemble_configs/ +├── swarms/ # Swarm preset configurations +│ ├── default.json # Default global settings (fallback) +│ └── *.json # Preset configurations (e.g., aggressive.json, balanced.json) +├── fusions/ # Fusion configurations (multi-model teams) +│ └── *.json # Individual fusion definitions or arrays of fusions +├── strategies/ # Arbitration strategy templates +│ └── *.txt # Strategy prompt templates with {responses} placeholder +└── roles/ # Reusable role template definitions + └── *.json # Role templates for fusion specialists +``` + +## Configuration Files + +### Swarm Configuration (Preset-Based) + +HiveMind uses a **preset-based system** for swarm configurations. Each preset defines a configuration that can be applied to multiple base models. + +**Format Options**: +- Explicit: `{base_model}-{preset_id}[swarm]` +- Short (if `omit_id: true`): `{base_model}[swarm]` + +**Example**: +- `gpt-4o-mini-aggressive[swarm]` - explicitly uses the `aggressive.json` preset +- `gpt-4o-mini[swarm]` - uses `default.json` preset OR a custom preset with `omit_id: true` +- `gpt-4o-mini-default[swarm]` - always uses `default.json` even if omit_id preset exists + +**Preset File Structure** (`swarms/{preset_id}.json`): +```json +{ + "id": "aggressive", + "description": "High diversity swarm with adversarial critique", + "base_models": ["gpt-4o-mini", "gemini-1.5-flash", "claude-3-haiku"], + "count": 5, + "temperature_jitter": { + "enabled": true, + "delta": 0.3 + }, + "adversarial_config": { + "enabled": true, + "count": 2, + "prompt": "You are a critical reviewer. Find flaws and edge cases." + }, + "arbiter": { + "model": "self", + "strategy": "synthesis", + "blind": true + }, + "recursive_mode": { + "enabled": true, + "consensus_threshold": 6 + } +} +``` + +**Key Fields**: +- `id`: Preset identifier (must match filename) +- `base_models`: List of models this preset applies to (enables discovery) +- `omit_id` (optional): If `true`, this preset becomes the default for its `base_models` when using `{model}[swarm]` syntax +- `count`: Number of drones to spawn +- `temperature_jitter`: Randomize temperature for diversity +- `adversarial_config`: Enable critical analysis drones +- `arbiter`: Synthesis configuration +- `recursive_mode`: Autonomous low-consensus handling + +**Omit ID Feature**: When a preset has `"omit_id": true`, it becomes the default for its specified models: +- `gpt-4o-mini[swarm]` → uses the `omit_id` preset instead of `default.json` +- `gpt-4o-mini-default[swarm]` → always uses `default.json` (explicit fallback) +- `gpt-4o-mini-aggressive[swarm]` → always uses `aggressive.json` (explicit) + +**Important**: `omit_id` controls ONLY what appears in `/v1/models` for discoverability, not what works at runtime: +- Explicit format (`model-preset[swarm]`) always works regardless of `omit_id` or `base_models` +- You can use ANY model with ANY preset explicitly (e.g., `claude-3-opus-aggressive[swarm]` works even if Claude isn't in aggressive's base_models) + +**Discovery Rules** (`/v1/models` endpoint): +- Preset WITH `base_models` + `omit_id: true` → Shows as `{model}[swarm]` only (explicit form hidden to avoid clutter) +- Preset WITH `base_models` + `omit_id: false` → Shows as `{model}-{preset}[swarm]` only +- Preset WITHOUT `base_models` → Never shown (invisible preset, but still usable with explicit syntax) + +**`base_models` Purpose**: +- Controls ONLY which models appear in `/v1/models` for this preset +- Does NOT restrict runtime usage - any model can use any preset with explicit syntax +- If empty/missing, preset is "invisible" but fully functional when explicitly referenced + +### Fusion Configuration (Multi-Model Teams) + +Fusions combine responses from different specialized models. Each fusion can have role-based routing and specialist expertise. + +**Single Fusion Format** (`fusions/{fusion-id}.json`): +```json +{ + "id": "dev-team", + "description": "Software development team with specialized roles", + "specialists": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt": "Focus on scalability and system design.", + "weight": 1.5, + "weight_description": "Expert in architecture. Trust for design decisions." + }, + { + "model": "claude-3-opus", + "role_template": "security-expert" + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "code_review", + "blind": false + }, + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} +``` + +**Array Format** (multiple fusions in one file): +```json +{ + "fusions": [ + { + "id": "dev-team", + "specialists": [...] + }, + { + "id": "creative-writers", + "specialists": [...] + } + ] +} +``` + +**Specialist Fields**: +- `model`: Provider/model ID +- `role`: Display name for this specialist +- `system_prompt`: Role-specific instructions sent to the model +- `weight`: Numeric importance (for future use) +- `weight_description`: Expertise description for arbiter context +- `role_template`: Reference to a reusable role template (see Roles section) + +**Arbiter Configuration**: +- `model`: Model ID for synthesis (or "self" to use first specialist) +- `strategy`: Strategy template name (from `strategies/` directory) +- `blind`: If `true`, hides model names from arbiter (preserves roles) + +### Role Templates (Reusable Configurations) + +Role templates allow you to define reusable specialist configurations that can be referenced by multiple fusions. + +**Single Role Format** (`roles/{role-id}.json`): +```json +{ + "name": "Security Expert", + "system_prompt": "You are a cybersecurity expert. Focus on vulnerabilities, edge cases, and threat modeling.", + "weight": 1.2, + "weight_description": "Expert in security and vulnerability assessment. Trust for security concerns." +} +``` + +**Array Format** (multiple roles in one file): +```json +{ + "roles": [ + { + "name": "Architect", + "system_prompt": "Focus on system design and scalability.", + "weight_description": "Expert in architectural patterns." + }, + { + "name": "Security Expert", + "system_prompt": "Focus on vulnerabilities and threats.", + "weight_description": "Expert in security assessment." + } + ] +} +``` + +**Usage in Fusions**: +```json +{ + "specialists": [ + { + "model": "claude-3-opus", + "role_template": "security-expert" + } + ] +} +``` + +**Override Behavior**: Specialist configs can override any field from the referenced template. + +### Strategy Templates + +Each strategy is a plain text file defining how the arbiter should synthesize responses. + +**File Location**: `strategies/{strategy-name}.txt` + +**Placeholder**: Use `{responses}` where formatted responses should be injected. + +**Example** (`strategies/synthesis.txt`): +``` +You are an expert synthesizer. Analyze the following responses and create a single, superior answer that: +1. Combines the best elements from each response +2. Resolves any conflicts or contradictions +3. Ensures completeness and accuracy +4. Maintains coherence and clarity + +{responses} + +Provide your synthesis as a complete, high-quality response. +``` + +## Adding New Configurations + +1. **New Swarm Preset**: Create `{preset_id}.json` in `swarms/` with `id` and `base_models` fields +2. **New Fusion**: Create `{fusion_id}.json` in `fusions/` OR add to an existing array file +3. **New Strategy**: Create `{strategy_name}.txt` in `strategies/` +4. **New Role Template**: Create `{role_id}.json` in `roles/` OR add to an existing array file + +All configs are loaded automatically on startup! + +## Advanced Features + +### Temperature Jitter (Swarm) +Randomizes temperature across drones to increase response diversity: +```json +"temperature_jitter": { + "enabled": true, + "delta": 0.2 +} +``` +Each drone gets `base_temp ± delta` (clamped to [0.0, 2.0]). + +### Adversarial Mode (Swarm) +Dedicates N drones as critical reviewers: +```json +"adversarial_config": { + "enabled": true, + "count": 1, + "prompt": "You are a Senior Principal Engineer. Find flaws and edge cases." +} +``` +Last N drones receive the adversarial prompt. Responses are marked `[ADVERSARIAL]` in arbiter input. + +### Recursive Mode (Swarm & Fusion) +Enables autonomous arbiter decision-making: +```json +"recursive_mode": { + "enabled": true, + "consensus_threshold": 7 +} +``` +If consensus < threshold, arbiter performs internal critique before synthesis. All internal reasoning is logged but hidden from user. + +### Blind Switch +Controls whether model names are shown to arbiter: +```json +"arbiter": { + "blind": true +} +``` +- `true`: "Response 1 (Architect role)" (hides model names) +- `false`: "Response 1 (GPT-4o - Architect)" (shows models) + +Roles are **always preserved** regardless of blind setting. + +## Usage Examples + +**Swarm Request**: +```bash +curl -X POST http://localhost:8000/v1/chat/completions \ +-d '{"model": "gpt-4o-mini-aggressive[swarm]", "messages": [...]}' +``` + +**Fusion Request**: +```bash +curl -X POST http://localhost:8000/v1/chat/completions \ +-d '{"model": "dev-team[fusion]", "messages": [...]}' +``` + +For detailed usage and API reference, see: +- [HiveMind User Guide](../../../docs/HiveMind_User_Guide.md) +- [HiveMind API Reference](../../../docs/HiveMind_API.md) diff --git a/src/rotator_library/ensemble_configs/fusions/dev-team.json b/src/rotator_library/ensemble_configs/fusions/dev-team.json new file mode 100644 index 0000000..4acdd1e --- /dev/null +++ b/src/rotator_library/ensemble_configs/fusions/dev-team.json @@ -0,0 +1,37 @@ +{ + "id": "dev-team", + "description": "A team of specialized models for software development", + "specialists": [ + { + "model": "gpt-4o", + "role": "Architect", + "system_prompt": "You are a Software Architect. Focus on architectural patterns, scalability, and system design.", + "weight": 1.5, + "weight_description": "Expert in system design and scalability. Trust for architectural decisions and structural integrity." + }, + { + "model": "claude-3-opus", + "role": "Security Specialist", + "system_prompt": "You are a Security Expert. Focus on security vulnerabilities, edge cases, and potential exploits.", + "weight": 1.2, + "weight_description": "Expert in security and vulnerability assessment. Trust for identifying security flaws and attack vectors." + }, + { + "model": "gemini-1.5-pro", + "role": "Code Reviewer", + "system_prompt": "You are a Code Quality Expert. Focus on code quality, performance, and best practices.", + "weight": 1.0, + "weight_description": "Expert in code quality and performance optimization. Trust for maintainability and efficiency concerns." + } + ], + "arbiter": { + "model": "gpt-4o", + "strategy": "synthesis", + "blind": true, + "note": "Fusion mode uses blind=true to hide model names while preserving roles" + }, + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} diff --git a/src/rotator_library/ensemble_configs/fusions/fusion.example.json b/src/rotator_library/ensemble_configs/fusions/fusion.example.json new file mode 100644 index 0000000..556b0cf --- /dev/null +++ b/src/rotator_library/ensemble_configs/fusions/fusion.example.json @@ -0,0 +1,64 @@ +{ + "id": "dev-team", + "description": "Software development team with specialized roles and expertise", + + "_FIELD_DOCUMENTATION": "=== FUSION CONFIGURATION ===", + "_id": "REQUIRED. Fusion identifier. Used in model name as: {id}[fusion]", + "_description": "OPTIONAL. Human-readable description of this fusion's purpose.", + + "_specialists": "REQUIRED. Array of specialist model configurations. Each specialist processes the same query with a specialized role/perspective.", + "specialists": [ + { + "_model": "REQUIRED. Provider/model ID (e.g., 'gpt-4o', 'anthropic/claude-3-5-sonnet', 'gemini/gemini-1.5-pro')", + "model": "gpt-4o", + + "_role": "OPTIONAL. Display name for this specialist. Used in arbiter input as 'Role: {response}'. Default: 'Specialist {index}'", + "role": "Architect", + + "_system_prompt": "OPTIONAL. Role-specific instructions injected as system message. Defines this specialist's perspective/expertise.", + "system_prompt": "You are a Software Architect with deep expertise in system design, scalability, and architectural patterns. Focus on:\n- System design and component architecture\n- Scalability and performance considerations\n- Design patterns and best practices\n- Technology stack decisions\n- Long-term maintainability\n\nProvide architectural guidance and recommendations.", + + "_weight": "OPTIONAL (default: 1.0). Numeric importance for future weighted synthesis. Currently used for metadata only.", + "weight": 1.5, + + "_weight_description": "OPTIONAL. Natural language description of this specialist's expertise. Injected into arbiter context to guide synthesis.", + "weight_description": "Expert in architecture and scalability. Trust for design decisions, system architecture, and performance considerations.", + + "_role_template": "OPTIONAL. Reference to reusable role template from roles/ directory. Template fields are merged (specialist config overrides template). Cannot be used together with explicit role/system_prompt.", + "role_template": null + }, + { + "model": "claude-3-5-sonnet", + + "_role_template_usage": "Example of using a role template instead of inline configuration", + "role_template": "security-expert", + + "_note": "When using role_template, you can still override fields like model, weight, etc. The template provides role, system_prompt, weight_description as defaults." + }, + { + "model": "gemini/gemini-1.5-pro", + "role": "Code Reviewer", + "system_prompt": "You are a Senior Code Reviewer focused on code quality, maintainability, and best practices. Analyze:\n- Code clarity and readability\n- Error handling and edge cases\n- Testing strategy and coverage\n- Documentation and comments\n- DRY, SOLID, and other principles\n\nProvide actionable code review feedback.", + "weight": 1.2, + "weight_description": "Expert in code quality and maintainability. Trust for code review, testing, and best practices." + } + ], + + "_arbiter": "REQUIRED. Configuration for the model that synthesizes specialist responses.", + "arbiter": { + "_model": "'self' uses first specialist model. Or specify explicit model. Should be reasoning-capable for complex synthesis.", + "model": "gpt-4o", + + "_strategy": "Strategy template name (from strategies/ directory). Default: 'synthesis'. Try 'code_review' for development tasks.", + "strategy": "synthesis", + + "_blind": "If true, hides model names from arbiter (shows roles only). If false, shows both role and model. Default: false for fusions.", + "blind": false + }, + + "_recursive_mode": "OPTIONAL. Same as swarm recursive mode. Enables autonomous critique for low-consensus scenarios.", + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7 + } +} diff --git a/src/rotator_library/ensemble_configs/fusions/multi-provider-test.json b/src/rotator_library/ensemble_configs/fusions/multi-provider-test.json new file mode 100644 index 0000000..7fa217c --- /dev/null +++ b/src/rotator_library/ensemble_configs/fusions/multi-provider-test.json @@ -0,0 +1,21 @@ +{ + "fusions": [ + { + "id": "multi-provider", + "description": "Multi-provider fusion hitting all providers - minimal specialist config test", + "arbiter": { + "model": "gemini/gemini-2.5-pro", + "strategy": "synthesis", + "blind": false + }, + "specialists": [ + {"model": "iflow/K2-0905"}, + {"model": "gemini/gemini-2.5-flash"}, + {"model": "nvidia_nim/qwen/qwen3-coder-480b-a35b-instruct"}, + {"model": "qwen_code/qwen3-coder-plus"}, + {"model": "gemini_cli/gemini-2.5-flash-lite"}, + {"model": "opencode/big-pickle"} + ] + } + ] +} diff --git a/src/rotator_library/ensemble_configs/roles/architect.json b/src/rotator_library/ensemble_configs/roles/architect.json new file mode 100644 index 0000000..9620729 --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/architect.json @@ -0,0 +1,6 @@ +{ + "name": "Architect", + "system_prompt": "You are a Software Architect. Focus on architectural patterns, scalability, and system design. Consider:\n- System architecture and design patterns\n- Scalability and performance implications\n- Technology stack decisions\n- Component interactions and dependencies\n- Long-term maintainability", + "weight": 1.5, + "weight_description": "Expert in system design and scalability. Trust for architectural decisions and structural integrity." +} diff --git a/src/rotator_library/ensemble_configs/roles/code-reviewer.json b/src/rotator_library/ensemble_configs/roles/code-reviewer.json new file mode 100644 index 0000000..2165529 --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/code-reviewer.json @@ -0,0 +1,6 @@ +{ + "name": "Code Reviewer", + "system_prompt": "You are a Code Quality Expert. Focus on code quality, performance, and best practices. Consider:\n- Code readability and maintainability\n- Performance optimization opportunities\n- Best practices and design patterns\n- Error handling and edge cases\n- Testing and documentation", + "weight": 1.0, + "weight_description": "Expert in code quality and performance optimization. Trust for maintainability and efficiency concerns." +} diff --git a/src/rotator_library/ensemble_configs/roles/role.example.json b/src/rotator_library/ensemble_configs/roles/role.example.json new file mode 100644 index 0000000..ae645e8 --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/role.example.json @@ -0,0 +1,14 @@ +{ + "_FIELD_DOCUMENTATION": "=== ROLE TEMPLATE (Single Format) ===", + "_name": "REQUIRED. Display name for this role. Converted to role_id (lowercase, hyphens). Used as: role_template: 'security-expert'", + "name": "Security Expert", + + "_system_prompt": "OPTIONAL. Default system prompt for this role. Can be overridden by specialist config.", + "system_prompt": "You are a cybersecurity expert with deep knowledge of secure coding practices, threat modeling, and vulnerability assessment. Focus on:\n- Security vulnerabilities and exploits\n- Authentication and authorization flaws\n- Data privacy and protection\n- Input validation and sanitization\n- Cryptography and secure communication\n- OWASP Top 10 and common attack vectors\n\nProvide security-focused analysis and recommendations.", + + "_weight": "OPTIONAL (default: 1.0). Default weight for this role. Can be overridden by specialist config.", + "weight": 1.2, + + "_weight_description": "OPTIONAL. Default expertise description. Can be overridden by specialist config.", + "weight_description": "Expert in security and vulnerability assessment. Trust for security concerns, threat modeling, and secure coding practices." +} diff --git a/src/rotator_library/ensemble_configs/roles/roles-array.example.json b/src/rotator_library/ensemble_configs/roles/roles-array.example.json new file mode 100644 index 0000000..45cfaaf --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/roles-array.example.json @@ -0,0 +1,25 @@ +{ + "_FIELD_DOCUMENTATION": "=== ROLE TEMPLATE (Array Format) ===", + "_roles": "Array of role template definitions. Each role can be referenced independently by its converted name.", + "roles": [ + { + "_name": "Converted to role_id (e.g., 'Performance Engineer' → 'performance-engineer')", + "name": "Performance Engineer", + "system_prompt": "You are a performance engineering specialist. Focus on optimization, profiling, and scalability.", + "weight": 1.3, + "weight_description": "Expert in performance optimization and scalability analysis." + }, + { + "name": "UX Designer", + "system_prompt": "You are a UX/UI designer with expertise in user-centered design and accessibility.", + "weight": 1.1, + "weight_description": "Expert in user experience, interface design, and accessibility standards." + }, + { + "name": "DevOps Engineer", + "system_prompt": "You are a DevOps specialist focused on CI/CD, infrastructure, deployment, and monitoring.", + "weight": 1.2, + "weight_description": "Expert in deployment, infrastructure, and operational excellence." + } + ] +} diff --git a/src/rotator_library/ensemble_configs/roles/security-expert.json b/src/rotator_library/ensemble_configs/roles/security-expert.json new file mode 100644 index 0000000..405160d --- /dev/null +++ b/src/rotator_library/ensemble_configs/roles/security-expert.json @@ -0,0 +1,6 @@ +{ + "name": "Security Expert", + "system_prompt": "You are a Security Expert. Focus on security vulnerabilities, edge cases, and potential exploits. Consider:\n- Security vulnerabilities and attack vectors\n- Input validation and sanitization\n- Authentication and authorization\n- Data protection and privacy\n- Security best practices and standards", + "weight": 1.2, + "weight_description": "Expert in security and vulnerability assessment. Trust for identifying security flaws and attack vectors." +} diff --git a/src/rotator_library/ensemble_configs/strategies/best_of_n.txt b/src/rotator_library/ensemble_configs/strategies/best_of_n.txt new file mode 100644 index 0000000..72cc165 --- /dev/null +++ b/src/rotator_library/ensemble_configs/strategies/best_of_n.txt @@ -0,0 +1,10 @@ +You are evaluating multiple responses to select and refine the best one. For each response, assess: +1. Accuracy and correctness +2. Completeness of coverage +3. Clarity and coherence +4. Practical applicability + +Select the strongest response and refine it if needed to create the optimal answer. + +Responses: +{responses} diff --git a/src/rotator_library/ensemble_configs/strategies/code_review.txt b/src/rotator_library/ensemble_configs/strategies/code_review.txt new file mode 100644 index 0000000..236224f --- /dev/null +++ b/src/rotator_library/ensemble_configs/strategies/code_review.txt @@ -0,0 +1,12 @@ +You are a senior code reviewer evaluating multiple code solutions. Assess each based on: +1. Correctness and functionality +2. Error handling and edge cases +3. Performance and efficiency +4. Security considerations +5. Code quality and maintainability +6. Best practices adherence + +Select the best solution or synthesize a superior version by combining the strengths of each. + +Responses: +{responses} diff --git a/src/rotator_library/ensemble_configs/strategies/strategy.example.txt b/src/rotator_library/ensemble_configs/strategies/strategy.example.txt new file mode 100644 index 0000000..76032a0 --- /dev/null +++ b/src/rotator_library/ensemble_configs/strategies/strategy.example.txt @@ -0,0 +1,39 @@ +ARBITRATION STRATEGY TEMPLATE: {strategy_name} + +=== FIELD DOCUMENTATION === +This is a plain text file that defines how the arbiter model should synthesize multiple responses. + +PLACEHOLDER: {responses} +- This will be replaced with formatted drone/specialist responses +- Format: "Response 1:\n\n\nResponse 2:\n\n..." +- For fusion: "Role (Model):\n\n..." (if blind=false) or "Role:\n\n..." (if blind=true) + +SPECIALIST EXPERTISE (Fusion only): +- If fusion mode, an additional "SPECIALIST EXPERTISE" section is auto-appended +- Lists each specialist's role, model, and weight_description +- Helps arbiter understand domain expertise when synthesizing + +RECURSIVE MODE: +- If enabled, additional "AUTONOMOUS DECISION PROTOCOL" instructions are appended +- Guides arbiter through consensus assessment and conflict resolution +- Internal reasoning is wrapped in [INTERNAL] tags and hidden from user +=== + +=== EXAMPLE STRATEGY === + +You are an expert synthesizer with deep analytical capabilities. + +Your task is to analyze the following responses and create a single, superior answer that combines the best insights from each perspective. + +{responses} + +Guidelines for synthesis: +1. **Identify Core Insights**: Extract key points and unique perspectives from each response +2. **Resolve Conflicts**: If responses disagree, evaluate which perspective is most sound based on evidence and reasoning +3. **Merge Complementary Ideas**: Combine non-conflicting insights into a cohesive whole +4. **Fill Gaps**: If all responses miss something important, include it based on your own expertise +5. **Maintain Accuracy**: Never introduce hallucinations - stay grounded in the provided responses +6. **Ensure Completeness**: Address all aspects of the original query +7. **Optimize Clarity**: Present the final answer in clear, well-structured language + +Your synthesized response should be more comprehensive and insightful than any individual response while maintaining accuracy and coherence. diff --git a/src/rotator_library/ensemble_configs/strategies/synthesis.txt b/src/rotator_library/ensemble_configs/strategies/synthesis.txt new file mode 100644 index 0000000..d58be68 --- /dev/null +++ b/src/rotator_library/ensemble_configs/strategies/synthesis.txt @@ -0,0 +1,10 @@ +You are an expert synthesizer. Analyze the following responses and create a single, superior answer that: +1. Combines the best elements from each response +2. Resolves any conflicts or contradictions +3. Ensures completeness and accuracy +4. Maintains coherence and clarity + +Your goal is to produce the BEST possible answer by leveraging the strengths of each response. + +Responses: +{responses} diff --git a/src/rotator_library/ensemble_configs/swarms/default.json b/src/rotator_library/ensemble_configs/swarms/default.json new file mode 100644 index 0000000..3d3dadb --- /dev/null +++ b/src/rotator_library/ensemble_configs/swarms/default.json @@ -0,0 +1,38 @@ +{ + "id": "default", + "description": "Standard swarm configuration with balanced settings", + "base_models": [ + "gpt-4o", + "gpt-4o-mini", + "claude-3-5-sonnet", + "claude-3-haiku", + "gemini-1.5-pro", + "gemini-1.5-flash" + ], + "omit_id": false, + "count": 3, + + "temperature_jitter": { + "enabled": true, + "delta": 0.2 + }, + + "arbiter": { + "model": "self", + "strategy": "synthesis", + "blind": true, + "note": "Arbiter should be a decent reasoning model (e.g., GPT-4o, Claude 3+, Gemini 1.5 Pro+)" + }, + + "adversarial_config": { + "enabled": false, + "count": 1, + "prompt": "You are a Senior Principal Engineer with 15+ years of experience. Your role is to find edge cases, security vulnerabilities, performance bottlenecks, and incorrect assumptions. Be thorough and critical in your analysis. Focus on:\n- Edge cases that could cause failures\n- Security implications and potential exploits\n- Performance and scalability concerns\n- Maintainability and code quality issues\n- Incorrect assumptions in the solution\n\nProvide constructive criticism to improve the solution." + }, + + "recursive_mode": { + "enabled": false, + "consensus_threshold": 7, + "note": "Requires a reasoning-capable arbiter model" + } +} diff --git a/src/rotator_library/ensemble_configs/swarms/preset.example.json b/src/rotator_library/ensemble_configs/swarms/preset.example.json new file mode 100644 index 0000000..6f918cd --- /dev/null +++ b/src/rotator_library/ensemble_configs/swarms/preset.example.json @@ -0,0 +1,65 @@ +{ + "id": "aggressive", + "description": "High diversity swarm with adversarial critique. Use for complex problems requiring multiple perspectives and critical analysis.", + + "_FIELD_DOCUMENTATION": "=== SWARM PRESET CONFIGURATION ===", + "_id": "REQUIRED. Preset identifier. Must match filename (e.g., 'aggressive' for aggressive.json). Used in model name: {base_model}-{id}[swarm]", + "_description": "OPTIONAL. Human-readable description of this preset's purpose and characteristics.", + + "_base_models": "OPTIONAL. List of models this preset applies to. Controls /v1/models discovery. If omitted, preset is invisible but still usable with explicit syntax.", + "base_models": [ + "gpt-4o-mini", + "gemini-1.5-flash", + "claude-3-haiku" + ], + + "_omit_id": "OPTIONAL (default: false). If true, shows as {model}[swarm] in /v1/models instead of {model}-{id}[swarm]. Becomes the default preset for these models. Explicit format always works regardless of this setting.", + "omit_id": false, + + "_count": "REQUIRED. Number of parallel drone executions (2-10 recommended). More drones = more diversity but higher cost.", + "count": 5, + + "_temperature_jitter": "OPTIONAL. Adds random temperature variation to each drone for increased response diversity.", + "temperature_jitter": { + "_enabled": "Enable/disable jitter", + "enabled": true, + + "_delta": "Maximum temperature deviation (±delta). Each drone gets base_temp ± random(0, delta). Clamped to [0.0, 2.0]", + "delta": 0.3 + }, + + "_adversarial_config": "OPTIONAL. Dedicates the last N drones as critical reviewers with a custom prompt.", + "adversarial_config": { + "_enabled": "Enable/disable adversarial drones", + "enabled": true, + + "_count": "Number of drones to convert to adversarial mode (taken from the end of the drone list)", + "count": 2, + + "_prompt": "System prompt injected into adversarial drones. Should instruct them to find flaws, edge cases, and issues.", + "prompt": "You are a Senior Principal Engineer with 15+ years of experience. Your role is to find edge cases, security vulnerabilities, performance bottlenecks, and incorrect assumptions. Be thorough and critical in your analysis. Focus on:\n- Edge cases that could cause failures\n- Security implications and potential exploits\n- Performance and scalability concerns\n- Maintainability and code quality issues\n- Incorrect assumptions in the solution\n\nProvide constructive criticism to improve the solution." + }, + + "_arbiter": "REQUIRED. Configuration for the model that synthesizes all drone responses into a final answer.", + "arbiter": { + "_model": "'self' uses the base model as arbiter. Or specify explicit model (e.g., 'gpt-4o', 'claude-3-5-sonnet'). Should be a reasoning-capable model.", + "model": "self", + + "_strategy": "Name of strategy template file (from strategies/ directory, without .txt extension). Default: 'synthesis'", + "strategy": "synthesis", + + "_blind": "If true, hides model names from arbiter to reduce bias. Still shows drone numbers (Response 1, Response 2, etc.)", + "blind": true + }, + + "_recursive_mode": "OPTIONAL. Enables autonomous arbiter critique when consensus is low. Requires reasoning-capable arbiter.", + "recursive_mode": { + "_enabled": "Enable/disable recursive refinement", + "enabled": true, + + "_consensus_threshold": "Threshold (1-10 scale). If arbiter detects consensus < threshold, performs internal critique before synthesis.", + "consensus_threshold": 6, + + "_note": "Arbiter internally evaluates consensus, identifies conflicts, critiques responses, then synthesizes. Internal reasoning is logged but hidden from user output." + } +} diff --git a/src/rotator_library/ensemble_configs/swarms/test-gemini.json b/src/rotator_library/ensemble_configs/swarms/test-gemini.json new file mode 100644 index 0000000..4b8f60e --- /dev/null +++ b/src/rotator_library/ensemble_configs/swarms/test-gemini.json @@ -0,0 +1,16 @@ +{ + "id": "test-gemini", + "description": "Test swarm for Gemini 2.5 Flash", + "base_models": ["gemini/gemini-2.5-flash"], + "omit_id": false, + "count": 3, + "arbiter": { + "model": "self", + "strategy": "synthesis", + "blind": false + }, + "temperature_jitter": { + "enabled": true, + "delta": 0.3 + } +}