Skip to content

feat: adversarial council subsystem for multi-perspective evaluation#848

Closed
Ridwannurudeen wants to merge 1 commit intoNousResearch:mainfrom
Ridwannurudeen:feat/council-subsystem
Closed

feat: adversarial council subsystem for multi-perspective evaluation#848
Ridwannurudeen wants to merge 1 commit intoNousResearch:mainfrom
Ridwannurudeen:feat/council-subsystem

Conversation

@Ridwannurudeen
Copy link

@Ridwannurudeen Ridwannurudeen commented Mar 10, 2026

Summary

Adds an adversarial multi-perspective council as a general-purpose evaluation subsystem for hermes-agent. Five personas (Advocate, Skeptic, Oracle, Contrarian, Arbiter) deliberate from distinct intellectual traditions, producing structured verdicts with confidence scores, evidence links, and DPO preference pairs.

This fills a real gap — hermes-agent currently has no way to evaluate open-ended agent output quality. The council provides structured adversarial deliberation, automatic DPO pair generation, and confidence-gated safety.

What's included

3 new tools (tools/council_tool.py, tools/council_personas.py):

  • council_query — full 5-persona deliberation on any question
  • council_evaluate — evaluate content quality through adversarial critique
  • council_gate — quick safety review (Skeptic + Oracle + Arbiter) before high-stakes actions

RL integration (environments/council_evaluator.py, environments/ouroboros_env.py):

  • CouncilEvaluator — drop-in evaluator any HermesAgentBaseEnv can import for compute_reward()
  • OuroborosEnv — research agent environment with council-based multi-signal rewards and DPO extraction

3 skills (skills/council/):

  • multi-perspective-analysis — guide for using council deliberation
  • bayesian-synthesis — Arbiter's explicit prior/evidence/posterior methodology
  • adversarial-critique — stress-testing and safety gating workflows

Config (datagen-config-examples/ouroboros.yaml)

Architecture

Layer 0: council_personas.py     (pure data, zero deps)
Layer 1: council_tool.py         (core logic, imports personas)
Layer 2: council_evaluator.py    (RL integration) + registration (model_tools, toolsets)
Layer 3: ouroboros_env.py         (RL environment) + skills + datagen config

Changes to existing files

  • model_tools.py — added "tools.council_tool" to _discover_tools() (+1 line)
  • toolsets.py — added "council" toolset definition + "council_query" to _HERMES_CORE_TOOLS (+7 lines)

Key design decisions

  • LLM-agnostic: Uses OpenAI-compatible API via openai library (already a dependency). Works with OpenRouter, NousResearch API, or any OpenAI-compatible endpoint.
  • Gated availability: check_fn returns True only if OPENROUTER_API_KEY, OPENAI_API_KEY, or NOUS_API_KEY is set.
  • Custom personas: Users can add/override personas via ~/.hermes/config.yaml.
  • DPO-native: Every deliberation automatically extracts preference pairs (Arbiter-aligned = chosen, overruled dissenter = rejected).

Test plan

  • 37 tool tests passing (tests/tools/test_council.py)
  • 16 environment tests passing (tests/environments/test_ouroboros.py)
  • Tool registration verified (3 tools in council toolset)
  • Toolset resolution verified
  • council_query included in _HERMES_CORE_TOOLS
  • End-to-end: hermes --toolsets council,web with council_query
  • RL: python environments/ouroboros_env.py serve with Atropos

Add a 5-persona adversarial deliberation system (Advocate, Skeptic, Oracle,
Contrarian, Arbiter) as a general-purpose evaluation subsystem for hermes-agent.

New tools:
- council_query: full 5-persona deliberation on any question
- council_evaluate: evaluate content quality through adversarial critique
- council_gate: quick safety review before high-stakes actions

New environments:
- CouncilEvaluator: reusable drop-in evaluator for any RL environment
- OuroborosEnv: RL environment using council-based rewards + DPO extraction

New skills:
- multi-perspective-analysis: guide for using council deliberation
- bayesian-synthesis: Arbiter's Bayesian reasoning methodology
- adversarial-critique: stress-testing and safety gating workflows

53 tests passing (37 tool tests + 16 environment tests).
@teknium1
Copy link
Contributor

Thanks for this PR — the persona design is genuinely creative (Popperian falsificationism for Skeptic, Kuhnian paradigm critique for Contrarian, Bayesian synthesis for Arbiter). The concept has real potential, but it doesn't belong in hermes-agent core. Here's why and what to do instead.

Why not core?

1. _HERMES_CORE_TOOLS inclusion

Adding council_query to _HERMES_CORE_TOOLS means it appears in every session across all platforms (CLI, Telegram, Discord, etc.). The check_fn gates on OPENROUTER_API_KEY / OPENAI_API_KEY / NOUS_API_KEY — which are basically always set since the agent needs them to function. So in practice every user gets this tool injected into every API call whether they want it or not.

2. Bypasses the agent's provider chain

The tool creates its own AsyncOpenAI client with its own API key resolution (_get_api_config), completely ignoring the agent's configured provider, model, base_url, and api_mode. If a user is on Codex, Anthropic direct, or any non-OpenRouter provider, the council would silently use a different provider/model than the rest of the session.

3. Cost: 5 hidden LLM calls per invocation

council_query makes 5 full LLM calls (4 deliberators + arbiter). council_gate makes 3. These costs are invisible to the user and could add up fast — especially if the agent starts using council_gate before actions (which the tool description encourages).

4. Brittle regex parsing

Confidence, dissent flags, and key points are parsed via regex from free-form LLM text. Different models format differently — this will silently fall back to defaults (confidence=0.5, dissent=false) and produce unreliable DPO pairs. Use structured output / JSON mode instead.

The right path: MCP server

Same recommendation as PR #819 — Hermes has a native MCP client, and an MCP server is the right extension mechanism:

hermes-council/
├── pyproject.toml
├── server.py              # FastMCP server
├── council/
│   ├── personas.py        # Your persona definitions (keep as-is)
│   ├── deliberation.py    # Core logic (keep as-is)
│   └── config.py          # Model/provider config
├── skills/
│   ├── multi-perspective-analysis/
│   ├── bayesian-synthesis/
│   └── adversarial-critique/
└── tests/

Key changes for the MCP version:

  • Use the server's own config for model/provider (not env var sniffing)
  • Pool the OpenAI client instead of creating one per call
  • Use JSON mode / structured output instead of regex parsing
  • Tools only appear when the user configures the MCP server — no core bloat
  • The RL environments (CouncilEvaluator, OuroborosEnv) can import from the package independently

User installation

pip install hermes-council

Add to ~/.hermes/config.yaml:

mcp:
  servers:
    council:
      command: hermes-council-server

All 3 tools appear automatically in the next session.


We'd like to see this as an MCP server package — the persona framework and deliberation architecture are solid. Happy to review the MCP version when it's ready.

@teknium1 teknium1 closed this Mar 11, 2026
@Ridwannurudeen
Copy link
Author

Thanks for the detailed review — all four points are well taken.

The core tools injection and provider bypass were the biggest oversights on my end. You're right that the check_fn gate is effectively a no-op since those keys are almost always set, and rolling a separate AsyncOpenAI client completely sidesteps the agent's configured provider chain. The hidden multi-call cost and regex parsing brittleness are fair calls too — JSON mode / structured output is the obvious fix there.

MCP server is the right path. I'll restructure it as a standalone hermes-council package with its own config for model/provider, pooled client, structured output instead of regex, and the skills bundled alongside. Will open a new PR when it's ready for review.

@Ridwannurudeen
Copy link
Author

MCP server version ready for review: https://github.com/Ridwannurudeen/hermes-council

Changes from the original PR:

  • Standalone FastMCP stdio server (pip install hermes-council)
  • Own config via COUNCIL_* env vars (no provider bypass)
  • JSON mode structured output with Pydantic validation (regex fallback for non-supporting providers)
  • Cost transparency: _meta block in every response (calls_made, model, total_tokens)
  • CouncilEvaluator ships as library, OuroborosEnv as example template
  • 90 tests, all mocked

Install:

pip install git+https://github.com/Ridwannurudeen/hermes-council.git

Config:

mcp_servers:
  council:
    command: hermes-council-server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants