feat: adversarial council subsystem for multi-perspective evaluation by Ridwannurudeen · Pull Request #848 · NousResearch/hermes-agent

Ridwannurudeen · 2026-03-10T17:53:24Z

Summary

Adds an adversarial multi-perspective council as a general-purpose evaluation subsystem for hermes-agent. Five personas (Advocate, Skeptic, Oracle, Contrarian, Arbiter) deliberate from distinct intellectual traditions, producing structured verdicts with confidence scores, evidence links, and DPO preference pairs.

This fills a real gap — hermes-agent currently has no way to evaluate open-ended agent output quality. The council provides structured adversarial deliberation, automatic DPO pair generation, and confidence-gated safety.

What's included

3 new tools (tools/council_tool.py, tools/council_personas.py):

council_query — full 5-persona deliberation on any question
council_evaluate — evaluate content quality through adversarial critique
council_gate — quick safety review (Skeptic + Oracle + Arbiter) before high-stakes actions

RL integration (environments/council_evaluator.py, environments/ouroboros_env.py):

CouncilEvaluator — drop-in evaluator any HermesAgentBaseEnv can import for compute_reward()
OuroborosEnv — research agent environment with council-based multi-signal rewards and DPO extraction

3 skills (skills/council/):

multi-perspective-analysis — guide for using council deliberation
bayesian-synthesis — Arbiter's explicit prior/evidence/posterior methodology
adversarial-critique — stress-testing and safety gating workflows

Config (datagen-config-examples/ouroboros.yaml)

Architecture

Layer 0: council_personas.py     (pure data, zero deps)
Layer 1: council_tool.py         (core logic, imports personas)
Layer 2: council_evaluator.py    (RL integration) + registration (model_tools, toolsets)
Layer 3: ouroboros_env.py         (RL environment) + skills + datagen config

Changes to existing files

model_tools.py — added "tools.council_tool" to _discover_tools() (+1 line)
toolsets.py — added "council" toolset definition + "council_query" to _HERMES_CORE_TOOLS (+7 lines)

Key design decisions

LLM-agnostic: Uses OpenAI-compatible API via openai library (already a dependency). Works with OpenRouter, NousResearch API, or any OpenAI-compatible endpoint.
Gated availability: check_fn returns True only if OPENROUTER_API_KEY, OPENAI_API_KEY, or NOUS_API_KEY is set.
Custom personas: Users can add/override personas via ~/.hermes/config.yaml.
DPO-native: Every deliberation automatically extracts preference pairs (Arbiter-aligned = chosen, overruled dissenter = rejected).

Test plan

37 tool tests passing (tests/tools/test_council.py)
16 environment tests passing (tests/environments/test_ouroboros.py)
Tool registration verified (3 tools in council toolset)
Toolset resolution verified
council_query included in _HERMES_CORE_TOOLS
End-to-end: hermes --toolsets council,web with council_query
RL: python environments/ouroboros_env.py serve with Atropos

Add a 5-persona adversarial deliberation system (Advocate, Skeptic, Oracle, Contrarian, Arbiter) as a general-purpose evaluation subsystem for hermes-agent. New tools: - council_query: full 5-persona deliberation on any question - council_evaluate: evaluate content quality through adversarial critique - council_gate: quick safety review before high-stakes actions New environments: - CouncilEvaluator: reusable drop-in evaluator for any RL environment - OuroborosEnv: RL environment using council-based rewards + DPO extraction New skills: - multi-perspective-analysis: guide for using council deliberation - bayesian-synthesis: Arbiter's Bayesian reasoning methodology - adversarial-critique: stress-testing and safety gating workflows 53 tests passing (37 tool tests + 16 environment tests).

teknium1 · 2026-03-11T16:05:47Z

Thanks for this PR — the persona design is genuinely creative (Popperian falsificationism for Skeptic, Kuhnian paradigm critique for Contrarian, Bayesian synthesis for Arbiter). The concept has real potential, but it doesn't belong in hermes-agent core. Here's why and what to do instead.

Why not core?

1. `_HERMES_CORE_TOOLS` inclusion

Adding council_query to _HERMES_CORE_TOOLS means it appears in every session across all platforms (CLI, Telegram, Discord, etc.). The check_fn gates on OPENROUTER_API_KEY / OPENAI_API_KEY / NOUS_API_KEY — which are basically always set since the agent needs them to function. So in practice every user gets this tool injected into every API call whether they want it or not.

2. Bypasses the agent's provider chain

The tool creates its own AsyncOpenAI client with its own API key resolution (_get_api_config), completely ignoring the agent's configured provider, model, base_url, and api_mode. If a user is on Codex, Anthropic direct, or any non-OpenRouter provider, the council would silently use a different provider/model than the rest of the session.

3. Cost: 5 hidden LLM calls per invocation

council_query makes 5 full LLM calls (4 deliberators + arbiter). council_gate makes 3. These costs are invisible to the user and could add up fast — especially if the agent starts using council_gate before actions (which the tool description encourages).

4. Brittle regex parsing

Confidence, dissent flags, and key points are parsed via regex from free-form LLM text. Different models format differently — this will silently fall back to defaults (confidence=0.5, dissent=false) and produce unreliable DPO pairs. Use structured output / JSON mode instead.

The right path: MCP server

Same recommendation as PR #819 — Hermes has a native MCP client, and an MCP server is the right extension mechanism:

hermes-council/
├── pyproject.toml
├── server.py              # FastMCP server
├── council/
│   ├── personas.py        # Your persona definitions (keep as-is)
│   ├── deliberation.py    # Core logic (keep as-is)
│   └── config.py          # Model/provider config
├── skills/
│   ├── multi-perspective-analysis/
│   ├── bayesian-synthesis/
│   └── adversarial-critique/
└── tests/

Key changes for the MCP version:

Use the server's own config for model/provider (not env var sniffing)
Pool the OpenAI client instead of creating one per call
Use JSON mode / structured output instead of regex parsing
Tools only appear when the user configures the MCP server — no core bloat
The RL environments (CouncilEvaluator, OuroborosEnv) can import from the package independently

User installation

pip install hermes-council

Add to ~/.hermes/config.yaml:

mcp:
  servers:
    council:
      command: hermes-council-server

All 3 tools appear automatically in the next session.

We'd like to see this as an MCP server package — the persona framework and deliberation architecture are solid. Happy to review the MCP version when it's ready.

Ridwannurudeen · 2026-03-11T16:15:23Z

Thanks for the detailed review — all four points are well taken.

The core tools injection and provider bypass were the biggest oversights on my end. You're right that the check_fn gate is effectively a no-op since those keys are almost always set, and rolling a separate AsyncOpenAI client completely sidesteps the agent's configured provider chain. The hidden multi-call cost and regex parsing brittleness are fair calls too — JSON mode / structured output is the obvious fix there.

MCP server is the right path. I'll restructure it as a standalone hermes-council package with its own config for model/provider, pooled client, structured output instead of regex, and the skills bundled alongside. Will open a new PR when it's ready for review.

Ridwannurudeen · 2026-03-11T23:31:08Z

MCP server version ready for review: https://github.com/Ridwannurudeen/hermes-council

Changes from the original PR:

Standalone FastMCP stdio server (pip install hermes-council)
Own config via COUNCIL_* env vars (no provider bypass)
JSON mode structured output with Pydantic validation (regex fallback for non-supporting providers)
Cost transparency: _meta block in every response (calls_made, model, total_tokens)
CouncilEvaluator ships as library, OuroborosEnv as example template
90 tests, all mocked

Install:

pip install git+https://github.com/Ridwannurudeen/hermes-council.git

Config:

mcp_servers:
  council:
    command: hermes-council-server

teknium1 closed this Mar 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: adversarial council subsystem for multi-perspective evaluation#848

feat: adversarial council subsystem for multi-perspective evaluation#848
Ridwannurudeen wants to merge 1 commit intoNousResearch:mainfrom
Ridwannurudeen:feat/council-subsystem

Ridwannurudeen commented Mar 10, 2026 •

edited

Loading

Uh oh!

teknium1 commented Mar 11, 2026

Uh oh!

Ridwannurudeen commented Mar 11, 2026

Uh oh!

Ridwannurudeen commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Ridwannurudeen commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's included

Architecture

Changes to existing files

Key design decisions

Test plan

Uh oh!

teknium1 commented Mar 11, 2026

Why not core?

1. _HERMES_CORE_TOOLS inclusion

2. Bypasses the agent's provider chain

3. Cost: 5 hidden LLM calls per invocation

4. Brittle regex parsing

The right path: MCP server

User installation

Uh oh!

Ridwannurudeen commented Mar 11, 2026

Uh oh!

Ridwannurudeen commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ridwannurudeen commented Mar 10, 2026 •

edited

Loading

1. `_HERMES_CORE_TOOLS` inclusion