feat: adversarial council subsystem for multi-perspective evaluation#848
feat: adversarial council subsystem for multi-perspective evaluation#848Ridwannurudeen wants to merge 1 commit intoNousResearch:mainfrom
Conversation
Add a 5-persona adversarial deliberation system (Advocate, Skeptic, Oracle, Contrarian, Arbiter) as a general-purpose evaluation subsystem for hermes-agent. New tools: - council_query: full 5-persona deliberation on any question - council_evaluate: evaluate content quality through adversarial critique - council_gate: quick safety review before high-stakes actions New environments: - CouncilEvaluator: reusable drop-in evaluator for any RL environment - OuroborosEnv: RL environment using council-based rewards + DPO extraction New skills: - multi-perspective-analysis: guide for using council deliberation - bayesian-synthesis: Arbiter's Bayesian reasoning methodology - adversarial-critique: stress-testing and safety gating workflows 53 tests passing (37 tool tests + 16 environment tests).
|
Thanks for this PR — the persona design is genuinely creative (Popperian falsificationism for Skeptic, Kuhnian paradigm critique for Contrarian, Bayesian synthesis for Arbiter). The concept has real potential, but it doesn't belong in hermes-agent core. Here's why and what to do instead. Why not core?1.
|
|
Thanks for the detailed review — all four points are well taken. The core tools injection and provider bypass were the biggest oversights on my end. You're right that the MCP server is the right path. I'll restructure it as a standalone |
|
MCP server version ready for review: https://github.com/Ridwannurudeen/hermes-council Changes from the original PR:
Install: pip install git+https://github.com/Ridwannurudeen/hermes-council.gitConfig: mcp_servers:
council:
command: hermes-council-server |
Summary
Adds an adversarial multi-perspective council as a general-purpose evaluation subsystem for hermes-agent. Five personas (Advocate, Skeptic, Oracle, Contrarian, Arbiter) deliberate from distinct intellectual traditions, producing structured verdicts with confidence scores, evidence links, and DPO preference pairs.
This fills a real gap — hermes-agent currently has no way to evaluate open-ended agent output quality. The council provides structured adversarial deliberation, automatic DPO pair generation, and confidence-gated safety.
What's included
3 new tools (
tools/council_tool.py,tools/council_personas.py):council_query— full 5-persona deliberation on any questioncouncil_evaluate— evaluate content quality through adversarial critiquecouncil_gate— quick safety review (Skeptic + Oracle + Arbiter) before high-stakes actionsRL integration (
environments/council_evaluator.py,environments/ouroboros_env.py):CouncilEvaluator— drop-in evaluator anyHermesAgentBaseEnvcan import forcompute_reward()OuroborosEnv— research agent environment with council-based multi-signal rewards and DPO extraction3 skills (
skills/council/):multi-perspective-analysis— guide for using council deliberationbayesian-synthesis— Arbiter's explicit prior/evidence/posterior methodologyadversarial-critique— stress-testing and safety gating workflowsConfig (
datagen-config-examples/ouroboros.yaml)Architecture
Changes to existing files
model_tools.py— added"tools.council_tool"to_discover_tools()(+1 line)toolsets.py— added"council"toolset definition +"council_query"to_HERMES_CORE_TOOLS(+7 lines)Key design decisions
openailibrary (already a dependency). Works with OpenRouter, NousResearch API, or any OpenAI-compatible endpoint.check_fnreturns True only if OPENROUTER_API_KEY, OPENAI_API_KEY, or NOUS_API_KEY is set.~/.hermes/config.yaml.Test plan
tests/tools/test_council.py)tests/environments/test_ouroboros.py)council_queryincluded in_HERMES_CORE_TOOLShermes --toolsets council,webwith council_querypython environments/ouroboros_env.py servewith Atropos