⚖️ AI Consensus Validator (Multi-Agent Debate)

A practical implementation of the "Multi-Agent Debate" architecture to reduce hallucinations and improve LLM factuality through iterative consensus.

📖 Introduction

AI Consensus Validator is a research and verification tool that subjects Large Language Model (LLM) responses to peer scrutiny. Instead of relying on a single output, this system orchestrates an autonomous debate among different models (such as GPT-4, Claude 3.5, and Gemini) until they reach a consensus validated by an impartial judge.

This project is an applied implementation of the concepts presented in the academic paper: 📄 "Improving Factuality and Reasoning in Language Models through Multiagent Debate" (Yilun Du et al., MIT/Google).

🚀 Key Features

Multi-Agent Architecture: Graph-based orchestration using LangGraph.
Model Agnostic: Native support for:
- 🟢 OpenAI (GPT-4o, GPT-3.5)
- 🟠 Anthropic (Claude 3.5 Sonnet)
- 🔵 Google (Gemini 2.5 Pro/Flash)
- ⚡ Groq (Llama 4, Qwen 3, Kimi K2)
Feedback Loop: Models view their rivals' responses, critique them, and self-correct errors in real-time.
Dynamic Judge: You can configure which specific model acts as the arbiter of truth.
Modern GUI: UI built with Streamlit to visualize the step-by-step thinking process.
State Management: Persistent chat history and secure API Key configuration via the interface.

🧠 How it Works

The system utilizes a cyclic state graph (StateGraph) following this logical flow:

  graph TD
    A[Start: User Question] --> B["Parallel Generation (Debaters)"]
    B --> C[Judge Evaluates Responses]
    C --> D{Consensus?}
    D -- Yes --> E["✅ Final Validated Answer"]
    D -- "No (Discrepancy)" --> F[Inject Cross-Feedback]
    F --> B

Round 0: Selected models answer the user's question independently.
Judgment: The Judge analyzes semantic and factual coherence between answers.
Debate: If discrepancies exist, the system injects the other models' answers into each agent's context.
Refinement: Models reconsider their answers based on peer critique.
Iteration: The process repeats until consensus is reached or the round limit is hit.

🛠️ Installation

Clone the repository:

git clone [https://github.com/your-username/ai-consensus-validator.git](https://github.com/your-username/ai-consensus-validator.git)
cd ai-consensus-validator

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```

▶️ Usage

Run the Streamlit application:
```
streamlit run app.py
```
Access the interface: Open your browser at http://localhost:8501.
App Configuration:
- Step 1: Enter your API Keys in the sidebar (OpenAI, Anthropic, Google, Groq).
- Step 2: Select at least 2 models to participate in the debate.
- Step 3: Select 1 model to act as the Judge/Validator.
- Step 4: Enter your question in the chat input and watch the debate unfold step-by-step.

📂 Project Structure

ai-consensus-validator/
├── app.py              # Frontend (Streamlit UI & Session Management)
├── graph_validator.py          # Graph Logic (LangGraph, Nodes & Client Routing)
├── requirements.txt    # Python Dependencies
└── README.md           # Documentation

🤝 Contributions

Contributions are welcome! If you have ideas to improve the judge's logic, add new model providers, or improve the UI:

Fork the project.
Create a branch (git checkout -b feature/AmazingFeature).
Commit your changes (git commit -m 'Add some AmazingFeature').
Push to the branch (git push origin feature/AmazingFeature).
Open a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Note: This project uses commercial APIs. Be sure to check the costs associated with using GPT-4, Claude, and other models during intensive debate loops (as each round generates multiple API calls).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

⚖️ AI Consensus Validator (Multi-Agent Debate)

📖 Introduction

🚀 Key Features

🧠 How it Works

🛠️ Installation

▶️ Usage

📂 Project Structure

🤝 Contributions

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
app.py		app.py
graph_validator.py		graph_validator.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

⚖️ AI Consensus Validator (Multi-Agent Debate)

📖 Introduction

🚀 Key Features

🧠 How it Works

🛠️ Installation

▶️ Usage

📂 Project Structure

🤝 Contributions

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages