Skip to content

add: LLM Prisoner's Dilemma — when agents reason about trust, not just strategy#378

Open
abhinavk0220 wants to merge 11 commits intomesa:mainfrom
abhinavk0220:add/llm-prisoners-dilemma
Open

add: LLM Prisoner's Dilemma — when agents reason about trust, not just strategy#378
abhinavk0220 wants to merge 11 commits intomesa:mainfrom
abhinavk0220:add/llm-prisoners-dilemma

Conversation

@abhinavk0220
Copy link

@abhinavk0220 abhinavk0220 commented Mar 12, 2026

Summary

Adds an iterated Prisoner's Dilemma simulation where agents use LLM
Chain-of-Thought reasoning
to decide whether to cooperate or defect
each round — instead of following fixed strategies like tit-for-tat,
always-defect, or random.


How this differs from classical Prisoner's Dilemma models

Classical PD models hardcode strategies — agents follow predefined
behavioral rules (tit-for-tat, always-defect, Pavlov) based on parameters.

This model takes a fundamentally different approach: agents have
no fixed strategy. Instead they use LLM Chain-of-Thought reasoning
to decide whether to cooperate or defect at each step based on their
interaction history, partner behavior, and long-term payoff reasoning.

The distinction matters because:

  • Fixed-strategy models are faster and analytically tractable
  • LLM-reasoning models produce emergent strategic behavior
    that fixed rules cannot capture
  • Both approaches are valid — they answer different research questions

What makes this different from a classical PD model

In a standard iterated PD model, strategies are fixed at initialization.
Here, agents reason about their situation at each step — observing partner
history, weighing trust against exploitation risk — and choose an action:

  • cooperate — Work together for mutual benefit (3, 3 payoff)
  • defect — Betray partner for personal gain (5, 0 payoff)

Payoff matrix:

Partner Cooperates Partner Defects
You Cooperate 3, 3 0, 5
You Defect 5, 0 1, 1

This produces emergent negotiation dynamics — reputation building,
trust signaling, strategic retaliation — that fixed rules cannot replicate.


Visualization — Emergent Game Theory from Pure LLM Reasoning

Round 1 — First cooperation attempt:

Round 1

After 5 rounds of LLM-driven reasoning:

Step 5 — full arc

What the charts show — round by round:

Round Cooperation Rate What happened
1 0.5 One agent tried cooperation to build trust; the other exploited it
2 0.5 Cooperating agent gave a second chance; exploiter defected again
3+ 0.0 Exploited agent switched to permanent defection — "I tried twice, got burned twice"
4–5 0.0 Stable mutual defection — Nash equilibrium lock-in

Why this is significant: This is Axelrod's Evolution of Cooperation
(1984) core result reproduced with zero hardcoded strategy. No
tit-for-tat rule, no punishment parameter, no threshold. The LLM reasoned
its way to this behavior by reflecting on interaction history at each step.


How to Run

cp .env.example .env  # fill in your API key
pip install -r requirements.txt
solara run app.py

Supported LLM Providers

Gemini, OpenAI, Anthropic, Ollama (local) — configured via .env.
A .env.example is included for easy setup.


Reference

Axelrod, R. (1984). The Evolution of Cooperation. Basic Books.


GSoC contributor checklist

Context & motivation

Classical PD models use fixed strategies. Real strategic behavior is
driven by reasoning — agents assess trust, weigh past interactions, and
decide whether to cooperate or retaliate. This model replaces fixed
strategies with LLM reasoning to produce more behaviorally realistic
game dynamics.

What I learned

LLM agents independently converge to the Nash equilibrium in early
rounds (defect when no trust exists) and develop punishment behavior
after exploitation — without being told to. The emergent tit-for-tat
pattern from Axelrod's foundational work appears naturally from
language-based reasoning, not from a hardcoded rule.

Learning repo

🔗 My learning repo: https://github.com/abhinavk0220/GSoC-learning-space
🔗 Relevant model(s): https://github.com/abhinavk0220/GSoC-learning-space/tree/main/models/llm_prisoners_dilemma

Readiness checks

  • This PR addresses an agreed-upon problem (linked issue or discussion with maintainer approval), or is a small/trivial fix
  • I have read the contributing guide and deprecation policy
  • I have performed a self-review: I reviewed my own PR
  • Another GSoC contributor has reviewed this PR
  • Tests pass locally (pytest --cov=mesa tests/)
  • Code is formatted (ruff check . --fix)
  • If applicable: documentation, examples, and/or migration guide are updated

AI Assistance Disclosure

This PR was developed with AI assistance (Claude) for code generation
and debugging. All code has been reviewed, tested, and understood by
the contributor.


Mesa Examples Review Checklist (#390)

Does it belong?

  • No significant overlap with existing examples
  • Well-scoped simplest model that demonstrates the idea
  • Showcases Mesa features not already well-covered
  • Showcases interesting ABM mechanics (LLM reasoning vs fixed strategies)

Is it correct?

  • Uses current Mesa APIs (DataCollector, Model)
  • Runs and visualizes out of the box (requires API key — see .env.example)
  • Deterministic with fixed rng seed — LLM outputs are non-deterministic by nature
  • Moved to llm/ directory

Is it clean?

  • No dead code or unused imports
  • Clear naming, logic readable
  • README explains what it does, what it demonstrates, how to run it
  • PR follows template, commits reasonably clean

abhinavKumar0206 and others added 2 commits March 12, 2026 10:40
…terated Prisoner's Dilemma simulation where agents useLLM Chain-of-Thought reasoning to decide whether to cooperateor defect each round, instead of fixed strategies like tit-for-tator always-defect.Agents reason about partner history, trust signals, and long-termpayoff before deciding — producing emergent negotiation andtrust-building behavior that fixed-strategy models cannot capture.Payoff matrix:- Both cooperate: 3, 3- Defect vs cooperate: 5, 0- Both defect: 1, 1Visualization tracks cooperation rate and average score over rounds.Includes .env.example for Gemini, OpenAI, Anthropic, and Ollama.Reference: Axelrod, R. (1984). The Evolution of Cooperation.
@AdityaChauhanX07
Copy link

Nice concept, Prisoner's Dilemma with LLM reasoning makes a lot of sense.
I went through the code and noticed a couple things:

Decisions are hardcoded in model.py - on lines 91-92, both actions are just set to "cooperate":

action1 = "cooperate"
action2 = "cooperate"

_parse_action() exists in the agent and there's a step_prompt ready to go, but model.step() never calls reasoning.plan() or _parse_action(). So the LLM never actually runs during the game. Every round both agents just cooperate no matter what, which means the trust/betrayal/reputation dynamics from the description can't happen yet.

llm_model not wired to agents - the model takes llm_model as a param and there's a selector in the UI, but PrisonerAgent.__init__ doesn't accept llm_model. It just uses the LLMAgent default (gemini/gemini-2.0-flash). So switching models in the UI doesn't do anything.

The agent design looks good though - payoff matrix, _parse_action(), apply_decision(), internal state tracking are all clean. Just needs the reasoning hooked up in model.step().

@EwoutH
Copy link
Member

EwoutH commented Mar 15, 2026

Thanks for the effort. Could you:

  • Add a screenshot of the visualisation
  • target a new llm directory (similar on how GIS has its own directory)

@abhinavk0220
Copy link
Author

Thanks for the effort. Could you:

  • Add a screenshot of the visualisation
  • target a new llm directory (similar on how GIS has its own directory)

Thanks for the review @EwoutH! Will add the screenshot and move
to the llm/ directory. Will push the update tonight.

@abhinavk0220
Copy link
Author

Nice concept, Prisoner's Dilemma with LLM reasoning makes a lot of sense. I went through the code and noticed a couple things:

Decisions are hardcoded in model.py - on lines 91-92, both actions are just set to "cooperate":

action1 = "cooperate"
action2 = "cooperate"

_parse_action() exists in the agent and there's a step_prompt ready to go, but model.step() never calls reasoning.plan() or _parse_action(). So the LLM never actually runs during the game. Every round both agents just cooperate no matter what, which means the trust/betrayal/reputation dynamics from the description can't happen yet.

llm_model not wired to agents - the model takes llm_model as a param and there's a selector in the UI, but PrisonerAgent.__init__ doesn't accept llm_model. It just uses the LLMAgent default (gemini/gemini-2.0-flash). So switching models in the UI doesn't do anything.

The agent design looks good though - payoff matrix, _parse_action(), apply_decision(), internal state tracking are all clean. Just needs the reasoning hooked up in model.step().

Thanks @AdityaChauhanX07 for the thorough review! All three issues are now fixed:

  1. model.step() now calls agent.reasoning.plan() and
    agent._parse_action() LLM reasoning actually runs each round
  2. PrisonerAgent.__init__ now accepts llm_model parameter
  3. llm_model is properly wired from PrisonersDilemmaModel
    PrisonerAgent

Glad the agent design looked clean just needed the reasoning hooked
up in model.step() as you pointed out. The model now actually produces
the trust and reputation dynamics described. Appreciate the careful read!

@abhinavk0220
Copy link
Author

Thanks for the effort. Could you:

  • Add a screenshot of the visualisation
  • target a new llm directory (similar on how GIS has its own directory)

Hi @EwoutH! Both requests addressed:

  1. Screenshot : added prisoners_dilemma_dashboard.png showing
    the Solara visualization with cooperation rate and defection
    tracking charts, LLM model selector, and agent count controls.

  2. llm/ directory : moved from examples/llm_prisoners_dilemma/
    to llm/llm_prisoners_dilemma/ following the same pattern as gis/.
    Will apply the same structure to the other three LLM example PRs
    (feat: add LLM Opinion Dynamics example using mesa-llm #360, feat: add LLM Schelling Segregation example using mesa-llm #363, add: LLM Epidemic (SIR) model using Chain-of-Thought reasoning #372) as well.

Also peer review is complete. @AdityaChauhanX07 reviewed the PR and
caught that LLM reasoning wasn't actually being called in model.step()
(actions were hardcoded). That's now fixed agents genuinely reason
using CoT before each round, and llm_model is properly wired to the
UI selector.
Feel free to remove the peer review label!

@EwoutH
Copy link
Member

EwoutH commented Mar 15, 2026

Thanks!

Could you:

  • Add a screenshot or gif of the visualisation to the PR description
  • Request a peer-review (and maybe do one or two yourself)

@abhinavk0220
Copy link
Author

abhinavk0220 commented Mar 16, 2026

Thanks!

Could you:

  • Add a screenshot or gif of the visualisation to the PR description
  • Request a peer-review (and maybe do one or two yourself)

Hi @EwoutH! Updates done:

Also did a self-review on Files changed tab.

READY TO MERGE !!!!!!!

@EwoutH
Copy link
Member

EwoutH commented Mar 16, 2026

Thanks both for your initial efforts. You’re on the right track.

However, it needs still needs a serious peer review and maintainer review. Showing how you handle this process is more important for GSoC than if the PR is merged or not.

@abhinavk0220
Copy link
Author

Thanks both for your initial efforts. You’re on the right track.

However, it needs still needs a serious peer review and maintainer review. Showing how you handle this process is more important for GSoC than if the PR is merged or not.

Thanks @EwoutH! Understood the process matters more than the merge.

I'll get a more thorough peer review done on #378 using the #390
checklist. I've already reviewed #384 and plan to do one or two more
this week.

Also working on the proposal excited about both Mesa-LLM Iteration
and Mesa Examples Revival.

@apfine
Copy link

apfine commented Mar 17, 2026

@abhinavk0220
I will get back with a well thought review soon.

@ZhehaoZhao423
Copy link

Hi @abhinavk0220,

Thank you for this impressive contribution! Moving the Prisoner's Dilemma from rigid, rule-based strategies to LLM-driven Chain-of-Thought reasoning is a fantastic showcase for the mesa-llm extension.

I've run the model locally, reviewed the code, and evaluated it strictly against the Mesa Examples #390 Review Guidelines. Here are my notes:

Does it belong?

Absolutely. It is well-scoped, doesn't unnecessarily overlap with existing examples, and perfectly demonstrates the emergent social dynamics (trust, betrayal, forgiveness) that happen when agents have memory and reasoning capabilities.

Is it correct and current?

The model visualizes correctly, but there are two critical vulnerabilities that need to be addressed to ensure both scientific correctness and UI stability.

  • Brittle Output Parsing (Logic Flaw): In agent._parse_action(), the logic checks for the substring:

    if "defect" in content:
        return "defect"

    Since LLMs are highly verbose, if an agent reasons: "I value our trust, therefore I will not defect today", your parser will incorrectly classify this as a defection, breaking the core game theory mechanics.
    Suggestion: Make the parsing robust. You could prompt the LLM to output a strict JSON format (e.g., {"action": "cooperate"}) or mandate a specific ending phrase like <ACTION>: COOPERATE and use regex to extract it.

  • Sequential IO & Simultaneity ("Dirty Read" Risk): In model.step(), you iterate through pairs, calling plan() and applying decisions sequentially. This creates two major issues:

    1. Performance: For 6 agents, that's 6 synchronous network calls per step, causing the Solara UI to completely freeze.
    2. Scientific Correctness: The Prisoner's Dilemma relies on simultaneous decision-making. By calculating plans and applying decisions in the same loop, you risk "Dirty Reads" in strict ABM—where later pairs in the loop might theoretically observe mid-step state changes caused by earlier pairs.

    Suggestion: You should separate the "thinking" phase from the "acting" phase by executing API calls concurrently before applying any decisions. ThreadPoolExecutor is the safest, most standard way to do this without disrupting Solara's existing event loop.

    Here is a quick refactor for your step() function that solves both the latency and the simultaneity issue:

    from concurrent.futures import ThreadPoolExecutor
    
    def get_agent_action(agent):
        """Helper to fetch LLM response for a single agent."""
        agent._update_internal_state()
        plan = agent.reasoning_instance.plan(obs=agent.generate_obs())
        content = str(getattr(plan.llm_plan, "content", plan.llm_plan))
        return agent, agent._parse_action(content)
    
    # Inside model.step(self):
    pairs = self._pair_agents()
    
    # 1. Fetch all LLM decisions concurrently (Massive speedup + Simultaneous reasoning)
    with ThreadPoolExecutor(max_workers=self.num_agents) as executor:
        # Returns a dict of {agent_instance: "cooperate"/"defect"}
        agent_actions = dict(executor.map(get_agent_action, self.agents))
        
    # 2. Apply outcomes simultaneously (Prevents Dirty Reads)
    for agent1, agent2 in pairs:
        action1, action2 = agent_actions[agent1], agent_actions[agent2]
        agent1.apply_decision(action1, action2)
        agent2.apply_decision(action2, action1)
        # ... update your model counters here ...
  • Mesa Time Tracking: You initialized a custom self.round_number = 0.
    Suggestion: In Mesa, it is best practice to use the built-in self.steps (or Mesa 4.0's model.time) rather than maintaining a custom counter, as modern DataCollectors rely on standard time attributes.

Is it clean?

The code is mostly very readable, but the developer experience could be slightly improved:

  • Separation of Concerns: SYSTEM_PROMPT is currently a large global string hardcoded inside agent.py.
    Suggestion: To keep the agent logic clean and make it easier for researchers to "tune" the agent's personality, consider moving this string into a separate prompts.py file or a YAML config.

This is a really exciting example! Once the concurrent API calling and the parsing logic are hardened, this will be an excellent template for future LLM-based ABMs. Great work so far!

abhinavKumar0206 and others added 2 commits March 18, 2026 22:37
- Robust action parsing: replace brittle substring match with regex on
  mandatory <ACTION>: COOPERATE/DEFECT tag in system prompt, preventing
  false defection when LLM says "I will not defect"
- Concurrent LLM calls: use ThreadPoolExecutor so all agents reason in
  parallel before any outcome is applied, fixing Solara UI freeze and
  ensuring true simultaneity (no dirty reads between pairs)
- Mesa time: replace custom round_number with super().step() so model
  uses Mesa's built-in time tracking via model.time

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@abhinavk0220
Copy link
Author

Thanks @ZhehaoZhao423 for the thorough review — all three points addressed in the latest commit:

1. Brittle output parsing — fixed
Updated _parse_action() to use regex matching on a mandatory <ACTION>: COOPERATE / <ACTION>: DEFECT tag that the system prompt now explicitly requires at the end of every LLM response. Reasoning like "I will not defect" will no longer trigger a false defection.

2. Sequential IO & simultaneity — fixed
Refactored step() into two explicit phases:

  • Phase 1: All agents call the LLM concurrently via ThreadPoolExecutor, so all decisions are locked in before anyone acts. No dirty reads, no UI freeze.
  • Phase 2: Outcomes are applied simultaneously using the precomputed agent_actions dict.

3. Mesa time tracking — fixed
Removed custom round_number counter and replaced with super().step() so the model uses Mesa's built-in model.time.

@apfine
Copy link

apfine commented Mar 19, 2026

@abhinavk0220
I analyzed the pull for a couple of days and here is my review -

Summary: What I think about this

This is a valuable and interesting addition to the Mesa Examples repository. By using a language model to drive the Chain-of-Thought reasoning instead of just using game theory the model is able to capture the complex parts of social interaction. Like forgiveness, trust and reputation. The code has become much better through the review process. Is now a robust simulation that handles things like multiple agents making decisions at the same time.

Details I saw

  • Concurrent Execution for Simultaneity: The change to using ThreadPoolExecutor for the large language model API calls is a great idea. It stops the Solara UI from freezing. Makes sure the Prisoners Dilemma is simulated correctly by preventing agents from seeing changes made by other agents.

  • Robust Output Parsing: Changing from substring matching to using regex parsing with an ACTION tag is a good fix. It stops the model from breaking when the large language model gives a response.

  • Mesa Standard Alignment: Using the built-in Mesa time attributes of a custom round counter makes the code work seamlessly with DataCollectors.

  • Clean UI Integration: The Solara dashboard shows the parts of the simulation making it easy to see things like cooperation rates and collective welfare.

Improvements that can be done

  • Inline Documentation: There are no comments in the code now. Adding comments to explain the ThreadPoolExecutor implementation and the prompt parsing logic will be important for contributors to understand the model.

  • Prompt Abstraction: The SYSTEM_PROMPT is embedded in the agent logic. Moving it to a file would make it easier for researchers to change the personalities or behaviors of the agents.

  • Token/Cost Tracking: The large language model calls can become expensive. Adding a token counter or cost tracker would help users manage their API budgets when scaling up the simulation.

Design questions

  • Memory Decay: As the simulation runs the internal context history will keep growing. Should we add a limit to how many roundsre remembered, to mimic human memory limitations?

  • Cheap Talk Mechanics: now agents only see past actions. Would the simulation change if agents could communicate with each other before making their decisions? The Mesa Examples repository is what this is about. The large language model is what is being used to drive the Chain-of-Thought reasoning in the Mesa Examples repository. The Mesa Examples repository and the large language model are important, to this.

@savirpatil
Copy link

Does it belong?
This is a well-chosen domain for an LLM-agent example. The Prisoner's Dilemma is a classic and widely understood problem, which makes it a good method for demonstrating how LLM reasoning produces different dynamics from fixed strategies. The model is also well-scoped: no grid, no spatial complexity, just agents reasoning and interacting, which keeps the focus on the LLM behavior.

Is it correct and current?
The Mesa APIs used are current and correct. The two-phase step design, where all LLM decisions are gathered in parallel via ThreadPoolExecutor before any outcomes are applied, is a thoughtful implementation that correctly enforces simultaneous decision-making. One thing worth checking is that calling super().step() at the top of model.step() increments the Mesa step counter but the agents are never activated via Mesa's scheduler, which may cause minor inconsistencies in step tracking depending on how DataCollector uses the counter.

Is it clean?
The code is clean, well-named, and easy to follow. The README explains the game, the payoff matrix, the motivation for using LLMs, and how to run it, all concisely. The app.py is appropriately short. The only minor loose end is that the spatial component (make_mpl_space_component) is imported in app.py but the model has no grid, so agents have no positions to render; the unused space component could confuse readers if it was left in accidentally.

Summary
This is a well-executed example where the simultaneous-decision architecture is elegant, the structured action tag in the system prompt shows real care for reliability, and the README does an excellent job of explaining why LLM reasoning is more interesting here than fixed strategies. It is probably the most polished of the LLM examples submitted so far, and with the minor points above addressed it would be a great addition to the repo.

abhinavKumar0206 and others added 3 commits March 21, 2026 10:31
- Add prisoners_dilemma_dashboard.png (round 1: all defect — Nash equilibrium)
- Add prisoners_dilemma_initial.png (step 0 empty state)
- Expand README Visualization section: explain why 100% defection in
  round 1 is the correct game-theoretic outcome, not a bug

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replace placeholder screenshot with 5-round run showing cooperation
collapse after exploitation — emergent Nash equilibrium lock-in with
no hardcoded strategy. Expand README with round-by-round analysis
table and connection to Axelrod (1984).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- prisoners_dilemma_initial.png: step 1 showing first cooperation attempt
- prisoners_dilemma_dashboard.png: step 5 showing full arc (coop peak → collapse)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@abhinavk0220
Copy link
Author

abhinavk0220 commented Mar 21, 2026

Thanks @apfine for the detailed review and design questions addressing each point:

Inline Documentation: Fair point. The ThreadPoolExecutor block and the <ACTION> tag parsing are the two non-obvious pieces. Will add comments explaining both.

Prompt Abstraction: Acknowledged moving SYSTEM_PROMPT to prompts.py would make it cleaner for researchers who want to experiment with agent personalities. For this initial PR I kept it in agent.py for simplicity (one less file for contributors to trace), but happy to move it if maintainers prefer.

Token/Cost Tracking: Good idea for a future enhancement not in scope for this PR but worth noting in the README as a suggested extension.

Memory Decay: Already handled ShortTermMemory(agent=agent, n=5, display=False) limits each agent's memory to 5 rounds. Older interactions are automatically dropped, which naturally models human memory limitations. The n=5 parameter can be tuned via the model constructor.

Cheap Talk (pre-decision communication): This is the most interesting design question. Mesa-LLM has a speak_to() tool in inbuilt_tools.py that lets agents send messages before acting. Adding it here would let agents signal intentions ("I'll cooperate if you do") and potentially sustain cooperation clusters. That's probably a follow-up example rather than a change to this one — it would shift the model from pure Prisoner's Dilemma to something closer to a negotiation game.

@abhinavk0220
Copy link
Author

abhinavk0220 commented Mar 21, 2026

Thanks @savirpatil for the careful review two specific responses:

On super().step() and step tracking:

super().step() is the Mesa 4.x standard for incrementing model.time it replaces the old custom self.round_number counter that was in an earlier version (removed after @ZhehaoZhao423's review). The agents aren't activated via Mesa's scheduler intentionally: this model doesn't use self.agents.do("step") because activation order matters in the Prisoner's Dilemma all agents must decide before anyone acts (the two-phase ThreadPoolExecutor design). The DataCollector uses model.time which super().step() increments correctly. Happy to add a comment in the code clarifying this design choice.

On make_mpl_space_component:

That import was removed in an earlier commit the current app.py only uses make_plot_component for the two chart panels (CoopPlot and ScorePlot). There's no grid in this model so no space component. You may have reviewed a previous version of the file. The current state has no unused imports (ruff passes clean).

Appreciate the thorough read this is exactly the kind of review that improves the model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants