## PR #378 : LLM Prisoner's Dilemma: when agents reason about trust #379

abhinavk0220 · 2026-03-12T05:29:05Z

abhinavk0220
Mar 12, 2026

The Prisoner's Dilemma has fascinated game theorists, economists,
and biologists for 70 years because it captures something fundamental:
individual rationality and collective optimality are often in conflict.

Classical ABM tackles this beautifully with fixed strategies
tit-for-tat, always-defect, Pavlov. Axelrod's tournaments showed
that tit-for-tat wins in repeated games. Clean, elegant, reproducible.

But here's what fixed strategies can't do: change their mind based
on context.

In PR #378, agents use LLM Chain-of-Thought reasoning at each round.
They receive their full interaction history score, last action,
partner's recent moves and reason about what to do next. Not
lookup. Reasoning.

This means an agent can recognize a partner who defected once
out of apparent caution vs one who is systematically exploiting.
It can choose to forgive. It can bluff. It can build reputation.

The cooperation rate and average score plots tell the story
you can watch trust emerge or collapse in real time based purely
on how agents reason about each other.

Would love feedback on the reasoning prompt design specifically
whether the internal state passed to the LLM gives enough context
for meaningful strategic reasoning.

PR: #378

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

## PR #378 : LLM Prisoner's Dilemma: when agents reason about trust #379

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

## PR #378 : LLM Prisoner's Dilemma: when agents reason about trust #379

Uh oh!

abhinavk0220 Mar 12, 2026

Replies: 0 comments

abhinavk0220
Mar 12, 2026