RL Gomoku

A reinforcement learning framework for developing and evaluating Gomoku (Five in a Row) agents on a 9×9 board.

Installation

pip install -e .

Requirements: Python >= 3.11, NumPy, Pygame, PyTorch

Project Structure

agents/
├── base_agent.py          # BaseAgent abstract class
├── random_agent.py        # Uniform-random baseline agent
├── threatening_agent.py   # Curriculum opponent: blocks threats at tunable probability
├── strategic_agent.py     # Curriculum opponent: full offence + defence heuristics
├── dqn_simple_jeson.py    # DQN agent (Simple CNN — production model)
└── dqn_jeson.py           # DQN agent (Residual CNN — experimental architecture)

game/
├── logic.py               # Core Gomoku rules (default board size: 9×9)
├── gomoku_env.py          # RL environment wrapper (sparse + shaped rewards)
├── board.py               # Pygame visualisation (window: 650×550)
├── match.py               # Headless evaluation utility
└── threat_detector.py     # Standalone threat-detection helpers

train_sparse_jeson.py      # Stage 1: baseline training vs RandomAgent (sparse rewards)
train_phase1_shaped.py     # Stage 2: shaped-reward fine-tuning vs RandomAgent
train_phase2_continue.py   # Stage 3: curriculum vs ThreateningAgent
train_phase2_selfplay.py   # Stage 3 (alt): self-play training
train_phase3_mixed.py      # Stage 4: mixed-opponent curriculum
train_phase4_threeway.py   # Stage 5: three-way curriculum (Random + Threatening + Strategic)

evaluate_baseline.py       # Evaluate agent win-rate against baselines
evaluate_threatening.py    # Evaluate agent vs ThreateningAgent at various skill levels
test_agent.py              # Quick sanity-check script
progress_log.md            # Detailed training notes and results per stage

main.py                    # Entry point (visual PyGame mode or headless evaluation)

Agent Architecture

Simple DQN (`agents/dqn_simple_jeson.py`) — Production Model

The model that achieved 95–100% win rate vs RandomAgent and strong performance across curriculum stages.

Input: (batch, 3, 9, 9)
  Channel 0: Agent's own pieces
  Channel 1: Opponent's pieces
  Channel 2: Constant plane = player ID (+1 or -1)

Conv2D(3 → 64,  kernel=3, pad=1) + BatchNorm + ReLU
Conv2D(64 → 128, kernel=3, pad=1) + BatchNorm + ReLU
Conv2D(128 → 128, kernel=3, pad=1) + BatchNorm + ReLU
Flatten → FC(128×9×9 → 512) + BatchNorm + ReLU
FC(512 → 81)   ← Q-value for each board cell

Training algorithm: Double DQN

Online network selects actions; target network evaluates them
Replay buffer: 100,000 experiences
Optimizer: Adam (lr = 1e-4), gamma = 0.99
Gradient clipping: max norm 1.0
Target network sync: every 1,000 steps

Residual DQN (`agents/dqn_jeson.py`) — Experimental

Deeper architecture using residual blocks, developed as an alternative for stronger strategic play.

Input: (batch, 3, board_size, board_size)
Conv2D(3 → 128, kernel=3, pad=1) + ReLU
5 × ResidualBlock(128)
  └─ Conv2D(128→128) + ReLU → Conv2D(128→128) + skip connection
Conv2D(128 → 32, kernel=1) + ReLU
Flatten → FC(32×9×9 → 81)   ← Q-values

Curriculum Opponents

Agent	File	Strategy
`RandomAgent`	`agents/random_agent.py`	Uniform-random valid moves
`ThreateningAgent`	`agents/threatening_agent.py`	Blocks 4-in-a-row with configurable probability
`StrategicAgent`	`agents/strategic_agent.py`	Wins, blocks, extends sequences, uses opening patterns

ThreateningAgent

Parameterised by block_probability (0.0–1.0). Only detects and blocks immediate 4-in-a-row threats. Designed for gradual curriculum learning.

from agents.threatening_agent import ThreateningAgent
opp = ThreateningAgent(player_id=-1, block_probability=0.5, board_size=9)

StrategicAgent

Priority: win immediately → block opponent win → extend 4-in-a-row → block 4-in-a-row → extend 3-in-a-row → opening pattern → random fallback. Parameterised by skill_level (0.0 = random, 1.0 = always strategic).

from agents.strategic_agent import StrategicAgent
opp = StrategicAgent(player_id=-1, skill_level=0.8, board_size=9)

Training Pipeline

Training progressed through multiple stages. All models target a 9×9 board with a 5-in-a-row win condition.

Stage 1 — Sparse Rewards vs RandomAgent (`train_sparse_jeson.py`)

Rewards: +1 win, -1 loss, 0 draw/ongoing
Episodes: ~20,000
Epsilon: 1.0 → 0.02
Result: 95–97% win rate vs RandomAgent

Stage 2 — Shaped Rewards vs RandomAgent (`train_phase1_shaped.py`)

Fine-tunes the Stage 1 model with intermediate rewards:

Created 3-in-a-row: +0.15
Created 4-in-a-row: +0.40
Blocked opponent 3-in-a-row: +0.10
Blocked opponent 4-in-a-row: +0.30

Stage 3 — Curriculum vs ThreateningAgent (`train_phase2_continue.py` / `train_phase2_selfplay.py`)

Gradually increases opponent block_probability
Also includes self-play variant for diversity

Stage 4 — Mixed Opponents (`train_phase3_mixed.py`)

Mix of RandomAgent and ThreateningAgent at varying skill levels
Prevents over-fitting to a single opponent type

Stage 5 — Three-Way Curriculum (`train_phase4_threeway.py`)

Final stage using RandomAgent, ThreateningAgent, and StrategicAgent simultaneously. Best model saved to models_phase4_v2/phase4_best_strategic.pt.

Environment: GomokuEnv

game/gomoku_env.py wraps GomokuLogic with a standard RL interface.

from game.logic import GomokuLogic
from game.gomoku_env import GomokuEnv

env = GomokuEnv(GomokuLogic(board_size=9), use_sparse_rewards=True)

state = env.reset()
next_state, reward, done, info = env.step((row, col))

use_sparse_rewards=True (default): only terminal rewards (±1). use_sparse_rewards=False: adds shaped intermediate rewards via _evaluate_threat_value and _evaluate_blocking_move.

Usage

Play Against the Agent (Visual Mode)

python main.py

This launches a PyGame window where you (Human) play against the trained DQN agent loaded from models_phase4_v2/phase4_best_strategic.pt.

Headless Evaluation

python main.py --headless

Runs 100 games between two agents and prints win/loss/draw statistics.

Training

python train_sparse_jeson.py        # Stage 1: baseline
python train_phase1_shaped.py       # Stage 2: shaped rewards
python train_phase2_continue.py     # Stage 3: curriculum
python train_phase3_mixed.py        # Stage 4: mixed
python train_phase4_threeway.py     # Stage 5: three-way curriculum

Evaluation

python evaluate_baseline.py         # Win rate vs RandomAgent
python evaluate_threatening.py      # Win rate vs ThreateningAgent
python test_agent.py                # Quick sanity check

Developing Your Own Agent

Create agents/my_agent.py and inherit from BaseAgent:

from agents.base_agent import BaseAgent
import numpy as np

class MyAgent(BaseAgent):
    def __init__(self, player_id):
        super().__init__(player_id)

    def predict(self, board_state):
        """
        Return next move as (row, col) tuple.

        board_state: numpy array (9×9)
            1  = your pieces
            -1 = opponent pieces
            0  = empty cells
        """
        valid_moves = list(zip(*np.where(board_state == 0)))
        return valid_moves[0]  # Replace with your logic

Key Classes

Class	Location	Description
`BaseAgent`	`agents/base_agent.py`	Abstract base — implement `predict(board_state)`
`GomokuLogic`	`game/logic.py`	Game rules, `make_move()`, win detection
`GomokuEnv`	`game/gomoku_env.py`	RL env — `reset()`, `step(action)`
`DQNAgent` (simple)	`agents/dqn_simple_jeson.py`	Production DQN agent
`DQNAgent` (residual)	`agents/dqn_jeson.py`	Experimental deeper DQN agent
`eval_agents()`	`game/match.py`	Headless evaluation, alternates first move

Notable Changes vs Baseline Repository

Area	Change
Board size	Default changed from 15×15 to 9×9
Window size	Pygame window reduced from 900×700 to 650×550
`GomokuEnv`	Added shaped reward methods, `use_sparse_rewards` flag
`main.py`	Now loads trained DQN agent for human-vs-AI play
`.gitignore`	Added `*.pt`, model directories, archive, and dev artefacts
New agents	`dqn_jeson.py`, `dqn_simple_jeson.py`, `threatening_agent.py`, `strategic_agent.py`
New scripts	Full training pipeline (5 stages) + evaluation scripts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RL Gomoku

Installation

Project Structure

Agent Architecture

Simple DQN (`agents/dqn_simple_jeson.py`) — Production Model

Residual DQN (`agents/dqn_jeson.py`) — Experimental

Curriculum Opponents

ThreateningAgent

StrategicAgent

Training Pipeline

Stage 1 — Sparse Rewards vs RandomAgent (`train_sparse_jeson.py`)

Stage 2 — Shaped Rewards vs RandomAgent (`train_phase1_shaped.py`)

Stage 3 — Curriculum vs ThreateningAgent (`train_phase2_continue.py` / `train_phase2_selfplay.py`)

Stage 4 — Mixed Opponents (`train_phase3_mixed.py`)

Stage 5 — Three-Way Curriculum (`train_phase4_threeway.py`)

Environment: GomokuEnv

Usage

Play Against the Agent (Visual Mode)

Headless Evaluation

Training

Evaluation

Developing Your Own Agent

Key Classes

Notable Changes vs Baseline Repository

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
agents		agents
evaluation_results		evaluation_results
game		game
.gitignore		.gitignore
README.md		README.md
evaluate_baseline.py		evaluate_baseline.py
evaluate_threatening.py		evaluate_threatening.py
main.py		main.py
progress_log.md		progress_log.md
pyproject.toml		pyproject.toml
test_agent.py		test_agent.py
train_jeson.py		train_jeson.py
train_phase1_shaped.py		train_phase1_shaped.py
train_phase2_continue.py		train_phase2_continue.py
train_phase2_selfplay.py		train_phase2_selfplay.py
train_phase3_mixed.py		train_phase3_mixed.py
train_phase4_threeway.py		train_phase4_threeway.py
train_sparse_jeson.py		train_sparse_jeson.py

Folders and files

Latest commit

History

Repository files navigation

RL Gomoku

Installation

Project Structure

Agent Architecture

Simple DQN (agents/dqn_simple_jeson.py) — Production Model

Residual DQN (agents/dqn_jeson.py) — Experimental

Curriculum Opponents

ThreateningAgent

StrategicAgent

Training Pipeline

Stage 1 — Sparse Rewards vs RandomAgent (train_sparse_jeson.py)

Stage 2 — Shaped Rewards vs RandomAgent (train_phase1_shaped.py)

Stage 3 — Curriculum vs ThreateningAgent (train_phase2_continue.py / train_phase2_selfplay.py)

Stage 4 — Mixed Opponents (train_phase3_mixed.py)

Stage 5 — Three-Way Curriculum (train_phase4_threeway.py)

Environment: GomokuEnv

Usage

Play Against the Agent (Visual Mode)

Headless Evaluation

Training

Evaluation

Developing Your Own Agent

Key Classes

Notable Changes vs Baseline Repository

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Simple DQN (`agents/dqn_simple_jeson.py`) — Production Model

Residual DQN (`agents/dqn_jeson.py`) — Experimental

Stage 1 — Sparse Rewards vs RandomAgent (`train_sparse_jeson.py`)

Stage 2 — Shaped Rewards vs RandomAgent (`train_phase1_shaped.py`)

Stage 3 — Curriculum vs ThreateningAgent (`train_phase2_continue.py` / `train_phase2_selfplay.py`)

Stage 4 — Mixed Opponents (`train_phase3_mixed.py`)

Stage 5 — Three-Way Curriculum (`train_phase4_threeway.py`)

Packages