e-alphazero

This repository contains an implementation of Epistemic AlphaZero which is an modification of AlphaZero that uses Epistemic Monte Carlo Tree Search (E-MCTS). We use JAX to make efficient use of GPU acceleration. Our framework is compatible with pgx environments, and in fact we implement two new ones: DeepSea and Subleq, (see src/envs/).

See also emctx, a fork of mctx which supports epistemic uncertainty propagation as described in the E-MCTS paper.

Structure

The program in src/:
- The entry is main.py.
- Self-play (i.e. environment interaction) is in selfplay.py.
- Replay buffer reanalyze is in reanalyze.py.
- Evaluation (i.e. determining strength) is in evaluate.py.
- Network training (i.e. policy and value improvement) is in train.py.
- Config options are in config.py, and the context that is created from them is in context.py.
Custom Environments are in envs/.
Network architectures and hashing algorithms (for uncertainty estimation) are in network/.
Scripts for submitting experiments and analysis are in scripts/.

Usage

Install Python.
Install pipenv with pip install --user pipenv.
Run pipenv install in this directory to install the required dependencies.
Run pipenv run python src/main.py with optional configuration specified as space-separated parameter=value.

Name		Name	Last commit message	Last commit date
Latest commit History 150 Commits
figs		figs
src		src
.gitignore		.gitignore
Pipfile		Pipfile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

e-alphazero

Structure

Usage

About

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

e-alphazero

Structure

Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors

Uh oh!

Languages