From ba2c85284bf3ca4b31c52b73fbad0d01153309a0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 14:43:08 -0700 Subject: [PATCH 01/45] updated .gitignore --- .gitignore | 104 ++++++++++++++++- .pylintrc | 325 ++++++++++++++++++++++++++++++++++++++++++++++++++++ .travis.yml | 30 +++++ 3 files changed, 455 insertions(+), 4 deletions(-) create mode 100644 .pylintrc create mode 100644 .travis.yml diff --git a/.gitignore b/.gitignore index ba50ec8..8de6bbf 100644 --- a/.gitignore +++ b/.gitignore @@ -1,9 +1,105 @@ +# Personal *.DS_Store - -# log and output files *.hlt *.log - -# developer environment .idea/ + +### Python template __pycache__/ +*.py[cod] +*$py.class + +# C extensions +*.so + +# Distribution / packaging +.Python +build/ +develop-eggs/ +dist/ +downloads/ +eggs/ +.eggs/ +lib/ +lib64/ +parts/ +sdist/ +var/ +wheels/ +*.egg-info/ +.installed.cfg +*.egg + +# PyInstaller +# Usually these files are written by a python script from a template +# before PyInstaller builds the exe, so as to inject date/other infos into it. +*.manifest +*.spec + +# Installer logs +pip-log.txt +pip-delete-this-directory.txt + +# Unit test / coverage reports +htmlcov/ +.tox/ +.coverage +.coverage.* +.cache +nosetests.xml +coverage.xml +*.cover +.hypothesis/ + +# Translations +*.mo +*.pot + +# Django stuff: +*.log +local_settings.py + +# Flask stuff: +instance/ +.webassets-cache + +# Scrapy stuff: +.scrapy + +# Sphinx documentation +docs/_build/ + +# PyBuilder +target/ + +# Jupyter Notebook +.ipynb_checkpoints + +# pyenv +.python-version + +# celery beat schedule file +celerybeat-schedule + +# SageMath parsed files +*.sage.py + +# Environments +.env +.venv +env/ +venv/ +ENV/ + +# Spyder project settings +.spyderproject +.spyproject + +# Rope project settings +.ropeproject + +# mkdocs documentation +/site + +# mypy +.mypy_cache/ \ No newline at end of file diff --git a/.pylintrc b/.pylintrc new file mode 100644 index 0000000..03b76bb --- /dev/null +++ b/.pylintrc @@ -0,0 +1,325 @@ +[MASTER] + +# Specify a configuration file. +#rcfile= + +# Python code to execute, usually for sys.path manipulation such as +# pygtk.require(). +#init-hook= + +# Profiled execution. +profile=no + +# Add files or directories to the blacklist. They should be base names, not +# paths. +ignore=CVS + +# Pickle collected data for later comparisons. +persistent=yes + +# List of plugins (as comma separated values of python modules names) to load, +# usually to register additional checkers. +load-plugins= + + +[MESSAGES CONTROL] + +# Enable the message, report, category or checker with the given id(s). You can +# either give multiple identifier separated by comma (,) or put this option +# multiple time. See also the "--disable" option for examples. +enable=indexing-exception,old-raise-syntax + +# Disable the message, report, category or checker with the given id(s). You +# can either give multiple identifiers separated by comma (,) or put this +# option multiple times (only on the command line, not in the configuration +# file where it should appear only once).You can also use "--disable=all" to +# disable everything first and then reenable specific checks. For example, if +# you want to run only the similarities checker, you can use "--disable=all +# --enable=similarities". If you want to run only the classes checker, but have +# no Warning level messages displayed, use"--disable=all --enable=classes +# --disable=W" +disable=design,similarities,no-self-use,attribute-defined-outside-init,locally-disabled,star-args,pointless-except,bad-option-value,global-statement,fixme,suppressed-message,useless-suppression,locally-enabled,no-member,no-name-in-module,import-error,unsubscriptable-object,unbalanced-tuple-unpacking,undefined-variable,not-context-manager + + +# Set the cache size for astng objects. +cache-size=500 + + +[REPORTS] + +# Set the output format. Available formats are text, parseable, colorized, msvs +# (visual studio) and html. You can also give a reporter class, eg +# mypackage.mymodule.MyReporterClass. +output-format=text + +# Put messages in a separate file for each module / package specified on the +# command line instead of printing them on stdout. Reports (if any) will be +# written in a file name "pylint_global.[txt|html]". +files-output=yes + +# Tells whether to display a full report or only the messages +reports=yes + +# Python expression which should return a note less than 10 (10 is the highest +# note). You have access to the variables errors warning, statement which +# respectively contain the number of errors / warnings messages and the total +# number of statements analyzed. This is used by the global evaluation report +# (RP0004). +evaluation=10.0 - ((float(5 * error + warning + refactor + convention) / statement) * 10) + +# Add a comment according to your evaluation note. This is used by the global +# evaluation report (RP0004). +comment=yes + +# Template used to display messages. This is a python new-style format string +# used to format the message information. See doc for all details +#msg-template= + + +[TYPECHECK] + +# Tells whether missing members accessed in mixin class should be ignored. A +# mixin class is detected if its name ends with "mixin" (case insensitive). +ignore-mixin-members=yes + +# List of classes names for which member attributes should not be checked +# (useful for classes with attributes dynamically set). +ignored-classes=SQLObject + +# When zope mode is activated, add a predefined set of Zope acquired attributes +# to generated-members. +zope=no + +# List of members which are set dynamically and missed by pylint inference +# system, and so shouldn't trigger E0201 when accessed. Python regular +# expressions are accepted. +generated-members=REQUEST,acl_users,aq_parent + +# List of decorators that create context managers from functions, such as +# contextlib.contextmanager. +contextmanager-decorators=contextlib.contextmanager,contextlib2.contextmanager + + +[VARIABLES] + +# Tells whether we should check for unused import in __init__ files. +init-import=yes + +# A regular expression matching the beginning of the name of dummy variables +# (i.e. not used). +dummy-variables-rgx=^\*{0,2}(_$|unused_|dummy_) + +# List of additional names supposed to be defined in builtins. Remember that +# you should avoid to define new builtins when possible. +additional-builtins= + + +[BASIC] + +# Required attributes for module, separated by a comma +required-attributes= + +# List of builtins function names that should not be used, separated by a comma +bad-functions=apply,input,reduce + + +# Disable the report(s) with the given id(s). +# All non-Google reports are disabled by default. +disable-report=R0001,R0002,R0003,R0004,R0101,R0102,R0201,R0202,R0220,R0401,R0402,R0701,R0801,R0901,R0902,R0903,R0904,R0911,R0912,R0913,R0914,R0915,R0921,R0922,R0923 + +# Regular expression which should only match correct module names +module-rgx=(([a-z_][a-z0-9_]*)|([A-Z][a-zA-Z0-9]+))$ + +# Regular expression which should only match correct module level names +const-rgx=^(_?[A-Z][A-Z0-9_]*|__[a-z0-9_]+__|_?[a-z][a-z0-9_]*)$ + +# Regular expression which should only match correct class names +class-rgx=^_?[A-Z][a-zA-Z0-9]*$ + +# Regular expression which should only match correct function names +function-rgx=^(?:(?P_?[A-Z][a-zA-Z0-9]*)|(?P_?[a-z][a-z0-9_]*))$ + +# Regular expression which should only match correct method names +method-rgx=^(?:(?P__[a-z0-9_]+__|next)|(?P_{0,2}[A-Z][a-zA-Z0-9]*)|(?P_{0,2}[a-z][a-z0-9_]*))$ + +# Regular expression which should only match correct instance attribute names +attr-rgx=^_{0,2}[a-z][a-z0-9_]*$ + +# Regular expression which should only match correct argument names +argument-rgx=^[a-z][a-z0-9_]*$ + +# Regular expression which should only match correct variable names +variable-rgx=^[a-z][a-z0-9_]*$ + +# Regular expression which should only match correct attribute names in class +# bodies +class-attribute-rgx=^(_?[A-Z][A-Z0-9_]*|__[a-z0-9_]+__|_?[a-z][a-z0-9_]*)$ + +# Regular expression which should only match correct list comprehension / +# generator expression variable names +inlinevar-rgx=^[a-z][a-z0-9_]*$ + +# Good variable names which should always be accepted, separated by a comma +good-names=main,_ + +# Bad variable names which should always be refused, separated by a comma +bad-names= + +# Regular expression which should only match function or class names that do +# not require a docstring. +no-docstring-rgx=(__.*__|main) + +# Minimum line length for functions/classes that require docstrings, shorter +# ones are exempt. +docstring-min-length=10 + + +[FORMAT] + +# Maximum number of characters on a single line. +max-line-length=120 + +# Regexp for a line that is allowed to be longer than the limit. +ignore-long-lines=^\s*(# )??$ + +# Allow the body of an if to be on the same line as the test if there is no +# else. +single-line-if-stmt=y + +# List of optional constructs for which whitespace checking is disabled +no-space-check= + +# Maximum number of lines in a module +max-module-lines=99999 + +# String used as indentation unit. This is usually " " (4 spaces) or "\t" (1 +# tab). +indent-string=' ' + + +[SIMILARITIES] + +# Minimum lines number of a similarity. +min-similarity-lines=4 + +# Ignore comments when computing similarities. +ignore-comments=yes + +# Ignore docstrings when computing similarities. +ignore-docstrings=yes + +# Ignore imports when computing similarities. +ignore-imports=no + + +[MISCELLANEOUS] + +# List of note tags to take in consideration, separated by a comma. +notes= + + +[IMPORTS] + +# Deprecated modules which should not be used, separated by a comma +deprecated-modules=regsub,TERMIOS,Bastion,rexec,sets + +# Create a graph of every (i.e. internal and external) dependencies in the +# given file (report RP0402 must not be disabled) +import-graph= + +# Create a graph of external dependencies in the given file (report RP0402 must +# not be disabled) +ext-import-graph= + +# Create a graph of internal dependencies in the given file (report RP0402 must +# not be disabled) +int-import-graph= + + +[CLASSES] + +# List of interface methods to ignore, separated by a comma. This is used for +# instance to not check methods defines in Zope's Interface base class. +ignore-iface-methods=isImplementedBy,deferred,extends,names,namesAndDescriptions,queryDescriptionFor,getBases,getDescriptionFor,getDoc,getName,getTaggedValue,getTaggedValueTags,isEqualOrExtendedBy,setTaggedValue,isImplementedByInstancesOf,adaptWith,is_implemented_by + +# List of method names used to declare (i.e. assign) instance attributes. +defining-attr-methods=__init__,__new__,setUp + +# List of valid names for the first argument in a class method. +valid-classmethod-first-arg=cls,class_ + +# List of valid names for the first argument in a metaclass class method. +valid-metaclass-classmethod-first-arg=mcs + + +[DESIGN] + +# Maximum number of arguments for function / method +max-args=5 + +# Argument names that match this expression will be ignored. Default to name +# with leading underscore +ignored-argument-names=_.* + +# Maximum number of locals for function / method body +max-locals=15 + +# Maximum number of return / yield for function / method body +max-returns=6 + +# Maximum number of branch for function / method body +max-branches=12 + +# Maximum number of statements in function / method body +max-statements=50 + +# Maximum number of parents for a class (see R0901). +max-parents=7 + +# Maximum number of attributes for a class (see R0902). +max-attributes=7 + +# Minimum number of public methods for a class (see R0903). +min-public-methods=2 + +# Maximum number of public methods for a class (see R0904). +max-public-methods=20 + + +[EXCEPTIONS] + +# Exceptions that will emit a warning when being caught. Defaults to +# "Exception" +overgeneral-exceptions=Exception,StandardError,BaseException + + +[AST] + +# Maximum line length for lambdas +short-func-length=1 + +# List of module members that should be marked as deprecated. +# All of the string functions are listed in 4.1.4 Deprecated string functions +# in the Python 2.4 docs. +deprecated-members=string.atof,string.atoi,string.atol,string.capitalize,string.expandtabs,string.find,string.rfind,string.index,string.rindex,string.count,string.lower,string.split,string.rsplit,string.splitfields,string.join,string.joinfields,string.lstrip,string.rstrip,string.strip,string.swapcase,string.translate,string.upper,string.ljust,string.rjust,string.center,string.zfill,string.replace,sys.exitfunc + + +[DOCSTRING] + +# List of exceptions that do not need to be mentioned in the Raises section of +# a docstring. +ignore-exceptions=AssertionError,NotImplementedError,StopIteration,TypeError + + + +[TOKENS] + +# Number of spaces of indent required when the last token on the preceding line +# is an open (, [, or {. +indent-after-paren=4 + + +[Louis LINES] + +# Regexp for a proper copyright notice. +copyright=Copyright \d{4} Louis R?mus\. +All [Rr]ights [Rr]eserved\. diff --git a/.travis.yml b/.travis.yml new file mode 100644 index 0000000..315e19a --- /dev/null +++ b/.travis.yml @@ -0,0 +1,30 @@ +sudo: false + +language: python + +python: + - 3.5 + +install: + - pip install tox + - pip install coveralls + - pip install pylint + - pip install -r requirements.txt + +env: + - $COVPYYAML=cov41-pyyaml,coveralls41 + +script: + #- pytest # or py.test for Python versions 3.5 and below + #- tox -e $(echo py$TRAVIS_PYTHON_VERSION | tr -d . | sed -e 's/pypypy/pypy/')-$COVPYYAML + - coverage run --source=featureEngineering setup.py test + - pylint main.py + - python -m unittest discover -v + +after_success: + coveralls + +notifications: + email: + on_success: change + on_failure: change From 9932d9f946b65dce232fc46df05f48c81c6ce0f7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:07:40 -0700 Subject: [PATCH 02/45] reformated --- .travis.yml | 9 +++------ networking/hlt_networking.py | 3 ++- networking/pipe_socket_translator.py | 3 +-- public/models/agent/agent.py | 2 -- public/models/agent/vanillaAgent.py | 1 - public/models/bot/improvedBot.py | 2 +- public/models/bot/randomBot.py | 2 +- public/models/bot/trainedBot.py | 3 +-- public/models/visualize_score.py | 2 +- requirements.txt | 1 + tests/__init__.py | 5 +++++ train/worker.py | 2 +- 12 files changed, 17 insertions(+), 18 deletions(-) create mode 100644 requirements.txt create mode 100644 tests/__init__.py diff --git a/.travis.yml b/.travis.yml index 315e19a..b590579 100644 --- a/.travis.yml +++ b/.travis.yml @@ -15,14 +15,11 @@ env: - $COVPYYAML=cov41-pyyaml,coveralls41 script: - #- pytest # or py.test for Python versions 3.5 and below - #- tox -e $(echo py$TRAVIS_PYTHON_VERSION | tr -d . | sed -e 's/pypypy/pypy/')-$COVPYYAML - - coverage run --source=featureEngineering setup.py test - - pylint main.py + # Tests - python -m unittest discover -v + # Style checks + - pylint main.py -after_success: - coveralls notifications: email: diff --git a/networking/hlt_networking.py b/networking/hlt_networking.py index 5e79010..6a0d250 100644 --- a/networking/hlt_networking.py +++ b/networking/hlt_networking.py @@ -1,5 +1,6 @@ import socket -from public.hlt import translate_cardinal, GameMap + +from public.hlt import GameMap, translate_cardinal class HLT: diff --git a/networking/pipe_socket_translator.py b/networking/pipe_socket_translator.py index 1f324a7..df3a7a9 100644 --- a/networking/pipe_socket_translator.py +++ b/networking/pipe_socket_translator.py @@ -1,6 +1,5 @@ import socket -import sys, traceback -import logging +import sys # logging.basicConfig(filename='example.log', level=logging.DEBUG) diff --git a/public/models/agent/agent.py b/public/models/agent/agent.py index 323ed5d..a545a5b 100644 --- a/public/models/agent/agent.py +++ b/public/models/agent/agent.py @@ -1,6 +1,4 @@ import numpy as np -import tensorflow as tf -import tensorflow.contrib.slim as slim from train.reward import localStateFromGlobal diff --git a/public/models/agent/vanillaAgent.py b/public/models/agent/vanillaAgent.py index 5889ff0..46bb265 100644 --- a/public/models/agent/vanillaAgent.py +++ b/public/models/agent/vanillaAgent.py @@ -2,7 +2,6 @@ import tensorflow as tf import tensorflow.contrib.slim as slim -from train.reward import localStateFromGlobal from public.models.agent.agent import Agent diff --git a/public/models/bot/improvedBot.py b/public/models/bot/improvedBot.py index eaea6d3..874a978 100644 --- a/public/models/bot/improvedBot.py +++ b/public/models/bot/improvedBot.py @@ -1,7 +1,7 @@ import random +from public.hlt import Move, NORTH, STILL, WEST from public.models.bot.bot import Bot -from public.hlt import NORTH, EAST, SOUTH, WEST, STILL, Move class ImprovedBot(Bot): diff --git a/public/models/bot/randomBot.py b/public/models/bot/randomBot.py index 844e41e..be16972 100644 --- a/public/models/bot/randomBot.py +++ b/public/models/bot/randomBot.py @@ -1,7 +1,7 @@ import random +from public.hlt import EAST, Move, NORTH, SOUTH, STILL, WEST from public.models.bot.bot import Bot -from public.hlt import NORTH, EAST, SOUTH, WEST, STILL, Move class RandomBot(Bot): diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index 59531b5..d942c55 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -1,7 +1,6 @@ from public.models.agent.vanillaAgent import VanillaAgent from public.models.bot.bot import Bot -from train.reward import getGameState, formatMoves -import tensorflow as tf +from train.reward import formatMoves, getGameState class TrainedBot(Bot): diff --git a/public/models/visualize_score.py b/public/models/visualize_score.py index acf0367..74d95a9 100644 --- a/public/models/visualize_score.py +++ b/public/models/visualize_score.py @@ -1,6 +1,6 @@ +import matplotlib.pyplot as plt import numpy as np import pandas as pd -import matplotlib.pyplot as plt rewards = [np.load('./models/vanilla.npy')] diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..b3a60b6 --- /dev/null +++ b/requirements.txt @@ -0,0 +1 @@ +tensorflow \ No newline at end of file diff --git a/tests/__init__.py b/tests/__init__.py new file mode 100644 index 0000000..f3bba3e --- /dev/null +++ b/tests/__init__.py @@ -0,0 +1,5 @@ +# -*- coding: utf-8 -*- +""" +Contributors: + - Louis Rémus +""" \ No newline at end of file diff --git a/train/worker.py b/train/worker.py index edaafec..e2ac963 100644 --- a/train/worker.py +++ b/train/worker.py @@ -5,7 +5,7 @@ import tensorflow as tf from networking.hlt_networking import HLT -from train.reward import getGameState, formatMoves +from train.reward import formatMoves, getGameState def update_target_graph(from_scope, to_scope): From 61960b3979fc70c3f205e50f3ca64d01cb42c420 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:11:29 -0700 Subject: [PATCH 03/45] Added docs and tests folder --- docs/README.md | 1 + tests/README.md | 1 + 2 files changed, 2 insertions(+) create mode 100644 docs/README.md create mode 100644 tests/README.md diff --git a/docs/README.md b/docs/README.md new file mode 100644 index 0000000..7d8b3af --- /dev/null +++ b/docs/README.md @@ -0,0 +1 @@ +# Documentation \ No newline at end of file diff --git a/tests/README.md b/tests/README.md new file mode 100644 index 0000000..cae503b --- /dev/null +++ b/tests/README.md @@ -0,0 +1 @@ +# Tests \ No newline at end of file From decdce157404fe45125d7ec8eba86be16ec9fdfa Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:23:18 -0700 Subject: [PATCH 04/45] Initial travis file --- .travis.yml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/.travis.yml b/.travis.yml index b590579..013d582 100644 --- a/.travis.yml +++ b/.travis.yml @@ -18,7 +18,7 @@ script: # Tests - python -m unittest discover -v # Style checks - - pylint main.py + - pylint train/experience.py notifications: From 238c35f79d661d282b40b06554a5b99908a20eee Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:23:30 -0700 Subject: [PATCH 05/45] initiated test repo --- tests/README.md | 2 +- tests/__init__.py | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/README.md b/tests/README.md index cae503b..007eb95 100644 --- a/tests/README.md +++ b/tests/README.md @@ -1 +1 @@ -# Tests \ No newline at end of file +# Tests diff --git a/tests/__init__.py b/tests/__init__.py index f3bba3e..7d80e79 100644 --- a/tests/__init__.py +++ b/tests/__init__.py @@ -2,4 +2,4 @@ """ Contributors: - Louis Rémus -""" \ No newline at end of file +""" From 437b33129dca5350ad5c24cae1c7a12660b44f06 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:23:47 -0700 Subject: [PATCH 06/45] Embedded status image --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3fd2a7a..12c8b95 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,4 @@ +[![Build Status](https://travis-ci.org/louis-r/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/louis-r/Halite-Python-RL) # Halite-Python-RL

Halite Challenge Overview
From 6f7e8c5249e19889179f0f468e79779edbdb96fd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:29:56 -0700 Subject: [PATCH 07/45] PyLint compliant --- train/experience.py | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/train/experience.py b/train/experience.py index 9b0c485..67721c6 100644 --- a/train/experience.py +++ b/train/experience.py @@ -1,9 +1,15 @@ +""" +Experience class definition +""" import numpy as np from train.reward import allRewards, rawRewards class Experience: + """ + Experience class to store moves, rewards and metric values + """ def __init__(self): self.moves = np.array([]) self.rewards = np.array([]) @@ -11,6 +17,9 @@ def __init__(self): self.metric = np.array([]) def add_episode(self, game_states, moves): + # moves is not used here, kept for inheritance reasons + # TODO Edouard to act on this + # pylint: disable=W0612,W0613 production_increments = np.sum(np.sum(rawRewards(game_states), axis=2), axis=1) self.metric = np.append(self.metric, production_increments.dot(np.linspace(2.0, 1.0, num=len(game_states) - 1))) @@ -22,6 +31,9 @@ def save_metric(self, name): class ExperienceVanilla(Experience): + """ + Stores states in addition to the inherited attributes of Experience + """ def __init__(self): super(ExperienceVanilla, self).__init__() self.states = np.array([]).reshape(0, 27) From 6dd82cd71a8e5db42518addadf0d2efb967d2bc8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:40:01 -0700 Subject: [PATCH 08/45] Why this change was necessary: There was not commit message template for this repository This change addresses the need by: Creating a template .gitmessage for git commits Potential side-effects: * --- .gitmessage | 15 +++++++++++++++ README.md | 1 + 2 files changed, 16 insertions(+) create mode 100644 .gitmessage diff --git a/.gitmessage b/.gitmessage new file mode 100644 index 0000000..ea6cf73 --- /dev/null +++ b/.gitmessage @@ -0,0 +1,15 @@ +Why: + +* + +This change addresses the need by: + +* + +Potential side-effects: + +* + +# 50-character subject line +# +# 72-character wrapped longer description. \ No newline at end of file diff --git a/README.md b/README.md index 12c8b95..22685e4 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ [![Build Status](https://travis-ci.org/louis-r/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/louis-r/Halite-Python-RL) +Test # Halite-Python-RL

Halite Challenge Overview
From 51d42112d224a42bcba06890a639cf520105fa2a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:42:48 -0700 Subject: [PATCH 09/45] Why this change was necessary: * Testing the behavior of commit templates has been done This change addresses the need by: * Removing the test Potential side-effects: * --- README.md | 1 - 1 file changed, 1 deletion(-) diff --git a/README.md b/README.md index 22685e4..12c8b95 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,4 @@ [![Build Status](https://travis-ci.org/louis-r/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/louis-r/Halite-Python-RL) -Test # Halite-Python-RL

Halite Challenge Overview
From 4c6e5a219ecc2a40ca56fa351c34fa638087a69d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 26 Sep 2017 15:51:08 -0700 Subject: [PATCH 10/45] Reformatting .gitmessage to have first line as a title Why this change was necessary: * Otherwise purpose of the commit was unclear This change addresses the need by: * Addind an empty line at the beginning Potential side-effects: * --- .gitmessage | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/.gitmessage b/.gitmessage index ea6cf73..6194a76 100644 --- a/.gitmessage +++ b/.gitmessage @@ -1,4 +1,5 @@ -Why: + +Why this change was necessary: * @@ -12,4 +13,4 @@ Potential side-effects: # 50-character subject line # -# 72-character wrapped longer description. \ No newline at end of file +# 72-character wrapped longer description. From f54ee4de07ac9c60a90e83d50e3d6738ddc2057b Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 07:38:37 +0200 Subject: [PATCH 11/45] Adding coverage tests --- .travis.yml | 7 ++++--- requirements.txt | 8 +++++++- tests/reward_test.py | 17 +++++++++++++++++ 3 files changed, 28 insertions(+), 4 deletions(-) create mode 100644 tests/reward_test.py diff --git a/.travis.yml b/.travis.yml index 013d582..2bf5fdf 100644 --- a/.travis.yml +++ b/.travis.yml @@ -6,9 +6,6 @@ python: - 3.5 install: - - pip install tox - - pip install coveralls - - pip install pylint - pip install -r requirements.txt env: @@ -19,7 +16,11 @@ script: - python -m unittest discover -v # Style checks - pylint train/experience.py + # Coverage checks + - py.test --cov=train tests/ +after_success: + coveralls notifications: email: diff --git a/requirements.txt b/requirements.txt index b3a60b6..f9764c2 100644 --- a/requirements.txt +++ b/requirements.txt @@ -1 +1,7 @@ -tensorflow \ No newline at end of file +tensorflow +coverage>=3.6 +pytest-cov +pytest-xdist +tox +coveralls +pylint \ No newline at end of file diff --git a/tests/reward_test.py b/tests/reward_test.py new file mode 100644 index 0000000..3c6bc74 --- /dev/null +++ b/tests/reward_test.py @@ -0,0 +1,17 @@ +""" +Tests the reward function +""" +from train.reward import discount_rewards +import unittest + +import numpy as np + + +class TestReward(unittest.TestCase): + def test_length_discount_rewards(self): + self.assertTrue(len(discount_rewards(np.array([1]))) == 1) + self.assertTrue(len(discount_rewards(np.array([1, 3]))) == 2) + + +if __name__ == '__main__': + unittest.main() From 7e87b033ac60276e4fae05c162a48f22ecb0f8d3 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 07:41:45 +0200 Subject: [PATCH 12/45] removing tox --- requirements.txt | 1 - 1 file changed, 1 deletion(-) diff --git a/requirements.txt b/requirements.txt index f9764c2..46b8d63 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,6 +2,5 @@ tensorflow coverage>=3.6 pytest-cov pytest-xdist -tox coveralls pylint \ No newline at end of file From e9a53d8724bbb50606d45b90820e21a03a854f87 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 09:20:28 +0200 Subject: [PATCH 13/45] Adding the docs Why this change was necessary: * This change addresses the need by: * Potential side-effects: * --- docs/Makefile | 230 +++++++++++++++++++++++++++++++++++++ docs/conf.py | 287 +++++++++++++++++++++++++++++++++++++++++++++++ docs/edouard.rst | 9 ++ docs/index.rst | 27 +++++ docs/louis.rst | 10 ++ 5 files changed, 563 insertions(+) create mode 100644 docs/Makefile create mode 100644 docs/conf.py create mode 100644 docs/edouard.rst create mode 100644 docs/index.rst create mode 100644 docs/louis.rst diff --git a/docs/Makefile b/docs/Makefile new file mode 100644 index 0000000..7fdf53c --- /dev/null +++ b/docs/Makefile @@ -0,0 +1,230 @@ +# Makefile for Sphinx documentation +# + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +PAPER = +BUILDDIR = _build + +# User-friendly check for sphinx-build +ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) + $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don\'t have Sphinx installed, grab it from http://sphinx-doc.org/) +endif + +# Internal variables. +PAPEROPT_a4 = -D latex_paper_size=a4 +PAPEROPT_letter = -D latex_paper_size=letter +ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . +# the i18n builder cannot share the environment and doctrees with the others +I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . + +.PHONY: help +help: + @echo "Please use \`make ' where is one of" + @echo " html to make standalone HTML files" + @echo " dirhtml to make HTML files named index.html in directories" + @echo " singlehtml to make a single large HTML file" + @echo " pickle to make pickle files" + @echo " json to make JSON files" + @echo " htmlhelp to make HTML files and a HTML help project" + @echo " qthelp to make HTML files and a qthelp project" + @echo " applehelp to make an Apple Help Book" + @echo " devhelp to make HTML files and a Devhelp project" + @echo " epub to make an epub" + @echo " epub3 to make an epub3" + @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" + @echo " latexpdf to make LaTeX files and run them through pdflatex" + @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" + @echo " text to make text files" + @echo " man to make manual pages" + @echo " texinfo to make Texinfo files" + @echo " info to make Texinfo files and run them through makeinfo" + @echo " gettext to make PO message catalogs" + @echo " changes to make an overview of all changed/added/deprecated items" + @echo " xml to make Docutils-native XML files" + @echo " pseudoxml to make pseudoxml-XML files for display purposes" + @echo " linkcheck to check all external links for integrity" + @echo " doctest to run all doctests embedded in the documentation (if enabled)" + @echo " coverage to run coverage check of the documentation (if enabled)" + @echo " dummy to check syntax errors of document sources" + +.PHONY: clean +clean: + rm -rf $(BUILDDIR)/* + +.PHONY: html +html: + $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." + +.PHONY: dirhtml +dirhtml: + $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml + @echo + @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." + +.PHONY: singlehtml +singlehtml: + $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml + @echo + @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." + +.PHONY: pickle +pickle: + $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle + @echo + @echo "Build finished; now you can process the pickle files." + +.PHONY: json +json: + $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json + @echo + @echo "Build finished; now you can process the JSON files." + +.PHONY: htmlhelp +htmlhelp: + $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp + @echo + @echo "Build finished; now you can run HTML Help Workshop with the" \ + ".hhp project file in $(BUILDDIR)/htmlhelp." + +.PHONY: qthelp +qthelp: + $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp + @echo + @echo "Build finished; now you can run "qcollectiongenerator" with the" \ + ".qhcp project file in $(BUILDDIR)/qthelp, like this:" + @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Halite-Python-RL.qhcp" + @echo "To view the help file:" + @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Halite-Python-RL.qhc" + +.PHONY: applehelp +applehelp: + $(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp + @echo + @echo "Build finished. The help book is in $(BUILDDIR)/applehelp." + @echo "N.B. You won't be able to view it unless you put it in" \ + "~/Library/Documentation/Help or install it in your application" \ + "bundle." + +.PHONY: devhelp +devhelp: + $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp + @echo + @echo "Build finished." + @echo "To view the help file:" + @echo "# mkdir -p $$HOME/.local/share/devhelp/Halite-Python-RL" + @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Halite-Python-RL" + @echo "# devhelp" + +.PHONY: epub +epub: + $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub + @echo + @echo "Build finished. The epub file is in $(BUILDDIR)/epub." + +.PHONY: epub3 +epub3: + $(SPHINXBUILD) -b epub3 $(ALLSPHINXOPTS) $(BUILDDIR)/epub3 + @echo + @echo "Build finished. The epub3 file is in $(BUILDDIR)/epub3." + +.PHONY: latex +latex: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo + @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." + @echo "Run \`make' in that directory to run these through (pdf)latex" \ + "(use \`make latexpdf' here to do that automatically)." + +.PHONY: latexpdf +latexpdf: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo "Running LaTeX files through pdflatex..." + $(MAKE) -C $(BUILDDIR)/latex all-pdf + @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." + +.PHONY: latexpdfja +latexpdfja: + $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex + @echo "Running LaTeX files through platex and dvipdfmx..." + $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja + @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." + +.PHONY: text +text: + $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text + @echo + @echo "Build finished. The text files are in $(BUILDDIR)/text." + +.PHONY: man +man: + $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man + @echo + @echo "Build finished. The manual pages are in $(BUILDDIR)/man." + +.PHONY: texinfo +texinfo: + $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo + @echo + @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." + @echo "Run \`make' in that directory to run these through makeinfo" \ + "(use \`make info' here to do that automatically)." + +.PHONY: info +info: + $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo + @echo "Running Texinfo files through makeinfo..." + make -C $(BUILDDIR)/texinfo info + @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." + +.PHONY: gettext +gettext: + $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale + @echo + @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." + +.PHONY: changes +changes: + $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes + @echo + @echo "The overview file is in $(BUILDDIR)/changes." + +.PHONY: linkcheck +linkcheck: + $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck + @echo + @echo "Link check complete; look for any errors in the above output " \ + "or in $(BUILDDIR)/linkcheck/output.txt." + +.PHONY: doctest +doctest: + $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest + @echo "Testing of doctests in the sources finished, look at the " \ + "results in $(BUILDDIR)/doctest/output.txt." + +.PHONY: coverage +coverage: + $(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage + @echo "Testing of coverage in the sources finished, look at the " \ + "results in $(BUILDDIR)/coverage/python.txt." + +.PHONY: xml +xml: + $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml + @echo + @echo "Build finished. The XML files are in $(BUILDDIR)/xml." + +.PHONY: pseudoxml +pseudoxml: + $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml + @echo + @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." + +.PHONY: dummy +dummy: + $(SPHINXBUILD) -b dummy $(ALLSPHINXOPTS) $(BUILDDIR)/dummy + @echo + @echo "Build finished. Dummy builder generates no files." diff --git a/docs/conf.py b/docs/conf.py new file mode 100644 index 0000000..6795197 --- /dev/null +++ b/docs/conf.py @@ -0,0 +1,287 @@ +#!/usr/bin/env python3 +# -*- coding: utf-8 -*- +# +# Halite-Python-RL documentation build configuration file, created by +# sphinx-quickstart on Wed Sep 27 07:49:44 2017. +# +# This file is execfile()d with the current directory set to its +# containing dir. +# +# Note that not all possible configuration values are present in this +# autogenerated file. +# +# All configuration values have a default; values that are commented out +# serve to show the default. + +import sys +import os + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +#sys.path.insert(0, os.path.abspath('.')) + +# -- General configuration ------------------------------------------------ + +# If your documentation needs a minimal Sphinx version, state it here. +#needs_sphinx = '1.0' + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [] + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# The suffix(es) of source filenames. +# You can specify multiple suffix as a list of string: +# source_suffix = ['.rst', '.md'] +source_suffix = '.rst' + +# The encoding of source files. +#source_encoding = 'utf-8-sig' + +# The master toctree document. +master_doc = 'index' + +# General information about the project. +project = 'Halite-Python-RL' +copyright = '2017, Edouard Mehlman' +author = 'Edouard Mehlman' + +# The version info for the project you're documenting, acts as replacement for +# |version| and |release|, also used in various other places throughout the +# built documents. +# +# The short X.Y version. +version = '1.0' +# The full version, including alpha/beta/rc tags. +release = '1.0' + +# The language for content autogenerated by Sphinx. Refer to documentation +# for a list of supported languages. +# +# This is also used if you do content translation via gettext catalogs. +# Usually you set "language" from the command line for these cases. +language = None + +# There are two options for replacing |today|: either, you set today to some +# non-false value, then it is used: +#today = '' +# Else, today_fmt is used as the format for a strftime call. +#today_fmt = '%B %d, %Y' + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This patterns also effect to html_static_path and html_extra_path +exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] + +# The reST default role (used for this markup: `text`) to use for all +# documents. +#default_role = None + +# If true, '()' will be appended to :func: etc. cross-reference text. +#add_function_parentheses = True + +# If true, the current module name will be prepended to all description +# unit titles (such as .. function::). +#add_module_names = True + +# If true, sectionauthor and moduleauthor directives will be shown in the +# output. They are ignored by default. +#show_authors = False + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = 'sphinx' + +# A list of ignored prefixes for module index sorting. +#modindex_common_prefix = [] + +# If true, keep warnings as "system message" paragraphs in the built documents. +#keep_warnings = False + +# If true, `todo` and `todoList` produce output, else they produce nothing. +todo_include_todos = False + + +# -- Options for HTML output ---------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +html_theme = 'alabaster' + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. +#html_theme_options = {} + +# Add any paths that contain custom themes here, relative to this directory. +#html_theme_path = [] + +# The name for this set of Sphinx documents. +# " v documentation" by default. +#html_title = 'Halite-Python-RL v1.0' + +# A shorter title for the navigation bar. Default is the same as html_title. +#html_short_title = None + +# The name of an image file (relative to this directory) to place at the top +# of the sidebar. +#html_logo = None + +# The name of an image file (relative to this directory) to use as a favicon of +# the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 +# pixels large. +#html_favicon = None + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['_static'] + +# Add any extra paths that contain custom files (such as robots.txt or +# .htaccess) here, relative to this directory. These files are copied +# directly to the root of the documentation. +#html_extra_path = [] + +# If not None, a 'Last updated on:' timestamp is inserted at every page +# bottom, using the given strftime format. +# The empty string is equivalent to '%b %d, %Y'. +#html_last_updated_fmt = None + +# If true, SmartyPants will be used to convert quotes and dashes to +# typographically correct entities. +#html_use_smartypants = True + +# Custom sidebar templates, maps document names to template names. +#html_sidebars = {} + +# Additional templates that should be rendered to pages, maps page names to +# template names. +#html_additional_pages = {} + +# If false, no module index is generated. +#html_domain_indices = True + +# If false, no index is generated. +#html_use_index = True + +# If true, the index is split into individual pages for each letter. +#html_split_index = False + +# If true, links to the reST sources are added to the pages. +#html_show_sourcelink = True + +# If true, "Created using Sphinx" is shown in the HTML footer. Default is True. +#html_show_sphinx = True + +# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. +#html_show_copyright = True + +# If true, an OpenSearch description file will be output, and all pages will +# contain a tag referring to it. The value of this option must be the +# base URL from which the finished HTML is served. +#html_use_opensearch = '' + +# This is the file name suffix for HTML files (e.g. ".xhtml"). +#html_file_suffix = None + +# Language to be used for generating the HTML full-text search index. +# Sphinx supports the following languages: +# 'da', 'de', 'en', 'es', 'fi', 'fr', 'h', 'it', 'ja' +# 'nl', 'no', 'pt', 'ro', 'r', 'sv', 'tr', 'zh' +#html_search_language = 'en' + +# A dictionary with options for the search language support, empty by default. +# 'ja' uses this config value. +# 'zh' user can custom change `jieba` dictionary path. +#html_search_options = {'type': 'default'} + +# The name of a javascript file (relative to the configuration directory) that +# implements a search results scorer. If empty, the default will be used. +#html_search_scorer = 'scorer.js' + +# Output file base name for HTML help builder. +htmlhelp_basename = 'Halite-Python-RLdoc' + +# -- Options for LaTeX output --------------------------------------------- + +latex_elements = { +# The paper size ('letterpaper' or 'a4paper'). +#'papersize': 'letterpaper', + +# The font size ('10pt', '11pt' or '12pt'). +#'pointsize': '10pt', + +# Additional stuff for the LaTeX preamble. +#'preamble': '', + +# Latex figure (float) alignment +#'figure_align': 'htbp', +} + +# Grouping the document tree into LaTeX files. List of tuples +# (source start file, target name, title, +# author, documentclass [howto, manual, or own class]). +latex_documents = [ + (master_doc, 'Halite-Python-RL.tex', 'Halite-Python-RL Documentation', + 'Edouard Mehlman', 'manual'), +] + +# The name of an image file (relative to this directory) to place at the top of +# the title page. +#latex_logo = None + +# For "manual" documents, if this is true, then toplevel headings are parts, +# not chapters. +#latex_use_parts = False + +# If true, show page references after internal links. +#latex_show_pagerefs = False + +# If true, show URL addresses after external links. +#latex_show_urls = False + +# Documents to append as an appendix to all manuals. +#latex_appendices = [] + +# If false, no module index is generated. +#latex_domain_indices = True + + +# -- Options for manual page output --------------------------------------- + +# One entry per manual page. List of tuples +# (source start file, name, description, authors, manual section). +man_pages = [ + (master_doc, 'halite-python-rl', 'Halite-Python-RL Documentation', + [author], 1) +] + +# If true, show URL addresses after external links. +#man_show_urls = False + + +# -- Options for Texinfo output ------------------------------------------- + +# Grouping the document tree into Texinfo files. List of tuples +# (source start file, target name, title, author, +# dir menu entry, description, category) +texinfo_documents = [ + (master_doc, 'Halite-Python-RL', 'Halite-Python-RL Documentation', + author, 'Halite-Python-RL', 'One line description of project.', + 'Miscellaneous'), +] + +# Documents to append as an appendix to all manuals. +#texinfo_appendices = [] + +# If false, no module index is generated. +#texinfo_domain_indices = True + +# How to display URL addresses: 'footnote', 'no', or 'inline'. +#texinfo_show_urls = 'footnote' + +# If true, do not generate a @detailmenu in the "Top" node's menu. +#texinfo_no_detailmenu = False diff --git a/docs/edouard.rst b/docs/edouard.rst new file mode 100644 index 0000000..b37bc89 --- /dev/null +++ b/docs/edouard.rst @@ -0,0 +1,9 @@ +.. _edouard: + + + +Edouard +======================= + +Edouard + diff --git a/docs/index.rst b/docs/index.rst new file mode 100644 index 0000000..a1fa54b --- /dev/null +++ b/docs/index.rst @@ -0,0 +1,27 @@ +.. Halite-Python-RL documentation master file, created by + sphinx-quickstart on Wed Sep 27 07:49:44 2017. + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +Welcome to Halite-Python-RL's documentation! +============================================ + +Contents: + +.. toctree:: + :maxdepth: 2 + :caption: Contributors + + louis + edouard + + +Indices and tables +================== + +* :ref:`genindex` +* :ref:`modindex` +* :ref:`search` + + + diff --git a/docs/louis.rst b/docs/louis.rst new file mode 100644 index 0000000..6fa90bd --- /dev/null +++ b/docs/louis.rst @@ -0,0 +1,10 @@ +.. _louis: + + + +Louis +======================= + +Louis + + From 24710e10dcfd7ce2d6ce35dd3f116bed356a8a7e Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 09:26:05 +0200 Subject: [PATCH 14/45] Adding badge Why this change was necessary: * To add the badge This change addresses the need by: * Adding the badge Potential side-effects: * When merging with master, the name of the badge needs to be changes... No longer coverage --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 12c8b95..f037a95 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,5 @@ -[![Build Status](https://travis-ci.org/louis-r/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/louis-r/Halite-Python-RL) +[![Build Status](https://travis-ci.org/louis-r/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/louis-r/Halite-Python-RL) [![Coverage Status](https://coveralls.io/repos/github/Edouard360/Halite-Python-RL/badge.svg?branch=coverage)](https://coveralls.io/github/Edouard360/Halite-Python-RL?branch=coverage) [![Documentation Status](https://readthedocs.org/projects/halite-python-rl/badge/?version=latest)](http://halite-python-rl.readthedocs.io/en/latest/?badge=latest) + # Halite-Python-RL

Halite Challenge Overview
From b011c7c33b31151cfa745699d1c86dc077f0b325 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 15:34:11 +0200 Subject: [PATCH 15/45] Adding tests and docs, as well as the right compiler for halite file This change addresses the need by: * Create util.py to download games for testing purposes * Adding a template for .rst file * Adding test about the train folder to increase coverage * The right version of gcc compiler g++-4.9 --- .travis.yml | 1 + README.md | 2 +- tests/reward_test.py | 28 +++++++++++++++++++++++++++- tests/util.py | 21 +++++++++++++++++++++ train/experience.py | 14 ++++++++------ train/reward.py | 2 +- 6 files changed, 59 insertions(+), 9 deletions(-) create mode 100644 tests/util.py diff --git a/.travis.yml b/.travis.yml index 2bf5fdf..03abe5f 100644 --- a/.travis.yml +++ b/.travis.yml @@ -7,6 +7,7 @@ python: install: - pip install -r requirements.txt + - make env: - $COVPYYAML=cov41-pyyaml,coveralls41 diff --git a/README.md b/README.md index f037a95..5e1d25c 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -[![Build Status](https://travis-ci.org/louis-r/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/louis-r/Halite-Python-RL) [![Coverage Status](https://coveralls.io/repos/github/Edouard360/Halite-Python-RL/badge.svg?branch=coverage)](https://coveralls.io/github/Edouard360/Halite-Python-RL?branch=coverage) [![Documentation Status](https://readthedocs.org/projects/halite-python-rl/badge/?version=latest)](http://halite-python-rl.readthedocs.io/en/latest/?badge=latest) +[![Build Status](https://travis-ci.org/louis-r/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/louis-r/Halite-Python-RL) [![Coverage Status](https://coveralls.io/repos/github/Edouard360/Halite-Python-RL/badge.svg?branch=master)](https://coveralls.io/github/Edouard360/Halite-Python-RL?branch=master) [![Documentation Status](https://readthedocs.org/projects/halite-python-rl/badge/?version=latest)](http://halite-python-rl.readthedocs.io/en/latest/?badge=latest) # Halite-Python-RL diff --git a/tests/reward_test.py b/tests/reward_test.py index 3c6bc74..6398ff0 100644 --- a/tests/reward_test.py +++ b/tests/reward_test.py @@ -1,10 +1,13 @@ """ Tests the reward function """ -from train.reward import discount_rewards +from train.reward import discount_rewards, rawRewards, allRewards +from train.experience import ExperienceVanilla +from train.worker import Worker import unittest import numpy as np +from tests.util import game_states_from_url class TestReward(unittest.TestCase): @@ -12,6 +15,29 @@ def test_length_discount_rewards(self): self.assertTrue(len(discount_rewards(np.array([1]))) == 1) self.assertTrue(len(discount_rewards(np.array([1, 3]))) == 2) + def test_reward(self): + GAME_URL = 'https://s3.eu-central-1.amazonaws.com/halite-python-rl/hlt-games/trained-bot.hlt' + game_states, moves = game_states_from_url(GAME_URL) + + raw_rewards = rawRewards(game_states) + self.assertTrue(len(raw_rewards) == len(game_states) - 1) + + all_states, all_moves, all_rewards = allRewards(game_states, moves) + self.assertTrue(len(all_states) >= len(game_states) - 1) + self.assertTrue(len(all_moves) >= len(moves)) + self.assertTrue(len(all_rewards) == len(all_moves) and len(all_states) == len(all_moves)) + experience = ExperienceVanilla() + experience.add_episode(game_states, moves) + experience.add_episode(game_states, moves) + self.assertTrue(len(experience.moves) == 2 * len(all_moves)) + batch_states, batch_moves, batch_rewards = experience.batch() + self.assertTrue(len(batch_rewards) == len(batch_moves) and len(batch_states) == len(batch_moves)) + + def test_worker(self): + worker = Worker(2000, 2, None) + self.assertTrue(worker.port == 2002) + worker.p.terminate() + if __name__ == '__main__': unittest.main() diff --git a/tests/util.py b/tests/util.py new file mode 100644 index 0000000..21d662b --- /dev/null +++ b/tests/util.py @@ -0,0 +1,21 @@ +import json +import urllib.request +import numpy as np + + +def game_states_from_url(GAME_URL): + """ + We host known games on aws server and we run the tests according to these games, from which we know the output + :param GAME_URL: The url of the game on the server (string). + :return: + """ + game = json.loads(urllib.request.urlopen(GAME_URL).readline().decode("utf-8")) + + owner_frames = np.array(game["frames"])[:, :, :, 0][:, np.newaxis, :, :] + strength_frames = np.array(game["frames"])[:, :, :, 1][:, np.newaxis, :, :] + production_frames = np.repeat(np.array(game["productions"])[np.newaxis, np.newaxis, :, :], len(owner_frames), + axis=0) + moves = np.array(game['moves']) + + game_states = np.concatenate(([owner_frames, strength_frames, production_frames]), axis=1) + return game_states / np.array([1, 255, 10])[:, np.newaxis, np.newaxis], moves diff --git a/train/experience.py b/train/experience.py index 67721c6..3d4c94b 100644 --- a/train/experience.py +++ b/train/experience.py @@ -10,6 +10,7 @@ class Experience: """ Experience class to store moves, rewards and metric values """ + def __init__(self): self.moves = np.array([]) self.rewards = np.array([]) @@ -17,15 +18,15 @@ def __init__(self): self.metric = np.array([]) def add_episode(self, game_states, moves): - # moves is not used here, kept for inheritance reasons - # TODO Edouard to act on this - # pylint: disable=W0612,W0613 - production_increments = np.sum(np.sum(rawRewards(game_states), axis=2), axis=1) - self.metric = np.append(self.metric, production_increments.dot(np.linspace(2.0, 1.0, num=len(game_states) - 1))) + pass def batch(self, size): pass + def compute_metric(self, game_states): + production_increments = np.sum(np.sum(rawRewards(game_states), axis=2), axis=1) + self.metric = np.append(self.metric, production_increments.dot(np.linspace(2.0, 1.0, num=len(game_states) - 1))) + def save_metric(self, name): np.save(name, self.metric) @@ -34,12 +35,13 @@ class ExperienceVanilla(Experience): """ Stores states in addition to the inherited attributes of Experience """ + def __init__(self): super(ExperienceVanilla, self).__init__() self.states = np.array([]).reshape(0, 27) def add_episode(self, game_states, moves): - super(ExperienceVanilla, self).add_episode(game_states, moves) + self.compute_metric(game_states) all_states, all_moves, all_rewards = allRewards(game_states, moves) self.states = np.concatenate((self.states, all_states.reshape(-1, 27)), axis=0) diff --git a/train/reward.py b/train/reward.py index 99a1ad1..81cf603 100644 --- a/train/reward.py +++ b/train/reward.py @@ -12,7 +12,7 @@ def getGameState(game_map, myID): [[(square.owner == myID) + 0, square.strength, square.production] for square in game_map], [game_map.width, game_map.height, 3]) return np.swapaxes(np.swapaxes(game_state, 2, 0), 1, 2) * ( - 1 / np.array([1, STRENGTH_SCALE, PRODUCTION_SCALE])[:, np.newaxis, np.newaxis]) + 1 / np.array([1, STRENGTH_SCALE, PRODUCTION_SCALE])[:, np.newaxis, np.newaxis]) def getGameProd(game_state): From a125fb39a44a6739d2267121416bfe4f9e87b3ca Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 16:36:08 +0200 Subject: [PATCH 16/45] Removing tensorflow dependency for certain Bot Why this change was necessary: * To simplify the Bot files, and preserves invariance. The same steps are executed for each different bots, and if one Bot needs to import tensorflow, open a session, ect... it deals with it alone This change addresses the need by: * Editing all the bot classes --- public/MyBot.py | 35 +++++++++----------------------- public/OpponentBot.py | 4 +++- public/models/bot/bot.py | 5 +---- public/models/bot/improvedBot.py | 3 --- public/models/bot/trainedBot.py | 29 ++++++++++++++++++++------ 5 files changed, 37 insertions(+), 39 deletions(-) diff --git a/public/MyBot.py b/public/MyBot.py index 0a9ccbe..217a1c0 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -2,6 +2,7 @@ mode = 'server' if (len(sys.argv) == 1) else 'local' mode = 'local' # TODO remove forcing + if mode == 'server': # 'server' mode import hlt else: # 'local' mode @@ -12,30 +13,14 @@ from public.models.bot.trainedBot import TrainedBot -import tensorflow as tf - -tf.reset_default_graph() - -with tf.device("/cpu:0"): - with tf.variable_scope('global'): - bot = TrainedBot() - global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') - saver = tf.train.Saver(global_variables) - init = tf.global_variables_initializer() - -with tf.Session() as sess: - sess.run(init) - try: - saver.restore(sess, 'models/' + bot.agent.name) - except Exception: - print("Model not found - initiating new one") +bot = TrainedBot() - while True: - myID, game_map = hlt.get_init() - bot.setID(myID) - hlt.send_init("OpponentBot") +while True: + myID, game_map = hlt.get_init() + bot.setID(myID) + hlt.send_init("MyBot") - while (mode == 'server' or hlt.get_string() == 'Get map and play!'): - game_map.get_frame(hlt.get_string()) - moves = bot.compute_moves(game_map, sess) - hlt.send_frame(moves) + while (mode == 'server' or hlt.get_string() == 'Get map and play!'): + game_map.get_frame(hlt.get_string()) + moves = bot.compute_moves(game_map) + hlt.send_frame(moves) diff --git a/public/OpponentBot.py b/public/OpponentBot.py index 55d8629..49bd507 100644 --- a/public/OpponentBot.py +++ b/public/OpponentBot.py @@ -13,10 +13,12 @@ from public.models.bot.improvedBot import ImprovedBot +bot = ImprovedBot() + while True: myID, game_map = hlt.get_init() hlt.send_init("OpponentBot") - bot = ImprovedBot(myID) + bot.setID(myID) while (mode == 'server' or hlt.get_string() == 'Get map and play!'): game_map.get_frame(hlt.get_string()) diff --git a/public/models/bot/bot.py b/public/models/bot/bot.py index d298dae..4fa4cd5 100644 --- a/public/models/bot/bot.py +++ b/public/models/bot/bot.py @@ -1,8 +1,5 @@ class Bot: - def __init__(self, myID=1): - self.myID = myID - - def compute_moves(self, game_map, sess=None): + def compute_moves(self, game_map): pass def setID(self, myID): diff --git a/public/models/bot/improvedBot.py b/public/models/bot/improvedBot.py index 874a978..b56a706 100644 --- a/public/models/bot/improvedBot.py +++ b/public/models/bot/improvedBot.py @@ -5,9 +5,6 @@ class ImprovedBot(Bot): - def __init__(self, myID): - super(ImprovedBot, self).__init__(myID) - def compute_moves(self, game_map, sess=None): moves = [] for square in game_map: diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index d942c55..e05c7bf 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -1,17 +1,34 @@ from public.models.agent.vanillaAgent import VanillaAgent from public.models.bot.bot import Bot from train.reward import formatMoves, getGameState - +import tensorflow as tf class TrainedBot(Bot): - def __init__(self, myID=None): + def __init__(self): lr = 1e-3; s_size = 9 * 3; a_size = 5; h_size = 50 - self.agent = VanillaAgent(None, lr, s_size, a_size, h_size) - super(TrainedBot, self).__init__(myID) + tf.reset_default_graph() + + with tf.device("/cpu:0"): + with tf.variable_scope('global'): + self.agent = VanillaAgent(None, lr, s_size, a_size, h_size) + + global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') + saver = tf.train.Saver(global_variables) + init = tf.global_variables_initializer() - def compute_moves(self, game_map, sess=None): + self.sess = tf.Session() + self.sess.run(init) + try: + saver.restore(self.sess, 'models/' + self.bot.agent.name) + except Exception: + print("Model not found - initiating new one") + + def compute_moves(self, game_map): game_state = getGameState(game_map, self.myID) - return formatMoves(game_map, self.agent.choose_actions(sess, game_state)) + return formatMoves(game_map, self.agent.choose_actions(self.sess, game_state)) + + def close(self): + self.sess.close() \ No newline at end of file From 76156eb2638b0062d9c99f4a0a84c8a403a92b8a Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 16:36:08 +0200 Subject: [PATCH 17/45] Removing tensorflow dependency for certain Bot Why this change was necessary: * To simplify the Bot files, and preserves invariance. The same steps are executed for each different bots, and if one Bot needs to import tensorflow, open a session, ect... it deals with it alone This change addresses the need by: * Editing all the bot classes --- public/MyBot.py | 35 +++++++++----------------------- public/OpponentBot.py | 4 +++- public/models/bot/bot.py | 5 +---- public/models/bot/improvedBot.py | 3 --- public/models/bot/trainedBot.py | 29 ++++++++++++++++++++------ 5 files changed, 37 insertions(+), 39 deletions(-) diff --git a/public/MyBot.py b/public/MyBot.py index 0a9ccbe..217a1c0 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -2,6 +2,7 @@ mode = 'server' if (len(sys.argv) == 1) else 'local' mode = 'local' # TODO remove forcing + if mode == 'server': # 'server' mode import hlt else: # 'local' mode @@ -12,30 +13,14 @@ from public.models.bot.trainedBot import TrainedBot -import tensorflow as tf - -tf.reset_default_graph() - -with tf.device("/cpu:0"): - with tf.variable_scope('global'): - bot = TrainedBot() - global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') - saver = tf.train.Saver(global_variables) - init = tf.global_variables_initializer() - -with tf.Session() as sess: - sess.run(init) - try: - saver.restore(sess, 'models/' + bot.agent.name) - except Exception: - print("Model not found - initiating new one") +bot = TrainedBot() - while True: - myID, game_map = hlt.get_init() - bot.setID(myID) - hlt.send_init("OpponentBot") +while True: + myID, game_map = hlt.get_init() + bot.setID(myID) + hlt.send_init("MyBot") - while (mode == 'server' or hlt.get_string() == 'Get map and play!'): - game_map.get_frame(hlt.get_string()) - moves = bot.compute_moves(game_map, sess) - hlt.send_frame(moves) + while (mode == 'server' or hlt.get_string() == 'Get map and play!'): + game_map.get_frame(hlt.get_string()) + moves = bot.compute_moves(game_map) + hlt.send_frame(moves) diff --git a/public/OpponentBot.py b/public/OpponentBot.py index 55d8629..49bd507 100644 --- a/public/OpponentBot.py +++ b/public/OpponentBot.py @@ -13,10 +13,12 @@ from public.models.bot.improvedBot import ImprovedBot +bot = ImprovedBot() + while True: myID, game_map = hlt.get_init() hlt.send_init("OpponentBot") - bot = ImprovedBot(myID) + bot.setID(myID) while (mode == 'server' or hlt.get_string() == 'Get map and play!'): game_map.get_frame(hlt.get_string()) diff --git a/public/models/bot/bot.py b/public/models/bot/bot.py index d298dae..4fa4cd5 100644 --- a/public/models/bot/bot.py +++ b/public/models/bot/bot.py @@ -1,8 +1,5 @@ class Bot: - def __init__(self, myID=1): - self.myID = myID - - def compute_moves(self, game_map, sess=None): + def compute_moves(self, game_map): pass def setID(self, myID): diff --git a/public/models/bot/improvedBot.py b/public/models/bot/improvedBot.py index 874a978..b56a706 100644 --- a/public/models/bot/improvedBot.py +++ b/public/models/bot/improvedBot.py @@ -5,9 +5,6 @@ class ImprovedBot(Bot): - def __init__(self, myID): - super(ImprovedBot, self).__init__(myID) - def compute_moves(self, game_map, sess=None): moves = [] for square in game_map: diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index d942c55..e05c7bf 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -1,17 +1,34 @@ from public.models.agent.vanillaAgent import VanillaAgent from public.models.bot.bot import Bot from train.reward import formatMoves, getGameState - +import tensorflow as tf class TrainedBot(Bot): - def __init__(self, myID=None): + def __init__(self): lr = 1e-3; s_size = 9 * 3; a_size = 5; h_size = 50 - self.agent = VanillaAgent(None, lr, s_size, a_size, h_size) - super(TrainedBot, self).__init__(myID) + tf.reset_default_graph() + + with tf.device("/cpu:0"): + with tf.variable_scope('global'): + self.agent = VanillaAgent(None, lr, s_size, a_size, h_size) + + global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') + saver = tf.train.Saver(global_variables) + init = tf.global_variables_initializer() - def compute_moves(self, game_map, sess=None): + self.sess = tf.Session() + self.sess.run(init) + try: + saver.restore(self.sess, 'models/' + self.bot.agent.name) + except Exception: + print("Model not found - initiating new one") + + def compute_moves(self, game_map): game_state = getGameState(game_map, self.myID) - return formatMoves(game_map, self.agent.choose_actions(sess, game_state)) + return formatMoves(game_map, self.agent.choose_actions(self.sess, game_state)) + + def close(self): + self.sess.close() \ No newline at end of file From 6a673033aefca28f4a5e7b6bd1a5f8ecde5ce829 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 17:01:02 +0200 Subject: [PATCH 18/45] Dependency for building with make Why this change was necessary: * g++-4.9 compiler --- .travis.yml | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/.travis.yml b/.travis.yml index 03abe5f..ae492e7 100644 --- a/.travis.yml +++ b/.travis.yml @@ -5,13 +5,21 @@ language: python python: - 3.5 +addons: + apt: + sources: + - ubuntu-toolchain-r-test + packages: + - g++-4.9 + +env: + global: + - CXX=g++-4.9 + install: - pip install -r requirements.txt - make -env: - - $COVPYYAML=cov41-pyyaml,coveralls41 - script: # Tests - python -m unittest discover -v From 681d2e9649423a4ab815c3a2313d1ea2b2dfc817 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 27 Sep 2017 21:04:24 +0200 Subject: [PATCH 19/45] Python function for starting game instead of script + add docs to explain it Why this change was necessary: * It is easier to use * It is modulable afterwards This change addresses the need by: * The start_game.py file * Changing the calls in worker.py - which is used for train.py * Writing the commands to execute Potential side-effects: * We still have to figure out the problem with imports - see MyBot.py l.3-5 --- docs/conf.py | 107 +++++++++++++++--------------- docs/edouard.rst | 9 --- docs/index.rst | 13 ++-- docs/louis.rst | 10 --- docs/run.rst | 13 ++++ networking/hlt_networking.py | 4 -- networking/runGameDebugConfig.sh | 4 +- networking/start_game.py | 36 ++++++++++ public/MyBot.py | 5 ++ public/models/agent/agent.py | 4 +- public/models/bot/trainedBot.py | 2 +- public/models/variables/README.md | 3 + train/main.py | 2 +- train/worker.py | 13 ++-- 14 files changed, 129 insertions(+), 96 deletions(-) delete mode 100644 docs/edouard.rst delete mode 100644 docs/louis.rst create mode 100644 docs/run.rst create mode 100644 networking/start_game.py create mode 100644 public/models/variables/README.md diff --git a/docs/conf.py b/docs/conf.py index 6795197..82c9aa3 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -19,12 +19,12 @@ # If extensions (or modules to document with autodoc) are in another directory, # add these directories to sys.path here. If the directory is relative to the # documentation root, use os.path.abspath to make it absolute, like shown here. -#sys.path.insert(0, os.path.abspath('.')) +# sys.path.insert(0, os.path.abspath('.')) # -- General configuration ------------------------------------------------ # If your documentation needs a minimal Sphinx version, state it here. -#needs_sphinx = '1.0' +# needs_sphinx = '1.0' # Add any Sphinx extension module names here, as strings. They can be # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom @@ -40,7 +40,7 @@ source_suffix = '.rst' # The encoding of source files. -#source_encoding = 'utf-8-sig' +# source_encoding = 'utf-8-sig' # The master toctree document. master_doc = 'index' @@ -68,9 +68,9 @@ # There are two options for replacing |today|: either, you set today to some # non-false value, then it is used: -#today = '' +# today = '' # Else, today_fmt is used as the format for a strftime call. -#today_fmt = '%B %d, %Y' +# today_fmt = '%B %d, %Y' # List of patterns, relative to source directory, that match files and # directories to ignore when looking for source files. @@ -79,61 +79,60 @@ # The reST default role (used for this markup: `text`) to use for all # documents. -#default_role = None +# default_role = None # If true, '()' will be appended to :func: etc. cross-reference text. -#add_function_parentheses = True +# add_function_parentheses = True # If true, the current module name will be prepended to all description # unit titles (such as .. function::). -#add_module_names = True +# add_module_names = True # If true, sectionauthor and moduleauthor directives will be shown in the # output. They are ignored by default. -#show_authors = False +# show_authors = False # The name of the Pygments (syntax highlighting) style to use. pygments_style = 'sphinx' # A list of ignored prefixes for module index sorting. -#modindex_common_prefix = [] +# modindex_common_prefix = [] # If true, keep warnings as "system message" paragraphs in the built documents. -#keep_warnings = False +# keep_warnings = False # If true, `todo` and `todoList` produce output, else they produce nothing. todo_include_todos = False - # -- Options for HTML output ---------------------------------------------- # The theme to use for HTML and HTML Help pages. See the documentation for # a list of builtin themes. -html_theme = 'alabaster' +html_theme = 'sphinx_rtd_theme' # Theme options are theme-specific and customize the look and feel of a theme # further. For a list of options available for each theme, see the # documentation. -#html_theme_options = {} +# html_theme_options = {} # Add any paths that contain custom themes here, relative to this directory. -#html_theme_path = [] +# html_theme_path = [] # The name for this set of Sphinx documents. # " v documentation" by default. -#html_title = 'Halite-Python-RL v1.0' +# html_title = 'Halite-Python-RL v1.0' # A shorter title for the navigation bar. Default is the same as html_title. -#html_short_title = None +# html_short_title = None # The name of an image file (relative to this directory) to place at the top # of the sidebar. -#html_logo = None +# html_logo = None # The name of an image file (relative to this directory) to use as a favicon of # the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 # pixels large. -#html_favicon = None +# html_favicon = None # Add any paths that contain custom static files (such as style sheets) here, # relative to this directory. They are copied after the builtin static files, @@ -143,64 +142,64 @@ # Add any extra paths that contain custom files (such as robots.txt or # .htaccess) here, relative to this directory. These files are copied # directly to the root of the documentation. -#html_extra_path = [] +# html_extra_path = [] # If not None, a 'Last updated on:' timestamp is inserted at every page # bottom, using the given strftime format. # The empty string is equivalent to '%b %d, %Y'. -#html_last_updated_fmt = None +# html_last_updated_fmt = None # If true, SmartyPants will be used to convert quotes and dashes to # typographically correct entities. -#html_use_smartypants = True +# html_use_smartypants = True # Custom sidebar templates, maps document names to template names. -#html_sidebars = {} +# html_sidebars = {} # Additional templates that should be rendered to pages, maps page names to # template names. -#html_additional_pages = {} +# html_additional_pages = {} # If false, no module index is generated. -#html_domain_indices = True +# html_domain_indices = True # If false, no index is generated. -#html_use_index = True +# html_use_index = True # If true, the index is split into individual pages for each letter. -#html_split_index = False +# html_split_index = False # If true, links to the reST sources are added to the pages. -#html_show_sourcelink = True +# html_show_sourcelink = True # If true, "Created using Sphinx" is shown in the HTML footer. Default is True. -#html_show_sphinx = True +# html_show_sphinx = True # If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. -#html_show_copyright = True +# html_show_copyright = True # If true, an OpenSearch description file will be output, and all pages will # contain a tag referring to it. The value of this option must be the # base URL from which the finished HTML is served. -#html_use_opensearch = '' +# html_use_opensearch = '' # This is the file name suffix for HTML files (e.g. ".xhtml"). -#html_file_suffix = None +# html_file_suffix = None # Language to be used for generating the HTML full-text search index. # Sphinx supports the following languages: # 'da', 'de', 'en', 'es', 'fi', 'fr', 'h', 'it', 'ja' # 'nl', 'no', 'pt', 'ro', 'r', 'sv', 'tr', 'zh' -#html_search_language = 'en' +# html_search_language = 'en' # A dictionary with options for the search language support, empty by default. # 'ja' uses this config value. # 'zh' user can custom change `jieba` dictionary path. -#html_search_options = {'type': 'default'} +# html_search_options = {'type': 'default'} # The name of a javascript file (relative to the configuration directory) that # implements a search results scorer. If empty, the default will be used. -#html_search_scorer = 'scorer.js' +# html_search_scorer = 'scorer.js' # Output file base name for HTML help builder. htmlhelp_basename = 'Halite-Python-RLdoc' @@ -208,17 +207,17 @@ # -- Options for LaTeX output --------------------------------------------- latex_elements = { -# The paper size ('letterpaper' or 'a4paper'). -#'papersize': 'letterpaper', + # The paper size ('letterpaper' or 'a4paper'). + # 'papersize': 'letterpaper', -# The font size ('10pt', '11pt' or '12pt'). -#'pointsize': '10pt', + # The font size ('10pt', '11pt' or '12pt'). + # 'pointsize': '10pt', -# Additional stuff for the LaTeX preamble. -#'preamble': '', + # Additional stuff for the LaTeX preamble. + # 'preamble': '', -# Latex figure (float) alignment -#'figure_align': 'htbp', + # Latex figure (float) alignment + # 'figure_align': 'htbp', } # Grouping the document tree into LaTeX files. List of tuples @@ -231,23 +230,23 @@ # The name of an image file (relative to this directory) to place at the top of # the title page. -#latex_logo = None +# latex_logo = None # For "manual" documents, if this is true, then toplevel headings are parts, # not chapters. -#latex_use_parts = False +# latex_use_parts = False # If true, show page references after internal links. -#latex_show_pagerefs = False +# latex_show_pagerefs = False # If true, show URL addresses after external links. -#latex_show_urls = False +# latex_show_urls = False # Documents to append as an appendix to all manuals. -#latex_appendices = [] +# latex_appendices = [] # If false, no module index is generated. -#latex_domain_indices = True +# latex_domain_indices = True # -- Options for manual page output --------------------------------------- @@ -260,7 +259,7 @@ ] # If true, show URL addresses after external links. -#man_show_urls = False +# man_show_urls = False # -- Options for Texinfo output ------------------------------------------- @@ -275,13 +274,13 @@ ] # Documents to append as an appendix to all manuals. -#texinfo_appendices = [] +# texinfo_appendices = [] # If false, no module index is generated. -#texinfo_domain_indices = True +# texinfo_domain_indices = True # How to display URL addresses: 'footnote', 'no', or 'inline'. -#texinfo_show_urls = 'footnote' +# texinfo_show_urls = 'footnote' # If true, do not generate a @detailmenu in the "Top" node's menu. -#texinfo_no_detailmenu = False +# texinfo_no_detailmenu = False diff --git a/docs/edouard.rst b/docs/edouard.rst deleted file mode 100644 index b37bc89..0000000 --- a/docs/edouard.rst +++ /dev/null @@ -1,9 +0,0 @@ -.. _edouard: - - - -Edouard -======================= - -Edouard - diff --git a/docs/index.rst b/docs/index.rst index a1fa54b..01d62dc 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,7 +1,7 @@ .. Halite-Python-RL documentation master file, created by - sphinx-quickstart on Wed Sep 27 07:49:44 2017. - You can adapt this file completely to your liking, but it should at least - contain the root `toctree` directive. +sphinx-quickstart on Wed Sep 27 07:49:44 2017. +You can adapt this file completely to your liking, but it should at least +contain the root `toctree` directive. Welcome to Halite-Python-RL's documentation! ============================================ @@ -9,11 +9,10 @@ Welcome to Halite-Python-RL's documentation! Contents: .. toctree:: - :maxdepth: 2 - :caption: Contributors +:maxdepth: 2 + :caption: Contributors - louis - edouard + run Indices and tables diff --git a/docs/louis.rst b/docs/louis.rst deleted file mode 100644 index 6fa90bd..0000000 --- a/docs/louis.rst +++ /dev/null @@ -1,10 +0,0 @@ -.. _louis: - - - -Louis -======================= - -Louis - - diff --git a/docs/run.rst b/docs/run.rst new file mode 100644 index 0000000..a8d042e --- /dev/null +++ b/docs/run.rst @@ -0,0 +1,13 @@ +.. _run: + + + +Run the Bot +======================= + +cd networking +python start_game.py +cd public +python MyBot.py + +This will run infinite games until one of the program stops. \ No newline at end of file diff --git a/networking/hlt_networking.py b/networking/hlt_networking.py index 6a0d250..e9a404c 100644 --- a/networking/hlt_networking.py +++ b/networking/hlt_networking.py @@ -36,7 +36,3 @@ def send_frame(self, moves): self.sendString(' '.join( str(move.square.x) + ' ' + str(move.square.y) + ' ' + str(translate_cardinal(move.direction)) for move in moves)) - - # - # if __name__ =="__main__": - # HLT(2000) diff --git a/networking/runGameDebugConfig.sh b/networking/runGameDebugConfig.sh index fd1cdcd..6a28f81 100755 --- a/networking/runGameDebugConfig.sh +++ b/networking/runGameDebugConfig.sh @@ -1,5 +1,3 @@ #!/bin/bash -if hash python3 2>/dev/null; then - kill $(ps aux | grep python | grep $1| awk '{print $2}'); ./public/halite -j -z 25 -n 1 -x 25 -t -d "10 10" "python3 networking/pipe_socket_translator.py $1"; -fi +kill $(ps aux | grep python | grep $1| awk '{print $2}'); ./public/halite -j -t -n 1 -z 25 -x 25 -d "10 10" "python3 ./networking/pipe_socket_translator.py $1"; \ No newline at end of file diff --git a/networking/start_game.py b/networking/start_game.py new file mode 100644 index 0000000..c0bb6a7 --- /dev/null +++ b/networking/start_game.py @@ -0,0 +1,36 @@ +import subprocess +import argparse + + +def start_game(port, path_to_root, dim=10, max_strength=25, max_turn=25, silent_bool=True, timeout=False): + subprocess.call([path_to_root + "networking/kill.sh", str(port)]) + halite = path_to_root + 'public/halite ' + dimensions = '-d "' + str(dim) + ' ' + str(dim) + '" ' + + max_strength = '-z ' + str(max_strength) + ' ' + max_turn = '-x ' + str(max_turn) + ' ' + silent_bool = '-j ' if silent_bool else '' + timeout = '-t ' if timeout else '' + players = [ + "python3 " + path_to_root + "networking/pipe_socket_translator.py " + str(port) + ] + n_player = '' if len(players) > 1 else '-n 1 ' + + players = '"' + '" "'.join(players) + '"' + + print("Launching process") + subprocess.call(halite + dimensions + n_player + max_strength + max_turn + silent_bool + timeout + players, + shell=True) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + parser.add_argument("-p", "--port", type=int, help="the port for the simulation", default=2000) + parser.add_argument("-t", "--timeout", help="timeout", action="store_true") + parser.add_argument("-j", "--silent", help="silent", action="store_true", default=True) + parser.add_argument("-s", "--strength", help="max strength", type=int, default=25) + parser.add_argument("-d", "--dimension", help="max dimension", type=int, default=10) + parser.add_argument("-m", "--maxturn", help="max turn", type=int, default=25) + args = parser.parse_args() + start_game(str(args.port), '../', dim=args.dimension, max_strength=args.strength, max_turn=args.maxturn, + silent_bool=args.silent, timeout=args.timeout) diff --git a/public/MyBot.py b/public/MyBot.py index 217a1c0..a016b22 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -1,4 +1,9 @@ import sys +import os + +module_path = os.path.abspath(os.path.join('..')) +if module_path not in sys.path: + sys.path.append(module_path) mode = 'server' if (len(sys.argv) == 1) else 'local' mode = 'local' # TODO remove forcing diff --git a/public/models/agent/agent.py b/public/models/agent/agent.py index a545a5b..9610482 100644 --- a/public/models/agent/agent.py +++ b/public/models/agent/agent.py @@ -9,9 +9,9 @@ def __init__(self, name, experience): self.experience = experience if self.experience is not None: try: - self.experience.metric = np.load('models/' + self.name + '.npy') + self.experience.metric = np.load('../public/models/variables/'+self.name +'/'+self.name+'.npy') except: - print("New metric file created") + print("Metric file not found") self.experience.metric = np.array([]) def choose_actions(self, sess, game_state, debug=False): diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index e05c7bf..dcb954d 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -22,7 +22,7 @@ def __init__(self): self.sess = tf.Session() self.sess.run(init) try: - saver.restore(self.sess, 'models/' + self.bot.agent.name) + saver.restore(self.sess, './models/variables/' + self.agent.name+'/'+ self.agent.name) except Exception: print("Model not found - initiating new one") diff --git a/public/models/variables/README.md b/public/models/variables/README.md new file mode 100644 index 0000000..d1b81c2 --- /dev/null +++ b/public/models/variables/README.md @@ -0,0 +1,3 @@ +# Variables + +Here are, for instance, the stored Tensorflow models, with the convention of using the name of the agent for **both the folder and the files**. diff --git a/train/main.py b/train/main.py index 3b7e621..79df5c8 100644 --- a/train/main.py +++ b/train/main.py @@ -43,7 +43,7 @@ with tf.Session() as sess: sess.run(init) try: - saver.restore(sess, './public/models/' + master_agent.name) + saver.restore(sess, '../public/models/variables/' + master_agent.name+'/'+master_agent.name) except Exception: print("Model not found - initiating new one") diff --git a/train/worker.py b/train/worker.py index e2ac963..316a14b 100644 --- a/train/worker.py +++ b/train/worker.py @@ -1,12 +1,12 @@ import multiprocessing -import subprocess import time +import os import tensorflow as tf from networking.hlt_networking import HLT from train.reward import formatMoves, getGameState - +from networking.start_game import start_game def update_target_graph(from_scope, to_scope): from_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, from_scope) @@ -25,7 +25,7 @@ def __init__(self, port, number, agent): self.port = port + number def worker(): - subprocess.call(['./networking/runGameDebugConfig.sh', str(self.port)]) # runSimulation + start_game(self.port,'../') self.p = multiprocessing.Process(target=worker) self.p.start() @@ -59,7 +59,10 @@ def work(self, sess, coord, saver, n_simultations): self.agent.update_agent(sess) if self.number == 0: - saver.save(sess, './public/models/' + self.agent.name) - self.agent.experience.save_metric('./public/models/' + self.agent.name) + directory = '../public/models/variables/' + self.agent.name+'/' + if not os.path.exists(directory): + os.makedirs(directory) + saver.save(sess, directory+self.agent.name) + self.agent.experience.save_metric(directory+self.agent.name) self.p.terminate() From 5c840f93a4cda305584baf23b783edb0ada7279e Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Thu, 28 Sep 2017 17:29:17 +0200 Subject: [PATCH 20/45] Resolved all path and import issues Why this change was necessary: * Scripts needed to be executed from PyCharm (that understands the module architecture) * Relative path were problematic This change addresses the need by: * For the module either inserting in sys.path - see main.py - or by creating a context.py file, a practice explained in http://docs.python-guide.org/en/latest/writing/structure/, which is worth what it is... Potential side-effects: * This will work for any setting but if we need to restructure, all path will need to be modified manually... --- networking/kill.sh | 3 +++ networking/start_game.py | 21 +++++++++++++-------- public/MyBot.py | 11 +++-------- public/OpponentBot.py | 4 ++-- public/__init__.py | 0 public/context.py | 4 ++++ public/models/__init__.py | 0 public/models/agent/agent.py | 4 ++-- public/models/bot/trainedBot.py | 3 ++- src/main.cpp | 8 +++++--- train/__init__.py | 0 train/experience.py | 2 +- train/main.py | 9 ++++++--- train/worker.py | 5 +++-- 14 files changed, 44 insertions(+), 30 deletions(-) create mode 100755 networking/kill.sh delete mode 100644 public/__init__.py create mode 100644 public/context.py delete mode 100644 public/models/__init__.py delete mode 100644 train/__init__.py diff --git a/networking/kill.sh b/networking/kill.sh new file mode 100755 index 0000000..04014ce --- /dev/null +++ b/networking/kill.sh @@ -0,0 +1,3 @@ +#!/bin/bash + +kill $(ps aux | grep python | grep -v start_game.py | grep $1| awk '{print $2}'); \ No newline at end of file diff --git a/networking/start_game.py b/networking/start_game.py index c0bb6a7..2683028 100644 --- a/networking/start_game.py +++ b/networking/start_game.py @@ -1,25 +1,28 @@ import subprocess import argparse +import os - -def start_game(port, path_to_root, dim=10, max_strength=25, max_turn=25, silent_bool=True, timeout=False): - subprocess.call([path_to_root + "networking/kill.sh", str(port)]) - halite = path_to_root + 'public/halite ' +def start_game(port, dim=10, max_strength=25, max_turn=25, max_game=1,silent_bool=True, timeout=False, quiet=True): + path_to_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) + subprocess.call([path_to_root + "/networking/kill.sh", str(port)]) + halite = path_to_root + '/public/halite ' dimensions = '-d "' + str(dim) + ' ' + str(dim) + '" ' max_strength = '-z ' + str(max_strength) + ' ' max_turn = '-x ' + str(max_turn) + ' ' + max_game = '-g ' + str(max_game) + ' ' silent_bool = '-j ' if silent_bool else '' timeout = '-t ' if timeout else '' + quiet = '-q ' if quiet else '' players = [ - "python3 " + path_to_root + "networking/pipe_socket_translator.py " + str(port) + "python3 " + path_to_root + "/networking/pipe_socket_translator.py " + str(port) ] n_player = '' if len(players) > 1 else '-n 1 ' players = '"' + '" "'.join(players) + '"' print("Launching process") - subprocess.call(halite + dimensions + n_player + max_strength + max_turn + silent_bool + timeout + players, + subprocess.call(halite + dimensions + n_player + max_strength + max_turn + silent_bool + timeout + quiet + max_game +players, shell=True) @@ -28,9 +31,11 @@ def start_game(port, path_to_root, dim=10, max_strength=25, max_turn=25, silent_ parser.add_argument("-p", "--port", type=int, help="the port for the simulation", default=2000) parser.add_argument("-t", "--timeout", help="timeout", action="store_true") parser.add_argument("-j", "--silent", help="silent", action="store_true", default=True) + parser.add_argument("-q", "--quiet", help="quiet", action="store_true", default=False) parser.add_argument("-s", "--strength", help="max strength", type=int, default=25) parser.add_argument("-d", "--dimension", help="max dimension", type=int, default=10) parser.add_argument("-m", "--maxturn", help="max turn", type=int, default=25) + parser.add_argument("-g", "--maxgame", help="max game", type=int, default=1) # -1 for infinite game args = parser.parse_args() - start_game(str(args.port), '../', dim=args.dimension, max_strength=args.strength, max_turn=args.maxturn, - silent_bool=args.silent, timeout=args.timeout) + start_game(str(args.port), dim=args.dimension, max_strength=args.strength, max_turn=args.maxturn, + silent_bool=args.silent, timeout=args.timeout, max_game=args.maxgame, quiet=args.quiet) diff --git a/public/MyBot.py b/public/MyBot.py index a016b22..a649c51 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -1,9 +1,4 @@ import sys -import os - -module_path = os.path.abspath(os.path.join('..')) -if module_path not in sys.path: - sys.path.append(module_path) mode = 'server' if (len(sys.argv) == 1) else 'local' mode = 'local' # TODO remove forcing @@ -11,10 +6,10 @@ if mode == 'server': # 'server' mode import hlt else: # 'local' mode - from networking.hlt_networking import HLT + import context port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 - hlt = HLT(port=port) + hlt = context.HLT(port=port) from public.models.bot.trainedBot import TrainedBot @@ -22,8 +17,8 @@ while True: myID, game_map = hlt.get_init() - bot.setID(myID) hlt.send_init("MyBot") + bot.setID(myID) while (mode == 'server' or hlt.get_string() == 'Get map and play!'): game_map.get_frame(hlt.get_string()) diff --git a/public/OpponentBot.py b/public/OpponentBot.py index 49bd507..f6f002e 100644 --- a/public/OpponentBot.py +++ b/public/OpponentBot.py @@ -6,10 +6,10 @@ if mode == 'server': # 'server' mode import hlt else: # 'local' mode - from networking.hlt_networking import HLT + import context port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 - hlt = HLT(port=port) + hlt = context.HLT(port=port) from public.models.bot.improvedBot import ImprovedBot diff --git a/public/__init__.py b/public/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/public/context.py b/public/context.py new file mode 100644 index 0000000..8c7dc43 --- /dev/null +++ b/public/context.py @@ -0,0 +1,4 @@ +import sys +import os +sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) +from networking.hlt_networking import HLT \ No newline at end of file diff --git a/public/models/__init__.py b/public/models/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/public/models/agent/agent.py b/public/models/agent/agent.py index 9610482..7971c8e 100644 --- a/public/models/agent/agent.py +++ b/public/models/agent/agent.py @@ -1,5 +1,5 @@ import numpy as np - +import os from train.reward import localStateFromGlobal @@ -9,7 +9,7 @@ def __init__(self, name, experience): self.experience = experience if self.experience is not None: try: - self.experience.metric = np.load('../public/models/variables/'+self.name +'/'+self.name+'.npy') + self.experience.metric = np.load(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))+'/variables/'+self.name +'/'+self.name+'.npy') except: print("Metric file not found") self.experience.metric = np.array([]) diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index dcb954d..34cc077 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -2,6 +2,7 @@ from public.models.bot.bot import Bot from train.reward import formatMoves, getGameState import tensorflow as tf +import os class TrainedBot(Bot): def __init__(self): @@ -22,7 +23,7 @@ def __init__(self): self.sess = tf.Session() self.sess.run(init) try: - saver.restore(self.sess, './models/variables/' + self.agent.name+'/'+ self.agent.name) + saver.restore(self.sess, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) +'/variables/'+ self.agent.name+'/'+ self.agent.name) except Exception: print("Model not found - initiating new one") diff --git a/src/main.cpp b/src/main.cpp index 8ec037b..2bb2760 100644 --- a/src/main.cpp +++ b/src/main.cpp @@ -47,6 +47,7 @@ int main(int argc, char ** argv) { TCLAP::ValueArg nPlayersArg("n", "nplayers", "Create a map that will accommodate n players [SINGLE PLAYER MODE ONLY].", false, 1, "{1,2,3,4,5,6}", cmd); TCLAP::ValueArg< std::pair > dimensionArgs("d", "dimensions", "The dimensions of the map.", false, { 0, 0 }, "a string containing two space-seprated positive integers", cmd); TCLAP::ValueArg seedArg("s", "seed", "The seed for the map generator.", false, 0, "positive integer", cmd); + TCLAP::ValueArg max_game_args("g", "maxgame", "The max number of games.", false, 0, "integer", cmd); TCLAP::ValueArg custom_max_strength_args("z", "maxstrength", "The max strength.", false, 0, "positive integer", cmd); TCLAP::ValueArg customMaxTurnNumberArg("x", "maxturn", "The number of turns.", false, 0, "positive integer", cmd); //Remaining Args, be they start commands and/or override names. Description only includes start commands since it will only be seen on local testing. @@ -136,8 +137,9 @@ int main(int argc, char ** argv) { std::cout << std::endl << "A map can only accommodate between 1 and 6 players." << std::endl << std::endl; exit(1); } - - while(true){ + int ng = max_game_args.getValue(); + bool infinite_loop = ng==-1; + while(infinite_loop || ng-- > 0){ seed = (std::chrono::duration_cast(std::chrono::system_clock::now().time_since_epoch()).count() % 4294967295); my_game = new Halite(mapWidth, mapHeight, seed, n_players_for_map_creation, networking, ignore_timeout, custom_max_strength_args.getValue()); @@ -145,7 +147,7 @@ int main(int argc, char ** argv) { if(names != NULL) delete names; - //delete my_game; + } if(names != NULL) delete names; diff --git a/train/__init__.py b/train/__init__.py deleted file mode 100644 index e69de29..0000000 diff --git a/train/experience.py b/train/experience.py index 3d4c94b..aa1b501 100644 --- a/train/experience.py +++ b/train/experience.py @@ -50,4 +50,4 @@ def add_episode(self, game_states, moves): def batch(self, size=128): indices = np.random.randint(len(self.states), size=min(int(len(self.states) / 2), size)) - return self.states[indices], self.moves[indices], self.rewards[indices] + return self.states[indices], self.moves[indices], self.rewards[indices] \ No newline at end of file diff --git a/train/main.py b/train/main.py index 79df5c8..cf21c6d 100644 --- a/train/main.py +++ b/train/main.py @@ -1,9 +1,12 @@ import multiprocessing -import sys import threading - +import os +import sys import tensorflow as tf +os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' +sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) + from public.models.agent.vanillaAgent import VanillaAgent from train.experience import ExperienceVanilla from train.worker import Worker @@ -43,7 +46,7 @@ with tf.Session() as sess: sess.run(init) try: - saver.restore(sess, '../public/models/variables/' + master_agent.name+'/'+master_agent.name) + saver.restore(sess, os.path.abspath(os.path.dirname(__file__))+'/../public/models/variables/' + master_agent.name+'/'+master_agent.name) except Exception: print("Model not found - initiating new one") diff --git a/train/worker.py b/train/worker.py index 316a14b..9bafd89 100644 --- a/train/worker.py +++ b/train/worker.py @@ -25,7 +25,7 @@ def __init__(self, port, number, agent): self.port = port + number def worker(): - start_game(self.port,'../') + start_game(self.port, quiet=True, max_game=-1) # Infinite games self.p = multiprocessing.Process(target=worker) self.p.start() @@ -59,8 +59,9 @@ def work(self, sess, coord, saver, n_simultations): self.agent.update_agent(sess) if self.number == 0: - directory = '../public/models/variables/' + self.agent.name+'/' + directory = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))+'/public/models/variables/'+self.agent.name+'/' if not os.path.exists(directory): + print("Creating directory for agent :"+self.agent.name) os.makedirs(directory) saver.save(sess, directory+self.agent.name) self.agent.experience.save_metric(directory+self.agent.name) From 25339688a874368975077d1302071f808b6acc64 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Thu, 28 Sep 2017 17:56:10 +0200 Subject: [PATCH 21/45] Minor docs and PEP8 commit --- README.md | 2 +- docs/run.rst | 11 ++++++++++- public/context.py | 3 ++- public/models/agent/agent.py | 3 ++- public/models/bot/trainedBot.py | 6 ++++-- train/experience.py | 2 +- train/reward.py | 2 +- 7 files changed, 21 insertions(+), 8 deletions(-) diff --git a/README.md b/README.md index 5e1d25c..c0981cc 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -[![Build Status](https://travis-ci.org/louis-r/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/louis-r/Halite-Python-RL) [![Coverage Status](https://coveralls.io/repos/github/Edouard360/Halite-Python-RL/badge.svg?branch=master)](https://coveralls.io/github/Edouard360/Halite-Python-RL?branch=master) [![Documentation Status](https://readthedocs.org/projects/halite-python-rl/badge/?version=latest)](http://halite-python-rl.readthedocs.io/en/latest/?badge=latest) +[![Build Status](https://travis-ci.org/Edouard360/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/Edouard360/Halite-Python-RL) [![Coverage Status](https://coveralls.io/repos/github/Edouard360/Halite-Python-RL/badge.svg?branch=master)](https://coveralls.io/github/Edouard360/Halite-Python-RL?branch=master) [![Documentation Status](https://readthedocs.org/projects/halite-python-rl/badge/?version=latest)](http://halite-python-rl.readthedocs.io/en/latest/?badge=latest) # Halite-Python-RL diff --git a/docs/run.rst b/docs/run.rst index a8d042e..19c5ed4 100644 --- a/docs/run.rst +++ b/docs/run.rst @@ -5,9 +5,18 @@ Run the Bot ======================= +In your console: + cd networking python start_game.py + +In another tab + cd public python MyBot.py -This will run infinite games until one of the program stops. \ No newline at end of file +This will run 1 game. Options can be added to starting the game, among which: + +python start_game.py -g 5 -x 30 -z 50 + +Will run 5 games, of at most 30 turns, which at most squares of strength 50. \ No newline at end of file diff --git a/public/context.py b/public/context.py index 8c7dc43..6aa2548 100644 --- a/public/context.py +++ b/public/context.py @@ -1,4 +1,5 @@ import sys import os + sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) -from networking.hlt_networking import HLT \ No newline at end of file +from networking.hlt_networking import HLT diff --git a/public/models/agent/agent.py b/public/models/agent/agent.py index 7971c8e..a2c96e8 100644 --- a/public/models/agent/agent.py +++ b/public/models/agent/agent.py @@ -9,7 +9,8 @@ def __init__(self, name, experience): self.experience = experience if self.experience is not None: try: - self.experience.metric = np.load(os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))+'/variables/'+self.name +'/'+self.name+'.npy') + self.experience.metric = np.load(os.path.abspath(os.path.join(os.path.dirname(__file__), + '..')) + '/variables/' + self.name + '/' + self.name + '.npy') except: print("Metric file not found") self.experience.metric = np.array([]) diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index 34cc077..7cc0ec3 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -4,6 +4,7 @@ import tensorflow as tf import os + class TrainedBot(Bot): def __init__(self): lr = 1e-3; @@ -23,7 +24,8 @@ def __init__(self): self.sess = tf.Session() self.sess.run(init) try: - saver.restore(self.sess, os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) +'/variables/'+ self.agent.name+'/'+ self.agent.name) + saver.restore(self.sess, os.path.abspath(os.path.join(os.path.dirname(__file__), + '..')) + '/variables/' + self.agent.name + '/' + self.agent.name) except Exception: print("Model not found - initiating new one") @@ -32,4 +34,4 @@ def compute_moves(self, game_map): return formatMoves(game_map, self.agent.choose_actions(self.sess, game_state)) def close(self): - self.sess.close() \ No newline at end of file + self.sess.close() diff --git a/train/experience.py b/train/experience.py index aa1b501..3d4c94b 100644 --- a/train/experience.py +++ b/train/experience.py @@ -50,4 +50,4 @@ def add_episode(self, game_states, moves): def batch(self, size=128): indices = np.random.randint(len(self.states), size=min(int(len(self.states) / 2), size)) - return self.states[indices], self.moves[indices], self.rewards[indices] \ No newline at end of file + return self.states[indices], self.moves[indices], self.rewards[indices] diff --git a/train/reward.py b/train/reward.py index 81cf603..8b1c56a 100644 --- a/train/reward.py +++ b/train/reward.py @@ -1,7 +1,7 @@ import numpy as np +from public.hlt import NORTH, EAST, SOUTH, WEST, Move gamma = 0.8 -from public.hlt import NORTH, EAST, SOUTH, WEST, Move STRENGTH_SCALE = 255 PRODUCTION_SCALE = 10 From df8c3be8e32975e802f4fd25bbb9d43a1b59b0d9 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Thu, 28 Sep 2017 18:09:20 +0200 Subject: [PATCH 22/45] Correcting index.rst formatting --- docs/index.rst | 8 +++----- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/docs/index.rst b/docs/index.rst index 01d62dc..bbd0227 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -9,17 +9,15 @@ Welcome to Halite-Python-RL's documentation! Contents: .. toctree:: -:maxdepth: 2 - :caption: Contributors + :maxdepth: 2 + :caption: Contributors - run + run Indices and tables ================== -* :ref:`genindex` -* :ref:`modindex` * :ref:`search` From c13c87379afec00db49ea9b70b28b5dfa2461e42 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Thu, 28 Sep 2017 20:55:01 -0700 Subject: [PATCH 23/45] Created a file to store all Python files to PyLint check Why this change was necessary: * For now, rewrite all Python files to make them PyLint compliant is a pain. * Rather, let us add a file to the list of files to PyLint check as soon as it is updated This change addresses the need by: * Creating the pylint_checks.txt file, and modifying .travis_yaml Potential side-effects: * None --- .travis.yml | 6 +++++- pylint_checks.txt | 2 ++ train/__init__.py | 5 +++++ 3 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 pylint_checks.txt create mode 100644 train/__init__.py diff --git a/.travis.yml b/.travis.yml index ae492e7..4091153 100644 --- a/.travis.yml +++ b/.travis.yml @@ -24,7 +24,11 @@ script: # Tests - python -m unittest discover -v # Style checks - - pylint train/experience.py + # Temporary workaround + - while read line; do + pylint $line + done < pylint_checks.txt + # Coverage checks - py.test --cov=train tests/ diff --git a/pylint_checks.txt b/pylint_checks.txt new file mode 100644 index 0000000..eecdc7e --- /dev/null +++ b/pylint_checks.txt @@ -0,0 +1,2 @@ +train/__init__.py +train/experience.py diff --git a/train/__init__.py b/train/__init__.py new file mode 100644 index 0000000..7d80e79 --- /dev/null +++ b/train/__init__.py @@ -0,0 +1,5 @@ +# -*- coding: utf-8 -*- +""" +Contributors: + - Louis Rémus +""" From ec962ac67d0eb0f64fa3fccb48b802cbaa676ccb Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Thu, 28 Sep 2017 21:05:36 -0700 Subject: [PATCH 24/45] Corrected bug on Travis Why this change was necessary: * Bug This change addresses the need by: * Using for loop with semicolons Potential side-effects: * None --- .travis.yml | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/.travis.yml b/.travis.yml index 4091153..0beac11 100644 --- a/.travis.yml +++ b/.travis.yml @@ -25,9 +25,7 @@ script: - python -m unittest discover -v # Style checks # Temporary workaround - - while read line; do - pylint $line - done < pylint_checks.txt + - for i in `cat pylint_checks.txt` ; do pylint $i ;done # Coverage checks - py.test --cov=train tests/ From 57446962366c74bcfcaa3c7d05218ae59551c687 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Thu, 28 Sep 2017 21:15:16 -0700 Subject: [PATCH 25/45] Small test --- prout.lol | 2 ++ 1 file changed, 2 insertions(+) create mode 100644 prout.lol diff --git a/prout.lol b/prout.lol new file mode 100644 index 0000000..f9fb0f0 --- /dev/null +++ b/prout.lol @@ -0,0 +1,2 @@ +g +x From 847f4cd4166c8f226b1eba67f8f54df8645fa385 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Thu, 28 Sep 2017 21:17:52 -0700 Subject: [PATCH 26/45] Removing test file --- prout.lol | 2 -- 1 file changed, 2 deletions(-) delete mode 100644 prout.lol diff --git a/prout.lol b/prout.lol deleted file mode 100644 index f9fb0f0..0000000 --- a/prout.lol +++ /dev/null @@ -1,2 +0,0 @@ -g -x From 9e9bda6cf6c756de7d876d4a9180f0281b62a60f Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Tue, 3 Oct 2017 21:29:58 +0200 Subject: [PATCH 27/45] Visualization tools Why this change was necessary: * Better understanding how to set the reward. * Debugging purpose This change addresses the need by: * Add visualize folder providing powerful debugging features * Add the necessary requirements for it * Add options for start_game.py * Add links to docs * Complete docs index * Slightly changed the reward file * Change normalisation process (important) --- README.md | 4 ++ docs/README.md | 4 +- docs/index.rst | 11 ++--- docs/visualize.rst | 17 ++++++++ networking/runGameDebugConfig.sh | 3 -- networking/start_game.py | 12 +++--- public/models/agent/agent.py | 19 ++++++++- public/models/agent/vanillaAgent.py | 8 +++- public/models/bot/trainedBot.py | 4 ++ public/models/visualize_score.py | 14 ------- requirements.txt | 3 +- tests/util.py | 2 +- train/reward.py | 62 +++++++++++++++++++---------- 13 files changed, 109 insertions(+), 54 deletions(-) create mode 100644 docs/visualize.rst delete mode 100755 networking/runGameDebugConfig.sh delete mode 100644 public/models/visualize_score.py diff --git a/README.md b/README.md index c0981cc..c6605fa 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,10 @@ Halite is an open source artificial intelligence programming challenge, created by Two Sigma, where players build bots using the coding language of their choice to battle on a two-dimensional virtual board. The last bot standing or the bot with all the territory wins. Victory will require micromanaging of the movement of pieces, optimizing a bot’s combat ability, and braving a branching factor billions of times higher than that of Go. +## Documentation + +The documentation is available here. + ## Objective The objective of the project is to apply **Reinforcement Learning** strategies to teach the Bot to perform as well as possible. We teach an agent to learn the best actions to play at each turn. More precisely, given the game state, our untrained Bot **initially performs random actions, but gets rewarded for the good one**. Over time, the Bot automatically learns how to conquer efficiently the map. diff --git a/docs/README.md b/docs/README.md index 7d8b3af..5b3b98b 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1 +1,3 @@ -# Documentation \ No newline at end of file +# Documentation + +Go read the documentation [here](http://halite-python-rl.readthedocs.io/en/latest/?badge=latest). \ No newline at end of file diff --git a/docs/index.rst b/docs/index.rst index bbd0227..de7dabf 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,18 +1,19 @@ .. Halite-Python-RL documentation master file, created by -sphinx-quickstart on Wed Sep 27 07:49:44 2017. -You can adapt this file completely to your liking, but it should at least -contain the root `toctree` directive. Welcome to Halite-Python-RL's documentation! ============================================ -Contents: - .. toctree:: :maxdepth: 2 :caption: Contributors run + visualize + +.. toctree:: + :maxdepth: 2 + :caption: Contributors + Indices and tables diff --git a/docs/visualize.rst b/docs/visualize.rst new file mode 100644 index 0000000..ded38d8 --- /dev/null +++ b/docs/visualize.rst @@ -0,0 +1,17 @@ +.. _visualize: + + + +Visualize the Bot +======================= + +In your console: + +cd visualize +export FLASK_APP=visualize.py;flask run + +Then either: + +Look at http://127.0.0.1:5000/performance.png for performance insights. + +Or at http://127.0.0.1:5000/ for games replay. \ No newline at end of file diff --git a/networking/runGameDebugConfig.sh b/networking/runGameDebugConfig.sh deleted file mode 100755 index 6a28f81..0000000 --- a/networking/runGameDebugConfig.sh +++ /dev/null @@ -1,3 +0,0 @@ -#!/bin/bash - -kill $(ps aux | grep python | grep $1| awk '{print $2}'); ./public/halite -j -t -n 1 -z 25 -x 25 -d "10 10" "python3 ./networking/pipe_socket_translator.py $1"; \ No newline at end of file diff --git a/networking/start_game.py b/networking/start_game.py index 2683028..e29be76 100644 --- a/networking/start_game.py +++ b/networking/start_game.py @@ -2,9 +2,10 @@ import argparse import os -def start_game(port, dim=10, max_strength=25, max_turn=25, max_game=1,silent_bool=True, timeout=False, quiet=True): +def start_game(port, dim=10, max_strength=25, max_turn=25, max_game=1,silent_bool=True, timeout=True, quiet=True): path_to_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) subprocess.call([path_to_root + "/networking/kill.sh", str(port)]) + # subprocess.call([path_to_root + "/networking/kill.sh", str(port+1)]) # TODO automatic call to subprocess halite = path_to_root + '/public/halite ' dimensions = '-d "' + str(dim) + ' ' + str(dim) + '" ' @@ -15,8 +16,9 @@ def start_game(port, dim=10, max_strength=25, max_turn=25, max_game=1,silent_boo timeout = '-t ' if timeout else '' quiet = '-q ' if quiet else '' players = [ - "python3 " + path_to_root + "/networking/pipe_socket_translator.py " + str(port) + "python3 " + path_to_root + "/networking/pipe_socket_translator.py " + str(port), ] + # "python3 " + path_to_root + "/networking/pipe_socket_translator.py " + str(port+1) n_player = '' if len(players) > 1 else '-n 1 ' players = '"' + '" "'.join(players) + '"' @@ -29,13 +31,13 @@ def start_game(port, dim=10, max_strength=25, max_turn=25, max_game=1,silent_boo if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("-p", "--port", type=int, help="the port for the simulation", default=2000) - parser.add_argument("-t", "--timeout", help="timeout", action="store_true") - parser.add_argument("-j", "--silent", help="silent", action="store_true", default=True) + parser.add_argument("-t", "--timeout", help="timeout", action="store_true", default=False) + parser.add_argument("-j", "--silent", help="silent", action="store_true", default=False) parser.add_argument("-q", "--quiet", help="quiet", action="store_true", default=False) parser.add_argument("-s", "--strength", help="max strength", type=int, default=25) parser.add_argument("-d", "--dimension", help="max dimension", type=int, default=10) parser.add_argument("-m", "--maxturn", help="max turn", type=int, default=25) parser.add_argument("-g", "--maxgame", help="max game", type=int, default=1) # -1 for infinite game args = parser.parse_args() - start_game(str(args.port), dim=args.dimension, max_strength=args.strength, max_turn=args.maxturn, + start_game(port = args.port, dim=args.dimension, max_strength=args.strength, max_turn=args.maxturn, silent_bool=args.silent, timeout=args.timeout, max_game=args.maxgame, quiet=args.quiet) diff --git a/public/models/agent/agent.py b/public/models/agent/agent.py index a2c96e8..563b647 100644 --- a/public/models/agent/agent.py +++ b/public/models/agent/agent.py @@ -1,6 +1,6 @@ import numpy as np import os -from train.reward import localStateFromGlobal +from train.reward import localStateFromGlobal, normalizeGameState class Agent: @@ -15,13 +15,28 @@ def __init__(self, name, experience): print("Metric file not found") self.experience.metric = np.array([]) + def get_policies(self,sess, game_state): + policies = np.zeros(game_state[0].shape + (5,)) + for y in range(len(game_state[0])): + for x in range(len(game_state[0][0])): + if (game_state[0][y][x] == 1): + policies[y][x] = self.get_policy(sess, normalizeGameState(localStateFromGlobal(game_state, x, y))) + return policies + + def get_policy(self,sess, state): + pass + def choose_actions(self, sess, game_state, debug=False): + # Here the state is not yet normalized ! moves = np.zeros_like(game_state[0], dtype=np.int64) - 1 for y in range(len(game_state[0])): for x in range(len(game_state[0][0])): if (game_state[0][y][x] == 1): - moves[y][x] = self.choose_action(sess, localStateFromGlobal(game_state, x, y), debug=debug) + moves[y][x] = self.choose_action(sess, normalizeGameState(localStateFromGlobal(game_state, x, y)), debug=debug) return moves + def choose_action(self, sess, state, frac_progress=1.0, debug=False): + pass + def update_agent(self, sess): pass diff --git a/public/models/agent/vanillaAgent.py b/public/models/agent/vanillaAgent.py index 46bb265..b952f0d 100644 --- a/public/models/agent/vanillaAgent.py +++ b/public/models/agent/vanillaAgent.py @@ -6,8 +6,8 @@ class VanillaAgent(Agent): - def __init__(self, experience, lr, s_size, a_size, h_size): # all these are optional ? - super(VanillaAgent, self).__init__('vanilla', experience) + def __init__(self, experience, lr = 1e-3, s_size = 9 * 3, a_size = 5, h_size = 50): # all these are optional ? + super(VanillaAgent, self).__init__('vanilla-ter', experience) # These lines established the feed-forward part of the network. The agent takes a state and produces an action. self.state_in = tf.placeholder(shape=[None, s_size], dtype=tf.float32) @@ -40,7 +40,11 @@ def __init__(self, experience, lr, s_size, a_size, h_size): # all these are opt self.updateGlobal = optimizer.apply_gradients(zip(self.gradientHolders, global_vars)) # self.tvars + def get_policy(self,sess, state): + return sess.run(self.policy, feed_dict={self.state_in: [state.reshape(-1)]}) + def choose_action(self, sess, state, frac_progress=1.0, debug=False): # it only a state, not the game state... + # Here the state is normalized ! if (np.random.uniform() >= frac_progress): a = np.random.choice(range(5)) else: diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index 7cc0ec3..894337a 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -33,5 +33,9 @@ def compute_moves(self, game_map): game_state = getGameState(game_map, self.myID) return formatMoves(game_map, self.agent.choose_actions(self.sess, game_state)) + def get_policies(self, game_state): + # Warning this is not hereditary + return self.agent.get_policies(self.sess, game_state) + def close(self): self.sess.close() diff --git a/public/models/visualize_score.py b/public/models/visualize_score.py deleted file mode 100644 index 74d95a9..0000000 --- a/public/models/visualize_score.py +++ /dev/null @@ -1,14 +0,0 @@ -import matplotlib.pyplot as plt -import numpy as np -import pandas as pd - -rewards = [np.load('./models/vanilla.npy')] - -max_len = max([len(reward) for reward in rewards]) -for i in range(len(rewards)): - rewards[i] = np.append(rewards[i], np.repeat(np.nan, max_len - len(rewards[i]))) - -pd.DataFrame(np.array(rewards).T, columns=['vanilla']).rolling(100).mean().plot( - title="Weighted reward at each game. (Rolling average)") - -plt.show() diff --git a/requirements.txt b/requirements.txt index 46b8d63..8b7417f 100644 --- a/requirements.txt +++ b/requirements.txt @@ -3,4 +3,5 @@ coverage>=3.6 pytest-cov pytest-xdist coveralls -pylint \ No newline at end of file +pylint +flask \ No newline at end of file diff --git a/tests/util.py b/tests/util.py index 21d662b..6681bb4 100644 --- a/tests/util.py +++ b/tests/util.py @@ -18,4 +18,4 @@ def game_states_from_url(GAME_URL): moves = np.array(game['moves']) game_states = np.concatenate(([owner_frames, strength_frames, production_frames]), axis=1) - return game_states / np.array([1, 255, 10])[:, np.newaxis, np.newaxis], moves + return game_states, moves diff --git a/train/reward.py b/train/reward.py index 8b1c56a..f545dd6 100644 --- a/train/reward.py +++ b/train/reward.py @@ -1,7 +1,5 @@ import numpy as np -from public.hlt import NORTH, EAST, SOUTH, WEST, Move - -gamma = 0.8 +from public.hlt import NORTH, EAST, SOUTH, WEST, STILL, Move STRENGTH_SCALE = 255 PRODUCTION_SCALE = 10 @@ -11,20 +9,28 @@ def getGameState(game_map, myID): game_state = np.reshape( [[(square.owner == myID) + 0, square.strength, square.production] for square in game_map], [game_map.width, game_map.height, 3]) - return np.swapaxes(np.swapaxes(game_state, 2, 0), 1, 2) * ( - 1 / np.array([1, STRENGTH_SCALE, PRODUCTION_SCALE])[:, np.newaxis, np.newaxis]) + return np.swapaxes(np.swapaxes(game_state, 2, 0), 1, 2) + + +def normalizeGameState(game_state): + return game_state / np.array([1, STRENGTH_SCALE, PRODUCTION_SCALE])[:, np.newaxis, np.newaxis] def getGameProd(game_state): - return PRODUCTION_SCALE * np.sum(game_state[0] * game_state[2]) + return np.sum(game_state[0] * game_state[2]) def getStrength(game_state): - return game_state[1][1][ - 1] * STRENGTH_SCALE # np.sum([square.strength for square in game_map if square.owner == myID]) + return np.sum(game_state[0] * game_state[1]) + # np.sum([square.strength for square in game_map if square.owner == myID]) + + +def getNumber(game_state): + return np.sum(game_state[0]) + # np.sum([square.strength for square in game_map if square.owner == myID]) -def discount_rewards(r): +def discount_rewards(r, gamma=0.8): """ take 1D float array of rewards and compute discounted reward """ discounted_r = np.zeros_like(r, dtype=np.float64) running_add = 0 @@ -41,30 +47,46 @@ def localStateFromGlobal(game_state, x, y, size=1): def rawRewards(game_states): - return np.array([game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2] for i in - range(len(game_states) - 1)]) + return np.array([game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2] + for i in range(len(game_states) - 1)]) -def discountedReward(next_reward, move_before, discount_factor=1.0): +def strengthRewards(game_states): + return np.array([(getStrength(game_states[i + 1]) - getStrength(game_states[i])) + for i in range(len(game_states) - 1)]) + + +def discountedReward(next_reward, move_before, strength_before, discount_factor=1.0): reward = np.zeros_like(next_reward) + + def take_value(matrix, x, y): + return np.take(np.take(matrix, x, axis=1, mode='wrap'), y, axis=0, mode='wrap') + for y in range(len(reward)): for x in range(len(reward[0])): d = move_before[y][x] if d != -1: dy = (-1 if d == NORTH else 1) if (d == SOUTH or d == NORTH) else 0 dx = (-1 if d == WEST else 1) if (d == WEST or d == EAST) else 0 - reward[y][x] = discount_factor * np.take(np.take(next_reward, x + dx, axis=1, mode='wrap'), y + dy, - axis=0, mode='wrap') + discount_factor = discount_factor if (d != STILL or discount_factor == 1.0) else 0.9 + reward[y][x] = discount_factor * take_value(next_reward, x + dx, y + dy) if strength_before[y][ + x] >= take_value( + strength_before, x + dx, y + dy) else 0 + return reward -def discountedRewards(raw_rewards, moves): +def discountedRewards(game_states, moves): + raw_rewards = rawRewards(game_states) + # strength_rewards = strengthRewards(game_states) discounted_rewards = np.zeros_like(raw_rewards, dtype=np.float64) running_reward = np.zeros_like(raw_rewards[0]) for t in reversed(range(0, len(raw_rewards))): - running_reward = discountedReward(running_reward, moves[t], discount_factor=0.8) + discountedReward( - raw_rewards[t], moves[t]) - discounted_rewards[t] = running_reward + running_reward = discountedReward(running_reward, moves[t], game_states[t][1], + discount_factor=0.2) + discountedReward( + raw_rewards[t], moves[t], game_states[t][1]) + discounted_rewards[t] = running_reward # + 0.2*(moves[t]==STILL)*(game_states[t][2]) + ##TODO : HERE FOR STRENGTH ! INDEPENDENT return discounted_rewards @@ -75,7 +97,7 @@ def individualStatesAndRewards(game_state, move, discounted_reward): for y in range(len(game_state[0])): for x in range(len(game_state[0][0])): if (game_state[0][y][x] == 1): - states += [localStateFromGlobal(game_state, x, y)] + states += [normalizeGameState(localStateFromGlobal(game_state, x, y))] moves += [move[y][x]] rewards += [discounted_reward[y][x]] return states, moves, rewards @@ -95,7 +117,7 @@ def allIndividualStatesAndRewards(game_states, moves, discounted_rewards): def allRewards(game_states, moves): # game_states n+1, moves n - discounted_rewards = discountedRewards(rawRewards(game_states), moves) + discounted_rewards = discountedRewards(game_states, moves) return allIndividualStatesAndRewards(game_states[:-1], moves, discounted_rewards) From df2b30b32836508e39f4959420f7b3de06f3589d Mon Sep 17 00:00:00 2001 From: Eddie Date: Wed, 4 Oct 2017 07:57:47 +0200 Subject: [PATCH 28/45] Set theme jekyll-theme-cayman --- docs/_config.yml | 1 + 1 file changed, 1 insertion(+) create mode 100644 docs/_config.yml diff --git a/docs/_config.yml b/docs/_config.yml new file mode 100644 index 0000000..c419263 --- /dev/null +++ b/docs/_config.yml @@ -0,0 +1 @@ +theme: jekyll-theme-cayman \ No newline at end of file From 4541f014f9e1662587ce42bb6b57b17fa1b57b86 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 4 Oct 2017 08:08:58 +0200 Subject: [PATCH 29/45] Migrating to GitHub pages --- README.md | 4 +- docs/README.md | 30 ++++- docs/conf.py | 286 --------------------------------------------- docs/index.rst | 25 ---- docs/run.rst | 22 ---- docs/visualize.rst | 17 --- 6 files changed, 31 insertions(+), 353 deletions(-) delete mode 100644 docs/conf.py delete mode 100644 docs/index.rst delete mode 100644 docs/run.rst delete mode 100644 docs/visualize.rst diff --git a/README.md b/README.md index c6605fa..b99baed 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -[![Build Status](https://travis-ci.org/Edouard360/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/Edouard360/Halite-Python-RL) [![Coverage Status](https://coveralls.io/repos/github/Edouard360/Halite-Python-RL/badge.svg?branch=master)](https://coveralls.io/github/Edouard360/Halite-Python-RL?branch=master) [![Documentation Status](https://readthedocs.org/projects/halite-python-rl/badge/?version=latest)](http://halite-python-rl.readthedocs.io/en/latest/?badge=latest) +[![Build Status](https://travis-ci.org/Edouard360/Halite-Python-RL.svg?branch=master)](https://travis-ci.org/Edouard360/Halite-Python-RL) [![Coverage Status](https://coveralls.io/repos/github/Edouard360/Halite-Python-RL/badge.svg?branch=master)](https://coveralls.io/github/Edouard360/Halite-Python-RL?branch=master) # Halite-Python-RL @@ -13,7 +13,7 @@ ## Documentation -The documentation is available here. +The documentation is available here. ## Objective diff --git a/docs/README.md b/docs/README.md index 5b3b98b..bdfed08 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,3 +1,31 @@ # Documentation -Go read the documentation [here](http://halite-python-rl.readthedocs.io/en/latest/?badge=latest). \ No newline at end of file +Go read the documentation [here](https://edouard360.github.io/Halite-Python-RL/). + +## Run the Bot + +In your console: + +`cd networking python start_game.py` + +In another tab + +`cd public python MyBot.py` + +This will run 1 game. Options can be added to starting the game, among which: + +`python start_game.py -g 5 -x 30 -z 50` + +Will run 5 games, of at most 30 turns, which at most squares of strength 50. + +## Visualize the Bot + +In your console: + +`cd visualize export FLASK_APP=visualize.py;flask run` + +Then either: + +Look at http://127.0.0.1:5000/performance.png for performance insights. + +Or at http://127.0.0.1:5000/ for games replay. \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py deleted file mode 100644 index 82c9aa3..0000000 --- a/docs/conf.py +++ /dev/null @@ -1,286 +0,0 @@ -#!/usr/bin/env python3 -# -*- coding: utf-8 -*- -# -# Halite-Python-RL documentation build configuration file, created by -# sphinx-quickstart on Wed Sep 27 07:49:44 2017. -# -# This file is execfile()d with the current directory set to its -# containing dir. -# -# Note that not all possible configuration values are present in this -# autogenerated file. -# -# All configuration values have a default; values that are commented out -# serve to show the default. - -import sys -import os - -# If extensions (or modules to document with autodoc) are in another directory, -# add these directories to sys.path here. If the directory is relative to the -# documentation root, use os.path.abspath to make it absolute, like shown here. -# sys.path.insert(0, os.path.abspath('.')) - -# -- General configuration ------------------------------------------------ - -# If your documentation needs a minimal Sphinx version, state it here. -# needs_sphinx = '1.0' - -# Add any Sphinx extension module names here, as strings. They can be -# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom -# ones. -extensions = [] - -# Add any paths that contain templates here, relative to this directory. -templates_path = ['_templates'] - -# The suffix(es) of source filenames. -# You can specify multiple suffix as a list of string: -# source_suffix = ['.rst', '.md'] -source_suffix = '.rst' - -# The encoding of source files. -# source_encoding = 'utf-8-sig' - -# The master toctree document. -master_doc = 'index' - -# General information about the project. -project = 'Halite-Python-RL' -copyright = '2017, Edouard Mehlman' -author = 'Edouard Mehlman' - -# The version info for the project you're documenting, acts as replacement for -# |version| and |release|, also used in various other places throughout the -# built documents. -# -# The short X.Y version. -version = '1.0' -# The full version, including alpha/beta/rc tags. -release = '1.0' - -# The language for content autogenerated by Sphinx. Refer to documentation -# for a list of supported languages. -# -# This is also used if you do content translation via gettext catalogs. -# Usually you set "language" from the command line for these cases. -language = None - -# There are two options for replacing |today|: either, you set today to some -# non-false value, then it is used: -# today = '' -# Else, today_fmt is used as the format for a strftime call. -# today_fmt = '%B %d, %Y' - -# List of patterns, relative to source directory, that match files and -# directories to ignore when looking for source files. -# This patterns also effect to html_static_path and html_extra_path -exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store'] - -# The reST default role (used for this markup: `text`) to use for all -# documents. -# default_role = None - -# If true, '()' will be appended to :func: etc. cross-reference text. -# add_function_parentheses = True - -# If true, the current module name will be prepended to all description -# unit titles (such as .. function::). -# add_module_names = True - -# If true, sectionauthor and moduleauthor directives will be shown in the -# output. They are ignored by default. -# show_authors = False - -# The name of the Pygments (syntax highlighting) style to use. -pygments_style = 'sphinx' - -# A list of ignored prefixes for module index sorting. -# modindex_common_prefix = [] - -# If true, keep warnings as "system message" paragraphs in the built documents. -# keep_warnings = False - -# If true, `todo` and `todoList` produce output, else they produce nothing. -todo_include_todos = False - -# -- Options for HTML output ---------------------------------------------- - -# The theme to use for HTML and HTML Help pages. See the documentation for -# a list of builtin themes. -html_theme = 'sphinx_rtd_theme' - -# Theme options are theme-specific and customize the look and feel of a theme -# further. For a list of options available for each theme, see the -# documentation. -# html_theme_options = {} - -# Add any paths that contain custom themes here, relative to this directory. -# html_theme_path = [] - -# The name for this set of Sphinx documents. -# " v documentation" by default. -# html_title = 'Halite-Python-RL v1.0' - -# A shorter title for the navigation bar. Default is the same as html_title. -# html_short_title = None - -# The name of an image file (relative to this directory) to place at the top -# of the sidebar. -# html_logo = None - -# The name of an image file (relative to this directory) to use as a favicon of -# the docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32 -# pixels large. -# html_favicon = None - -# Add any paths that contain custom static files (such as style sheets) here, -# relative to this directory. They are copied after the builtin static files, -# so a file named "default.css" will overwrite the builtin "default.css". -html_static_path = ['_static'] - -# Add any extra paths that contain custom files (such as robots.txt or -# .htaccess) here, relative to this directory. These files are copied -# directly to the root of the documentation. -# html_extra_path = [] - -# If not None, a 'Last updated on:' timestamp is inserted at every page -# bottom, using the given strftime format. -# The empty string is equivalent to '%b %d, %Y'. -# html_last_updated_fmt = None - -# If true, SmartyPants will be used to convert quotes and dashes to -# typographically correct entities. -# html_use_smartypants = True - -# Custom sidebar templates, maps document names to template names. -# html_sidebars = {} - -# Additional templates that should be rendered to pages, maps page names to -# template names. -# html_additional_pages = {} - -# If false, no module index is generated. -# html_domain_indices = True - -# If false, no index is generated. -# html_use_index = True - -# If true, the index is split into individual pages for each letter. -# html_split_index = False - -# If true, links to the reST sources are added to the pages. -# html_show_sourcelink = True - -# If true, "Created using Sphinx" is shown in the HTML footer. Default is True. -# html_show_sphinx = True - -# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True. -# html_show_copyright = True - -# If true, an OpenSearch description file will be output, and all pages will -# contain a tag referring to it. The value of this option must be the -# base URL from which the finished HTML is served. -# html_use_opensearch = '' - -# This is the file name suffix for HTML files (e.g. ".xhtml"). -# html_file_suffix = None - -# Language to be used for generating the HTML full-text search index. -# Sphinx supports the following languages: -# 'da', 'de', 'en', 'es', 'fi', 'fr', 'h', 'it', 'ja' -# 'nl', 'no', 'pt', 'ro', 'r', 'sv', 'tr', 'zh' -# html_search_language = 'en' - -# A dictionary with options for the search language support, empty by default. -# 'ja' uses this config value. -# 'zh' user can custom change `jieba` dictionary path. -# html_search_options = {'type': 'default'} - -# The name of a javascript file (relative to the configuration directory) that -# implements a search results scorer. If empty, the default will be used. -# html_search_scorer = 'scorer.js' - -# Output file base name for HTML help builder. -htmlhelp_basename = 'Halite-Python-RLdoc' - -# -- Options for LaTeX output --------------------------------------------- - -latex_elements = { - # The paper size ('letterpaper' or 'a4paper'). - # 'papersize': 'letterpaper', - - # The font size ('10pt', '11pt' or '12pt'). - # 'pointsize': '10pt', - - # Additional stuff for the LaTeX preamble. - # 'preamble': '', - - # Latex figure (float) alignment - # 'figure_align': 'htbp', -} - -# Grouping the document tree into LaTeX files. List of tuples -# (source start file, target name, title, -# author, documentclass [howto, manual, or own class]). -latex_documents = [ - (master_doc, 'Halite-Python-RL.tex', 'Halite-Python-RL Documentation', - 'Edouard Mehlman', 'manual'), -] - -# The name of an image file (relative to this directory) to place at the top of -# the title page. -# latex_logo = None - -# For "manual" documents, if this is true, then toplevel headings are parts, -# not chapters. -# latex_use_parts = False - -# If true, show page references after internal links. -# latex_show_pagerefs = False - -# If true, show URL addresses after external links. -# latex_show_urls = False - -# Documents to append as an appendix to all manuals. -# latex_appendices = [] - -# If false, no module index is generated. -# latex_domain_indices = True - - -# -- Options for manual page output --------------------------------------- - -# One entry per manual page. List of tuples -# (source start file, name, description, authors, manual section). -man_pages = [ - (master_doc, 'halite-python-rl', 'Halite-Python-RL Documentation', - [author], 1) -] - -# If true, show URL addresses after external links. -# man_show_urls = False - - -# -- Options for Texinfo output ------------------------------------------- - -# Grouping the document tree into Texinfo files. List of tuples -# (source start file, target name, title, author, -# dir menu entry, description, category) -texinfo_documents = [ - (master_doc, 'Halite-Python-RL', 'Halite-Python-RL Documentation', - author, 'Halite-Python-RL', 'One line description of project.', - 'Miscellaneous'), -] - -# Documents to append as an appendix to all manuals. -# texinfo_appendices = [] - -# If false, no module index is generated. -# texinfo_domain_indices = True - -# How to display URL addresses: 'footnote', 'no', or 'inline'. -# texinfo_show_urls = 'footnote' - -# If true, do not generate a @detailmenu in the "Top" node's menu. -# texinfo_no_detailmenu = False diff --git a/docs/index.rst b/docs/index.rst deleted file mode 100644 index de7dabf..0000000 --- a/docs/index.rst +++ /dev/null @@ -1,25 +0,0 @@ -.. Halite-Python-RL documentation master file, created by - -Welcome to Halite-Python-RL's documentation! -============================================ - -.. toctree:: - :maxdepth: 2 - :caption: Contributors - - run - visualize - -.. toctree:: - :maxdepth: 2 - :caption: Contributors - - - -Indices and tables -================== - -* :ref:`search` - - - diff --git a/docs/run.rst b/docs/run.rst deleted file mode 100644 index 19c5ed4..0000000 --- a/docs/run.rst +++ /dev/null @@ -1,22 +0,0 @@ -.. _run: - - - -Run the Bot -======================= - -In your console: - -cd networking -python start_game.py - -In another tab - -cd public -python MyBot.py - -This will run 1 game. Options can be added to starting the game, among which: - -python start_game.py -g 5 -x 30 -z 50 - -Will run 5 games, of at most 30 turns, which at most squares of strength 50. \ No newline at end of file diff --git a/docs/visualize.rst b/docs/visualize.rst deleted file mode 100644 index ded38d8..0000000 --- a/docs/visualize.rst +++ /dev/null @@ -1,17 +0,0 @@ -.. _visualize: - - - -Visualize the Bot -======================= - -In your console: - -cd visualize -export FLASK_APP=visualize.py;flask run - -Then either: - -Look at http://127.0.0.1:5000/performance.png for performance insights. - -Or at http://127.0.0.1:5000/ for games replay. \ No newline at end of file From 052e2682a1996b3d15703f44c6eac80e8bd18c1b Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 4 Oct 2017 08:10:33 +0200 Subject: [PATCH 30/45] Adding visualize folder commit --- visualize/static/localVisualizer.js | 34 ++ visualize/static/parsereplay.js | 380 +++++++++++++++ visualize/static/visualizer.js | 716 ++++++++++++++++++++++++++++ visualize/templates/reward.py | 10 + visualize/templates/visualizer.html | 28 ++ visualize/visualize.py | 89 ++++ 6 files changed, 1257 insertions(+) create mode 100644 visualize/static/localVisualizer.js create mode 100644 visualize/static/parsereplay.js create mode 100644 visualize/static/visualizer.js create mode 100644 visualize/templates/reward.py create mode 100644 visualize/templates/visualizer.html create mode 100644 visualize/visualize.py diff --git a/visualize/static/localVisualizer.js b/visualize/static/localVisualizer.js new file mode 100644 index 0000000..e18cb86 --- /dev/null +++ b/visualize/static/localVisualizer.js @@ -0,0 +1,34 @@ +$(function () { + var $dropZone = $("html"); + var $filePicker = $("#filePicker"); + function handleFiles(files) { + // only use the first file. + file = files[0]; + console.log(file) + var reader = new FileReader(); + + reader.onload = (function(filename) { // finished reading file data. + return function(e2) { + $("#displayArea").empty(); + var fsHeight = $("#fileSelect").outerHeight(); + showGame(textToGame(e2.target.result, filename), $("#displayArea"), null, -fsHeight, true, false, true); + }; + })(file.name); + reader.readAsText(file); // start reading the file data. + } + + $dropZone.on('dragover', function(e) { + e.stopPropagation(); + e.preventDefault(); + }); + $dropZone.on('drop', function(e) { + e.stopPropagation(); + e.preventDefault(); + var files = e.originalEvent.dataTransfer.files; // Array of all files + handleFiles(files) + }); + $filePicker.on('change', function(e) { + var files = e.target.files + handleFiles(files) + }); +}) diff --git a/visualize/static/parsereplay.js b/visualize/static/parsereplay.js new file mode 100644 index 0000000..9a114a7 --- /dev/null +++ b/visualize/static/parsereplay.js @@ -0,0 +1,380 @@ +function processFrame(game, frameNum) { + var checkSim = false; + var gameMap = game.frames[frameNum]; + if(checkSim) { + gameMap = _.cloneDeep(game.frames[frameNum]); + } + var moves = game.moves[frameNum]; + var productions = game.productions; + var width = game.width; + var height = game.height; + var numPlayers = game.num_players; + + var STILL = 0; + var NORTH = 1; + var EAST = 2; + var SOUTH = 3; + var WEST = 4; + + var pieces = []; + var stats = []; + + var p, q, y, x; + + function getLocation(loc, direction) { + if (direction === STILL) { + // nothing + } else if (direction === NORTH) { + loc.y -= 1; + } else if (direction === EAST) { + loc.x += 1; + } else if (direction === SOUTH) { + loc.y += 1; + } else if (direction === WEST) { + loc.x -= 1; + } + + if (loc.x < 0) { + loc.x = width - 1; + } else { + loc.x %= width; + } + + if (loc.y < 0) { + loc.y = height - 1; + } else { + loc.y %= height; + } + } + + for (p = 0; p < numPlayers; p++) { + pieces[p] = []; + stats[p] = { + actualProduction: 0, + playerDamageDealt: 0, + environmentDamageDealt: 0, + damageTaken: 0, + capLosses: 0, + overkillDamage: 0, + }; + for (y = 0; y < height; y++) { + pieces[p][y] = []; + } + } + + for (y = 0; y < height; y++) { + for (x = 0; x < width; x++) { + var direction = moves[y][x]; + var cell = gameMap[y][x]; + var player = gameMap[y][x].owner - 1; + var production = productions[y][x]; + + if (gameMap[y][x].owner == 0) continue + + if (direction === STILL) { + cell = { owner: gameMap[y][x].owner, strength: gameMap[y][x].strength }; + if (cell.strength + production <= 255) { + stats[player].actualProduction += production; + cell.strength += production; + } else { + stats[player].actualProduction += cell.strength - 255; + stats[player].capLosses += cell.strength + production - 255; + cell.strength = 255; + } + } + + var newLoc = { x: x, y: y }; + getLocation(newLoc, direction); + if (!_.isUndefined(pieces[player][newLoc.y][newLoc.x])) { + if (pieces[player][newLoc.y][newLoc.x] + cell.strength <= 255) { + pieces[player][newLoc.y][newLoc.x] += cell.strength; + } else { + stats[player].capLosses += pieces[player][newLoc.y][newLoc.x] + cell.strength - 255; + pieces[player][newLoc.y][newLoc.x] = 255; + } + } else { + pieces[player][newLoc.y][newLoc.x] = cell.strength; + } + + // add in a new piece with a strength of 0 if necessary + if (_.isUndefined(pieces[player][y][x])) { + pieces[player][y][x] = 0; + } + + // erase from the game map so that the player can't make another move with the same piece + // On second thought, trust that the original game took care of that. + if(checkSim) { + gameMap[y][x] = { owner: 0, strength: 0 }; + } + } + } + + var toInjure = []; + var injureMap = []; + + for (p = 0; p < numPlayers; p++) { + toInjure[p] = []; + for (y = 0; y < height; y++) { + toInjure[p][y] = []; + } + } + + for (y = 0; y < height; y++) { + injureMap[y] = []; + for (x = 0; x < width; x++) { + injureMap[y][x] = 0; + } + } + + for (y = 0; y < height; y++) { + for (x = 0; x < width; x++) { + for (p = 0; p < numPlayers; p++) { + // if player p has a piece at these coords + if (!_.isUndefined(pieces[p][y][x])) { + var damageDone = 0; + // look for other players with pieces here + for (q = 0; q < numPlayers; q++) { + // exclude the same player + if (p !== q) { + for (var dir = STILL; dir <= WEST; dir++) { + // check STILL square + var loc = { x: x, y: y }; + getLocation(loc, dir); + + // if the other player has a piece here + if (!_.isUndefined(pieces[q][loc.y][loc.x])) { + // add player p's damage + if (!_.isUndefined(toInjure[q][loc.y][loc.x])) { + toInjure[q][loc.y][loc.x] += pieces[p][y][x]; + stats[p].playerDamageDealt += pieces[p][y][x]; + damageDone += Math.min(pieces[p][y][x], pieces[q][loc.y][loc.x]); + } else { + toInjure[q][loc.y][loc.x] = pieces[p][y][x]; + stats[p].playerDamageDealt += pieces[p][y][x]; + damageDone += Math.min(pieces[p][y][x], pieces[q][loc.y][loc.x]); + } + } + } + } + } + + // if the environment can do damage back + if (gameMap[y][x].owner == 0 && gameMap[y][x].strength > 0) { + if (!_.isUndefined(toInjure[p][y][x])) { + toInjure[p][y][x] += gameMap[y][x].strength; + } else { + toInjure[p][y][x] = gameMap[y][x].strength; + } + // and apply damage to the environment + injureMap[y][x] += pieces[p][y][x]; + damageDone += Math.min(pieces[p][y][x], gameMap[y][x].strength); + stats[p].environmentDamageDealt += Math.min(pieces[p][y][x], gameMap[y][x].strength); + } + + if (damageDone > pieces[p][y][x]) { + stats[p].overkillDamage += damageDone - pieces[p][y][x]; + } + } + } + } + } + + // injure and/or delete pieces. Note >= rather than > indicates that pieces with a strength of 0 are killed. + for (p = 0; p < numPlayers; p++) { + for (y = 0; y < height; y++) { + for (x = 0; x < width; x++) { + if (!_.isUndefined(toInjure[p][y][x])) { + if (toInjure[p][y][x] >= pieces[p][y][x]) { + stats[p].damageTaken += pieces[p][y][x]; + pieces[p][y][x] = undefined; + } else { + stats[p].damageTaken += toInjure[p][y][x]; + pieces[p][y][x] -= toInjure[p][y][x]; + } + } + } + } + } + + if(checkSim) { + // apply damage to map pieces + for (y = 0; y < height; y++) { + for (x = 0; x < width; x++) { + if (gameMap[y][x].strength < injureMap[y][x]) { + gameMap[y][x].strength = 0; + } else { + gameMap[y][x].strength -= injureMap[y][x] + } + gameMap[y][x].owner = 0; + } + } + + // add pieces back into the map + for (p = 0; p < numPlayers; p++) { + for (y = 0; y < height; y++) { + for (x = 0; x < width; x++) { + if (!_.isUndefined(pieces[p][y][x])) { + gameMap[y][x].owner = p + 1; + gameMap[y][x].strength = pieces[p][y][x]; + } + } + } + } + + if (frameNum + 1 < gameMap.num_frames - 1) { + if (!_.isEqual(gameMap, game.frames[frameNum + 1])) { + throw new Error("Evaluated frame did not match actual game map for frame number " + frameNum); + } + } + } + + return stats; +} + +function textToGame(text, seed) { + var startParse = new Date(); + console.log("Starting parse at", startParse); + var game = JSON.parse(text) + + if (game.version != 11) { + alert("Invalid version number: " + json_game.version); + } + + //Adds determinism (when used with https://github.com/davidbau/seedrandom) to color scramble. + console.log(seed); + + //Hardcoding colors: + var colors = [ + '0x04E6F2', + '0x424C8F',//, + '0xF577F2', + '0x23D1DE', + '0xB11243', + '0xFF704B', + '0x00B553', + '0xF8EC31' + ]; + + var x, i; + + game.players = [] + game.players.push({name: 'NULL', color: "0x888888"}); + for(i = 0; i < game.num_players; i++) { + game.players.push({name: game.player_names[i], color: colors[i] }); + console.log(game.players[game.players.length - 1].color); + } + delete game.player_names; + + console.log(game.players); + + var maxProd = 0; + for(var a = 0; a < game.height; a++) { + for(var b = 0; b < game.width; b++) { + if(game.productions[a][b] > maxProd) maxProd = game.productions[a][b]; + } + } + + game.productionNormals = [] + for(var a = 0; a < game.height; a++) { + var row = [] + for(var b = 0; b < game.width; b++) { + row.push(game.productions[a][b] / maxProd); + } + game.productionNormals.push(row) + } + + for(var a = 0; a < game.num_frames; a++) { + for(var b = 0; b < game.height; b++) { + for(var c = 0; c < game.width; c++) { + var array = game.frames[a][b][c]; + game.frames[a][b][c] = { owner: array[0], strength: array[1] }; + } + } + } + + var stats = []; + for(var a = 0; a < game.num_frames - 1; a++) { + stats[a+1] = processFrame(game, a); + } + + //Get game statistics: + for(var a = 1; a <= game.num_players; a++) { + game.players[a].territories = []; + game.players[a].productions = []; + game.players[a].strengths = []; + game.players[a].actualProduction = []; + game.players[a].playerDamageDealt = []; + game.players[a].environmentDamageDealt = []; + game.players[a].damageTaken = []; + game.players[a].capLosses = []; + + for(var b = 0; b < game.num_frames; b++) { + var ter = 0, prod = 0, str = 0; + for(var c = 0; c < game.height; c++) for(var d = 0; d < game.width; d++) { + if(game.frames[b][c][d].owner == a) { + ter++; + prod += game.productions[c][d]; + str += game.frames[b][c][d].strength; + } + } + game.players[a].territories.push(ter); + game.players[a].productions.push(prod); + game.players[a].strengths.push(str); + if (b == 0) { + game.players[a].actualProduction.push(0); + game.players[a].environmentDamageDealt.push(0); + game.players[a].damageTaken.push(0); + game.players[a].playerDamageDealt.push(0); + game.players[a].capLosses.push(0); + } + else { + game.players[a].actualProduction.push(game.players[a].actualProduction[b - 1] + stats[b][a - 1].actualProduction); + game.players[a].environmentDamageDealt.push(game.players[a].environmentDamageDealt[b - 1] + stats[b][a - 1].environmentDamageDealt); + game.players[a].damageTaken.push(game.players[a].damageTaken[b - 1] + stats[b][a - 1].damageTaken - stats[b][a - 1].environmentDamageDealt); + game.players[a].playerDamageDealt.push(game.players[a].playerDamageDealt[b - 1] + stats[b][a - 1].overkillDamage); + game.players[a].capLosses.push(game.players[a].capLosses[b - 1] + stats[b][a - 1].capLosses); + } + } + } + + //Normalize game statistics for display + var maxPlayerTer = 0, maxPlayerProd = 0, maxPlayerStr = 0, maxActProd = 0; + var maxPlrDmgDlt = 0, maxEnvDmgDlt = 0, maxDmgTkn = 0, maxCapLoss = 0; + for(var a = 1; a <= game.num_players; a++) { + for(var b = 0; b < game.num_frames; b++) { + if(game.players[a].territories[b] > maxPlayerTer) maxPlayerTer = game.players[a].territories[b]; + if(game.players[a].productions[b] > maxPlayerProd) maxPlayerProd = game.players[a].productions[b]; + if(game.players[a].strengths[b] > maxPlayerStr) maxPlayerStr = game.players[a].strengths[b]; + if(game.players[a].actualProduction[b] > maxActProd) maxActProd = game.players[a].actualProduction[b]; + if(game.players[a].playerDamageDealt[b] > maxPlrDmgDlt) maxPlrDmgDlt = game.players[a].playerDamageDealt[b]; + if(game.players[a].environmentDamageDealt[b] > maxEnvDmgDlt) maxEnvDmgDlt = game.players[a].environmentDamageDealt[b]; + if(game.players[a].damageTaken[b] > maxDmgTkn) maxDmgTkn = game.players[a].damageTaken[b]; + if(game.players[a].capLosses[b] > maxCapLoss) maxCapLoss = game.players[a].capLosses[b]; + } + } + for(var a = 1; a <= game.num_players; a++) { + game.players[a].normTers = []; + game.players[a].normProds = []; + game.players[a].normStrs = []; + game.players[a].normActProd = []; + game.players[a].normPlrDmgDlt = []; + game.players[a].normEnvDmgDlt = []; + game.players[a].normDmgTkn = []; + game.players[a].normCapLoss = []; + for(var b = 0; b < game.num_frames; b++) { + game.players[a].normTers.push(game.players[a].territories[b] / maxPlayerTer); + game.players[a].normProds.push(game.players[a].productions[b] / maxPlayerProd); + game.players[a].normStrs.push(game.players[a].strengths[b] / maxPlayerStr); + game.players[a].normActProd.push(game.players[a].actualProduction[b] / maxActProd); + game.players[a].normPlrDmgDlt.push(game.players[a].playerDamageDealt[b] / maxPlrDmgDlt); + game.players[a].normEnvDmgDlt.push(game.players[a].environmentDamageDealt[b] / maxEnvDmgDlt); + game.players[a].normDmgTkn.push(game.players[a].damageTaken[b] / maxDmgTkn); + game.players[a].normCapLoss.push(game.players[a].capLosses[b] / maxCapLoss); + } + } + + var endParse = new Date(); + console.log("Finished parse at", endParse); + console.log("Parse took", (endParse - startParse) / 1000); + return game +} diff --git a/visualize/static/visualizer.js b/visualize/static/visualizer.js new file mode 100644 index 0000000..4b34b13 --- /dev/null +++ b/visualize/static/visualizer.js @@ -0,0 +1,716 @@ +var renderer; + +function initPixi() { + //Create the root of the scene: stage: + stage = new PIXI.Container(); + + // Initialize the pixi graphics class for the map: + mapGraphics = new PIXI.Graphics(); + + // Initialize the pixi graphics class for the graphs: + graphGraphics = new PIXI.Graphics(); + + // Initialize the text container; + prodContainer = new PIXI.Container(); + + possessContainer = new PIXI.Container(); + + // Initialize the text container; + strengthContainer = new PIXI.Container(); + + // Initialize the text container; + rewardContainer = new PIXI.Container(); + + // Initialize the text container; + policyContainer = new PIXI.Container(); + + renderer = PIXI.autoDetectRenderer(0, 0, { backgroundColor: 0x000000, antialias: true, transparent: true }); +} + +function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal, offline, seconds) { + if(renderer == null) initPixi(); + + $container.empty(); + + if(!isminimal) { + var $row = $("

"); + $row.append($("
")); + $row.append($("
").append($("

"+game.players.slice(1, game.num_players+1).map(function(p) { + var nameComponents = p.name.split(" "); + var name = nameComponents.slice(0, nameComponents.length-1).join(" ").trim(); + console.log(name); + var user = offline ? null : getUser(null, name); + if(user) { + return ""+p.name+"" + } else { + return ""+p.name+"" + } + }).join(" vs ")+"

"))); + $container.append($row); + } + $container.append(renderer.view); + $container.append($("
")); + + var showExtended = false; + var frame = 0; + var transit = 0; + var framespersec = seconds == null ? 3 : game.num_frames / seconds; + var shouldplay = true; + var xOffset = 0, yOffset = 0; + var zoom = 8; + if(game.num_frames / zoom < 3) zoom = game.num_frames / 3; + if(zoom < 1) zoom = 1; + function centerStartPositions() { + var minX = game.width, maxX = 0, minY = game.height, maxY = 0; + // find the initial bounding box of all players + for(var x=0; x < game.width; x++) { + for(var y=0; y < game.height; y++) { + if(game.frames[0][y][x].owner != 0) { + if(x < minX) { minX = x; } + if(x > maxX) { maxX = x; } + if(y < minY) { minY = y; } + if(y > maxY) { maxY = y; } + } + } + } + // offset by half the difference from the edges rounded toward zero + xOffset = ((game.width - 1 - maxX - minX) / 2) | 0; + yOffset = ((game.height - 1 - maxY - minY) / 2) | 0; + } + centerStartPositions(); + + discountedRewards = undefined + $.ajax({ + type: "POST", + url: '/post_discounted_rewards', + data: JSON.stringify(game), + success: function(data) {discountedRewards = JSON.parse(data)['discountedRewards']}, + contentType: "application/json; charset=utf-8", + //dataType: "json" + }) + + policies = undefined + $.ajax({ + type: "POST", + url: '/post_policies', + data: JSON.stringify(game), + success: function(data) {policies = JSON.parse(data)['policies']}, + contentType: "application/json; charset=utf-8", + //dataType: "json" + }) + + window.onresize = function() { + var allowedWidth = (maxWidth == null ? $container.width() : maxWidth); + var allowedHeight = window.innerHeight - (25 + $("canvas").offset().top); + if(maxHeight != null) { + if(maxHeight > 0) { + allowedHeight = maxHeight - ($("canvas").offset().top - $container.offset().top); + } else { + // A negative maxHeight signifies extra space to leave for + // other page elements following the visualizer + allowedHeight += maxHeight; + } + } + + console.log(window.innerHeight) + console.log(allowedHeight) + var definingDimension = Math.min(allowedWidth, allowedHeight); + if(isminimal) { + if(allowedWidth < allowedHeight) { + sw = allowedWidth, sh = allowedWidth; + } else { + sw = allowedHeight, sh = allowedHeight; + } + mw = sh, mh = sh; + renderer.resize(sw, sh); + rw = mw / game.width, rh = mh / game.height; //Sizes of rectangles for rendering tiles. + } + else { + var splits = showExtended ? 5 : 4; + if(allowedWidth < allowedHeight*splits/3) { + sw = allowedWidth, sh = allowedWidth*3/splits; + } else { + sw = allowedHeight*splits/3, sh = allowedHeight; + } + mw = sh, mh = sh; + renderer.resize(sw, sh); + rw = mw / game.width, rh = mh / game.height; //Sizes of rectangles for rendering tiles. + if(showExtended) { + LEFT_GRAPH_LEFT = mw * 1.025, LEFT_GRAPH_RIGHT = LEFT_GRAPH_LEFT + sw * 0.17; + } else { + LEFT_GRAPH_LEFT = mw * 1.025, LEFT_GRAPH_RIGHT = sw - 1; + } + RIGHT_GRAPH_LEFT = mw * 1.35, RIGHT_GRAPH_RIGHT = RIGHT_GRAPH_LEFT + sw * 0.17; + + if(showExtended) { + TER_TOP = sh * 0.09, TER_BTM = sh * 0.29; + PROD_TOP = sh * 0.33, PROD_BTM = sh * 0.53; + STR_TOP = sh * 0.57, STR_BTM = sh * 0.77; + } else { + TER_TOP = sh * 0.09, TER_BTM = sh * 0.36; + PROD_TOP = sh * 0.41, PROD_BTM = sh * 0.675; + STR_TOP = sh * 0.725, STR_BTM = sh * 0.99; + } + + ENV_DMG_TOP = sh * 0.09, ENV_DMG_BTM = sh * 0.29; + ACT_PROD_TOP = sh * 0.33, ACT_PROD_BTM = sh * 0.53; + CAP_LOSS_TOP = sh * 0.57, CAP_LOSS_BTM = sh * 0.77; + + PLR_DMG_TOP = sh * 0.81, PLR_DMG_BTM = sh * 0.99; + DMG_TKN_TOP = sh * 0.81, DMG_TKN_BTM = sh * 0.99; + + //Create the text for rendering the terrritory, strength, and prod graphs. + stage.removeChildren(); + terText = new PIXI.Text('Territory', { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + terText.anchor = new PIXI.Point(0, 1); + terText.position = new PIXI.Point(mw + sh / 32, TER_TOP - sh * 0.005); + stage.addChild(terText); + prodText = new PIXI.Text('Production', { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + prodText.anchor = new PIXI.Point(0, 1); + prodText.position = new PIXI.Point(mw + sh / 32, PROD_TOP - sh * 0.005); + stage.addChild(prodText); + strText = new PIXI.Text('Strength', { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + strText.anchor = new PIXI.Point(0, 1); + strText.position = new PIXI.Point(mw + sh / 32, STR_TOP - sh * 0.005); + stage.addChild(strText); + if(showExtended) { + envDmgText = new PIXI.Text('Environment Damage', { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + envDmgText.anchor = new PIXI.Point(0, 1); + envDmgText.position = new PIXI.Point(mw + sh / 2.75, ENV_DMG_TOP - sh * 0.005); + stage.addChild(envDmgText); + actProdText = new PIXI.Text('Realized Production', { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + actProdText.anchor = new PIXI.Point(0, 1); + actProdText.position = new PIXI.Point(mw + sh / 2.75, ACT_PROD_TOP - sh * 0.005); + stage.addChild(actProdText); + capLossText = new PIXI.Text('Strength Loss to Cap', { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + capLossText.anchor = new PIXI.Point(0, 1); + capLossText.position = new PIXI.Point(mw + sh / 2.75, CAP_LOSS_TOP - sh * 0.005); + stage.addChild(capLossText); + plrDmgDltText = new PIXI.Text('Overkill Damage', { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + plrDmgDltText.anchor = new PIXI.Point(0, 1); + plrDmgDltText.position = new PIXI.Point(mw + sh / 32, PLR_DMG_TOP - sh * 0.005); + stage.addChild(plrDmgDltText); + dmgTknText = new PIXI.Text('Damage Taken', { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + dmgTknText.anchor = new PIXI.Point(0, 1); + dmgTknText.position = new PIXI.Point(mw + sh / 2.75, DMG_TKN_TOP - sh * 0.005); + stage.addChild(dmgTknText); + } + infoText = new PIXI.Text('Frame #' + frame.toString(), { font: (sh / 38).toString() + 'px Arial', fill: 0xffffff }); + infoText.anchor = new PIXI.Point(0, 1); + infoText.position = new PIXI.Point(mw + sh / 32, TER_TOP - sh * 0.05); + stage.addChild(infoText); + stage.addChild(graphGraphics); + } + + textStr = new Array(game.height); + textProd = new Array(game.height); + textPossess = new Array(game.height); + textReward = new Array(game.height); + textPolicy = new Array(game.height); + for (var i = 0; i < 10; i++) { + textProd[i] = new Array(game.width); + textStr[i] = new Array(game.width); + textPossess[i] = new Array(game.width); + textReward[i] = new Array(game.width); + textPolicy[i] = new Array(game.width); + for(var j = 0; j < 10; j++){ + textPolicy[i][j] = new Array(5); + } + } + loc=0 + + var sY = Math.round(yOffset); + for(var a = 0; a < game.height; a++) { + var sX = Math.round(xOffset); + for(var b = 0; b < game.width; b++) { + var sty = new PIXI.TextStyle({ + fontFamily: 'Arial', + fontSize: 40 + }); + site = game.frames[frame][Math.floor(loc / game.width)][loc % game.width]; + textStr[a][b] = new PIXI.Text(site.strength.toString(),sty); + textStr[a][b].anchor = new PIXI.Point(0.5, +1.5); + textStr[a][b].position = new PIXI.Point(rw * (sX+0.5) , rh * (sY+0.5)); + textStr[a][b].style.fill = "#ffffff"//"#f54601"; + + textProd[a][b] = new PIXI.Text((10*game.productionNormals[Math.floor(loc / game.width)][loc % game.width]).toString(),sty) + textProd[a][b].anchor = new PIXI.Point(0.5, -0.5); + textProd[a][b].position = new PIXI.Point(rw * (sX+0.5) , rh * (sY+0.5)); + textProd[a][b].style.fill = "#ffffff"; + + textPossess[a][b] = new PIXI.Text(site.owner.toString(),sty) + textPossess[a][b].anchor = new PIXI.Point(0.5, 0.5); + textPossess[a][b].position = new PIXI.Point(rw * (sX+0.5) , rh * (sY+0.5)); + textPossess[a][b].style.fill = "#ffffff"; + + textReward[a][b] = new PIXI.Text(site.owner.toString(),sty) + textReward[a][b].anchor = new PIXI.Point(0.5, 0.5); + textReward[a][b].position = new PIXI.Point(rw * (sX+0.5) , rh * (sY+0.5)); + textReward[a][b].style.fill = "#ffffff"; + + var style_2 = new PIXI.TextStyle({ + fontFamily: 'Roboto', + fontSize: 10 + }); + for(var j = 0; j < 5; j++){ + textPolicy[a][b][j] = new PIXI.Text(site.owner.toString(),style_2) + textPolicy[a][b][j].position = new PIXI.Point(rw * (sX+0.5) , rh * (sY+0.5)); + textPolicy[a][b][j].style.fill = "#ABD4FF"; + } + //NORTH, EAST, SOUTH, WEST, STILL + textPolicy[a][b][0].anchor = new PIXI.Point(0.5, +2.0); + textPolicy[a][b][1].anchor = new PIXI.Point(-1.0, 0.5); + textPolicy[a][b][2].anchor = new PIXI.Point(0.5, -1.0); + textPolicy[a][b][3].anchor = new PIXI.Point(2.0, 0.5); + textPolicy[a][b][4].anchor = new PIXI.Point(0.5, 0.5); + + + prodContainer.addChild(textProd[a][b]) + strengthContainer.addChild(textStr[a][b]) + possessContainer.addChild(textPossess[a][b]) + rewardContainer.addChild(textReward[a][b]) + for(var j = 0; j < 5; j++) { + policyContainer.addChild(textPolicy[a][b][j]) + } + loc++; + sX++; + if(sX == game.width) sX = 0; + } + sY++; + if(sY == game.height) sY = 0; + } + + // function postData(input) { + // $.ajax({ + // type: "POST", + // url: "http://127.0.0.1:5000/",//http://127.0.0.1:8080/reward.py + // data: { param: input }, + // success: callbackFunc + // }); + // } + + // function callbackFunc(response) { + // // do something with the response + // console.log(response); + // } + + // postData('data to process'); + + + // $.getJSON('/getpythondata', JSON.stringify(game), function(data) { + // console.log(data) + // }); + // $.ajax({ + // type: "POST", + // contentType: "application/json", + // url: '/getpythondata', + // data: { name: 'norm' }, + // dataType: "json" + // // }); + // $.post( "/getpythondata", { + // javascript_data: data + // }); + + + stage.addChild(mapGraphics); + //stage.addChild(prodContainer); + //stage.addChild(strengthContainer); + //stage.addChild(possessContainer); + //stage.addChild(rewardContainer); + stage.addChild(policyContainer); + console.log(renderer.width, renderer.height); + } + window.onresize(); + + var manager = new PIXI.interaction.InteractionManager(renderer); + var mousePressed = false; + document.onmousedown = function(e) { + mousePressed = true; + }; + document.onmouseup = function(e) { + mousePressed = false; + }; + + renderer.animateFunction = animate; + requestAnimationFrame(animate); + + var pressed={}; + document.onkeydown=function(e){ + e = e || window.event; + pressed[e.keyCode] = true; + if(e.keyCode == 32) { //Space + shouldplay = !shouldplay; + } + else if(e.keyCode == 69) { //e + showExtended = !showExtended; + mapGraphics.clear(); + graphGraphics.clear(); + renderer.render(stage); + window.onresize(); + } + else if(e.keyCode == 90) { //z + frame = 0; + transit = 0; + } + else if(e.keyCode == 88) { //x + frame = game.num_frames - 1; + transit = 0; + } + else if(e.keyCode == 188) { //, + if(transit == 0) frame--; + else transit = 0; + if(frame < 0) frame = 0; + shouldplay = false; + } + else if(e.keyCode == 190) { //. + frame++; + transit = 0; + if(frame >= game.num_frames - 1) frame = game.num_frames - 1; + shouldplay = false; + } + else if(e.keyCode == 65 || e.keyCode == 68 || e.keyCode == 87 || e.keyCode == 83) { //wasd + xOffset = Math.round(xOffset); + yOffset = Math.round(yOffset); + } + else if(e.keyCode == 79) { //o + xOffset = 0; + yOffset = 0; + } + else if(e.keyCode == 187 || e.keyCode == 107) { //= or + + zoom *= 1.41421356237; + if(game.num_frames / zoom < 3) zoom = game.num_frames / 3; + } + else if(e.keyCode == 189 || e.keyCode == 109) { //- or - (dash or subtract) + zoom /= 1.41421356237; + if(zoom < 1) zoom = 1; + } + else if(e.keyCode == 49) { //1 + framespersec = 1; + } + else if(e.keyCode == 50) { //2 + framespersec = 3; + } + else if(e.keyCode == 51) { //3 + framespersec = 6; + } + else if(e.keyCode == 52) { //4 + framespersec = 10; + } + else if(e.keyCode == 53) { //5 + framespersec = 15; + } + } + + document.onkeyup=function(e){ + e = e || window.event; + delete pressed[e.keyCode]; + } + + var lastTime = Date.now(); + + function interpolate(c1, c2, v) { + var c = { r: v * c2.r + (1 - v) * c1.r, g: v * c2.g + (1 - v) * c1.g, b: v * c2.b + (1- v) * c1.b }; + function compToHex(c) { var hex = c.toString(16); return hex.length == 1 ? "0" + hex : hex; }; + return "0x" + compToHex(Math.round(c.r)) + compToHex(Math.round(c.g)) + compToHex(Math.round(c.b)); + } + + function animate() { + + if(renderer.animateFunction !== animate) { return; } + + if(!isminimal) { + //Clear graphGraphics so that we can redraw freely. + graphGraphics.clear(); + + //Draw the graphs. + var nf = Math.round(game.num_frames / zoom), graphMidFrame = frame; + var nf2 = Math.floor(nf / 2); + if(graphMidFrame + nf2 >= game.num_frames) graphMidFrame -= ((nf2 + graphMidFrame) - game.num_frames); + else if(Math.ceil(graphMidFrame - nf2) < 0) graphMidFrame = nf2; + var firstFrame = graphMidFrame - nf2, lastFrame = graphMidFrame + nf2; + if(firstFrame < 0) firstFrame = 0; + if(lastFrame >= game.num_frames) lastFrame = game.num_frames - 1; + nf = lastFrame - firstFrame; + var dw = (LEFT_GRAPH_RIGHT - LEFT_GRAPH_LEFT) / (nf); + //Normalize values with respect to the range of frames seen by the graph. + var maxTer = 0, maxProd = 0, maxStr = 0, maxActProd = 0; + var maxPlrDmgDlt = 0, maxEnvDmgDlt = 0, maxDmgTkn = 0, maxCapLoss = 0; + for(var a = 1; a <= game.num_players; a++) { + for(var b = firstFrame; b <= lastFrame; b++) { + if(game.players[a].territories[b] > maxTer) maxTer = game.players[a].territories[b] * 1.01; + if(game.players[a].productions[b] > maxProd) maxProd = game.players[a].productions[b] * 1.01; + if(game.players[a].strengths[b] > maxStr) maxStr = game.players[a].strengths[b] * 1.01; + if(game.players[a].actualProduction[b] > maxActProd) maxActProd = game.players[a].actualProduction[b] * 1.01; + if(game.players[a].playerDamageDealt[b] > maxPlrDmgDlt) maxPlrDmgDlt = game.players[a].playerDamageDealt[b] * 1.01; + if(game.players[a].environmentDamageDealt[b] > maxEnvDmgDlt) maxEnvDmgDlt = game.players[a].environmentDamageDealt[b] * 1.01; + if(game.players[a].damageTaken[b] > maxDmgTkn) maxDmgTkn = game.players[a].damageTaken[b] * 1.01; + if(game.players[a].capLosses[b] > maxCapLoss) maxCapLoss = game.players[a].capLosses[b] * 1.01; + } + } + function drawGraph(left, top, bottom, data, maxData) { + graphGraphics.moveTo(left, (top - bottom) * data[firstFrame] / maxData + bottom); + for(var b = firstFrame + 1; b <= lastFrame; b++) { + graphGraphics.lineTo(left + dw * (b - firstFrame), (top - bottom) * data[b] / maxData + bottom); + } + } + for(var a = 1; a <= game.num_players; a++) { + graphGraphics.lineStyle(1, game.players[a].color); + //Draw ter graph. + drawGraph(LEFT_GRAPH_LEFT, TER_TOP, TER_BTM, game.players[a].territories, maxTer); + //Draw prod graph. + drawGraph(LEFT_GRAPH_LEFT, PROD_TOP, PROD_BTM, game.players[a].productions, maxProd); + //Draw str graph. + drawGraph(LEFT_GRAPH_LEFT, STR_TOP, STR_BTM, game.players[a].strengths, maxStr); + if(showExtended) { + //Draw env dmg graph. + drawGraph(RIGHT_GRAPH_LEFT, ENV_DMG_TOP, ENV_DMG_BTM, game.players[a].environmentDamageDealt, maxEnvDmgDlt); + //Draw act prod graph. + drawGraph(RIGHT_GRAPH_LEFT, ACT_PROD_TOP, ACT_PROD_BTM, game.players[a].actualProduction, maxActProd); + //Draw str loss graph. + drawGraph(RIGHT_GRAPH_LEFT, CAP_LOSS_TOP, CAP_LOSS_BTM, game.players[a].capLosses, maxCapLoss); + //Draw plr dmg dealt. + drawGraph(LEFT_GRAPH_LEFT, PLR_DMG_TOP, PLR_DMG_BTM, game.players[a].playerDamageDealt, maxPlrDmgDlt); + //Draw damage taken. + drawGraph(RIGHT_GRAPH_LEFT, DMG_TKN_TOP, DMG_TKN_BTM, game.players[a].damageTaken, maxDmgTkn); + } + } + //Draw borders. + graphGraphics.lineStyle(1, '0xffffff'); + function drawGraphBorder(left, right, top, bottom) { + graphGraphics.moveTo(left + dw * (frame - firstFrame), top); + graphGraphics.lineTo(left + dw * (frame - firstFrame), bottom); + if((frame - firstFrame) > 0) graphGraphics.lineTo(left, bottom); //Deals with odd disappearing line.; + graphGraphics.lineTo(left, top); + graphGraphics.lineTo(right, top); + graphGraphics.lineTo(right, bottom); + graphGraphics.lineTo(left + dw * (frame - firstFrame), bottom); + } + + //Draw ter border. + drawGraphBorder(LEFT_GRAPH_LEFT, LEFT_GRAPH_RIGHT, TER_TOP, TER_BTM); + //Draw prod border. + drawGraphBorder(LEFT_GRAPH_LEFT, LEFT_GRAPH_RIGHT, PROD_TOP, PROD_BTM); + //Draw str border. + drawGraphBorder(LEFT_GRAPH_LEFT, LEFT_GRAPH_RIGHT, STR_TOP, STR_BTM); + if(showExtended) { + //Draw env dmg border. + drawGraphBorder(RIGHT_GRAPH_LEFT, RIGHT_GRAPH_RIGHT, ENV_DMG_TOP, ENV_DMG_BTM); + //Draw act prod border. + drawGraphBorder(RIGHT_GRAPH_LEFT, RIGHT_GRAPH_RIGHT, ACT_PROD_TOP, ACT_PROD_BTM); + //Draw str loss border. + drawGraphBorder(RIGHT_GRAPH_LEFT, RIGHT_GRAPH_RIGHT, CAP_LOSS_TOP, CAP_LOSS_BTM); + //Draw plr damage dealt. + drawGraphBorder(LEFT_GRAPH_LEFT, LEFT_GRAPH_RIGHT, PLR_DMG_TOP, PLR_DMG_BTM); + //Draw plr damage taken. + drawGraphBorder(RIGHT_GRAPH_LEFT, RIGHT_GRAPH_RIGHT, DMG_TKN_TOP, DMG_TKN_BTM); + } + //Draw frame/ter text seperator. + graphGraphics.moveTo(LEFT_GRAPH_LEFT, TER_TOP - sh * 0.045); + graphGraphics.lineTo(RIGHT_GRAPH_RIGHT, TER_TOP - sh * 0.045); + } + + //Clear mapGraphics so that we can redraw freely. + mapGraphics.clear(); + + if(pressed[80]) { //Render productions. Don't update frames or transits. [Using p now for testing] + var loc = 0; + var pY = Math.round(yOffset); + for(var a = 0; a < game.height; a++) { + var pX = Math.round(xOffset); + for(var b = 0; b < game.width; b++) { + // VISU + if(game.productionNormals[Math.floor(loc / game.width)][loc % game.width] < 0.33333) mapGraphics.beginFill(interpolate({ r: 40, g: 40, b: 40 }, { r: 128, g: 80, b: 144 }, game.productionNormals[Math.floor(loc / game.width)][loc % game.width] * 3)); + else if(game.productionNormals[Math.floor(loc / game.width)][loc % game.width] < 0.66667) mapGraphics.beginFill(interpolate({ r: 128, g: 80, b: 144 }, { r: 176, g: 48, b: 48 }, game.productionNormals[Math.floor(loc / game.width)][loc % game.width] * 3 - 1)); + else mapGraphics.beginFill(interpolate({ r: 176, g: 48, b: 48 }, { r: 255, g: 240, b: 16 }, game.productionNormals[Math.floor(loc / game.width)][loc % game.width] * 3 - 2)); + mapGraphics.drawRect(rw * pX, rh * pY, rw, rh); + mapGraphics.endFill(); + loc++; + pX++; + if(pX == game.width) pX = 0; + } + pY++; + if(pY == game.height) pY = 0; + } + } + else { //Render game and update frames and transits. + var loc = 0; + var tY = Math.round(yOffset); + for(var a = 0; a < game.height; a++) { + var tX = Math.round(xOffset); + for(var b = 0; b < game.width; b++) { + var site = game.frames[frame][Math.floor(loc / game.width)][loc % game.width]; + mapGraphics.beginFill(game.players[site.owner].color, game.productionNormals[Math.floor(loc / game.width)][loc % game.width] * 0.4 + 0.15); + mapGraphics.drawRect(rw * tX, rh * tY, rw, rh); // Production + mapGraphics.endFill(); + loc++; + tX++; + if(tX == game.width) tX = 0; + } + tY++; + if(tY == game.height) tY = 0; + } + + var t = showmovement ? (-Math.cos(transit * Math.PI) + 1) / 2 : 0; + loc = 0; + var sY = Math.round(yOffset); + for(var a = 0; a < game.height; a++) { + var sX = Math.round(xOffset); + for(var b = 0; b < game.width; b++) { + var site = game.frames[frame][Math.floor(loc / game.width)][loc % game.width]; + if(site.strength == 255) mapGraphics.lineStyle(1, '0xfffff0'); + if(site.strength != 0) mapGraphics.beginFill(game.players[site.owner].color); + textStr[a][b].text = site.strength.toString() + textPossess[a][b].text = site.owner.toString() + textProd[a][b].style.fill = (site.owner.toString()=="1")?"#04e6f2":"#ffffff"; + + textReward[a][b].text =(discountedRewards!= undefined)?discountedRewards[frame][Math.floor(loc / game.width)][loc % game.width]:''; + + + //policies[a][b].text = policies[frame][a][b] In fact there are five... + var debug_direction = ["NORTH", "EAST", "SOUTH", "WEST", "STILL"] + for(var i = 0; i < 5; i++) { + var value = (policies!= undefined)?policies[frame][a][b][i].toExponential(1):0 + textPolicy[a][b][i].text = (value==0)?'':value.toString() + //textPolicy[a][b][i].text = (value==0)?'':debug_direction[i] + } + + + + //console.log(discountedRewards[frame][Math.floor(loc / game.width)][loc % game.width]) + var pw = rw * Math.sqrt(site.strength > 0 ? site.strength / 255 : 0.1) / 2 + var ph = rh * Math.sqrt(site.strength > 0 ? site.strength / 255 : 0.1) / 2; + var direction = frame < game.moves.length ? game.moves[frame][Math.floor(loc / game.width)][loc % game.width] : 0; + var move = t > 0 ? direction : 0; + var sY2 = move == 1 ? sY - 1 : move == 3 ? sY + 1 : sY; + var sX2 = move == 2 ? sX + 1 : move == 4 ? sX - 1 : sX; + if(site.strength == 0 && direction != 0) mapGraphics.lineStyle(1, '0x888888') + var center = new PIXI.Point(rw * ((t * sX2 + (1 - t) * sX) + 0.5), rh * ((t * sY2 + (1 - t) * sY) + 0.5)); + var pts = new Array(); + const squarescale = 0.75; + pts.push(new PIXI.Point(center.x + squarescale * pw, center.y + squarescale * ph)); + pts.push(new PIXI.Point(center.x + squarescale * pw, center.y - squarescale * ph)); + pts.push(new PIXI.Point(center.x - squarescale * pw, center.y - squarescale * ph)); + pts.push(new PIXI.Point(center.x - squarescale * pw, center.y + squarescale * ph)); + mapGraphics.drawPolygon(pts); + if(site.strength != 0) mapGraphics.endFill(); + mapGraphics.lineStyle(0, '0xffffff'); + loc++; + sX++; + if(sX == game.width) sX = 0; + } + sY++; + if(sY == game.height) sY = 0; + } + + var time = Date.now(); + var dt = time - lastTime; + lastTime = time; + + // If we are embedding a game, + // we want people to be able to scroll with + // the arrow keys + if(!isminimal) { + //Update frames per sec if up or down arrows are pressed. + if(pressed[38]) { + framespersec += 0.05; + } else if(pressed[40]) { + framespersec -= 0.05; + } + } + + if(pressed[39]) { + transit = 0; + frame++; + } + else if(pressed[37]) { + if(transit != 0) transit = 0; + else frame--; + } + else if(shouldplay) { + transit += dt / 1000 * framespersec; + } + } + + if(!isminimal) { + //Update info text: + var mousepos = manager.mouse.global; + if(mousepos.x < 0 || mousepos.x > sw || mousepos.y < 0 || mousepos.y > sh) { //Mouse is not over renderer. + infoText.text = 'Frame #' + frame.toString(); + } + else if(!mousePressed) { + infoText.text = 'Frame #' + frame.toString(); + if(mousepos.x < mw && mousepos.y < mh) { //Over map + var x = (Math.floor(mousepos.x / rw) - xOffset) % game.width, y = (Math.floor(mousepos.y / rh) - yOffset) % game.height; + if(x < 0) x += game.width; + if(y < 0) y += game.height; + infoText.text += ' | Loc: ' + x.toString() + ',' + y.toString(); + } + } + else { //Mouse is clicked and over renderer. + if(mousepos.x < mw && mousepos.y < mh) { //Over map: + var x = (Math.floor(mousepos.x / rw) - xOffset) % game.width, y = (Math.floor(mousepos.y / rh) - yOffset) % game.height; + if(x < 0) x += game.width; + if(y < 0) y += game.height; + str = game.frames[frame][y][x].strength; + prod = game.productions[y][x]; + infoText.text = 'Str: ' + str.toString() + ' | Prod: ' + prod.toString(); + if(frame < game.moves.length && game.frames[frame][y][x].owner != 0) { + move = game.moves[frame][y][x]; + if(move >= 0 && move < 5) { + move = "0NESW"[move]; + } + infoText.text += ' | Mv: ' + move.toString(); + } + } + else if(mousepos.x < RIGHT_GRAPH_RIGHT && mousepos.x > LEFT_GRAPH_LEFT) { + frame = firstFrame + Math.round((mousepos.x - LEFT_GRAPH_LEFT) / dw); + if(frame < 0) frame = 0; + if(frame >= game.num_frames) frame = game.num_frames - 1; + transit = 0; + if(mousepos.y > TER_TOP & mousepos.y < TER_BTM) { + } + } + } + } + + //Advance frame if transit moves far enough. Ensure all are within acceptable bounds. + while(transit >= 1) { + transit--; + frame++; + } + if(frame >= game.num_frames - 1) { + frame = game.num_frames - 1; + transit = 0; + } + while(transit < 0) { + transit++; + frame--; + } + if(frame < 0) { + frame = 0; + transit = 0; + } + + //Pan if desired. + const PAN_SPEED = 1; + if(pressed[65]) xOffset += PAN_SPEED; + if(pressed[68]) xOffset -= PAN_SPEED + if(pressed[87]) yOffset += PAN_SPEED; + if(pressed[83]) yOffset -= PAN_SPEED; + + //Reset pan to be in normal bounds: + if(Math.round(xOffset) >= game.width) xOffset -= game.width; + else if(Math.round(xOffset) < 0) xOffset += game.width; + if(Math.round(yOffset) >= game.height) yOffset -= game.height; + else if(Math.round(yOffset) < 0) yOffset += game.height; + + //Actually render. + renderer.render(stage); + + //Of course, we want to render in the future as well. + var idle = (Object.keys(pressed).length === 0) && !shouldplay; + setTimeout(function() { + requestAnimationFrame(animate); + }, 1000 / (idle ? 20.0 : 80.0)); + } +} \ No newline at end of file diff --git a/visualize/templates/reward.py b/visualize/templates/reward.py new file mode 100644 index 0000000..6df6140 --- /dev/null +++ b/visualize/templates/reward.py @@ -0,0 +1,10 @@ +import csv +from numpy import genfromtxt +from numpy import matrix + +def main(): + return 1 + +if __name__ == "__main__": + x=main() + return x; \ No newline at end of file diff --git a/visualize/templates/visualizer.html b/visualize/templates/visualizer.html new file mode 100644 index 0000000..95d9de6 --- /dev/null +++ b/visualize/templates/visualizer.html @@ -0,0 +1,28 @@ + + + + + Visualizer + + + + + +
+
+

Drag your file


+
+
+ + + + + + + + + + + \ No newline at end of file diff --git a/visualize/visualize.py b/visualize/visualize.py new file mode 100644 index 0000000..fdd1b7f --- /dev/null +++ b/visualize/visualize.py @@ -0,0 +1,89 @@ +import json +import os +import sys +from io import BytesIO + +import matplotlib.pyplot as plt +import numpy as np +from flask import Flask, render_template, request, make_response +from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas +from matplotlib.figure import Figure + +app = Flask(__name__) + +sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) +from train.reward import discountedRewards +from public.models.bot.trainedBot import TrainedBot + + +@app.route("/") +def home(): + return render_template('visualizer.html') + + +print("Look at http://127.0.0.1:5000/performance.png for performance insights") + + +@app.route("/performance.png") +def performance_plot(): + fig = Figure() + sub1 = fig.add_subplot(111) + import pandas as pd + path_to_variables = os.path.abspath(os.path.dirname(__file__)) + '/../public/models/variables/' + list_variables = [name for name in os.listdir(path_to_variables) if name != "README.md"] + path_to_npy = [path_to_variables + name + '/' + name + '.npy' for name in list_variables] + + rewards = [np.load(path) for path in path_to_npy] + + max_len = max([len(reward) for reward in rewards]) + for i in range(len(rewards)): + rewards[i] = np.append(rewards[i], np.repeat(np.nan, max_len - len(rewards[i]))) + + pd.DataFrame(np.array(rewards).T, columns=list_variables).rolling(100).mean().plot( + title="Weighted reward at each game. (Rolling average)", ax=sub1) + + plt.show() + canvas = FigureCanvas(fig) + png_output = BytesIO() + canvas.print_png(png_output) + response = make_response(png_output.getvalue()) + response.headers['Content-Type'] = 'image/png' + return response + + +def convert(request): + def get_owner(square): + return square['owner'] + + def get_strength(square): + return square['strength'] + + get_owner = np.vectorize(get_owner) + get_strength = np.vectorize(get_strength) + owner_frames = get_owner(request.json["frames"])[:, np.newaxis, :, :] + strength_frames = get_strength(request.json["frames"])[:, np.newaxis, :, :] + production_frames = np.repeat(np.array(request.json["productions"])[np.newaxis, np.newaxis, :, :], + len(owner_frames), + axis=0) + + moves = np.array(request.json['moves']) + + game_states = np.concatenate(([owner_frames, strength_frames, production_frames]), axis=1) + + moves = ((-5 + 5 * game_states[:-1, 0, :]) + ((moves - 1) % 5)) + return game_states, moves + + +@app.route('/post_discounted_rewards', methods=['POST']) +def post_discounted_rewards(): + game_states, moves = convert(request) + discounted_rewards = discountedRewards(game_states, moves) + return json.dumps({'discountedRewards': discounted_rewards.tolist()}) + + +@app.route('/post_policies', methods=['POST']) +def post_policies(): + game_states, _ = convert(request) + bot = TrainedBot() + policies = np.array([bot.get_policies(game_state) for game_state in game_states]) + return json.dumps({'policies': policies.tolist()}) From e0f3883ca9af3d4bb9348e1a2d5cb6f9c62081a2 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 4 Oct 2017 19:14:41 +0200 Subject: [PATCH 31/45] Visualizer improvement This change addresses the need by: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * Remove bug at end of visualiser that prevented playback. * Remove Children Of Container * Add make commands * Introduce the notion of pipe_players and slave_players. Slave players are automatically handled by the Halite program, whereas pipe_players are handle by the player himself. * Remove tensorflow warnings that prevented MyBot to be a “slave_player” --- Makefile | 26 +++- docs/Makefile | 230 ---------------------------- docs/README.md | 16 +- networking/start_game.py | 36 +++-- public/MyBot.py | 5 +- public/OpponentBot.py | 5 +- public/models/bot/trainedBot.py | 1 + src/core/Halite.cpp | 10 +- visualize/static/localVisualizer.js | 0 visualize/static/parsereplay.js | 0 visualize/static/visualizer.js | 79 ++++------ visualize/templates/reward.py | 10 -- visualize/templates/visualizer.html | 0 visualize/visualize.py | 0 14 files changed, 110 insertions(+), 308 deletions(-) delete mode 100644 docs/Makefile mode change 100644 => 100755 visualize/static/localVisualizer.js mode change 100644 => 100755 visualize/static/parsereplay.js mode change 100644 => 100755 visualize/static/visualizer.js delete mode 100644 visualize/templates/reward.py mode change 100644 => 100755 visualize/templates/visualizer.html mode change 100644 => 100755 visualize/visualize.py diff --git a/Makefile b/Makefile index 5ecfaae..6a6d9c4 100644 --- a/Makefile +++ b/Makefile @@ -4,4 +4,28 @@ all: .PHONY: clean clean: - rm public/halite; cd src/; make clean; cd ..; \ No newline at end of file + rm *.hlt *.log public/halite; cd src/; make clean; cd ..; + +.PHONY: sync-nefeli +sync-nefeli: + rsync -a --exclude 'public/halite' --exclude '*.o' . mehlman@nefeli.math-info.univ-paris5.fr:/home/mehlman/Halite-Python-RL/ #--delete + +.PHONY: get-nefeli +get-nefeli: + rsync -a --exclude 'public/halite' --exclude '*.o' mehlman@nefeli.math-info.univ-paris5.fr:/home/mehlman/Halite-Python-RL/ . #--delete + +.PHONY: sync-solon +sync-solon: + rsync -a --exclude 'public/halite' --exclude '*.o' . solon:/home/mehlman/Halite-Python-RL/ #--delete + +.PHONY: get-solon +get-solon: + rsync -a --exclude 'public/halite' --exclude '*.o' solon:/home/mehlman/Halite-Python-RL/ . #--delete + +.PHONY: clear-agent +clear-agent: + rm -r './public/models/variables/$(AGENT)' + +.PHONY: server +server: + cd visualize;export FLASK_APP=visualize.py;flask run \ No newline at end of file diff --git a/docs/Makefile b/docs/Makefile deleted file mode 100644 index 7fdf53c..0000000 --- a/docs/Makefile +++ /dev/null @@ -1,230 +0,0 @@ -# Makefile for Sphinx documentation -# - -# You can set these variables from the command line. -SPHINXOPTS = -SPHINXBUILD = sphinx-build -PAPER = -BUILDDIR = _build - -# User-friendly check for sphinx-build -ifeq ($(shell which $(SPHINXBUILD) >/dev/null 2>&1; echo $$?), 1) - $(error The '$(SPHINXBUILD)' command was not found. Make sure you have Sphinx installed, then set the SPHINXBUILD environment variable to point to the full path of the '$(SPHINXBUILD)' executable. Alternatively you can add the directory with the executable to your PATH. If you don\'t have Sphinx installed, grab it from http://sphinx-doc.org/) -endif - -# Internal variables. -PAPEROPT_a4 = -D latex_paper_size=a4 -PAPEROPT_letter = -D latex_paper_size=letter -ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . -# the i18n builder cannot share the environment and doctrees with the others -I18NSPHINXOPTS = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) . - -.PHONY: help -help: - @echo "Please use \`make ' where is one of" - @echo " html to make standalone HTML files" - @echo " dirhtml to make HTML files named index.html in directories" - @echo " singlehtml to make a single large HTML file" - @echo " pickle to make pickle files" - @echo " json to make JSON files" - @echo " htmlhelp to make HTML files and a HTML help project" - @echo " qthelp to make HTML files and a qthelp project" - @echo " applehelp to make an Apple Help Book" - @echo " devhelp to make HTML files and a Devhelp project" - @echo " epub to make an epub" - @echo " epub3 to make an epub3" - @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter" - @echo " latexpdf to make LaTeX files and run them through pdflatex" - @echo " latexpdfja to make LaTeX files and run them through platex/dvipdfmx" - @echo " text to make text files" - @echo " man to make manual pages" - @echo " texinfo to make Texinfo files" - @echo " info to make Texinfo files and run them through makeinfo" - @echo " gettext to make PO message catalogs" - @echo " changes to make an overview of all changed/added/deprecated items" - @echo " xml to make Docutils-native XML files" - @echo " pseudoxml to make pseudoxml-XML files for display purposes" - @echo " linkcheck to check all external links for integrity" - @echo " doctest to run all doctests embedded in the documentation (if enabled)" - @echo " coverage to run coverage check of the documentation (if enabled)" - @echo " dummy to check syntax errors of document sources" - -.PHONY: clean -clean: - rm -rf $(BUILDDIR)/* - -.PHONY: html -html: - $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html - @echo - @echo "Build finished. The HTML pages are in $(BUILDDIR)/html." - -.PHONY: dirhtml -dirhtml: - $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml - @echo - @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml." - -.PHONY: singlehtml -singlehtml: - $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml - @echo - @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml." - -.PHONY: pickle -pickle: - $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle - @echo - @echo "Build finished; now you can process the pickle files." - -.PHONY: json -json: - $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json - @echo - @echo "Build finished; now you can process the JSON files." - -.PHONY: htmlhelp -htmlhelp: - $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp - @echo - @echo "Build finished; now you can run HTML Help Workshop with the" \ - ".hhp project file in $(BUILDDIR)/htmlhelp." - -.PHONY: qthelp -qthelp: - $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp - @echo - @echo "Build finished; now you can run "qcollectiongenerator" with the" \ - ".qhcp project file in $(BUILDDIR)/qthelp, like this:" - @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/Halite-Python-RL.qhcp" - @echo "To view the help file:" - @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/Halite-Python-RL.qhc" - -.PHONY: applehelp -applehelp: - $(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp - @echo - @echo "Build finished. The help book is in $(BUILDDIR)/applehelp." - @echo "N.B. You won't be able to view it unless you put it in" \ - "~/Library/Documentation/Help or install it in your application" \ - "bundle." - -.PHONY: devhelp -devhelp: - $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp - @echo - @echo "Build finished." - @echo "To view the help file:" - @echo "# mkdir -p $$HOME/.local/share/devhelp/Halite-Python-RL" - @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/Halite-Python-RL" - @echo "# devhelp" - -.PHONY: epub -epub: - $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub - @echo - @echo "Build finished. The epub file is in $(BUILDDIR)/epub." - -.PHONY: epub3 -epub3: - $(SPHINXBUILD) -b epub3 $(ALLSPHINXOPTS) $(BUILDDIR)/epub3 - @echo - @echo "Build finished. The epub3 file is in $(BUILDDIR)/epub3." - -.PHONY: latex -latex: - $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex - @echo - @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex." - @echo "Run \`make' in that directory to run these through (pdf)latex" \ - "(use \`make latexpdf' here to do that automatically)." - -.PHONY: latexpdf -latexpdf: - $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex - @echo "Running LaTeX files through pdflatex..." - $(MAKE) -C $(BUILDDIR)/latex all-pdf - @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." - -.PHONY: latexpdfja -latexpdfja: - $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex - @echo "Running LaTeX files through platex and dvipdfmx..." - $(MAKE) -C $(BUILDDIR)/latex all-pdf-ja - @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex." - -.PHONY: text -text: - $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text - @echo - @echo "Build finished. The text files are in $(BUILDDIR)/text." - -.PHONY: man -man: - $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man - @echo - @echo "Build finished. The manual pages are in $(BUILDDIR)/man." - -.PHONY: texinfo -texinfo: - $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo - @echo - @echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo." - @echo "Run \`make' in that directory to run these through makeinfo" \ - "(use \`make info' here to do that automatically)." - -.PHONY: info -info: - $(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo - @echo "Running Texinfo files through makeinfo..." - make -C $(BUILDDIR)/texinfo info - @echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo." - -.PHONY: gettext -gettext: - $(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale - @echo - @echo "Build finished. The message catalogs are in $(BUILDDIR)/locale." - -.PHONY: changes -changes: - $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes - @echo - @echo "The overview file is in $(BUILDDIR)/changes." - -.PHONY: linkcheck -linkcheck: - $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck - @echo - @echo "Link check complete; look for any errors in the above output " \ - "or in $(BUILDDIR)/linkcheck/output.txt." - -.PHONY: doctest -doctest: - $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest - @echo "Testing of doctests in the sources finished, look at the " \ - "results in $(BUILDDIR)/doctest/output.txt." - -.PHONY: coverage -coverage: - $(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage - @echo "Testing of coverage in the sources finished, look at the " \ - "results in $(BUILDDIR)/coverage/python.txt." - -.PHONY: xml -xml: - $(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml - @echo - @echo "Build finished. The XML files are in $(BUILDDIR)/xml." - -.PHONY: pseudoxml -pseudoxml: - $(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml - @echo - @echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml." - -.PHONY: dummy -dummy: - $(SPHINXBUILD) -b dummy $(ALLSPHINXOPTS) $(BUILDDIR)/dummy - @echo - @echo "Build finished. Dummy builder generates no files." diff --git a/docs/README.md b/docs/README.md index bdfed08..7924f40 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,3 +1,11 @@ +--- +title: Sidebar Navigation +summary: "My man!" +sidebar: mydoc_sidebar +permalink: mydoc_sidebar_navigation.html +folder: mydoc +--- + # Documentation Go read the documentation [here](https://edouard360.github.io/Halite-Python-RL/). @@ -28,4 +36,10 @@ Then either: Look at http://127.0.0.1:5000/performance.png for performance insights. -Or at http://127.0.0.1:5000/ for games replay. \ No newline at end of file +Or at http://127.0.0.1:5000/ for games replay. + +## Working with PyCharm + +To run the Bot in Pycharm, you should provide a **mute** argument, since `MyBot.py` needs to know it's not on the Halite server, but running locally. + +Go to edit configuration and add the script argument 2000 (It could be any other number). \ No newline at end of file diff --git a/networking/start_game.py b/networking/start_game.py index e29be76..750539e 100644 --- a/networking/start_game.py +++ b/networking/start_game.py @@ -2,9 +2,12 @@ import argparse import os -def start_game(port, dim=10, max_strength=25, max_turn=25, max_game=1,silent_bool=True, timeout=True, quiet=True): + +def start_game(port=2000, dim=10, max_strength=25, max_turn=25, max_game=1, silent_bool=True, timeout=True, quiet=True, + n_pipe_players=1, slave_players=[]): path_to_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) - subprocess.call([path_to_root + "/networking/kill.sh", str(port)]) + for i in range(n_pipe_players): + subprocess.call([path_to_root + "/networking/kill.sh", str(port + i)]) # Free the necessary ports # subprocess.call([path_to_root + "/networking/kill.sh", str(port+1)]) # TODO automatic call to subprocess halite = path_to_root + '/public/halite ' dimensions = '-d "' + str(dim) + ' ' + str(dim) + '" ' @@ -15,29 +18,40 @@ def start_game(port, dim=10, max_strength=25, max_turn=25, max_game=1,silent_boo silent_bool = '-j ' if silent_bool else '' timeout = '-t ' if timeout else '' quiet = '-q ' if quiet else '' - players = [ - "python3 " + path_to_root + "/networking/pipe_socket_translator.py " + str(port), + pipe_players = [ + "python3 " + path_to_root + "/networking/pipe_socket_translator.py " + str(port + i) for i in + range(n_pipe_players) ] + slave_players = [ + "python3 " + path_to_root + "/public/" + slave_player + ' slave' for slave_player in slave_players + ] # slave is the slave argument + players = pipe_players + slave_players # "python3 " + path_to_root + "/networking/pipe_socket_translator.py " + str(port+1) n_player = '' if len(players) > 1 else '-n 1 ' players = '"' + '" "'.join(players) + '"' - + print(players) print("Launching process") - subprocess.call(halite + dimensions + n_player + max_strength + max_turn + silent_bool + timeout + quiet + max_game +players, - shell=True) + + subprocess.call( + halite + dimensions + n_player + max_strength + max_turn + silent_bool + timeout + quiet + max_game + players, + shell=True) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument("-p", "--port", type=int, help="the port for the simulation", default=2000) parser.add_argument("-t", "--timeout", help="timeout", action="store_true", default=False) - parser.add_argument("-j", "--silent", help="silent", action="store_true", default=False) + parser.add_argument("-j", "--silent", help="Doesn't print *.hlt file", action="store_true", default=False) parser.add_argument("-q", "--quiet", help="quiet", action="store_true", default=False) parser.add_argument("-s", "--strength", help="max strength", type=int, default=25) parser.add_argument("-d", "--dimension", help="max dimension", type=int, default=10) parser.add_argument("-m", "--maxturn", help="max turn", type=int, default=25) - parser.add_argument("-g", "--maxgame", help="max game", type=int, default=1) # -1 for infinite game + parser.add_argument("-g", "--maxgame", help="max game", type=int, default=1) # -1 for infinite game + parser.add_argument("-pp", "--n_pipe_players", type=int, default=0) + parser.add_argument("-sp", "--slave_players", nargs='+', default=[]) args = parser.parse_args() - start_game(port = args.port, dim=args.dimension, max_strength=args.strength, max_turn=args.maxturn, - silent_bool=args.silent, timeout=args.timeout, max_game=args.maxgame, quiet=args.quiet) + start_game(port=args.port, dim=args.dimension, max_strength=args.strength, max_turn=args.maxturn, + silent_bool=args.silent, timeout=args.timeout, max_game=args.maxgame, quiet=args.quiet, + n_pipe_players=args.n_pipe_players, + slave_players=args.slave_players) diff --git a/public/MyBot.py b/public/MyBot.py index a649c51..9429dac 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -1,9 +1,10 @@ import sys +import os +sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) mode = 'server' if (len(sys.argv) == 1) else 'local' -mode = 'local' # TODO remove forcing -if mode == 'server': # 'server' mode +if mode == 'server' or sys.argv[1]=='slave': # 'server' mode import hlt else: # 'local' mode import context diff --git a/public/OpponentBot.py b/public/OpponentBot.py index f6f002e..32ca428 100644 --- a/public/OpponentBot.py +++ b/public/OpponentBot.py @@ -1,9 +1,10 @@ import sys +import os +sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) mode = 'server' if (len(sys.argv) == 1) else 'local' -mode = 'local' # TODO remove forcing -if mode == 'server': # 'server' mode +if mode == 'server' or sys.argv[1]=='slave': # 'server' mode import hlt else: # 'local' mode import context diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index 894337a..39592e3 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -11,6 +11,7 @@ def __init__(self): s_size = 9 * 3; a_size = 5; h_size = 50 + os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' tf.reset_default_graph() with tf.device("/cpu:0"): diff --git a/src/core/Halite.cpp b/src/core/Halite.cpp index bcc4e8f..6bc0e19 100644 --- a/src/core/Halite.cpp +++ b/src/core/Halite.cpp @@ -389,14 +389,18 @@ GameStatistics Halite::runGame(std::vector * names_, unsigned int s stats.timeout_tags = timeout_tags; stats.timeout_log_filenames = std::vector(timeout_tags.size()); //Output gamefile. First try the replays folder; if that fails, just use the straight filename. - stats.output_filename = "Replays/" + std::to_string(id) + '-' + std::to_string(seed) + ".hlt"; + stats.output_filename = "visualize/hlt/" + std::to_string(id) + '-' + std::to_string(seed) + ".hlt"; if(!no_file_output){ try { output(stats.output_filename); } catch(std::runtime_error & e) { - stats.output_filename = stats.output_filename.substr(8); - output(stats.output_filename); + try{ + output("../"+stats.output_filename); + }catch(std::runtime_error & e){ + stats.output_filename = stats.output_filename.substr(8); + output(stats.output_filename); + } } if(!quiet_output) std::cout << "Map seed was " << seed << std::endl << "Opening a file at " << stats.output_filename << std::endl; diff --git a/visualize/static/localVisualizer.js b/visualize/static/localVisualizer.js old mode 100644 new mode 100755 diff --git a/visualize/static/parsereplay.js b/visualize/static/parsereplay.js old mode 100644 new mode 100755 diff --git a/visualize/static/visualizer.js b/visualize/static/visualizer.js old mode 100644 new mode 100755 index 4b34b13..26382a5 --- a/visualize/static/visualizer.js +++ b/visualize/static/visualizer.js @@ -219,6 +219,11 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal } loc=0 + prodContainer.removeChildren() + strengthContainer.removeChildren() + possessContainer.removeChildren() + rewardContainer.removeChildren() + policyContainer.removeChildren() var sY = Math.round(yOffset); for(var a = 0; a < game.height; a++) { var sX = Math.round(xOffset); @@ -280,43 +285,11 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal if(sY == game.height) sY = 0; } - // function postData(input) { - // $.ajax({ - // type: "POST", - // url: "http://127.0.0.1:5000/",//http://127.0.0.1:8080/reward.py - // data: { param: input }, - // success: callbackFunc - // }); - // } - - // function callbackFunc(response) { - // // do something with the response - // console.log(response); - // } - - // postData('data to process'); - - - // $.getJSON('/getpythondata', JSON.stringify(game), function(data) { - // console.log(data) - // }); - // $.ajax({ - // type: "POST", - // contentType: "application/json", - // url: '/getpythondata', - // data: { name: 'norm' }, - // dataType: "json" - // // }); - // $.post( "/getpythondata", { - // javascript_data: data - // }); - - stage.addChild(mapGraphics); //stage.addChild(prodContainer); //stage.addChild(strengthContainer); //stage.addChild(possessContainer); - //stage.addChild(rewardContainer); + stage.addChild(rewardContainer); stage.addChild(policyContainer); console.log(renderer.width, renderer.height); } @@ -368,10 +341,10 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal if(frame >= game.num_frames - 1) frame = game.num_frames - 1; shouldplay = false; } - else if(e.keyCode == 65 || e.keyCode == 68 || e.keyCode == 87 || e.keyCode == 83) { //wasd - xOffset = Math.round(xOffset); - yOffset = Math.round(yOffset); - } + // else if(e.keyCode == 65 || e.keyCode == 68 || e.keyCode == 87 || e.keyCode == 83) { //wasd + // xOffset = Math.round(xOffset); + // yOffset = Math.round(yOffset); + // } else if(e.keyCode == 79) { //o xOffset = 0; yOffset = 0; @@ -559,23 +532,21 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal var site = game.frames[frame][Math.floor(loc / game.width)][loc % game.width]; if(site.strength == 255) mapGraphics.lineStyle(1, '0xfffff0'); if(site.strength != 0) mapGraphics.beginFill(game.players[site.owner].color); + textStr[a][b].text = site.strength.toString() textPossess[a][b].text = site.owner.toString() textProd[a][b].style.fill = (site.owner.toString()=="1")?"#04e6f2":"#ffffff"; - textReward[a][b].text =(discountedRewards!= undefined)?discountedRewards[frame][Math.floor(loc / game.width)][loc % game.width]:''; + textReward[a][b].text =(pressed[65] && discountedRewards!= undefined && frame!=lastFrame && site.owner.toString()=="1")?discountedRewards[frame][Math.floor(loc / game.width)][loc % game.width]:''; //policies[a][b].text = policies[frame][a][b] In fact there are five... - var debug_direction = ["NORTH", "EAST", "SOUTH", "WEST", "STILL"] + //var debug_direction = ["NORTH", "EAST", "SOUTH", "WEST", "STILL"] for(var i = 0; i < 5; i++) { - var value = (policies!= undefined)?policies[frame][a][b][i].toExponential(1):0 - textPolicy[a][b][i].text = (value==0)?'':value.toString() - //textPolicy[a][b][i].text = (value==0)?'':debug_direction[i] + //var value = (policies!= undefined)?policies[frame][a][b][i].toExponential(1):0 + textPolicy[a][b][i].text = '' //(value==0)?'':value.toString() } - - //console.log(discountedRewards[frame][Math.floor(loc / game.width)][loc % game.width]) var pw = rw * Math.sqrt(site.strength > 0 ? site.strength / 255 : 0.1) / 2 var ph = rh * Math.sqrt(site.strength > 0 ? site.strength / 255 : 0.1) / 2; @@ -654,6 +625,18 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal str = game.frames[frame][y][x].strength; prod = game.productions[y][x]; infoText.text = 'Str: ' + str.toString() + ' | Prod: ' + prod.toString(); + + //policies[a][b].text = policies[frame][a][b] In fact there are five... + var debug_direction = ["NORTH", "EAST", "SOUTH", "WEST", "STILL"] + for(var i = 0; i < 5; i++) { + var value = (policies != undefined) ? policies[frame][y][x][i].toExponential(1) : 0 + textPolicy[y][x][i].text = (value == 0) ? '' : value.toString() + } + if(pressed[85]){//u pressed + textReward[y][x].text =(discountedRewards!= undefined && frame!=lastFrame)?discountedRewards[frame][y][x]:''; + } + + if(frame < game.moves.length && game.frames[frame][y][x].owner != 0) { move = game.moves[frame][y][x]; if(move >= 0 && move < 5) { @@ -693,10 +676,10 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal //Pan if desired. const PAN_SPEED = 1; - if(pressed[65]) xOffset += PAN_SPEED; - if(pressed[68]) xOffset -= PAN_SPEED - if(pressed[87]) yOffset += PAN_SPEED; - if(pressed[83]) yOffset -= PAN_SPEED; + // if(pressed[65]) xOffset += PAN_SPEED; + // if(pressed[68]) xOffset -= PAN_SPEED + // if(pressed[87]) yOffset += PAN_SPEED; + // if(pressed[83]) yOffset -= PAN_SPEED; //Reset pan to be in normal bounds: if(Math.round(xOffset) >= game.width) xOffset -= game.width; diff --git a/visualize/templates/reward.py b/visualize/templates/reward.py deleted file mode 100644 index 6df6140..0000000 --- a/visualize/templates/reward.py +++ /dev/null @@ -1,10 +0,0 @@ -import csv -from numpy import genfromtxt -from numpy import matrix - -def main(): - return 1 - -if __name__ == "__main__": - x=main() - return x; \ No newline at end of file diff --git a/visualize/templates/visualizer.html b/visualize/templates/visualizer.html old mode 100644 new mode 100755 diff --git a/visualize/visualize.py b/visualize/visualize.py old mode 100644 new mode 100755 From efdc266a63bc4a76851ac3367ac67b1183032f8e Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 4 Oct 2017 22:30:38 +0200 Subject: [PATCH 32/45] Handier start_game and vis + minor bug correction: This change addresses the need by: * For trained Bot, act greedily (no more randomness) * Start_game now specify width or height * Resolve bug related to width/height transfer from game_map to game_state * Reward, separate, rawRewardMetric for comparing model, and rawReward which differs according to the agent. * Visualizer even handier * Convenience for the server --- Makefile | 6 +++++- docs/.config.yml | 1 + networking/start_game.py | 9 +++++---- public/models/agent/vanillaAgent.py | 4 ++-- public/models/bot/trainedBot.py | 4 ++-- train/experience.py | 4 ++-- train/reward.py | 30 +++++++++++++++++++++++------ visualize/static/visualizer.js | 12 +++++++++--- 8 files changed, 50 insertions(+), 20 deletions(-) create mode 100644 docs/.config.yml diff --git a/Makefile b/Makefile index 6a6d9c4..ee5ecf1 100644 --- a/Makefile +++ b/Makefile @@ -28,4 +28,8 @@ clear-agent: .PHONY: server server: - cd visualize;export FLASK_APP=visualize.py;flask run \ No newline at end of file + cd visualize;export FLASK_APP=visualize.py;flask run + +.PHONY: debug-server +debug-server: + cd visualize;FLASK_APP=visualize.py FLASK_DEBUG=1 python -m flask run \ No newline at end of file diff --git a/docs/.config.yml b/docs/.config.yml new file mode 100644 index 0000000..c419263 --- /dev/null +++ b/docs/.config.yml @@ -0,0 +1 @@ +theme: jekyll-theme-cayman \ No newline at end of file diff --git a/networking/start_game.py b/networking/start_game.py index 750539e..5fcd8d8 100644 --- a/networking/start_game.py +++ b/networking/start_game.py @@ -3,14 +3,14 @@ import os -def start_game(port=2000, dim=10, max_strength=25, max_turn=25, max_game=1, silent_bool=True, timeout=True, quiet=True, +def start_game(port=2000, width=10,height=10, max_strength=25, max_turn=25, max_game=1, silent_bool=True, timeout=True, quiet=True, n_pipe_players=1, slave_players=[]): path_to_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) for i in range(n_pipe_players): subprocess.call([path_to_root + "/networking/kill.sh", str(port + i)]) # Free the necessary ports # subprocess.call([path_to_root + "/networking/kill.sh", str(port+1)]) # TODO automatic call to subprocess halite = path_to_root + '/public/halite ' - dimensions = '-d "' + str(dim) + ' ' + str(dim) + '" ' + dimensions = '-d "' + str(height) + ' ' + str(width) + '" ' max_strength = '-z ' + str(max_strength) + ' ' max_turn = '-x ' + str(max_turn) + ' ' @@ -45,13 +45,14 @@ def start_game(port=2000, dim=10, max_strength=25, max_turn=25, max_game=1, sile parser.add_argument("-j", "--silent", help="Doesn't print *.hlt file", action="store_true", default=False) parser.add_argument("-q", "--quiet", help="quiet", action="store_true", default=False) parser.add_argument("-s", "--strength", help="max strength", type=int, default=25) - parser.add_argument("-d", "--dimension", help="max dimension", type=int, default=10) + parser.add_argument("-dw", "--width", help="max width", type=int, default=10) + parser.add_argument("-dh", "--height", help="max height", type=int, default=10) parser.add_argument("-m", "--maxturn", help="max turn", type=int, default=25) parser.add_argument("-g", "--maxgame", help="max game", type=int, default=1) # -1 for infinite game parser.add_argument("-pp", "--n_pipe_players", type=int, default=0) parser.add_argument("-sp", "--slave_players", nargs='+', default=[]) args = parser.parse_args() - start_game(port=args.port, dim=args.dimension, max_strength=args.strength, max_turn=args.maxturn, + start_game(port=args.port, width=args.width,height=args.height, max_strength=args.strength, max_turn=args.maxturn, silent_bool=args.silent, timeout=args.timeout, max_game=args.maxgame, quiet=args.quiet, n_pipe_players=args.n_pipe_players, slave_players=args.slave_players) diff --git a/public/models/agent/vanillaAgent.py b/public/models/agent/vanillaAgent.py index b952f0d..77a03c8 100644 --- a/public/models/agent/vanillaAgent.py +++ b/public/models/agent/vanillaAgent.py @@ -6,8 +6,8 @@ class VanillaAgent(Agent): - def __init__(self, experience, lr = 1e-3, s_size = 9 * 3, a_size = 5, h_size = 50): # all these are optional ? - super(VanillaAgent, self).__init__('vanilla-ter', experience) + def __init__(self, experience, lr = 1e-2, s_size = 9 * 3, a_size = 5, h_size = 50): # all these are optional ? + super(VanillaAgent, self).__init__('vanilla-cin', experience) # These lines established the feed-forward part of the network. The agent takes a state and produces an action. self.state_in = tf.placeholder(shape=[None, s_size], dtype=tf.float32) diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py index 39592e3..2ea9ba5 100644 --- a/public/models/bot/trainedBot.py +++ b/public/models/bot/trainedBot.py @@ -7,7 +7,7 @@ class TrainedBot(Bot): def __init__(self): - lr = 1e-3; + lr = 5*1e-3; s_size = 9 * 3; a_size = 5; h_size = 50 @@ -32,7 +32,7 @@ def __init__(self): def compute_moves(self, game_map): game_state = getGameState(game_map, self.myID) - return formatMoves(game_map, self.agent.choose_actions(self.sess, game_state)) + return formatMoves(game_map, self.agent.choose_actions(self.sess, game_state, debug=True)) def get_policies(self, game_state): # Warning this is not hereditary diff --git a/train/experience.py b/train/experience.py index 3d4c94b..146933d 100644 --- a/train/experience.py +++ b/train/experience.py @@ -3,7 +3,7 @@ """ import numpy as np -from train.reward import allRewards, rawRewards +from train.reward import allRewards, rawRewardsMetric class Experience: @@ -24,7 +24,7 @@ def batch(self, size): pass def compute_metric(self, game_states): - production_increments = np.sum(np.sum(rawRewards(game_states), axis=2), axis=1) + production_increments = np.sum(np.sum(rawRewardsMetric(game_states), axis=2), axis=1) self.metric = np.append(self.metric, production_increments.dot(np.linspace(2.0, 1.0, num=len(game_states) - 1))) def save_metric(self, name): diff --git a/train/reward.py b/train/reward.py index f545dd6..36a8233 100644 --- a/train/reward.py +++ b/train/reward.py @@ -8,7 +8,7 @@ def getGameState(game_map, myID): game_state = np.reshape( [[(square.owner == myID) + 0, square.strength, square.production] for square in game_map], - [game_map.width, game_map.height, 3]) + [game_map.height, game_map.width, 3]) return np.swapaxes(np.swapaxes(game_state, 2, 0), 1, 2) @@ -39,6 +39,21 @@ def discount_rewards(r, gamma=0.8): discounted_r[t] = running_add return discounted_r +def take_surrounding_square(game_state, x, y, size = 1): + return np.take(np.take(game_state, range(y - size, y + size + 1), axis=1, mode='wrap'), + range(x - size, x + size + 1), axis=2, mode='wrap') + +def take_surrounding_losange(game_state, x, y, size = 2): + np.take(np.take(game_state, y, axis=1, mode='wrap'), + range(x - 2, x + 2 + 1), axis=2, mode='wrap') + np.take(np.take(game_state, y+1, axis=1, mode='wrap'), + range(x - 1, x + 1 + 1), axis=2, mode='wrap') + np.take(np.take(game_state, y-1, axis=1, mode='wrap'), + range(x - 1, x + 1 + 1), axis=2, mode='wrap') + np.take(np.take(game_state, y+2, axis=1, mode='wrap'), + x, axis=2, mode='wrap') + np.take(np.take(game_state, y-2, axis=1, mode='wrap'), + x, axis=2, mode='wrap') def localStateFromGlobal(game_state, x, y, size=1): # TODO: for now we still take a square, but a more complex shape could be better. @@ -46,10 +61,14 @@ def localStateFromGlobal(game_state, x, y, size=1): range(x - size, x + size + 1), axis=2, mode='wrap') -def rawRewards(game_states): +def rawRewardsMetric(game_states): return np.array([game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2] for i in range(len(game_states) - 1)]) +def rawRewards(game_states): + return np.array([0.0001*np.power(game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2],4) + for i in range(len(game_states) - 1)]) + def strengthRewards(game_states): return np.array([(getStrength(game_states[i + 1]) - getStrength(game_states[i])) @@ -68,7 +87,6 @@ def take_value(matrix, x, y): if d != -1: dy = (-1 if d == NORTH else 1) if (d == SOUTH or d == NORTH) else 0 dx = (-1 if d == WEST else 1) if (d == WEST or d == EAST) else 0 - discount_factor = discount_factor if (d != STILL or discount_factor == 1.0) else 0.9 reward[y][x] = discount_factor * take_value(next_reward, x + dx, y + dy) if strength_before[y][ x] >= take_value( strength_before, x + dx, y + dy) else 0 @@ -80,12 +98,12 @@ def discountedRewards(game_states, moves): raw_rewards = rawRewards(game_states) # strength_rewards = strengthRewards(game_states) discounted_rewards = np.zeros_like(raw_rewards, dtype=np.float64) - running_reward = np.zeros_like(raw_rewards[0]) + running_reward = np.zeros_like(raw_rewards[0], dtype=np.float64) for t in reversed(range(0, len(raw_rewards))): running_reward = discountedReward(running_reward, moves[t], game_states[t][1], - discount_factor=0.2) + discountedReward( + discount_factor=0.6) + discountedReward( raw_rewards[t], moves[t], game_states[t][1]) - discounted_rewards[t] = running_reward # + 0.2*(moves[t]==STILL)*(game_states[t][2]) + discounted_rewards[t] = running_reward ##TODO : HERE FOR STRENGTH ! INDEPENDENT return discounted_rewards diff --git a/visualize/static/visualizer.js b/visualize/static/visualizer.js index 26382a5..81c8068 100755 --- a/visualize/static/visualizer.js +++ b/visualize/static/visualizer.js @@ -248,7 +248,12 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal textPossess[a][b].position = new PIXI.Point(rw * (sX+0.5) , rh * (sY+0.5)); textPossess[a][b].style.fill = "#ffffff"; - textReward[a][b] = new PIXI.Text(site.owner.toString(),sty) + var style_1 = new PIXI.TextStyle({ + fontFamily: 'Roboto', + fontSize: 20 + }); + + textReward[a][b] = new PIXI.Text(site.owner.toString(),style_1) textReward[a][b].anchor = new PIXI.Point(0.5, 0.5); textReward[a][b].position = new PIXI.Point(rw * (sX+0.5) , rh * (sY+0.5)); textReward[a][b].style.fill = "#ffffff"; @@ -257,6 +262,7 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal fontFamily: 'Roboto', fontSize: 10 }); + for(var j = 0; j < 5; j++){ textPolicy[a][b][j] = new PIXI.Text(site.owner.toString(),style_2) textPolicy[a][b][j].position = new PIXI.Point(rw * (sX+0.5) , rh * (sY+0.5)); @@ -537,7 +543,7 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal textPossess[a][b].text = site.owner.toString() textProd[a][b].style.fill = (site.owner.toString()=="1")?"#04e6f2":"#ffffff"; - textReward[a][b].text =(pressed[65] && discountedRewards!= undefined && frame!=lastFrame && site.owner.toString()=="1")?discountedRewards[frame][Math.floor(loc / game.width)][loc % game.width]:''; + textReward[a][b].text =(pressed[65] && discountedRewards!= undefined && frame!=lastFrame && site.owner.toString()=="1")?discountedRewards[frame][Math.floor(loc / game.width)][loc % game.width].toPrecision(2):''; //policies[a][b].text = policies[frame][a][b] In fact there are five... @@ -633,7 +639,7 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal textPolicy[y][x][i].text = (value == 0) ? '' : value.toString() } if(pressed[85]){//u pressed - textReward[y][x].text =(discountedRewards!= undefined && frame!=lastFrame)?discountedRewards[frame][y][x]:''; + textReward[y][x].text =(discountedRewards!= undefined && frame!=lastFrame)?discountedRewards[frame][y][x].toPrecision(2):''; } From a460c4342005fcab1e5b7fb6b1ecf0a2d9530a82 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Thu, 5 Oct 2017 14:18:52 +0200 Subject: [PATCH 33/45] Documentation Why this change was necessary: * Simplify original README.md, removing the `blog article` * Remove undesirable advertisements * Ugly templates --- README.md | 93 +------------------ docs/.config.yml | 1 - docs/README.md | 44 +-------- docs/_config.yml | 14 +++ docs/_documentation/first_steps.md | 40 ++++++++ .../2017-09-26-simple-approach.markdown | 93 +++++++++++++++++++ docs/index.html | 20 ++++ 7 files changed, 170 insertions(+), 135 deletions(-) delete mode 100644 docs/.config.yml create mode 100644 docs/_config.yml create mode 100644 docs/_documentation/first_steps.md create mode 100644 docs/_posts/2017-09-26-simple-approach.markdown create mode 100644 docs/index.html diff --git a/README.md b/README.md index b99baed..ff705a0 100644 --- a/README.md +++ b/README.md @@ -11,10 +11,6 @@ Halite is an open source artificial intelligence programming challenge, created by Two Sigma, where players build bots using the coding language of their choice to battle on a two-dimensional virtual board. The last bot standing or the bot with all the territory wins. Victory will require micromanaging of the movement of pieces, optimizing a bot’s combat ability, and braving a branching factor billions of times higher than that of Go. -## Documentation - -The documentation is available here. - ## Objective The objective of the project is to apply **Reinforcement Learning** strategies to teach the Bot to perform as well as possible. We teach an agent to learn the best actions to play at each turn. More precisely, given the game state, our untrained Bot **initially performs random actions, but gets rewarded for the good one**. Over time, the Bot automatically learns how to conquer efficiently the map. @@ -31,91 +27,6 @@ Indeed, unlike chess or go, in the Halite turn-based game, we can do **multiple In this repository, we will mainly explore the solutions based on **Neural Networks**, and will start by a very simple MLP. This is inspired from a tutorial on Reinforcement Learning agent. +## Documentation & Articles -## Detailing the approach step by step - -We will explain the rules of the game in this section, along with our strategy for training the agent. To start simple, we will try to conquer a 3*3 map, where we are the only player (cf below). As we can see, this trained agent is already pretty efficient at conquering the map. - -
-

-conquermap -

- -### How does it start ? - -Each player starts with a single square of the map, and can either decide: - -- To **stay** in order to increase the strength of its square (action = STILL). - -- To **move** (/conquer) a neighboring square (action = NORTH, SOUTH, EAST, WEST). - -Conquering is only possible once the square's strength is high enough, such that a wise bot would first wait for its strength to increase before attacking any adjacent square, since **squares don't produce when they attack**. - -> To conquer a square, we must move in its direction having a strictly superior strength (action = NORTH, SOUTH, EAST, WEST) - -
- -The white numbers on the map below represent the current strength of the squares. On the left is just a snap of the initial state of the game. On the right you can see the strength of the blue square increment over time. This is because our agent decides to stay (action = STILL). - -

-the strength map - -

- -The increase in production is computed according to a fixed production map. In our example, we can see the blue square's strength increases by 4 at each turn. Each square has a different production speed, as represented by the white numbers below the squares. (cf below). On the left is also a snap of the initial game, whereas the game's dynamic is on the right. - -

-production map - -

- -This production map production is invariant over time, and is an information we should use to train our agent. Since we are interesting in maximizing our production, we should intuitively train our agent to target the squares with a high production rate. On the other hand, we should also consider the strength map, since squares with low strength are easier to conquer. - -

- -

- -### The Agent - -We will teach our agent with: - -- The successive **Game States**. -- The agent's **Moves** (initially random). -- The corresponding **Reward** for each Move (that we have to compute). - -For now, the Game State is a (3 * 3) * 3 matrix (width * height) * n_features, the features being: - -- The **Strength** of the Square -- The **Production** of the Square -- The **Owner** of the Square - -

-matrix - -

- -### The Reward - -
-As for the reward, we focus on the production. Since each square being conquered increase the total production of our land, the action leading to the conquest is rewarded according to the production rate of the conquered square. This strategy will best reward the conquest of highly productive squares. - -

- -

- -### Current results - -We train over 500 games and get significant improvements of the total reward obtained over time. - -

-screen shot 2017-09-26 at 17 34 04 -

- -On the right, you can observe the behaviour of the original, untrained bot, with random actions, whereas on the right, you can see the trained bot. - -

- - -

- -#### Isn't that amazing ? \ No newline at end of file +To get started, blog articles and documentation are available at this page. \ No newline at end of file diff --git a/docs/.config.yml b/docs/.config.yml deleted file mode 100644 index c419263..0000000 --- a/docs/.config.yml +++ /dev/null @@ -1 +0,0 @@ -theme: jekyll-theme-cayman \ No newline at end of file diff --git a/docs/README.md b/docs/README.md index 7924f40..886d01a 100644 --- a/docs/README.md +++ b/docs/README.md @@ -1,45 +1,3 @@ ---- -title: Sidebar Navigation -summary: "My man!" -sidebar: mydoc_sidebar -permalink: mydoc_sidebar_navigation.html -folder: mydoc ---- - # Documentation -Go read the documentation [here](https://edouard360.github.io/Halite-Python-RL/). - -## Run the Bot - -In your console: - -`cd networking python start_game.py` - -In another tab - -`cd public python MyBot.py` - -This will run 1 game. Options can be added to starting the game, among which: - -`python start_game.py -g 5 -x 30 -z 50` - -Will run 5 games, of at most 30 turns, which at most squares of strength 50. - -## Visualize the Bot - -In your console: - -`cd visualize export FLASK_APP=visualize.py;flask run` - -Then either: - -Look at http://127.0.0.1:5000/performance.png for performance insights. - -Or at http://127.0.0.1:5000/ for games replay. - -## Working with PyCharm - -To run the Bot in Pycharm, you should provide a **mute** argument, since `MyBot.py` needs to know it's not on the Halite server, but running locally. - -Go to edit configuration and add the script argument 2000 (It could be any other number). \ No newline at end of file +To see the docs, click [here](https://edouard360.github.io/Halite-Python-RL/). diff --git a/docs/_config.yml b/docs/_config.yml new file mode 100644 index 0000000..39a14be --- /dev/null +++ b/docs/_config.yml @@ -0,0 +1,14 @@ +# Setup +theme: jekyll-theme-cayman + +title: Halite Challenge +tagline: A data science project + +author: + name: Edouard Mehlman + url: edouard.mehlman@polytechnique.edu + +collections: + documentation: + output: true + permalink: /:collection/:name # This is just display diff --git a/docs/_documentation/first_steps.md b/docs/_documentation/first_steps.md new file mode 100644 index 0000000..f0ccd29 --- /dev/null +++ b/docs/_documentation/first_steps.md @@ -0,0 +1,40 @@ +--- +layout: default +title: "First Steps" + +--- + + +## Run the Bot + +In your console: + +`cd networking python start_game.py` + +In another tab + +`cd public python MyBot.py` + +This will run 1 game. Options can be added to starting the game, among which: + +`python start_game.py -g 5 -x 30 -z 50` + +Will run 5 games, of at most 30 turns, which at most squares of strength 50. + +## Visualize the Bot + +In your console: + +`cd visualize export FLASK_APP=visualize.py;flask run` + +Then either: + +Look at http://127.0.0.1:5000/performance.png for performance insights. + +Or at http://127.0.0.1:5000/ for games replay. + +## Working with PyCharm + +To run the Bot in Pycharm, you should provide a **mute** argument, since `MyBot.py` needs to know it's not on the Halite server, but running locally. + +Go to edit configuration and add the script argument `slave` (so that the bot knows it is in slave mode). \ No newline at end of file diff --git a/docs/_posts/2017-09-26-simple-approach.markdown b/docs/_posts/2017-09-26-simple-approach.markdown new file mode 100644 index 0000000..7b50fd1 --- /dev/null +++ b/docs/_posts/2017-09-26-simple-approach.markdown @@ -0,0 +1,93 @@ +--- +layout: default +title: "A simple approach" +date: 2016-02-12 17:50:00 +categories: main +--- + +## Detailing the approach step by step + +We will explain the rules of the game in this section, along with our strategy for training the agent. To start simple, we will try to conquer a 3*3 map, where we are the only player (cf below). As we can see, this trained agent is already pretty efficient at conquering the map. + +
+

+conquermap +

+ + +### How does it start ? + +Each player starts with a single square of the map, and can either decide: + +- To **stay** in order to increase the strength of its square (action = STILL). + +- To **move** (/conquer) a neighboring square (action = NORTH, SOUTH, EAST, WEST). + +Conquering is only possible once the square's strength is high enough, such that a wise bot would first wait for its strength to increase before attacking any adjacent square, since **squares don't produce when they attack**. + +> To conquer a square, we must move in its direction having a strictly superior strength (action = NORTH, SOUTH, EAST, WEST) + +
+ +The white numbers on the map below represent the current strength of the squares. On the left is just a snap of the initial state of the game. On the right you can see the strength of the blue square increment over time. This is because our agent decides to stay (action = STILL). + +

+the strength map + +

+ +The increase in production is computed according to a fixed production map. In our example, we can see the blue square's strength increases by 4 at each turn. Each square has a different production speed, as represented by the white numbers below the squares. (cf below). On the left is also a snap of the initial game, whereas the game's dynamic is on the right. + +

+production map + +

+ +This production map production is invariant over time, and is an information we should use to train our agent. Since we are interesting in maximizing our production, we should intuitively train our agent to target the squares with a high production rate. On the other hand, we should also consider the strength map, since squares with low strength are easier to conquer. + +

+ +

+ +### The Agent + +We will teach our agent with: + +- The successive **Game States**. +- The agent's **Moves** (initially random). +- The corresponding **Reward** for each Move (that we have to compute). + +For now, the Game State is a (3 * 3) * 3 matrix (width * height) * n_features, the features being: + +- The **Strength** of the Square +- The **Production** of the Square +- The **Owner** of the Square + +

+matrix + +

+ +### The Reward + +
+As for the reward, we focus on the production. Since each square being conquered increase the total production of our land, the action leading to the conquest is rewarded according to the production rate of the conquered square. This strategy will best reward the conquest of highly productive squares. + +

+ +

+ +### Current results + +We train over 500 games and get significant improvements of the total reward obtained over time. + +

+screen shot 2017-09-26 at 17 34 04 +

+ +On the right, you can observe the behaviour of the original, untrained bot, with random actions, whereas on the right, you can see the trained bot. + +

+ + +

diff --git a/docs/index.html b/docs/index.html new file mode 100644 index 0000000..e7a48ab --- /dev/null +++ b/docs/index.html @@ -0,0 +1,20 @@ +--- +layout: default +title: {{ site.name }} +--- + +
+

Documentation

+ +

Blog Posts

+
    + {% for post in site.posts %} +
  • {{ post.title }} ({{ post.date | date_to_string }})
  • + {% endfor %} +
+ +
\ No newline at end of file From 8f78b885b0625300cac64a35c48f7469602b72f0 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Thu, 5 Oct 2017 19:43:15 +0200 Subject: [PATCH 34/45] Post article added + start_game.py improved Why this change was necessary: * To clarify the reward process * start_game.py -h now lists the available commands and gives detailled explanation for all of them --- docs/_documentation/first_steps.md | 8 +- docs/_includes/center.css | 17 ++ .../2017-09-26-simple-approach.markdown | 3 +- .../2017-10-05-reward-importance.markdown | 160 ++++++++++++++++++ networking/start_game.py | 20 +-- train/main.py | 4 +- visualize/static/visualizer.js | 6 +- 7 files changed, 200 insertions(+), 18 deletions(-) create mode 100644 docs/_includes/center.css create mode 100644 docs/_posts/2017-10-05-reward-importance.markdown diff --git a/docs/_documentation/first_steps.md b/docs/_documentation/first_steps.md index f0ccd29..7bbfe13 100644 --- a/docs/_documentation/first_steps.md +++ b/docs/_documentation/first_steps.md @@ -9,11 +9,11 @@ title: "First Steps" In your console: -`cd networking python start_game.py` +`cd networking; python start_game.py` In another tab -`cd public python MyBot.py` +`cd public; python MyBot.py` This will run 1 game. Options can be added to starting the game, among which: @@ -21,6 +21,10 @@ This will run 1 game. Options can be added to starting the game, among which: Will run 5 games, of at most 30 turns, which at most squares of strength 50. +All the options available for start_game might by listed (_with a clear description_) using the -h flag: + +`python start_game.py -h` + ## Visualize the Bot In your console: diff --git a/docs/_includes/center.css b/docs/_includes/center.css new file mode 100644 index 0000000..671898c --- /dev/null +++ b/docs/_includes/center.css @@ -0,0 +1,17 @@ +.list-unstyled { + padding-left: 0; + list-style: none; + } +.list-inline { + padding-left: 0; + margin-left: -5px; + list-style: none; +} +.list-inline > li { + display: inline-block; + padding-right: 5px; + padding-left: 5px; +} +.text-center { + text-align: center; +} \ No newline at end of file diff --git a/docs/_posts/2017-09-26-simple-approach.markdown b/docs/_posts/2017-09-26-simple-approach.markdown index 7b50fd1..9f0a803 100644 --- a/docs/_posts/2017-09-26-simple-approach.markdown +++ b/docs/_posts/2017-09-26-simple-approach.markdown @@ -11,7 +11,7 @@ We will explain the rules of the game in this section, along with our strategy f

-conquermap +conquermap

@@ -91,3 +91,4 @@ On the right, you can observe the behaviour of the original, untrained bot, with

+ diff --git a/docs/_posts/2017-10-05-reward-importance.markdown b/docs/_posts/2017-10-05-reward-importance.markdown new file mode 100644 index 0000000..1c446e1 --- /dev/null +++ b/docs/_posts/2017-10-05-reward-importance.markdown @@ -0,0 +1,160 @@ +--- +layout: default +title: "The reward importance" +date: 2016-02-12 16:30:00 +categories: main +--- + + + +# The reward importance + +## The impact of the reward choice + +We will see how important it is to set a proper reward, playing with two hyperparameters, ie: + +* The **discount factor** - for the discounted rewards +* The **reward expression**, as a function of the **production** - the hyperparameter being the function itself + +The results of a well-chosen reward can lead to significant improvements ! Our bot, only trained for conquering the map as fast as possible, now systematically wins against the trained bot (that applies heuristics). This is best exemplified by the 3 games below. + +

+ + + +

+ +## Devising the reward + +The reward is complex to devise since we take **multiple actions** at each turn, and we have to compute the reward for **each of these individual actions**. + +Below is an insightful illustration to understand the process. The number written on each square are **the reward associated with the current action of the square**. Notice that, at each turn, **these reward are different for each square**, and that, when a square is about to conquer another adjacent square, **the reward for its action is high**. + +It is even more higher as the square is more productive. + +> HINT: highly productive square have a brighter background, and the poorly productive have a darker one. + +Observe how the rewards evolve over time: there is already a **discount factor** applied, because we encourage (/reward) action **that will eventually lead to a reward over time**. Indeed, the `STILL` squares are earning rewards ! + +
+ +
    +
  • + reward1 +
    Discount = 0.6
    +
  • +
+ +## Understanding the discount factor + +To better understand the discount factor, let's push it to its **limits**, and look at the corresponding reward for the exact same game. + +* On the left, notice that when the discount factor is set to 0, only the moves that conquer a square qre rewarded. This means that the `STILL` action for a square never gets rewarded - which is undesirable. +* On the over end, with a discount rate of 0.9, the rewards tend to be **overall much higher**. Yet this excessively uniform pattern **doesn't favor much the actions that are actually good**. Too many actions are rewarded, even though they were potentially not efficient. + +As expected, these reward strategies fare badly compared to a more balanced discount factor. See below the comparison. + +
    +
  • + reward2 +
    Discount = 0.0
    +
  • +
  • + reward3 +
    Discount = 0.9
    +
  • +
+ +## Variation of the raw reward + +Each reward is computed according to the production of the conquered square, and then "backtracked" to the actions that lead to this reward. + +But should this reward be proportional to the production ? Wouldn't it be better to make it **proportional to the square of the production** ? Or even to a higher power ? + +Indeed, we want to strongly encourage our bot to conquer highly productive square, and a way to enforce efficiently this learning is by **giving significantly greater rewards for the highly productive square**. + +All the example before had reward proportional to the power 4 of the production. But let's look for a **power 2** and a **linear** reward. + +
    +
  • + reward4 +
    Power: 2 (Discount = 0.6)
    +
  • +
  • + reward5 +
    Power: 1 (Discount = 0.6)
    +
  • +
+ +### The ratio changes + +Let's **extract one frame of the above**. (*see gifs below*) Let's not focus on the absolute value of the rewards, but rather on **the ratio between the rewards of different actions**. + +The two actions that we compare here are: + +* The square on the top left that conquers its left neighbour (1) +* The square on the bottom right that conquers its above neighbour (2) + +We would want action (1) to be better reward than action (2). Indeed look at the background color of the conquered square. The conquered square in (1) is **brighter** than the conquered square in (2) and therefore **more productive**. + +In all cases, `reward(1) > reward(2)`. But if we look at the ratio (*see gifs below*), we have, from left to right: + +* 0.65/0.24 = 2.7 +* 0.93/0.49 = 1.8 +* 1.1/0.7 = 1.5 + +Which illustrates that, the **higher the exponent** for the reward, **the greater the difference between the reward** of good and very good actions. + +
    +
  • + reward1-bis +
    Power: 4 (D = 0.6)
    +
  • +
  • + reward4-bis +
    Power: 2 (D = 0.6)
    +
  • +
  • + reward5-bis +
    Power: 1 (D = 0.6)
    +
  • +
+ +## The performance + +According to the choice of reward, the training can be much slower, or even converge to a worse equilibrium. We should keep this in mind as we explore new strategies in the future. + +
+ +

+performance +

+ +
+ +## Scaling up + +What about the results on a larger map ? + +Our trained Bot **still wins** all the games against the OpponentBot when we increase the map size. + +
    +
  • + +
  • +
  • + +
  • +
  • + +
  • +
+ +However, we notice that: + +* This solution is too long to compute for each square individually + * Maybe we should only apply it for **the squares on the border** (and find another strategy for the squares in the center) + * We could gain time if we made **only one call to the tensorflow session**. Besides, the extraction of the local game states would probably be faster on the tensorflow side. +* Squares in the middle have a **suboptimal behaviour** - seems like they tend to move to the left systematically. diff --git a/networking/start_game.py b/networking/start_game.py index 5fcd8d8..7ff8b48 100644 --- a/networking/start_game.py +++ b/networking/start_game.py @@ -40,17 +40,17 @@ def start_game(port=2000, width=10,height=10, max_strength=25, max_turn=25, max_ if __name__ == '__main__': parser = argparse.ArgumentParser() - parser.add_argument("-p", "--port", type=int, help="the port for the simulation", default=2000) - parser.add_argument("-t", "--timeout", help="timeout", action="store_true", default=False) + parser.add_argument("-p", "--port", type=int, help="the port for the simulation - Useless if there are no pipe_players", default=2000) + parser.add_argument("-t", "--timeout", help="Doens't timeout if you set this flag is set", action="store_true", default=False) parser.add_argument("-j", "--silent", help="Doesn't print *.hlt file", action="store_true", default=False) - parser.add_argument("-q", "--quiet", help="quiet", action="store_true", default=False) - parser.add_argument("-s", "--strength", help="max strength", type=int, default=25) - parser.add_argument("-dw", "--width", help="max width", type=int, default=10) - parser.add_argument("-dh", "--height", help="max height", type=int, default=10) - parser.add_argument("-m", "--maxturn", help="max turn", type=int, default=25) - parser.add_argument("-g", "--maxgame", help="max game", type=int, default=1) # -1 for infinite game - parser.add_argument("-pp", "--n_pipe_players", type=int, default=0) - parser.add_argument("-sp", "--slave_players", nargs='+', default=[]) + parser.add_argument("-q", "--quiet", help="Doesn't output information to the console", action="store_true", default=False) + parser.add_argument("-s", "--strength", help="The max strength of the squares, if needed", type=int, default=25) + parser.add_argument("-dw", "--width", help="The width of the game", type=int, default=10) + parser.add_argument("-dh", "--height", help="The height of the game", type=int, default=10) + parser.add_argument("-m", "--maxturn", help="The total number of turns per game (maximum)", type=int, default=25) + parser.add_argument("-g", "--maxgame", help="The total number of games to play", type=int, default=1) # -1 for infinite game + parser.add_argument("-pp", "--n_pipe_players",help="The number of pipe players. You need to handle these players yourself. Each of them has a port assigned.", type=int, default=0) + parser.add_argument("-sp", "--slave_players", help="The slave players. Handled by the halite.exe. You should write one of these two strings: 'MyBot.py' or 'OpponentBot.py' (multiple time if desired) ",nargs='+', default=[]) args = parser.parse_args() start_game(port=args.port, width=args.width,height=args.height, max_strength=args.strength, max_turn=args.maxturn, silent_bool=args.silent, timeout=args.timeout, max_game=args.maxgame, quiet=args.quiet, diff --git a/train/main.py b/train/main.py index cf21c6d..1739bb3 100644 --- a/train/main.py +++ b/train/main.py @@ -25,8 +25,8 @@ master_experience = ExperienceVanilla() master_agent = VanillaAgent(master_experience, lr, s_size, a_size, h_size) - num_workers = 1 # multiprocessing.cpu_count()# (2) Maybe set max number of workers / number of available CPU threads - n_simultations = 15 + num_workers = 5 # multiprocessing.cpu_count()# (2) Maybe set max number of workers / number of available CPU threads + n_simultations = 500 workers = [] if num_workers > 1: diff --git a/visualize/static/visualizer.js b/visualize/static/visualizer.js index 81c8068..29c530a 100755 --- a/visualize/static/visualizer.js +++ b/visualize/static/visualizer.js @@ -207,13 +207,13 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal textPossess = new Array(game.height); textReward = new Array(game.height); textPolicy = new Array(game.height); - for (var i = 0; i < 10; i++) { + for (var i = 0; i < game.height; i++) { textProd[i] = new Array(game.width); textStr[i] = new Array(game.width); textPossess[i] = new Array(game.width); textReward[i] = new Array(game.width); textPolicy[i] = new Array(game.width); - for(var j = 0; j < 10; j++){ + for(var j = 0; j < game.width; j++){ textPolicy[i][j] = new Array(5); } } @@ -296,7 +296,7 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal //stage.addChild(strengthContainer); //stage.addChild(possessContainer); stage.addChild(rewardContainer); - stage.addChild(policyContainer); + //stage.addChild(policyContainer); console.log(renderer.width, renderer.height); } window.onresize(); From 125329b17c9168c5a4fcaa511a5f86af1599f0ab Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Thu, 5 Oct 2017 14:35:08 -0700 Subject: [PATCH 35/45] PyLint compliance, resolves #21 Why this change was necessary: * Make sure we adopt a standard style This change addresses the need by: * Adding tests .py files to check in .travis_yaml. Potential side-effects: * --- pylint_checks.txt | 3 +++ tests/reward_test.py | 14 +++++++++++--- tests/util.py | 16 ++++++++++++---- 3 files changed, 26 insertions(+), 7 deletions(-) diff --git a/pylint_checks.txt b/pylint_checks.txt index eecdc7e..b91526e 100644 --- a/pylint_checks.txt +++ b/pylint_checks.txt @@ -1,2 +1,5 @@ train/__init__.py train/experience.py +tests/__init__.py +tests/reward_test.py +tests/util.py \ No newline at end of file diff --git a/tests/reward_test.py b/tests/reward_test.py index 6398ff0..de5f672 100644 --- a/tests/reward_test.py +++ b/tests/reward_test.py @@ -1,23 +1,31 @@ """ Tests the reward function """ +import unittest from train.reward import discount_rewards, rawRewards, allRewards from train.experience import ExperienceVanilla from train.worker import Worker -import unittest import numpy as np from tests.util import game_states_from_url class TestReward(unittest.TestCase): + """ + Test reward, exprience and worker + """ def test_length_discount_rewards(self): self.assertTrue(len(discount_rewards(np.array([1]))) == 1) self.assertTrue(len(discount_rewards(np.array([1, 3]))) == 2) def test_reward(self): - GAME_URL = 'https://s3.eu-central-1.amazonaws.com/halite-python-rl/hlt-games/trained-bot.hlt' - game_states, moves = game_states_from_url(GAME_URL) + """ + Tests rawRewards + Returns: test case + + """ + game_url = 'https://s3.eu-central-1.amazonaws.com/halite-python-rl/hlt-games/trained-bot.hlt' + game_states, moves = game_states_from_url(game_url) raw_rewards = rawRewards(game_states) self.assertTrue(len(raw_rewards) == len(game_states) - 1) diff --git a/tests/util.py b/tests/util.py index 6681bb4..1ee55cd 100644 --- a/tests/util.py +++ b/tests/util.py @@ -1,15 +1,23 @@ +""" +Utility functions for test cases +""" import json import urllib.request + import numpy as np -def game_states_from_url(GAME_URL): +def game_states_from_url(game_url): """ We host known games on aws server and we run the tests according to these games, from which we know the output - :param GAME_URL: The url of the game on the server (string). - :return: + + Args: + game_url (basestring): The url of the game on the server (string). + + Returns: game_states, moves + """ - game = json.loads(urllib.request.urlopen(GAME_URL).readline().decode("utf-8")) + game = json.loads(urllib.request.urlopen(game_url).readline().decode("utf-8")) owner_frames = np.array(game["frames"])[:, :, :, 0][:, np.newaxis, :, :] strength_frames = np.array(game["frames"])[:, :, :, 1][:, np.newaxis, :, :] From 80f537e9b01d1e6abbe1bfd396a7cc395b645155 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Fri, 6 Oct 2017 11:10:09 +0200 Subject: [PATCH 36/45] Correction of the pylint tests Why this change was necessary: * Norms are important * Camel case instead of dash * for loops instead of enumerate * change some file names * import grouped and ordered correctly * warning about spacing and hyphens --- .pylintrc | 2 +- .travis.yml | 4 +- networking/hlt_networking.py | 28 ++-- networking/pipe_socket_translator.py | 57 ++++---- networking/start_game.py | 53 +++++--- public/MyBot.py | 24 ++-- public/OpponentBot.py | 24 ++-- public/context.py | 5 - public/hlt.py | 17 ++- public/models/agent/Agent.py | 52 ++++++++ .../{vanillaAgent.py => VanillaAgent.py} | 22 +-- public/models/agent/__init__.py | 1 - public/models/agent/agent.py | 42 ------ public/models/bot/Bot.py | 7 + .../bot/{improvedBot.py => ImprovedBot.py} | 10 +- public/models/bot/RandomBot.py | 12 ++ public/models/bot/TrainedBot.py | 47 +++++++ public/models/bot/__init__.py | 1 - public/models/bot/bot.py | 6 - public/models/bot/randomBot.py | 13 -- public/models/bot/trainedBot.py | 42 ------ pylint_checks.txt | 2 - requirements.txt | 2 +- tests/reward_test.py | 27 +++- tests/util.py | 7 +- train/experience.py | 6 +- train/main.py | 34 ++--- train/reward.py | 125 ++++++++++-------- train/worker.py | 46 +++++-- visualize/static/visualizer.js | 2 +- visualize/visualize.py | 43 +++--- 31 files changed, 429 insertions(+), 334 deletions(-) delete mode 100644 public/context.py create mode 100644 public/models/agent/Agent.py rename public/models/agent/{vanillaAgent.py => VanillaAgent.py} (80%) delete mode 100644 public/models/agent/agent.py create mode 100644 public/models/bot/Bot.py rename public/models/bot/{improvedBot.py => ImprovedBot.py} (64%) create mode 100644 public/models/bot/RandomBot.py create mode 100644 public/models/bot/TrainedBot.py delete mode 100644 public/models/bot/__init__.py delete mode 100644 public/models/bot/bot.py delete mode 100644 public/models/bot/randomBot.py delete mode 100644 public/models/bot/trainedBot.py delete mode 100644 pylint_checks.txt diff --git a/.pylintrc b/.pylintrc index 03b76bb..abe13e9 100644 --- a/.pylintrc +++ b/.pylintrc @@ -38,7 +38,7 @@ enable=indexing-exception,old-raise-syntax # --enable=similarities". If you want to run only the classes checker, but have # no Warning level messages displayed, use"--disable=all --enable=classes # --disable=W" -disable=design,similarities,no-self-use,attribute-defined-outside-init,locally-disabled,star-args,pointless-except,bad-option-value,global-statement,fixme,suppressed-message,useless-suppression,locally-enabled,no-member,no-name-in-module,import-error,unsubscriptable-object,unbalanced-tuple-unpacking,undefined-variable,not-context-manager +disable=invalid-unary-operand-type,design,similarities,no-self-use,attribute-defined-outside-init,locally-disabled,star-args,pointless-except,bad-option-value,global-statement,fixme,suppressed-message,useless-suppression,locally-enabled,no-member,no-name-in-module,import-error,unsubscriptable-object,unbalanced-tuple-unpacking,undefined-variable,not-context-manager # Set the cache size for astng objects. diff --git a/.travis.yml b/.travis.yml index 0beac11..590f538 100644 --- a/.travis.yml +++ b/.travis.yml @@ -23,9 +23,7 @@ install: script: # Tests - python -m unittest discover -v - # Style checks - # Temporary workaround - - for i in `cat pylint_checks.txt` ; do pylint $i ;done + - find . -iname "*.py" | xargs pylint # Coverage checks - py.test --cov=train tests/ diff --git a/networking/hlt_networking.py b/networking/hlt_networking.py index e9a404c..3cecc94 100644 --- a/networking/hlt_networking.py +++ b/networking/hlt_networking.py @@ -1,38 +1,40 @@ +"""The HLT class to handle the connection""" import socket from public.hlt import GameMap, translate_cardinal class HLT: + """The HLT class to handle the connection""" def __init__(self, port): - _connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM) - _connection.connect(('localhost', port)) + connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM) + connection.connect(('localhost', port)) print('Connected to intermediary on port #' + str(port)) - self._connection = _connection + self.connection = connection def get_string(self): - newString = "" + new_string = "" buffer = '\0' while True: - buffer = self._connection.recv(1).decode('ascii') + buffer = self.connection.recv(1).decode('ascii') if buffer != '\n': - newString += str(buffer) + new_string += str(buffer) else: - return newString + return new_string - def sendString(self, s): + def send_string(self, s): s += '\n' - self._connection.sendall(bytes(s, 'ascii')) + self.connection.sendall(bytes(s, 'ascii')) def get_init(self): - myID = int(self.get_string()) + my_id = int(self.get_string()) game_map = GameMap(self.get_string(), self.get_string(), self.get_string()) - return myID, game_map + return my_id, game_map def send_init(self, name): - self.sendString(name) + self.send_string(name) def send_frame(self, moves): - self.sendString(' '.join( + self.send_string(' '.join( str(move.square.x) + ' ' + str(move.square.y) + ' ' + str(translate_cardinal(move.direction)) for move in moves)) diff --git a/networking/pipe_socket_translator.py b/networking/pipe_socket_translator.py index df3a7a9..2f11d16 100644 --- a/networking/pipe_socket_translator.py +++ b/networking/pipe_socket_translator.py @@ -1,11 +1,11 @@ +""" +To be launched by the Halite program as an intermediary, +in order to enable a pipe player to join. +""" import socket import sys -# logging.basicConfig(filename='example.log', level=logging.DEBUG) - try: - # Connect - # logging.warning("connecting") socket_ = socket.socket(socket.AF_INET, socket.SOCK_STREAM) socket_.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) socket_.bind(('localhost', int(sys.argv[1]))) # This is where the port is selected @@ -13,50 +13,47 @@ connection, _ = socket_.accept() - # IO Functions - def sendStringPipe(toBeSent): - sys.stdout.write(toBeSent + '\n') + def send_string_pipe(to_be_sent): + sys.stdout.write(to_be_sent + '\n') sys.stdout.flush() - def getStringPipe(): - str = sys.stdin.readline().rstrip('\n') - return (str) + def get_string_pipe(): + str_pipe = sys.stdin.readline().rstrip('\n') + return str_pipe - def sendStringSocket(toBeSent): - global connection - toBeSent += '\n' - connection.sendall(bytes(toBeSent, 'ascii')) + def send_string_socket(to_be_sent): + to_be_sent += '\n' + connection.sendall(bytes(to_be_sent, 'ascii')) - def getStringSocket(): - global connection - newString = "" + def get_string_socket(): + new_string = "" buffer = '\0' while True: buffer = connection.recv(1).decode('ascii') if buffer != '\n': - newString += str(buffer) + new_string += str(buffer) else: - return newString + return new_string while True: # Handle Init IO - sendStringSocket(getStringPipe()) # Player ID - sendStringSocket(getStringPipe()) # Map Dimensions - sendStringSocket(getStringPipe()) # Productions - sendStringSocket(getStringPipe()) # Starting Map - sendStringPipe(getStringSocket()) # Player Name / Ready Response + send_string_socket(get_string_pipe()) # Player ID + send_string_socket(get_string_pipe()) # Map Dimensions + send_string_socket(get_string_pipe()) # Productions + send_string_socket(get_string_pipe()) # Starting Map + send_string_pipe(get_string_socket()) # Player Name / Ready Response # Run Frame Loop - while (getStringPipe() == 'Get map and play!'): # while True: - sendStringSocket('Get map and play!') - sendStringSocket(getStringPipe()) # Frame Map - sendStringPipe(getStringSocket()) # Move List - sendStringSocket('Stop playing!') + while get_string_pipe() == 'Get map and play!': # while True: + send_string_socket('Get map and play!') + send_string_socket(get_string_pipe()) # Frame Map + send_string_pipe(get_string_socket()) # Move List + send_string_socket('Stop playing!') -except Exception as e: +except ConnectionError as e: # logging.warning(traceback.format_exc()) pass diff --git a/networking/start_game.py b/networking/start_game.py index 7ff8b48..cee7983 100644 --- a/networking/start_game.py +++ b/networking/start_game.py @@ -1,10 +1,16 @@ +"""The start_game function to launch the halite.exe""" import subprocess import argparse import os -def start_game(port=2000, width=10,height=10, max_strength=25, max_turn=25, max_game=1, silent_bool=True, timeout=True, quiet=True, - n_pipe_players=1, slave_players=[]): +def start_game(port=2000, width=10, height=10, max_strength=25, max_turn=25, max_game=1, + silent_bool=True, timeout=True, quiet=True, + n_pipe_players=1, slave_players=None): + """ + The start_game function to launch the halite.exe. + Execute with the -h option for help. + """ path_to_root = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) for i in range(n_pipe_players): subprocess.call([path_to_root + "/networking/kill.sh", str(port + i)]) # Free the necessary ports @@ -24,7 +30,7 @@ def start_game(port=2000, width=10,height=10, max_strength=25, max_turn=25, max_ ] slave_players = [ "python3 " + path_to_root + "/public/" + slave_player + ' slave' for slave_player in slave_players - ] # slave is the slave argument + ] if slave_players is not None else [] # slave is the slave argument players = pipe_players + slave_players # "python3 " + path_to_root + "/networking/pipe_socket_translator.py " + str(port+1) n_player = '' if len(players) > 1 else '-n 1 ' @@ -40,19 +46,36 @@ def start_game(port=2000, width=10,height=10, max_strength=25, max_turn=25, max_ if __name__ == '__main__': parser = argparse.ArgumentParser() - parser.add_argument("-p", "--port", type=int, help="the port for the simulation - Useless if there are no pipe_players", default=2000) - parser.add_argument("-t", "--timeout", help="Doens't timeout if you set this flag is set", action="store_true", default=False) - parser.add_argument("-j", "--silent", help="Doesn't print *.hlt file", action="store_true", default=False) - parser.add_argument("-q", "--quiet", help="Doesn't output information to the console", action="store_true", default=False) - parser.add_argument("-s", "--strength", help="The max strength of the squares, if needed", type=int, default=25) - parser.add_argument("-dw", "--width", help="The width of the game", type=int, default=10) - parser.add_argument("-dh", "--height", help="The height of the game", type=int, default=10) - parser.add_argument("-m", "--maxturn", help="The total number of turns per game (maximum)", type=int, default=25) - parser.add_argument("-g", "--maxgame", help="The total number of games to play", type=int, default=1) # -1 for infinite game - parser.add_argument("-pp", "--n_pipe_players",help="The number of pipe players. You need to handle these players yourself. Each of them has a port assigned.", type=int, default=0) - parser.add_argument("-sp", "--slave_players", help="The slave players. Handled by the halite.exe. You should write one of these two strings: 'MyBot.py' or 'OpponentBot.py' (multiple time if desired) ",nargs='+', default=[]) + parser.add_argument("-p", "--port", type=int, + help="the port for the simulation - Useless if there are no pipe_players", + default=2000) + parser.add_argument("-t", "--timeout", help="Doens't timeout if you set this flag is set", + action="store_true", default=False) + parser.add_argument("-j", "--silent", help="Doesn't print *.hlt file", + action="store_true", default=False) + parser.add_argument("-q", "--quiet", help="Doesn't output information to the console", + action="store_true", default=False) + parser.add_argument("-s", "--strength", help="The max strength of the squares, if needed", + type=int, default=25) + parser.add_argument("-dw", "--width", help="The width of the game", + type=int, default=10) + parser.add_argument("-dh", "--height", help="The height of the game", + type=int, default=10) + parser.add_argument("-m", "--maxturn", help="The total number of turns per game (maximum)", + type=int, default=25) + parser.add_argument("-g", "--maxgame", help="The total number of games to play", + type=int, default=1) # -1 for infinite game + parser.add_argument("-pp", "--n_pipe_players", + help="The number of pipe players. You need to handle these players yourself. " + "Each of them has a port assigned.", + type=int, default=0) + parser.add_argument("-sp", "--slave_players", + help="The slave players. Handled by the halite.exe. " + "You should write one of these two strings: " + "'MyBot.py' or 'OpponentBot.py' (multiple time if desired) ", + nargs='+', default=[]) args = parser.parse_args() - start_game(port=args.port, width=args.width,height=args.height, max_strength=args.strength, max_turn=args.maxturn, + start_game(port=args.port, width=args.width, height=args.height, max_strength=args.strength, max_turn=args.maxturn, silent_bool=args.silent, timeout=args.timeout, max_game=args.maxgame, quiet=args.quiet, n_pipe_players=args.n_pipe_players, slave_players=args.slave_players) diff --git a/public/MyBot.py b/public/MyBot.py index 9429dac..ecadf7c 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -1,27 +1,29 @@ -import sys +"""The MyBot.py file that executes the TrainedBot.py""" import os +import sys + sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) +try: + from public.models.bot.TrainedBot import TrainedBot + from networking.hlt_networking import HLT +except: + raise mode = 'server' if (len(sys.argv) == 1) else 'local' - -if mode == 'server' or sys.argv[1]=='slave': # 'server' mode +if mode == 'server' or sys.argv[1] == 'slave': # 'server' mode import hlt else: # 'local' mode - import context - port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 - hlt = context.HLT(port=port) - -from public.models.bot.trainedBot import TrainedBot + hlt = HLT(port=port) bot = TrainedBot() while True: - myID, game_map = hlt.get_init() + my_id, game_map = hlt.get_init() hlt.send_init("MyBot") - bot.setID(myID) + bot.set_id(my_id) - while (mode == 'server' or hlt.get_string() == 'Get map and play!'): + while mode == 'server' or hlt.get_string() == 'Get map and play!': game_map.get_frame(hlt.get_string()) moves = bot.compute_moves(game_map) hlt.send_frame(moves) diff --git a/public/OpponentBot.py b/public/OpponentBot.py index 32ca428..67344c3 100644 --- a/public/OpponentBot.py +++ b/public/OpponentBot.py @@ -1,27 +1,29 @@ -import sys +"""The Opponent.py file that executes the ImprovedBot.py""" import os +import sys + sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) +try: + from public.models.bot.ImprovedBot import ImprovedBot + from networking.hlt_networking import HLT +except: + raise mode = 'server' if (len(sys.argv) == 1) else 'local' - -if mode == 'server' or sys.argv[1]=='slave': # 'server' mode +if mode == 'server' or sys.argv[1] == 'slave': # 'server' mode import hlt else: # 'local' mode - import context - port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 - hlt = context.HLT(port=port) - -from public.models.bot.improvedBot import ImprovedBot + hlt = HLT(port=port) bot = ImprovedBot() while True: - myID, game_map = hlt.get_init() + my_id, game_map = hlt.get_init() hlt.send_init("OpponentBot") - bot.setID(myID) + bot.set_id(my_id) - while (mode == 'server' or hlt.get_string() == 'Get map and play!'): + while mode == 'server' or hlt.get_string() == 'Get map and play!': game_map.get_frame(hlt.get_string()) moves = bot.compute_moves(game_map) hlt.send_frame(moves) diff --git a/public/context.py b/public/context.py deleted file mode 100644 index 6aa2548..0000000 --- a/public/context.py +++ /dev/null @@ -1,5 +0,0 @@ -import sys -import os - -sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) -from networking.hlt_networking import HLT diff --git a/public/hlt.py b/public/hlt.py index 26093ed..acc0a1d 100644 --- a/public/hlt.py +++ b/public/hlt.py @@ -1,3 +1,4 @@ +"""The original but corrected hlt.py file for communication with halite.""" import sys from collections import namedtuple from itertools import chain, zip_longest @@ -24,6 +25,8 @@ def opposite_cardinal(direction): class GameMap: + """The GameMap on which to play.""" + def __init__(self, size_string, production_string, map_string=None): self.width, self.height = tuple(map(int, size_string.split())) self.production = tuple( @@ -57,12 +60,14 @@ def __iter__(self): return chain.from_iterable(self.contents) def neighbors(self, square, n=1, include_self=False): - "Iterable over the n-distance neighbors of a given square. For single-step neighbors, the enumeration index provides the direction associated with the neighbor." + """Iterable over the n-distance neighbors of a given square. + For single-step neighbors, the enumeration index provides + the direction associated with the neighbor. + """ assert isinstance(include_self, bool) assert isinstance(n, int) and n > 0 if n == 1: - combos = ((0, -1), (1, 0), (0, 1), (-1, 0), (0, - 0)) # NORTH, EAST, SOUTH, WEST, STILL ... matches indices provided by enumerate(game_map.neighbors(square)) + combos = ((0, -1), (1, 0), (0, 1), (-1, 0), (0, 0)) else: combos = ((dx, dy) for dy in range(-n, n + 1) for dx in range(-n, n + 1) if abs(dx) + abs(dy) <= n) return (self.contents[(square.y + dy) % self.height][(square.x + dx) % self.width] for dx, dy in combos if @@ -96,9 +101,9 @@ def get_string(): def get_init(): - playerID = int(get_string()) + player_id = int(get_string()) m = GameMap(get_string(), get_string()) - return playerID, m + return player_id, m def send_init(name): @@ -106,7 +111,7 @@ def send_init(name): def translate_cardinal(direction): - "Translate direction constants used by this Python-based bot framework to that used by the official Halite game environment." + "Beware the direction are changed! Important for visualization" return (direction + 1) % 5 diff --git a/public/models/agent/Agent.py b/public/models/agent/Agent.py new file mode 100644 index 0000000..b41ee37 --- /dev/null +++ b/public/models/agent/Agent.py @@ -0,0 +1,52 @@ +"""The Agent general class""" +import os + +import numpy as np + +from train.reward import local_state_from_global, normalize_game_state + + +class Agent: + """The Agent general class""" + + def __init__(self, name, experience): + self.name = name + self.experience = experience + if self.experience is not None: + try: + self.experience.metric = np.load(os.path.abspath( + os.path.join(os.path.dirname(__file__), '..')) + + '/variables/' + self.name + '/' + + self.name + '.npy') + except FileNotFoundError: + print("Metric file not found") + self.experience.metric = np.array([]) + + def get_policies(self, sess, game_state): + policies = np.zeros(game_state[0].shape + (5,)) + for y in range(len(game_state[0])): + for x in range(len(game_state[0][0])): + if game_state[0][y][x] == 1: + policies[y][x] = self.get_policy(sess, + normalize_game_state(local_state_from_global(game_state, x, y))) + return policies + + def get_policy(self, sess, state): + pass + + def choose_actions(self, sess, game_state, debug=False): + # Here the state is not yet normalized ! + moves = np.zeros_like(game_state[0], dtype=np.int64) - 1 + for y in range(len(game_state[0])): + for x in range(len(game_state[0][0])): + if game_state[0][y][x] == 1: + moves[y][x] = self.choose_action(sess, + normalize_game_state(local_state_from_global(game_state, x, y)), + debug=debug) + return moves + + def choose_action(self, sess, state, frac_progress=1.0, debug=False): + pass + + def update_agent(self, sess): + pass diff --git a/public/models/agent/vanillaAgent.py b/public/models/agent/VanillaAgent.py similarity index 80% rename from public/models/agent/vanillaAgent.py rename to public/models/agent/VanillaAgent.py index 77a03c8..8779dab 100644 --- a/public/models/agent/vanillaAgent.py +++ b/public/models/agent/VanillaAgent.py @@ -1,12 +1,14 @@ +"""The Vanilla Agent""" import numpy as np import tensorflow as tf import tensorflow.contrib.slim as slim -from public.models.agent.agent import Agent +from public.models.agent.Agent import Agent class VanillaAgent(Agent): - def __init__(self, experience, lr = 1e-2, s_size = 9 * 3, a_size = 5, h_size = 50): # all these are optional ? + """The Vanilla Agent""" + def __init__(self, experience=None, lr=1e-2, s_size=9 * 3, a_size=5, h_size=50): # all these are optional ? super(VanillaAgent, self).__init__('vanilla-cin', experience) # These lines established the feed-forward part of the network. The agent takes a state and produces an action. @@ -30,22 +32,22 @@ def __init__(self, experience, lr = 1e-2, s_size = 9 * 3, a_size = 5, h_size = 5 self.tvars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=tf.get_variable_scope().name) self.gradients = tf.gradients(loss, self.tvars) - self.gradientHolders = [] - for idx, var in enumerate(self.tvars): + self.gradient_holders = [] + for idx in range(len(self.tvars)): placeholder = tf.placeholder(tf.float32, name=str(idx) + '_holder') - self.gradientHolders.append(placeholder) + self.gradient_holders.append(placeholder) global_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'global') optimizer = tf.train.AdamOptimizer(learning_rate=lr) - self.updateGlobal = optimizer.apply_gradients(zip(self.gradientHolders, global_vars)) # self.tvars + self.update_global = optimizer.apply_gradients(zip(self.gradient_holders, global_vars)) # self.tvars - def get_policy(self,sess, state): + def get_policy(self, sess, state): return sess.run(self.policy, feed_dict={self.state_in: [state.reshape(-1)]}) def choose_action(self, sess, state, frac_progress=1.0, debug=False): # it only a state, not the game state... # Here the state is normalized ! - if (np.random.uniform() >= frac_progress): + if np.random.uniform() >= frac_progress: a = np.random.choice(range(5)) else: a_dist = sess.run(self.policy, feed_dict={self.state_in: [state.reshape(-1)]}) @@ -64,5 +66,5 @@ def update_agent(self, sess): self.action_holder: moves, self.reward_holder: rewards} grads = sess.run(self.gradients, feed_dict=feed_dict) - feed_dict = dict(zip(self.gradientHolders, grads)) - _ = sess.run(self.updateGlobal, feed_dict=feed_dict) + feed_dict = dict(zip(self.gradient_holders, grads)) + _ = sess.run(self.update_global, feed_dict=feed_dict) diff --git a/public/models/agent/__init__.py b/public/models/agent/__init__.py index 849b75f..e69de29 100644 --- a/public/models/agent/__init__.py +++ b/public/models/agent/__init__.py @@ -1 +0,0 @@ -# TODO: import via the agent package diff --git a/public/models/agent/agent.py b/public/models/agent/agent.py deleted file mode 100644 index 563b647..0000000 --- a/public/models/agent/agent.py +++ /dev/null @@ -1,42 +0,0 @@ -import numpy as np -import os -from train.reward import localStateFromGlobal, normalizeGameState - - -class Agent: - def __init__(self, name, experience): - self.name = name - self.experience = experience - if self.experience is not None: - try: - self.experience.metric = np.load(os.path.abspath(os.path.join(os.path.dirname(__file__), - '..')) + '/variables/' + self.name + '/' + self.name + '.npy') - except: - print("Metric file not found") - self.experience.metric = np.array([]) - - def get_policies(self,sess, game_state): - policies = np.zeros(game_state[0].shape + (5,)) - for y in range(len(game_state[0])): - for x in range(len(game_state[0][0])): - if (game_state[0][y][x] == 1): - policies[y][x] = self.get_policy(sess, normalizeGameState(localStateFromGlobal(game_state, x, y))) - return policies - - def get_policy(self,sess, state): - pass - - def choose_actions(self, sess, game_state, debug=False): - # Here the state is not yet normalized ! - moves = np.zeros_like(game_state[0], dtype=np.int64) - 1 - for y in range(len(game_state[0])): - for x in range(len(game_state[0][0])): - if (game_state[0][y][x] == 1): - moves[y][x] = self.choose_action(sess, normalizeGameState(localStateFromGlobal(game_state, x, y)), debug=debug) - return moves - - def choose_action(self, sess, state, frac_progress=1.0, debug=False): - pass - - def update_agent(self, sess): - pass diff --git a/public/models/bot/Bot.py b/public/models/bot/Bot.py new file mode 100644 index 0000000..f553835 --- /dev/null +++ b/public/models/bot/Bot.py @@ -0,0 +1,7 @@ +"""The General Bot class""" +class Bot: + def compute_moves(self, game_map): + pass + + def set_id(self, my_id): + self.my_id = my_id diff --git a/public/models/bot/improvedBot.py b/public/models/bot/ImprovedBot.py similarity index 64% rename from public/models/bot/improvedBot.py rename to public/models/bot/ImprovedBot.py index b56a706..c655eea 100644 --- a/public/models/bot/improvedBot.py +++ b/public/models/bot/ImprovedBot.py @@ -1,16 +1,18 @@ +"""The Improved Bot""" import random from public.hlt import Move, NORTH, STILL, WEST -from public.models.bot.bot import Bot +from public.models.bot.Bot import Bot class ImprovedBot(Bot): - def compute_moves(self, game_map, sess=None): + def compute_moves(self, game_map): + """Compute the moves given a game_map""" moves = [] for square in game_map: - if square.owner == self.myID: + if square.owner == self.my_id: for direction, neighbor in enumerate(game_map.neighbors(square)): - if neighbor.owner != self.myID and neighbor.strength < square.strength: + if neighbor.owner != self.my_id and neighbor.strength < square.strength: moves += [Move(square, direction)] if square.strength < 5 * square.production: moves += [Move(square, STILL)] diff --git a/public/models/bot/RandomBot.py b/public/models/bot/RandomBot.py new file mode 100644 index 0000000..1827185 --- /dev/null +++ b/public/models/bot/RandomBot.py @@ -0,0 +1,12 @@ +"""The Random Bot""" +import random + +from public.hlt import EAST, Move, NORTH, SOUTH, STILL, WEST +from public.models.bot.Bot import Bot + + +class RandomBot(Bot): + def compute_moves(self, game_map): + """Compute the moves given a game_map""" + return [Move(square, random.choice((NORTH, EAST, SOUTH, WEST, STILL))) for square in game_map if + square.owner == self.my_id] diff --git a/public/models/bot/TrainedBot.py b/public/models/bot/TrainedBot.py new file mode 100644 index 0000000..f02205f --- /dev/null +++ b/public/models/bot/TrainedBot.py @@ -0,0 +1,47 @@ +"""The Trained Bot""" +import os + +import tensorflow as tf + +from public.models.agent.VanillaAgent import VanillaAgent +from public.models.bot.Bot import Bot +from train.reward import format_moves, get_game_state + + +class TrainedBot(Bot): + """The trained bot""" + + def __init__(self): + os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' + tf.reset_default_graph() + + with tf.device("/cpu:0"): + with tf.variable_scope('global'): + self.agent = VanillaAgent() + + global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') + saver = tf.train.Saver(global_variables) + init = tf.global_variables_initializer() + + self.sess = tf.Session() + self.sess.run(init) + try: + saver.restore(self.sess, os.path.abspath( + os.path.join(os.path.dirname(__file__), '..')) + + '/variables/' + self.agent.name + '/' + + self.agent.name) + except FileNotFoundError: + print("Model not found - initiating new one") + + def compute_moves(self, game_map): + """Compute the moves given a game_map""" + game_state = get_game_state(game_map, self.my_id) + return format_moves(game_map, self.agent.choose_actions(self.sess, game_state, debug=True)) + + def get_policies(self, game_state): + """Compute the policies given a game_state""" + return self.agent.get_policies(self.sess, game_state) + + def close(self): + """Close the tensorflow session""" + self.sess.close() diff --git a/public/models/bot/__init__.py b/public/models/bot/__init__.py deleted file mode 100644 index 1a1aa6a..0000000 --- a/public/models/bot/__init__.py +++ /dev/null @@ -1 +0,0 @@ -# TODO: import via the bot package diff --git a/public/models/bot/bot.py b/public/models/bot/bot.py deleted file mode 100644 index 4fa4cd5..0000000 --- a/public/models/bot/bot.py +++ /dev/null @@ -1,6 +0,0 @@ -class Bot: - def compute_moves(self, game_map): - pass - - def setID(self, myID): - self.myID = myID diff --git a/public/models/bot/randomBot.py b/public/models/bot/randomBot.py deleted file mode 100644 index be16972..0000000 --- a/public/models/bot/randomBot.py +++ /dev/null @@ -1,13 +0,0 @@ -import random - -from public.hlt import EAST, Move, NORTH, SOUTH, STILL, WEST -from public.models.bot.bot import Bot - - -class RandomBot(Bot): - def __init__(self, myID): - super(RandomBot, self).__init__(myID) - - def compute_moves(self, game_map, sess=None): - [Move(square, random.choice((NORTH, EAST, SOUTH, WEST, STILL))) for square in game_map if - square.owner == self.myID] diff --git a/public/models/bot/trainedBot.py b/public/models/bot/trainedBot.py deleted file mode 100644 index 2ea9ba5..0000000 --- a/public/models/bot/trainedBot.py +++ /dev/null @@ -1,42 +0,0 @@ -from public.models.agent.vanillaAgent import VanillaAgent -from public.models.bot.bot import Bot -from train.reward import formatMoves, getGameState -import tensorflow as tf -import os - - -class TrainedBot(Bot): - def __init__(self): - lr = 5*1e-3; - s_size = 9 * 3; - a_size = 5; - h_size = 50 - os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' - tf.reset_default_graph() - - with tf.device("/cpu:0"): - with tf.variable_scope('global'): - self.agent = VanillaAgent(None, lr, s_size, a_size, h_size) - - global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') - saver = tf.train.Saver(global_variables) - init = tf.global_variables_initializer() - - self.sess = tf.Session() - self.sess.run(init) - try: - saver.restore(self.sess, os.path.abspath(os.path.join(os.path.dirname(__file__), - '..')) + '/variables/' + self.agent.name + '/' + self.agent.name) - except Exception: - print("Model not found - initiating new one") - - def compute_moves(self, game_map): - game_state = getGameState(game_map, self.myID) - return formatMoves(game_map, self.agent.choose_actions(self.sess, game_state, debug=True)) - - def get_policies(self, game_state): - # Warning this is not hereditary - return self.agent.get_policies(self.sess, game_state) - - def close(self): - self.sess.close() diff --git a/pylint_checks.txt b/pylint_checks.txt deleted file mode 100644 index eecdc7e..0000000 --- a/pylint_checks.txt +++ /dev/null @@ -1,2 +0,0 @@ -train/__init__.py -train/experience.py diff --git a/requirements.txt b/requirements.txt index 8b7417f..0a080ab 100644 --- a/requirements.txt +++ b/requirements.txt @@ -3,5 +3,5 @@ coverage>=3.6 pytest-cov pytest-xdist coveralls -pylint +pylint>=1.6 flask \ No newline at end of file diff --git a/tests/reward_test.py b/tests/reward_test.py index 6398ff0..56e5815 100644 --- a/tests/reward_test.py +++ b/tests/reward_test.py @@ -1,28 +1,38 @@ """ Tests the reward function """ -from train.reward import discount_rewards, rawRewards, allRewards -from train.experience import ExperienceVanilla -from train.worker import Worker import unittest import numpy as np + from tests.util import game_states_from_url +from train.experience import ExperienceVanilla +from train.reward import discount_rewards, raw_rewards_function, all_rewards_function +from train.worker import Worker class TestReward(unittest.TestCase): + """ + Tests the reward function + """ def test_length_discount_rewards(self): + """ + Test the length of the discount reward + """ self.assertTrue(len(discount_rewards(np.array([1]))) == 1) self.assertTrue(len(discount_rewards(np.array([1, 3]))) == 2) def test_reward(self): - GAME_URL = 'https://s3.eu-central-1.amazonaws.com/halite-python-rl/hlt-games/trained-bot.hlt' - game_states, moves = game_states_from_url(GAME_URL) + """ + Test the length of the discount reward + """ + game_url = 'https://s3.eu-central-1.amazonaws.com/halite-python-rl/hlt-games/trained-bot.hlt' + game_states, moves = game_states_from_url(game_url) - raw_rewards = rawRewards(game_states) + raw_rewards = raw_rewards_function(game_states) self.assertTrue(len(raw_rewards) == len(game_states) - 1) - all_states, all_moves, all_rewards = allRewards(game_states, moves) + all_states, all_moves, all_rewards = all_rewards_function(game_states, moves) self.assertTrue(len(all_states) >= len(game_states) - 1) self.assertTrue(len(all_moves) >= len(moves)) self.assertTrue(len(all_rewards) == len(all_moves) and len(all_states) == len(all_moves)) @@ -34,6 +44,9 @@ def test_reward(self): self.assertTrue(len(batch_rewards) == len(batch_moves) and len(batch_states) == len(batch_moves)) def test_worker(self): + """ + Test if the worker port initiate and terminate with good port + """ worker = Worker(2000, 2, None) self.assertTrue(worker.port == 2002) worker.p.terminate() diff --git a/tests/util.py b/tests/util.py index 6681bb4..3e4137e 100644 --- a/tests/util.py +++ b/tests/util.py @@ -1,15 +1,16 @@ +"""Importing the game from aws""" import json import urllib.request import numpy as np -def game_states_from_url(GAME_URL): +def game_states_from_url(game_url): """ We host known games on aws server and we run the tests according to these games, from which we know the output - :param GAME_URL: The url of the game on the server (string). + :param game_url: The url of the game on the server (string). :return: """ - game = json.loads(urllib.request.urlopen(GAME_URL).readline().decode("utf-8")) + game = json.loads(urllib.request.urlopen(game_url).readline().decode("utf-8")) owner_frames = np.array(game["frames"])[:, :, :, 0][:, np.newaxis, :, :] strength_frames = np.array(game["frames"])[:, :, :, 1][:, np.newaxis, :, :] diff --git a/train/experience.py b/train/experience.py index 146933d..49704b5 100644 --- a/train/experience.py +++ b/train/experience.py @@ -3,7 +3,7 @@ """ import numpy as np -from train.reward import allRewards, rawRewardsMetric +from train.reward import all_rewards_function, raw_rewards_metric class Experience: @@ -24,7 +24,7 @@ def batch(self, size): pass def compute_metric(self, game_states): - production_increments = np.sum(np.sum(rawRewardsMetric(game_states), axis=2), axis=1) + production_increments = np.sum(np.sum(raw_rewards_metric(game_states), axis=2), axis=1) self.metric = np.append(self.metric, production_increments.dot(np.linspace(2.0, 1.0, num=len(game_states) - 1))) def save_metric(self, name): @@ -42,7 +42,7 @@ def __init__(self): def add_episode(self, game_states, moves): self.compute_metric(game_states) - all_states, all_moves, all_rewards = allRewards(game_states, moves) + all_states, all_moves, all_rewards = all_rewards_function(game_states, moves) self.states = np.concatenate((self.states, all_states.reshape(-1, 27)), axis=0) self.moves = np.concatenate((self.moves, all_moves)) diff --git a/train/main.py b/train/main.py index 1739bb3..e693ffb 100644 --- a/train/main.py +++ b/train/main.py @@ -1,40 +1,39 @@ -import multiprocessing +"""This main.py file runs the training.""" import threading import os import sys import tensorflow as tf +from public.models.agent.VanillaAgent import VanillaAgent +from train.experience import ExperienceVanilla +from train.worker import Worker + os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) -from public.models.agent.vanillaAgent import VanillaAgent -from train.experience import ExperienceVanilla -from train.worker import Worker port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 tf.reset_default_graph() # Clear the Tensorflow graph. with tf.device("/cpu:0"): - lr = 1e-3; - s_size = 9 * 3; - a_size = 5; + lr = 1e-3 + s_size = 9 * 3 + a_size = 5 h_size = 50 with tf.variable_scope('global'): master_experience = ExperienceVanilla() master_agent = VanillaAgent(master_experience, lr, s_size, a_size, h_size) - num_workers = 5 # multiprocessing.cpu_count()# (2) Maybe set max number of workers / number of available CPU threads + num_workers = 5 n_simultations = 500 workers = [] if num_workers > 1: for i in range(num_workers): with tf.variable_scope('worker_' + str(i)): - experience = ExperienceVanilla() - agent = VanillaAgent(experience, lr, s_size, a_size, h_size) - workers.append(Worker(port, i, agent)) + workers.append(Worker(port, i, VanillaAgent(ExperienceVanilla(), lr, s_size, a_size, h_size))) else: workers.append(Worker(port, 0, master_agent)) # We need only to save the global @@ -46,19 +45,20 @@ with tf.Session() as sess: sess.run(init) try: - saver.restore(sess, os.path.abspath(os.path.dirname(__file__))+'/../public/models/variables/' + master_agent.name+'/'+master_agent.name) - except Exception: + saver.restore(sess, os.path.abspath( + os.path.dirname(__file__)) + '/../public/models/variables/' + master_agent.name + '/' + master_agent.name) + except FileNotFoundError: print("Model not found - initiating new one") coord = tf.train.Coordinator() worker_threads = [] - print("I'm the main thread running on CPU #%s" % multiprocessing.current_process().name) + print("I'm the main thread running on CPU") - if (num_workers == 1): - workers[0].work(sess, coord, saver, n_simultations) + if num_workers == 1: + workers[0].work(sess, saver, n_simultations) else: for worker in workers: - worker_work = lambda: worker.work(sess, coord, saver, n_simultations) + worker_work = lambda worker=worker: worker.work(sess, saver, n_simultations) t = threading.Thread(target=(worker_work)) # Process instead of threading.Thread multiprocessing.Process t.start() worker_threads.append(t) diff --git a/train/reward.py b/train/reward.py index 36a8233..4ef8143 100644 --- a/train/reward.py +++ b/train/reward.py @@ -1,33 +1,34 @@ +"""The reward.py file to compute the reward""" import numpy as np -from public.hlt import NORTH, EAST, SOUTH, WEST, STILL, Move +from public.hlt import NORTH, EAST, SOUTH, WEST, Move STRENGTH_SCALE = 255 PRODUCTION_SCALE = 10 -def getGameState(game_map, myID): +def get_game_state(game_map, my_id): game_state = np.reshape( - [[(square.owner == myID) + 0, square.strength, square.production] for square in game_map], + [[(square.owner == my_id) + 0, square.strength, square.production] for square in game_map], [game_map.height, game_map.width, 3]) return np.swapaxes(np.swapaxes(game_state, 2, 0), 1, 2) -def normalizeGameState(game_state): +def normalize_game_state(game_state): return game_state / np.array([1, STRENGTH_SCALE, PRODUCTION_SCALE])[:, np.newaxis, np.newaxis] -def getGameProd(game_state): +def get_game_prod(game_state): return np.sum(game_state[0] * game_state[2]) -def getStrength(game_state): +def get_strength(game_state): return np.sum(game_state[0] * game_state[1]) - # np.sum([square.strength for square in game_map if square.owner == myID]) + # np.sum([square.strength for square in game_map if square.owner == my_id]) -def getNumber(game_state): +def get_number(game_state): return np.sum(game_state[0]) - # np.sum([square.strength for square in game_map if square.owner == myID]) + # np.sum([square.strength for square in game_map if square.owner == my_id]) def discount_rewards(r, gamma=0.8): @@ -39,107 +40,117 @@ def discount_rewards(r, gamma=0.8): discounted_r[t] = running_add return discounted_r -def take_surrounding_square(game_state, x, y, size = 1): + +def take_surrounding_square(game_state, x, y, size=1): return np.take(np.take(game_state, range(y - size, y + size + 1), axis=1, mode='wrap'), range(x - size, x + size + 1), axis=2, mode='wrap') -def take_surrounding_losange(game_state, x, y, size = 2): - np.take(np.take(game_state, y, axis=1, mode='wrap'), - range(x - 2, x + 2 + 1), axis=2, mode='wrap') - np.take(np.take(game_state, y+1, axis=1, mode='wrap'), - range(x - 1, x + 1 + 1), axis=2, mode='wrap') - np.take(np.take(game_state, y-1, axis=1, mode='wrap'), - range(x - 1, x + 1 + 1), axis=2, mode='wrap') - np.take(np.take(game_state, y+2, axis=1, mode='wrap'), - x, axis=2, mode='wrap') - np.take(np.take(game_state, y-2, axis=1, mode='wrap'), - x, axis=2, mode='wrap') - -def localStateFromGlobal(game_state, x, y, size=1): + +def local_state_from_global(game_state, x, y, size=1): # TODO: for now we still take a square, but a more complex shape could be better. return np.take(np.take(game_state, range(y - size, y + size + 1), axis=1, mode='wrap'), range(x - size, x + size + 1), axis=2, mode='wrap') -def rawRewardsMetric(game_states): +def raw_rewards_metric(game_states): return np.array([game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2] for i in range(len(game_states) - 1)]) -def rawRewards(game_states): - return np.array([0.0001*np.power(game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2],4) - for i in range(len(game_states) - 1)]) +def raw_rewards_function(game_states): + return np.array( + [0.0001 * np.power(game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2], 4) + for i in range(len(game_states) - 1)]) -def strengthRewards(game_states): - return np.array([(getStrength(game_states[i + 1]) - getStrength(game_states[i])) + +def strength_rewards(game_states): + return np.array([(get_strength(game_states[i + 1]) - get_strength(game_states[i])) for i in range(len(game_states) - 1)]) -def discountedReward(next_reward, move_before, strength_before, discount_factor=1.0): +def discounted_reward_function(next_reward, move_before, strength_before, discount_factor=1.0): + """ + Given all the below arguments, return the discounted reward. + :param next_reward: + :param move_before: + :param strength_before: + :param discount_factor: + :return: + """ reward = np.zeros_like(next_reward) def take_value(matrix, x, y): return np.take(np.take(matrix, x, axis=1, mode='wrap'), y, axis=0, mode='wrap') - for y in range(len(reward)): - for x in range(len(reward[0])): - d = move_before[y][x] - if d != -1: - dy = (-1 if d == NORTH else 1) if (d == SOUTH or d == NORTH) else 0 - dx = (-1 if d == WEST else 1) if (d == WEST or d == EAST) else 0 - reward[y][x] = discount_factor * take_value(next_reward, x + dx, y + dy) if strength_before[y][ - x] >= take_value( - strength_before, x + dx, y + dy) else 0 - + for (y, x), d in np.ndenumerate(move_before): + if d != -1: + dy = (-1 if d == NORTH else 1) if (d == SOUTH or d == NORTH) else 0 + dx = (-1 if d == WEST else 1) if (d == WEST or d == EAST) else 0 + reward[y][x] = discount_factor * take_value(next_reward, x + dx, y + dy) \ + if strength_before[y][x] >= take_value(strength_before, x + dx, y + dy) \ + else 0 return reward -def discountedRewards(game_states, moves): - raw_rewards = rawRewards(game_states) - # strength_rewards = strengthRewards(game_states) +def discounted_rewards_function(game_states, moves): + """ + Compute height*width matrices of rewards - not yet individualized + :param game_states: The list of game states + :param moves: The list of moves + :return: + """ + raw_rewards = raw_rewards_function(game_states) + # strength_rewards = strength_rewards(game_states) discounted_rewards = np.zeros_like(raw_rewards, dtype=np.float64) running_reward = np.zeros_like(raw_rewards[0], dtype=np.float64) - for t in reversed(range(0, len(raw_rewards))): - running_reward = discountedReward(running_reward, moves[t], game_states[t][1], - discount_factor=0.6) + discountedReward( - raw_rewards[t], moves[t], game_states[t][1]) + for t, (raw_reward, move, game_state) in reversed(list(enumerate(zip(raw_rewards, moves, game_states)))): + running_reward = discounted_reward_function(running_reward, move, game_state[1], + discount_factor=0.6) + \ + discounted_reward_function(raw_reward, move, game_state[1]) discounted_rewards[t] = running_reward - ##TODO : HERE FOR STRENGTH ! INDEPENDENT return discounted_rewards -def individualStatesAndRewards(game_state, move, discounted_reward): +def individual_states_and_rewards(game_state, move, discounted_reward): + """ + Return the triplet states, moves, rewards for each of the square in one frame. + :param game_state: One game state - still a 3*3*3 matrix + :param move: The move for the given square + :param discounted_reward: The global matrix of discounted reward at time t, + from we we extract one frame + :return: + """ states = [] moves = [] rewards = [] for y in range(len(game_state[0])): for x in range(len(game_state[0][0])): - if (game_state[0][y][x] == 1): - states += [normalizeGameState(localStateFromGlobal(game_state, x, y))] + if game_state[0][y][x] == 1: + states += [normalize_game_state(local_state_from_global(game_state, x, y))] moves += [move[y][x]] rewards += [discounted_reward[y][x]] return states, moves, rewards -def allIndividualStatesAndRewards(game_states, moves, discounted_rewards): +def all_individual_states_and_rewards(game_states, moves, discounted_rewards): all_states = [] all_moves = [] all_rewards = [] for game_state, move, discounted_reward in zip(game_states, moves, discounted_rewards): - states_, moves_, rewards_ = individualStatesAndRewards(game_state, move, discounted_reward) + states_, moves_, rewards_ = individual_states_and_rewards(game_state, move, discounted_reward) all_states += states_ all_moves += moves_ all_rewards += rewards_ return np.array(all_states), np.array(all_moves), np.array(all_rewards) -def allRewards(game_states, moves): +def all_rewards_function(game_states, moves): # game_states n+1, moves n - discounted_rewards = discountedRewards(game_states, moves) - return allIndividualStatesAndRewards(game_states[:-1], moves, discounted_rewards) + discounted_rewards = discounted_rewards_function(game_states, moves) + return all_individual_states_and_rewards(game_states[:-1], moves, discounted_rewards) -def formatMoves(game_map, moves): +def format_moves(game_map, moves): moves_to_send = [] for y in range(len(game_map.contents)): for x in range(len(game_map.contents[0])): diff --git a/train/worker.py b/train/worker.py index 9bafd89..708d8c7 100644 --- a/train/worker.py +++ b/train/worker.py @@ -1,12 +1,13 @@ +"""The worker class for training and parallel operations""" import multiprocessing import time import os import tensorflow as tf -from networking.hlt_networking import HLT -from train.reward import formatMoves, getGameState +from train.reward import format_moves, get_game_state from networking.start_game import start_game +from networking.hlt_networking import HLT def update_target_graph(from_scope, to_scope): from_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, from_scope) @@ -19,51 +20,68 @@ def update_target_graph(from_scope, to_scope): class Worker(): + """ + The Worker class for training. Each worker has an individual port, number, and agent. + Each of them work with the global session, and use the global saver. + """ def __init__(self, port, number, agent): self.name = 'worker_' + str(number) self.number = number self.port = port + number def worker(): - start_game(self.port, quiet=True, max_game=-1) # Infinite games + start_game(self.port, quiet=True, max_game=-1) # Infinite games self.p = multiprocessing.Process(target=worker) self.p.start() time.sleep(1) - self.hlt = HLT(port=self.port) + self.hlt = HLT(port=self.port) # Launching the pipe operation self.agent = agent self.update_local_ops = update_target_graph('global', self.name) - def work(self, sess, coord, saver, n_simultations): + def work(self, sess, saver, n_simultations): + """ + Using the pipe operation launched at initialization, + the worker works `n_simultations` games to train the + agent + :param sess: The global session + :param saver: The saver + :param n_simultations: Number of max simulations to run. + Afterwards the process is stopped. + :return: + """ print("Starting worker " + str(self.number)) with sess.as_default(), sess.graph.as_default(): for i in range(n_simultations): # while not coord.should_stop(): - if (i % 10 == 1 and self.number == 0): + if i % 10 == 1 and self.number == 0: print("Simulation: " + str(i)) # self.port) sess.run(self.update_local_ops) # GET THE WORK DONE FROM OTHER - myID, game_map = self.hlt.get_init() + my_id, game_map = self.hlt.get_init() self.hlt.send_init("MyPythonBot") moves = [] game_states = [] - while (self.hlt.get_string() == 'Get map and play!'): + while self.hlt.get_string() == 'Get map and play!': game_map.get_frame(self.hlt.get_string()) - game_states += [getGameState(game_map, myID)] + game_states += [get_game_state(game_map, my_id)] moves += [self.agent.choose_actions(sess, game_states[-1])] - self.hlt.send_frame(formatMoves(game_map, moves[-1])) + self.hlt.send_frame(format_moves(game_map, moves[-1])) self.agent.experience.add_episode(game_states, moves) self.agent.update_agent(sess) if self.number == 0: - directory = os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))+'/public/models/variables/'+self.agent.name+'/' + directory = os.path.abspath( + os.path.join(os.path.dirname(__file__), '..')) \ + + '/public/models/variables/' \ + + self.agent.name + '/' if not os.path.exists(directory): - print("Creating directory for agent :"+self.agent.name) + print("Creating directory for agent :" + self.agent.name) os.makedirs(directory) - saver.save(sess, directory+self.agent.name) - self.agent.experience.save_metric(directory+self.agent.name) + saver.save(sess, directory + self.agent.name) + self.agent.experience.save_metric(directory + self.agent.name) self.p.terminate() diff --git a/visualize/static/visualizer.js b/visualize/static/visualizer.js index 29c530a..8b50fca 100755 --- a/visualize/static/visualizer.js +++ b/visualize/static/visualizer.js @@ -553,7 +553,7 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal textPolicy[a][b][i].text = '' //(value==0)?'':value.toString() } - //console.log(discountedRewards[frame][Math.floor(loc / game.width)][loc % game.width]) + //console.log(discounted_rewards_function[frame][Math.floor(loc / game.width)][loc % game.width]) var pw = rw * Math.sqrt(site.strength > 0 ? site.strength / 255 : 0.1) / 2 var ph = rh * Math.sqrt(site.strength > 0 ? site.strength / 255 : 0.1) / 2; var direction = frame < game.moves.length ? game.moves[frame][Math.floor(loc / game.width)][loc % game.width] : 0; diff --git a/visualize/visualize.py b/visualize/visualize.py index fdd1b7f..6f2b352 100755 --- a/visualize/visualize.py +++ b/visualize/visualize.py @@ -1,19 +1,24 @@ +"""The visualize main file to launch the server""" import json import os import sys from io import BytesIO -import matplotlib.pyplot as plt import numpy as np +import pandas as pd from flask import Flask, render_template, request, make_response +import matplotlib.pyplot as plt from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas from matplotlib.figure import Figure -app = Flask(__name__) - sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) -from train.reward import discountedRewards -from public.models.bot.trainedBot import TrainedBot +try: + from train.reward import discounted_rewards_function + from public.models.bot.trainedBot import TrainedBot +except: + raise + +app = Flask(__name__) @app.route("/") @@ -26,9 +31,12 @@ def home(): @app.route("/performance.png") def performance_plot(): + """ + Plot the performance at this address + :return: + """ fig = Figure() sub1 = fig.add_subplot(111) - import pandas as pd path_to_variables = os.path.abspath(os.path.dirname(__file__)) + '/../public/models/variables/' list_variables = [name for name in os.listdir(path_to_variables) if name != "README.md"] path_to_npy = [path_to_variables + name + '/' + name + '.npy' for name in list_variables] @@ -36,8 +44,8 @@ def performance_plot(): rewards = [np.load(path) for path in path_to_npy] max_len = max([len(reward) for reward in rewards]) - for i in range(len(rewards)): - rewards[i] = np.append(rewards[i], np.repeat(np.nan, max_len - len(rewards[i]))) + for i, reward in enumerate(rewards): + rewards[i] = np.append(reward, np.repeat(np.nan, max_len - len(reward))) pd.DataFrame(np.array(rewards).T, columns=list_variables).rolling(100).mean().plot( title="Weighted reward at each game. (Rolling average)", ax=sub1) @@ -51,7 +59,12 @@ def performance_plot(): return response -def convert(request): +def convert(r): + """ + Convert the r to the game_states/moves tuple. + :param r: + :return: + """ def get_owner(square): return square['owner'] @@ -60,13 +73,13 @@ def get_strength(square): get_owner = np.vectorize(get_owner) get_strength = np.vectorize(get_strength) - owner_frames = get_owner(request.json["frames"])[:, np.newaxis, :, :] - strength_frames = get_strength(request.json["frames"])[:, np.newaxis, :, :] - production_frames = np.repeat(np.array(request.json["productions"])[np.newaxis, np.newaxis, :, :], + owner_frames = get_owner(r.json["frames"])[:, np.newaxis, :, :] + strength_frames = get_strength(r.json["frames"])[:, np.newaxis, :, :] + production_frames = np.repeat(np.array(r.json["productions"])[np.newaxis, np.newaxis, :, :], len(owner_frames), axis=0) - moves = np.array(request.json['moves']) + moves = np.array(r.json['moves']) game_states = np.concatenate(([owner_frames, strength_frames, production_frames]), axis=1) @@ -77,8 +90,8 @@ def get_strength(square): @app.route('/post_discounted_rewards', methods=['POST']) def post_discounted_rewards(): game_states, moves = convert(request) - discounted_rewards = discountedRewards(game_states, moves) - return json.dumps({'discountedRewards': discounted_rewards.tolist()}) + discounted_rewards = discounted_rewards_function(game_states, moves) + return json.dumps({'discounted_rewards_function': discounted_rewards.tolist()}) @app.route('/post_policies', methods=['POST']) From 72085ec00c02694c007235bfd33da4dc2e837afb Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Sat, 7 Oct 2017 10:11:41 +0200 Subject: [PATCH 37/45] Improve visualization tool Why this change was necessary: * To increase productivity while debugging This change addresses the need by: * Changing html: we can now navigate between performance and visu * Auto-loading: at opening the window, the last game autoloads (no more need to drag and drop, though still possible) * A list of the games are available, and can be clicked to display to game a soon as desired * The policy can be seen by clicking on a square --- visualize/static/localVisualizer.js | 31 ++++++++++++++++++++++ visualize/static/visualizer.js | 4 +-- visualize/templates/performance.html | 25 ++++++++++++++++++ visualize/templates/visualizer.html | 22 +++++++++++++++- visualize/visualize.py | 39 +++++++++++++++++++++++----- 5 files changed, 111 insertions(+), 10 deletions(-) create mode 100755 visualize/templates/performance.html diff --git a/visualize/static/localVisualizer.js b/visualize/static/localVisualizer.js index e18cb86..534aec4 100755 --- a/visualize/static/localVisualizer.js +++ b/visualize/static/localVisualizer.js @@ -31,4 +31,35 @@ $(function () { var files = e.target.files handleFiles(files) }); + + $("li").on("click",function() { + $.ajax({ + type: "GET", + url: this.id, + success: function(text) { + $("#displayArea").empty(); + var fsHeight = $("#fileSelect").outerHeight(); + showGame(textToGame(text, "OK"), $("#displayArea"), null, -fsHeight, true, false, true); + // `text` is the file text + }, + error: function() { + // An error occurred + } + }); + }); + if($("li").length>=1){ + $.ajax({ + type: "GET", + url: $("li")[$("li").length-1].id, + success: function(text) { + $("#displayArea").empty(); + var fsHeight = $("#fileSelect").outerHeight(); + showGame(textToGame(text, "OK"), $("#displayArea"), null, -fsHeight, true, false, true); + // `text` is the file text + }, + error: function() { + // An error occurred + } + }); + } }) diff --git a/visualize/static/visualizer.js b/visualize/static/visualizer.js index 8b50fca..7f1392e 100755 --- a/visualize/static/visualizer.js +++ b/visualize/static/visualizer.js @@ -84,7 +84,7 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal type: "POST", url: '/post_discounted_rewards', data: JSON.stringify(game), - success: function(data) {discountedRewards = JSON.parse(data)['discountedRewards']}, + success: function(data) {discountedRewards = JSON.parse(data)['discounte_rewards']}, contentType: "application/json; charset=utf-8", //dataType: "json" }) @@ -296,7 +296,7 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal //stage.addChild(strengthContainer); //stage.addChild(possessContainer); stage.addChild(rewardContainer); - //stage.addChild(policyContainer); + stage.addChild(policyContainer); console.log(renderer.width, renderer.height); } window.onresize(); diff --git a/visualize/templates/performance.html b/visualize/templates/performance.html new file mode 100755 index 0000000..586ef5a --- /dev/null +++ b/visualize/templates/performance.html @@ -0,0 +1,25 @@ + + + + + Visualizer + + + + + +
+
+
+ + +
+
+
+ \ No newline at end of file diff --git a/visualize/templates/visualizer.html b/visualize/templates/visualizer.html index 95d9de6..402e54c 100755 --- a/visualize/templates/visualizer.html +++ b/visualize/templates/visualizer.html @@ -12,7 +12,27 @@
-

Drag your file


+
+

Drag your file

+

Go to performance plot

+
+
+ +
+
+
+ +
+
+ +
+
+
+
    + {%- for item in tree.children recursive %} +
  • {{ item.name }}
  • + {%- endfor %} +
diff --git a/visualize/visualize.py b/visualize/visualize.py index 6f2b352..2184e01 100755 --- a/visualize/visualize.py +++ b/visualize/visualize.py @@ -6,7 +6,7 @@ import numpy as np import pandas as pd -from flask import Flask, render_template, request, make_response +from flask import Flask, render_template, request, make_response, send_from_directory import matplotlib.pyplot as plt from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas from matplotlib.figure import Figure @@ -14,20 +14,45 @@ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) try: from train.reward import discounted_rewards_function - from public.models.bot.trainedBot import TrainedBot + from public.models.bot.TrainedBot import TrainedBot except: raise app = Flask(__name__) +hlt_root = os.path.join(app.root_path, 'hlt') +@app.route('/hlt/') +def send_hlt(path): + return send_from_directory('hlt', path) @app.route("/") def home(): - return render_template('visualizer.html') - - -print("Look at http://127.0.0.1:5000/performance.png for performance insights") + return render_template('visualizer.html',tree=make_tree(hlt_root)) +@app.route("/performance.html") +def performance(): + """ + Return the page for the performance + :return: + """ + return render_template('performance.html') + +def make_tree(path): + tree = dict(name=os.path.basename(path), children=[]) + try: + lst = os.listdir(path) + except OSError: + pass + else: + for name in lst: + fn = os.path.join(path, name) + if os.path.isdir(fn): + tree['children'].append(make_tree(fn)) + else: + if name != ".DS_Store": + tree['children'].append(dict(path='hlt/'+name,name=name)) + print(np) + return tree @app.route("/performance.png") def performance_plot(): @@ -91,7 +116,7 @@ def get_strength(square): def post_discounted_rewards(): game_states, moves = convert(request) discounted_rewards = discounted_rewards_function(game_states, moves) - return json.dumps({'discounted_rewards_function': discounted_rewards.tolist()}) + return json.dumps({'discounted_rewards': discounted_rewards.tolist()}) @app.route('/post_policies', methods=['POST']) From 5fc66a371aea61e4a3010801106f7897d33d081c Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Sat, 7 Oct 2017 10:46:01 +0200 Subject: [PATCH 38/45] Dijkstra algorithm + moving init sess and agent Why this change was necessary: * Very powerful insight into the topology of our game * Easy to compute with heuristics This change addresses the need by: * Creating a graph class and tools for building graph from state * Creating benchmark test for naive/updated graph * TrainedBot now initialized with an **agent_class** ! Independant of the agent * The code for TrainedBot and the tensorflow tests could be factorized and has been put as a function in agent --- public/MyBot.py | 3 +- public/models/agent/Agent.py | 27 +++++ public/models/bot/TrainedBot.py | 29 +----- public/util/dijkstra.py | 146 ++++++++++++++++++++++++++++ tests/dijkstra_speed_test.py | 47 +++++++++ tests/dijkstra_test.py | 37 +++++++ tests/tensorflow_call_speed_test.py | 50 ++++++++++ tests/util.py | 18 +++- visualize/hlt/README.md | 0 9 files changed, 329 insertions(+), 28 deletions(-) create mode 100644 public/util/dijkstra.py create mode 100644 tests/dijkstra_speed_test.py create mode 100644 tests/dijkstra_test.py create mode 100644 tests/tensorflow_call_speed_test.py create mode 100644 visualize/hlt/README.md diff --git a/public/MyBot.py b/public/MyBot.py index ecadf7c..62e35dc 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -5,6 +5,7 @@ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) try: from public.models.bot.TrainedBot import TrainedBot + from public.models.agent.VanillaAgent import VanillaAgent from networking.hlt_networking import HLT except: raise @@ -16,7 +17,7 @@ port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 hlt = HLT(port=port) -bot = TrainedBot() +bot = TrainedBot(VanillaAgent) while True: my_id, game_map = hlt.get_init() diff --git a/public/models/agent/Agent.py b/public/models/agent/Agent.py index b41ee37..55304e0 100644 --- a/public/models/agent/Agent.py +++ b/public/models/agent/Agent.py @@ -2,6 +2,8 @@ import os import numpy as np +import tensorflow as tf +from tensorflow.python.framework.errors_impl import InvalidArgumentError from train.reward import local_state_from_global, normalize_game_state @@ -50,3 +52,28 @@ def choose_action(self, sess, state, frac_progress=1.0, debug=False): def update_agent(self, sess): pass + + +def start_agent(agent_class): + """Start and return a tf session and its corresponding agent""" + os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' + tf.reset_default_graph() + + with tf.device("/cpu:0"): + with tf.variable_scope('global'): + agent = agent_class() + + global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') + saver = tf.train.Saver(global_variables) + init = tf.global_variables_initializer() + + sess = tf.Session() + sess.run(init) + try: + saver.restore(sess, os.path.abspath( + os.path.join(os.path.dirname(__file__), '..')) + + '/variables/' + agent.name + '/' + + agent.name) + except InvalidArgumentError: + print("Model not found - initiating new one") + return sess, agent diff --git a/public/models/bot/TrainedBot.py b/public/models/bot/TrainedBot.py index f02205f..a271663 100644 --- a/public/models/bot/TrainedBot.py +++ b/public/models/bot/TrainedBot.py @@ -1,9 +1,5 @@ """The Trained Bot""" -import os - -import tensorflow as tf - -from public.models.agent.VanillaAgent import VanillaAgent +from public.models.agent.Agent import start_agent from public.models.bot.Bot import Bot from train.reward import format_moves, get_game_state @@ -11,27 +7,8 @@ class TrainedBot(Bot): """The trained bot""" - def __init__(self): - os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' - tf.reset_default_graph() - - with tf.device("/cpu:0"): - with tf.variable_scope('global'): - self.agent = VanillaAgent() - - global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') - saver = tf.train.Saver(global_variables) - init = tf.global_variables_initializer() - - self.sess = tf.Session() - self.sess.run(init) - try: - saver.restore(self.sess, os.path.abspath( - os.path.join(os.path.dirname(__file__), '..')) - + '/variables/' + self.agent.name + '/' - + self.agent.name) - except FileNotFoundError: - print("Model not found - initiating new one") + def __init__(self, agent_class): + self.sess, self.agent = start_agent(agent_class) def compute_moves(self, game_map): """Compute the moves given a game_map""" diff --git a/public/util/dijkstra.py b/public/util/dijkstra.py new file mode 100644 index 0000000..50f4265 --- /dev/null +++ b/public/util/dijkstra.py @@ -0,0 +1,146 @@ +"""The dijkstra module""" +import numpy as np + + +class Prioritydictionary(dict): + """A priority dictionary""" + def __init__(self): + """Initialize Prioritydictionary by creating binary heap of + pairs (value,key). Note that changing or removing a dict entry + will not remove the old pair from the heap until it is found by + smallest() or until the heap is rebuilt.""" + self.__heap = [] + dict.__init__(self) + + def smallest(self): + """Find smallest item after removing deleted items from front of + heap.""" + if len(self) == 0: + raise IndexError("smallest of empty Prioritydictionary") + heap = self.__heap + while heap[0][1] not in self or self[heap[0][1]] != heap[0][0]: + last_item = heap.pop() + insertion_point = 0 + while 1: + small_child = 2 * insertion_point + 1 + if small_child + 1 < len(heap) and \ + heap[small_child] > heap[small_child + 1]: + small_child += 1 + if small_child >= len(heap) or last_item <= heap[small_child]: + heap[insertion_point] = last_item + break + heap[insertion_point] = heap[small_child] + insertion_point = small_child + return heap[0][1] + + def __iter__(self): + """Create destructive sorted iterator of Prioritydictionary.""" + + def iterfn(): + while len(self) > 0: + x = self.smallest() + yield x + del self[x] + + return iterfn() + + def __setitem__(self, key, val): + """Change value stored in dictionary and add corresponding pair + to heap. Rebuilds the heap if the number of deleted items gets + large, to avoid memory leakage.""" + dict.__setitem__(self, key, val) + heap = self.__heap + if len(heap) > 2 * len(self): + self.__heap = [(v, k) for k, v in self.items()] + self.__heap.sort() + # builtin sort probably faster than O(n)-time heapify + else: + new_pair = (val, key) + insertion_point = len(heap) + heap.append(None) + while insertion_point > 0 and \ + new_pair < heap[(insertion_point - 1) // 2]: + heap[insertion_point] = heap[(insertion_point - 1) // 2] + insertion_point = (insertion_point - 1) // 2 + heap[insertion_point] = new_pair + + def setdefault(self, key, val): + """Reimplement setdefault to pass through our customized __setitem__.""" + if key not in self: + self[key] = val + return self[key] + + +def dijkstra(g, start, end=None): + """The dijkstra algorithm""" + d = {} # dictionary of final distances + p = {} # dictionary of predecessors + q = Prioritydictionary() # estimated distances of non-final vertices + q[start] = 0 + + for v in q: + d[v] = q[v] + if v == end: + break + + for w in g[v]: + vw_length = d[v] + g[v][w] + if w in d: + if vw_length < d[w]: + raise ValueError("Dijkstra: found better path to already-final vertex") + elif w not in q or vw_length < q[w]: + q[w] = vw_length + p[w] = v + + return d, p + + +class Graph: + """The Graph object""" + def __init__(self): + self.g = {} + + def add_node(self, value): + if value not in self.g: + self.g[value] = {} + + def add_edge(self, from_node, to_node): + self.g[from_node][to_node] = 1 + self.g[to_node][from_node] = 1 + + def remove_node(self, value): + for to in self.g[value]: + self.g[to].pop(value, None) + self.g.pop(value, None) + + def update(self, new_state, previous_state): + for (i, j), k in np.ndenumerate(new_state - previous_state): + if k == 1: + self.add_node((i, j)) + for i1, j1 in [(i - 1, j), (i, j + 1), (i, j - 1), (i + 1, j)]: + if (i1, j1) in self.g: + self.add_edge((i, j), (i1, j1)) + elif k == -1: + self.remove_node((i, j)) + + +def build_graph_from_state(state): + """Build the graph from the state""" + def take(state, i, j): + return np.take(np.take(state, j, axis=1, mode='wrap'), i, axis=0, mode='wrap') + + g = Graph() + g.add_node(0) + for (i, j), k in np.ndenumerate(state): + if k == 1: + g.add_node((i, j)) + n, e, w, s = take(state, i - 1, j), take(state, i, j + 1), take(state, i, j - 1), take(state, i + 1, j) + if s: + g.add_node(((i + 1) % state.shape[0], j)) + g.add_edge((i, j), ((i + 1) % state.shape[0], j)) + if e: + g.add_node((i, (j + 1) % state.shape[1])) + g.add_edge((i, j), (i, (j + 1) % state.shape[1])) + if n * e * w * s == 0: + g.add_edge((i, j), 0) + return g diff --git a/tests/dijkstra_speed_test.py b/tests/dijkstra_speed_test.py new file mode 100644 index 0000000..eed9fde --- /dev/null +++ b/tests/dijkstra_speed_test.py @@ -0,0 +1,47 @@ +"""Testing the speed of 2 dijkstra alternative""" +import numpy as np +import pytest + +from public.util.dijkstra import build_graph_from_state, dijkstra +from tests.util import game_states_from_file + + +def dijkstra_naive(game_states): + for game_state in game_states: + g = build_graph_from_state(game_state[0]) + dist_dict, _ = dijkstra(g.g, 0) + + dist = np.zeros_like(game_state[0]) + for key, value in dist_dict.items(): + dist[key] = value + + +def dijkstra_update(game_states): + g = build_graph_from_state(game_states[0][0]) + for i in range(1, len(game_states)): + g.update(game_states[i][0], game_states[i - 1][0]) + dist_dict, _ = dijkstra(g.g, 0) + + dist = np.zeros_like(game_states[i][0]) + for key, value in dist_dict.items(): + dist[key] = value + + +@pytest.mark.benchmark(group="dijkstra") +def test_dijkstra_naive_speed(benchmark): + """ + Benchmark the time of dijsktra + """ + game_states, _ = game_states_from_file() + benchmark(dijkstra_naive, game_states=game_states) + assert True + + +@pytest.mark.benchmark(group="dijkstra") +def test_dijkstra_update_speed(benchmark): + """ + Benchmark the time of dijsktra + """ + game_states, _ = game_states_from_file() + benchmark(dijkstra_update, game_states=game_states) + assert True diff --git a/tests/dijkstra_test.py b/tests/dijkstra_test.py new file mode 100644 index 0000000..9c49016 --- /dev/null +++ b/tests/dijkstra_test.py @@ -0,0 +1,37 @@ +""" +Tests the dijkstra function +""" +import unittest + +import numpy as np +from public.util.dijkstra import build_graph_from_state, dijkstra + + +class TestDijkstra(unittest.TestCase): + """ + Tests the dijkstra algorithm + """ + + def test_dijkstra(self): + """ + Test the dijkstra algorithm + """ + state = np.array([[0, 0, 0, 0, 1, 1], + [0, 1, 1, 1, 1, 1], + [0, 1, 1, 1, 1, 1], + [0, 1, 1, 1, 1, 1], + [0, 1, 1, 1, 1, 1], + [0, 0, 0, 1, 1, 1]]) + + print(state) + g = build_graph_from_state(state) + dist_dict, _ = dijkstra(g.g, 0) + + dist = np.zeros_like(state) + for key, value in dist_dict.items(): + dist[key] = value + print(dist) + + +if __name__ == '__main__': + unittest.main() diff --git a/tests/tensorflow_call_speed_test.py b/tests/tensorflow_call_speed_test.py new file mode 100644 index 0000000..ca05b62 --- /dev/null +++ b/tests/tensorflow_call_speed_test.py @@ -0,0 +1,50 @@ +"""Testing the speed of 2 tensorflow alternatives""" +import numpy as np +import pytest + +from public.models.agent.Agent import start_agent +from public.models.agent.VanillaAgent import VanillaAgent +from tests.util import game_states_from_file +from train.reward import normalize_game_state, local_state_from_global + + +def tensorflow_naive(game_states, sess, agent): + for game_state in game_states: + for y in range(len(game_state[0])): + for x in range(len(game_state[0][0])): + if game_state[0][y][x] == 1: + game_state_n = normalize_game_state(local_state_from_global(game_state, x, y)).reshape(1, -1) + sess.run(agent.policy, feed_dict={agent.state_in: game_state_n}) + + +def tensorflow_combined(game_states, sess, agent): + for game_state in game_states: + all_game_state_n = np.array([]).reshape(0, 27) + for y in range(len(game_state[0])): + for x in range(len(game_state[0][0])): + if game_state[0][y][x] == 1: + game_state_n = normalize_game_state(local_state_from_global(game_state, x, y)).reshape(1, -1) + all_game_state_n = np.concatenate((all_game_state_n, game_state_n), axis=0) + sess.run(agent.policy, feed_dict={agent.state_in: all_game_state_n}) + + +@pytest.mark.benchmark(group="tf") +def test_tensorflow_naive_speed(benchmark): + """ + Benchmark the time of dijsktra + """ + sess, agent = start_agent(VanillaAgent) + game_states, _ = game_states_from_file() + benchmark(tensorflow_naive, game_states=game_states, sess=sess, agent=agent) + assert True + + +@pytest.mark.benchmark(group="tf") +def test_tensorflow_combined_speed(benchmark): + """ + Benchmark the time of dijsktra + """ + sess, agent = start_agent(VanillaAgent) + game_states, _ = game_states_from_file() + benchmark(tensorflow_combined, game_states=game_states, sess=sess, agent=agent) + assert True diff --git a/tests/util.py b/tests/util.py index 3e4137e..0348eed 100644 --- a/tests/util.py +++ b/tests/util.py @@ -1,4 +1,5 @@ """Importing the game from aws""" +import os import json import urllib.request import numpy as np @@ -10,7 +11,14 @@ def game_states_from_url(game_url): :param game_url: The url of the game on the server (string). :return: """ - game = json.loads(urllib.request.urlopen(game_url).readline().decode("utf-8")) + return text_to_game(urllib.request.urlopen(game_url).readline().decode("utf-8")) + + +def text_to_game(text): + """ + Transform the text from the *.hlt files into game_states + """ + game = json.loads(text) owner_frames = np.array(game["frames"])[:, :, :, 0][:, np.newaxis, :, :] strength_frames = np.array(game["frames"])[:, :, :, 1][:, np.newaxis, :, :] @@ -20,3 +28,11 @@ def game_states_from_url(game_url): game_states = np.concatenate(([owner_frames, strength_frames, production_frames]), axis=1) return game_states, moves + + +def game_states_from_file(filepath=None): + path_to_hlt = 'visualize/hlt/' if filepath is None else filepath # 'visualize/hlt/' + + hlt_files = [hlt_file for hlt_file in os.listdir(path_to_hlt) if hlt_file != '.DS_Store'] + filepath = hlt_files[0] + return text_to_game(open(path_to_hlt + filepath).read()) diff --git a/visualize/hlt/README.md b/visualize/hlt/README.md new file mode 100644 index 0000000..e69de29 From c4aa9e8fe7733153c4005722e51459840295a512 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Sat, 7 Oct 2017 10:50:54 +0200 Subject: [PATCH 39/45] Adding missing elements (pylint) to commit n-2 --- visualize/hlt/README.md | 3 +++ visualize/visualize.py | 15 +++++++++++++-- 2 files changed, 16 insertions(+), 2 deletions(-) diff --git a/visualize/hlt/README.md b/visualize/hlt/README.md index e69de29..b0c064f 100644 --- a/visualize/hlt/README.md +++ b/visualize/hlt/README.md @@ -0,0 +1,3 @@ +# HLT files + +Here should automatically go all the *.hlt files. \ No newline at end of file diff --git a/visualize/visualize.py b/visualize/visualize.py index 2184e01..37e6c7f 100755 --- a/visualize/visualize.py +++ b/visualize/visualize.py @@ -21,13 +21,17 @@ app = Flask(__name__) hlt_root = os.path.join(app.root_path, 'hlt') + + @app.route('/hlt/') def send_hlt(path): return send_from_directory('hlt', path) + @app.route("/") def home(): - return render_template('visualizer.html',tree=make_tree(hlt_root)) + return render_template('visualizer.html', tree=make_tree(hlt_root)) + @app.route("/performance.html") def performance(): @@ -37,7 +41,12 @@ def performance(): """ return render_template('performance.html') + def make_tree(path): + """ + For finding the halite file, we provide their directory tree. + :return: + """ tree = dict(name=os.path.basename(path), children=[]) try: lst = os.listdir(path) @@ -50,10 +59,11 @@ def make_tree(path): tree['children'].append(make_tree(fn)) else: if name != ".DS_Store": - tree['children'].append(dict(path='hlt/'+name,name=name)) + tree['children'].append(dict(path='hlt/' + name, name=name)) print(np) return tree + @app.route("/performance.png") def performance_plot(): """ @@ -90,6 +100,7 @@ def convert(r): :param r: :return: """ + def get_owner(square): return square['owner'] From 755d789411c4dcb461eb8c699fce281b1051bdb4 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Sun, 8 Oct 2017 18:49:44 +0200 Subject: [PATCH 40/45] State object - Path + Dijkstra Why this change was necessary: * State should be an object, since we might choose to train with different states * Use of dijkstra algo + one single call -> drastically improves perf Potential side-effects: * We need to change the State at many places when working: train / test / visualize ... that is not good * The names of the agent are not sufficient to discriminate btw models (we might want vanilla + state2 + scope 1) * There is a need for a bigger object that agent, since at now it deals with dijkstra, which should be in a separate, strategy object... * There is not one central 'monitoring screen' to change all the hyperparameters at once and run experiments * The start_agent function in Agent.py factorizes the code, but doesn't look very correct --- .gitignore | 8 +- Makefile | 8 +- .../2017-09-26-simple-approach.markdown | 2 +- .../2017-10-05-reward-importance.markdown | 2 +- docs/_posts/2017-10-08-dijkstra.markdown | 52 ++++++ networking/hlt_networking.py | 5 + public/MyBot.py | 3 +- public/hlt.py | 14 ++ public/models/agent/Agent.py | 34 +--- public/models/agent/VanillaAgent.py | 60 +++++-- public/models/bot/TrainedBot.py | 10 +- public/state/state.py | 50 ++++++ public/util/dijkstra.py | 2 +- public/util/path.py | 32 ++++ requirements.txt | 1 + tests/path_test.py | 24 +++ tests/reward_test.py | 12 +- tests/state_speed_test.py | 73 ++++++++ tests/tensorflow_call_speed_test.py | 14 +- train/experience.py | 13 +- train/main.py | 37 ++-- train/reward.py | 159 ------------------ train/reward/__init__.py | 0 train/reward/reward.py | 76 +++++++++ train/reward/util.py | 37 ++++ train/worker.py | 19 ++- visualize/hlt/example.hlt | 1 + visualize/static/visualizer.js | 2 +- visualize/visualize.py | 18 +- 29 files changed, 509 insertions(+), 259 deletions(-) create mode 100644 docs/_posts/2017-10-08-dijkstra.markdown create mode 100644 public/state/state.py create mode 100644 public/util/path.py create mode 100644 tests/path_test.py create mode 100644 tests/state_speed_test.py delete mode 100644 train/reward.py create mode 100644 train/reward/__init__.py create mode 100644 train/reward/reward.py create mode 100644 train/reward/util.py create mode 100644 visualize/hlt/example.hlt diff --git a/.gitignore b/.gitignore index 8de6bbf..ae86ca3 100644 --- a/.gitignore +++ b/.gitignore @@ -102,4 +102,10 @@ ENV/ /site # mypy -.mypy_cache/ \ No newline at end of file +.mypy_cache/ +.benchmarks/ +public/halite +public/models/variables/vanilla/ +src/core/Halite.o +src/main.o +src/networking/Networking.o diff --git a/Makefile b/Makefile index ee5ecf1..47f6a4f 100644 --- a/Makefile +++ b/Makefile @@ -8,7 +8,7 @@ clean: .PHONY: sync-nefeli sync-nefeli: - rsync -a --exclude 'public/halite' --exclude '*.o' . mehlman@nefeli.math-info.univ-paris5.fr:/home/mehlman/Halite-Python-RL/ #--delete + rsync -a . mehlman@nefeli.math-info.univ-paris5.fr:/home/mehlman/Halite-Python-RL/ --delete .PHONY: get-nefeli get-nefeli: @@ -16,7 +16,7 @@ get-nefeli: .PHONY: sync-solon sync-solon: - rsync -a --exclude 'public/halite' --exclude '*.o' . solon:/home/mehlman/Halite-Python-RL/ #--delete + rsync -a --exclude 'public/halite' --exclude '*.o' . solon:/home/mehlman/Halite-Python-RL/ --delete .PHONY: get-solon get-solon: @@ -32,4 +32,6 @@ server: .PHONY: debug-server debug-server: - cd visualize;FLASK_APP=visualize.py FLASK_DEBUG=1 python -m flask run \ No newline at end of file + cd visualize;FLASK_APP=visualize.py FLASK_DEBUG=1 python -m flask run + +#scp mehlman@nefeli.math-info.univ-paris5.fr:/home/mehlman/Halite-Python-RL/public/models/variables/vanilla-2 public/models/variables/vanilla-2 \ No newline at end of file diff --git a/docs/_posts/2017-09-26-simple-approach.markdown b/docs/_posts/2017-09-26-simple-approach.markdown index 9f0a803..4107369 100644 --- a/docs/_posts/2017-09-26-simple-approach.markdown +++ b/docs/_posts/2017-09-26-simple-approach.markdown @@ -1,7 +1,7 @@ --- layout: default title: "A simple approach" -date: 2016-02-12 17:50:00 +date: 2017-09-26 17:50:00 categories: main --- diff --git a/docs/_posts/2017-10-05-reward-importance.markdown b/docs/_posts/2017-10-05-reward-importance.markdown index 1c446e1..5b0eb80 100644 --- a/docs/_posts/2017-10-05-reward-importance.markdown +++ b/docs/_posts/2017-10-05-reward-importance.markdown @@ -1,7 +1,7 @@ --- layout: default title: "The reward importance" -date: 2016-02-12 16:30:00 +date: 2017-10-05 16:30:00 categories: main --- diff --git a/docs/_posts/2017-10-08-dijkstra.markdown b/docs/_posts/2017-10-08-dijkstra.markdown new file mode 100644 index 0000000..6d667d0 --- /dev/null +++ b/docs/_posts/2017-10-08-dijkstra.markdown @@ -0,0 +1,52 @@ +--- +layout: default +title: "The dijkstra algorithm" +date: 2017-10-08 16:30:00 +categories: main +--- + + + +# The Dijsktra algorithm + +## The power of Dijsktra + +We had dealt with the problem of border squares, learning with a neural network. + +Dijsktra algorithm, which runs here in linear time, gives us the ability + +## Our currently trained vanilla model + +Trained for 3h with 8 workers, each simulating 2500 games. The batch_size was 128. + +Every worker would update his agent - and then the global agent - after each game / episode is played. + +``` +self.agent.experience.add_episode(game_states, moves) +self.agent.update_agent(sess) +``` + +Parameters: + +``` +lr=1e-3, a_size=5, h_size=50 +``` + +The main features of the reward function were the following: + +``` +np.array([np.power(get_prod(game_states[i + 1]) - get_prod(game_states[i]),1) + for i in range(len(game_states) - 1)]) +... +discount_factor=0.6 +... +discounted_rewards[t] = running_reward - 0.1 +``` + +And finally the chosen state was: + +``` +State1(scope=2) +``` \ No newline at end of file diff --git a/networking/hlt_networking.py b/networking/hlt_networking.py index 3cecc94..5433c67 100644 --- a/networking/hlt_networking.py +++ b/networking/hlt_networking.py @@ -6,6 +6,7 @@ class HLT: """The HLT class to handle the connection""" + def __init__(self, port): connection = socket.socket(socket.AF_INET, socket.SOCK_STREAM) connection.connect(('localhost', port)) @@ -38,3 +39,7 @@ def send_frame(self, moves): self.send_string(' '.join( str(move.square.x) + ' ' + str(move.square.y) + ' ' + str(translate_cardinal(move.direction)) for move in moves)) + + def send_frame_custom(self, moves): + self.send_string(' '.join( + str(x) + ' ' + str(y) + ' ' + str(translate_cardinal(direction)) for (x, y), direction in moves)) diff --git a/public/MyBot.py b/public/MyBot.py index 62e35dc..45e4f50 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -4,6 +4,7 @@ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) try: + from public.state.state import State1 from public.models.bot.TrainedBot import TrainedBot from public.models.agent.VanillaAgent import VanillaAgent from networking.hlt_networking import HLT @@ -17,7 +18,7 @@ port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 hlt = HLT(port=port) -bot = TrainedBot(VanillaAgent) +bot = TrainedBot(VanillaAgent, State1(scope=2)) while True: my_id, game_map = hlt.get_init() diff --git a/public/hlt.py b/public/hlt.py index acc0a1d..c4aad22 100644 --- a/public/hlt.py +++ b/public/hlt.py @@ -119,3 +119,17 @@ def send_frame(moves): send_string(' '.join( str(move.square.x) + ' ' + str(move.square.y) + ' ' + str(translate_cardinal(move.direction)) for move in moves)) + + +def send_frame_custom(moves): + send_string(' '.join( + str(x) + ' ' + str(y) + ' ' + str(translate_cardinal(direction)) for (x, y), direction in moves)) + + +def format_moves(game_map, moves): + moves_to_send = [] + for y in range(len(game_map.contents)): + for x in range(len(game_map.contents[0])): + if moves[y][x] != -1: + moves_to_send += [Move(game_map.contents[y][x], moves[y][x])] + return moves_to_send diff --git a/public/models/agent/Agent.py b/public/models/agent/Agent.py index 55304e0..21c3d00 100644 --- a/public/models/agent/Agent.py +++ b/public/models/agent/Agent.py @@ -5,15 +5,14 @@ import tensorflow as tf from tensorflow.python.framework.errors_impl import InvalidArgumentError -from train.reward import local_state_from_global, normalize_game_state - class Agent: """The Agent general class""" - def __init__(self, name, experience): + def __init__(self, name, state, experience): self.name = name self.experience = experience + self.state = state if self.experience is not None: try: self.experience.metric = np.load(os.path.abspath( @@ -24,44 +23,21 @@ def __init__(self, name, experience): print("Metric file not found") self.experience.metric = np.array([]) - def get_policies(self, sess, game_state): - policies = np.zeros(game_state[0].shape + (5,)) - for y in range(len(game_state[0])): - for x in range(len(game_state[0][0])): - if game_state[0][y][x] == 1: - policies[y][x] = self.get_policy(sess, - normalize_game_state(local_state_from_global(game_state, x, y))) - return policies - - def get_policy(self, sess, state): - pass - - def choose_actions(self, sess, game_state, debug=False): - # Here the state is not yet normalized ! - moves = np.zeros_like(game_state[0], dtype=np.int64) - 1 - for y in range(len(game_state[0])): - for x in range(len(game_state[0][0])): - if game_state[0][y][x] == 1: - moves[y][x] = self.choose_action(sess, - normalize_game_state(local_state_from_global(game_state, x, y)), - debug=debug) - return moves - - def choose_action(self, sess, state, frac_progress=1.0, debug=False): + def choose_actions(self, sess, state, frac_progress=1.0, debug=False): pass def update_agent(self, sess): pass -def start_agent(agent_class): +def start_agent(agent_class, state): """Start and return a tf session and its corresponding agent""" os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' tf.reset_default_graph() with tf.device("/cpu:0"): with tf.variable_scope('global'): - agent = agent_class() + agent = agent_class(state) global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') saver = tf.train.Saver(global_variables) diff --git a/public/models/agent/VanillaAgent.py b/public/models/agent/VanillaAgent.py index 8779dab..b559b7c 100644 --- a/public/models/agent/VanillaAgent.py +++ b/public/models/agent/VanillaAgent.py @@ -1,18 +1,22 @@ """The Vanilla Agent""" + import numpy as np import tensorflow as tf import tensorflow.contrib.slim as slim from public.models.agent.Agent import Agent +from public.util.dijkstra import build_graph_from_state, dijkstra +from public.util.path import move_to, path_to class VanillaAgent(Agent): """The Vanilla Agent""" - def __init__(self, experience=None, lr=1e-2, s_size=9 * 3, a_size=5, h_size=50): # all these are optional ? - super(VanillaAgent, self).__init__('vanilla-cin', experience) + + def __init__(self, state, experience=None, lr=1e-4, a_size=5, h_size=200): # all these are optional ? + super(VanillaAgent, self).__init__('vanilla-debug', state, experience) # These lines established the feed-forward part of the network. The agent takes a state and produces an action. - self.state_in = tf.placeholder(shape=[None, s_size], dtype=tf.float32) + self.state_in = tf.placeholder(shape=[None, state.local_size], dtype=tf.float32) hidden = slim.fully_connected(self.state_in, h_size, activation_fn=tf.nn.relu) @@ -45,22 +49,56 @@ def __init__(self, experience=None, lr=1e-2, s_size=9 * 3, a_size=5, h_size=50): def get_policy(self, sess, state): return sess.run(self.policy, feed_dict={self.state_in: [state.reshape(-1)]}) - def choose_action(self, sess, state, frac_progress=1.0, debug=False): # it only a state, not the game state... + def get_policies(self, sess, game_state): + policies = np.zeros(game_state[0].shape + (5,)) + for (y, x), k in np.ndenumerate(game_state[0]): + if k == 1: + policies[y][x] = self.get_policy(sess, self.state.get_local_and_normalize(game_state, x, y)) + return policies + + def choose_action(self, sess, state, train=True): # Here the state is normalized ! - if np.random.uniform() >= frac_progress: - a = np.random.choice(range(5)) - else: + if train: # keep randomness a_dist = sess.run(self.policy, feed_dict={self.state_in: [state.reshape(-1)]}) a = np.random.choice(a_dist[0], p=a_dist[0]) a = np.argmax(a_dist == a) - if debug: + else: # act greedily a = sess.run(self.predict, feed_dict={self.state_in: [state.reshape(-1)]}) return a + def choose_actions(self, sess, game_state, train=True): + """Choose all actions using one call to tensorflow""" + g = build_graph_from_state(game_state[0]) + dist_dict, closest_dict = dijkstra(g.g, 0) + all_game_state_n = np.array([]).reshape(0, self.state.local_size) + moves = np.zeros_like(game_state[0], dtype=np.int64) - 1 + moves_2 = np.zeros_like(game_state[0], dtype=np.int64) - 1 + moves_where = [] + for (y, x), k in np.ndenumerate(game_state[0]): + if k == 1: + if (y, x) in dist_dict and dist_dict[(y, x)] in [1, 2]: + moves_where += [(y, x)] + game_state_n = self.state.get_local_and_normalize(game_state, x, y).reshape(1, + self.state.local_size) + all_game_state_n = np.concatenate((all_game_state_n, game_state_n), axis=0) + else: + if game_state[1][y][x] > 10: # Set a minimum strength + y_t, x_t = y, x + y_t, x_t = closest_dict[(y_t, x_t)] + # while closest_dict[(y_t,x_t)]!=0: Pbbly unnecessary + # y_t, x_t = closest_dict[(y_t, x_t)] + moves_2[y][x] = move_to(path_to((x, y), (x_t, y_t), len(game_state[0][0]), len(game_state[0]))) + if train: + actions = sess.run(self.policy, feed_dict={self.state_in: all_game_state_n}) + actions = [np.argmax(action == np.random.choice(action, p=action)) for action in actions] + else: + actions = sess.run(self.predict, feed_dict={self.state_in: all_game_state_n}) + for (y, x), d in zip(moves_where, actions): + moves[y][x] = d + return moves, moves_2 + def update_agent(self, sess): - # batch_size = min(int(len(self.moves)/2),128) # Batch size - # indices = np.random.randint(len(self.moves)-1, size=batch_size) - states, moves, rewards = self.experience.batch(512) + states, moves, rewards = self.experience.batch() feed_dict = {self.state_in: states, self.action_holder: moves, diff --git a/public/models/bot/TrainedBot.py b/public/models/bot/TrainedBot.py index a271663..3d72e5e 100644 --- a/public/models/bot/TrainedBot.py +++ b/public/models/bot/TrainedBot.py @@ -1,19 +1,21 @@ """The Trained Bot""" +from public.hlt import format_moves from public.models.agent.Agent import start_agent from public.models.bot.Bot import Bot -from train.reward import format_moves, get_game_state +from public.state.state import get_game_state class TrainedBot(Bot): """The trained bot""" - def __init__(self, agent_class): - self.sess, self.agent = start_agent(agent_class) + def __init__(self, agent_class, state): + self.sess, self.agent = start_agent(agent_class, state) def compute_moves(self, game_map): """Compute the moves given a game_map""" game_state = get_game_state(game_map, self.my_id) - return format_moves(game_map, self.agent.choose_actions(self.sess, game_state, debug=True)) + moves1, moves2 = self.agent.choose_actions(self.sess, game_state, train=False) + return format_moves(game_map, -(moves1*moves2)) def get_policies(self, game_state): """Compute the policies given a game_state""" diff --git a/public/state/state.py b/public/state/state.py new file mode 100644 index 0000000..df705b8 --- /dev/null +++ b/public/state/state.py @@ -0,0 +1,50 @@ +"""The state file""" +import numpy as np + +STRENGTH_SCALE = 255 +PRODUCTION_SCALE = 10 + + +class State: + def __init__(self, local_size): + self.local_size = local_size + + def get_local(self, game_state, x, y): + pass + + def get_local_and_normalize(self, game_state, x, y): + return self.get_local(game_state, x, y) / np.array([1, STRENGTH_SCALE, PRODUCTION_SCALE])[:, np.newaxis] + + +class State1(State): + def __init__(self, scope=1): + self.scope = scope + super(State1, self).__init__(local_size=3 * ((2 * scope + 1) ** 2)) + + def get_local(self, game_state, x, y): + # all the axes remain through the operation, because of range + return np.take(np.take(game_state, range(y - self.scope, y + self.scope + 1), axis=1, mode='wrap'), + range(x - self.scope, x + self.scope + 1), axis=2, mode='wrap').reshape(3, -1) + + +class State2(State): + def __init__(self, scope=2): + self.scope = scope + super(State2, self).__init__(local_size=3 * (2 * (scope ** 2) + 2 * scope + 1)) + + def get_local(self, game_state, x, y): + to_concat = () + for i in range(self.scope + 1): + slice_s = np.take(np.take(game_state, + range(x - (self.scope - i), x + (self.scope - i) + 1), axis=2, mode='wrap'), + [y - i, y + i] if i != 0 else y, axis=1, mode='wrap') + slice_s = slice_s.reshape(3, -1) + to_concat += (slice_s,) + return np.concatenate(to_concat, axis=1) + + +def get_game_state(game_map, my_id): + game_state = np.reshape( + [[(square.owner == my_id) + 0, square.strength, square.production] for square in game_map], + [game_map.height, game_map.width, 3]) + return np.swapaxes(np.swapaxes(game_state, 2, 0), 1, 2) diff --git a/public/util/dijkstra.py b/public/util/dijkstra.py index 50f4265..244fa68 100644 --- a/public/util/dijkstra.py +++ b/public/util/dijkstra.py @@ -124,7 +124,7 @@ def update(self, new_state, previous_state): self.remove_node((i, j)) -def build_graph_from_state(state): +def build_graph_from_state(state): # The keys will be y, x, since we np.ndenumerate """Build the graph from the state""" def take(state, i, j): return np.take(np.take(state, j, axis=1, mode='wrap'), i, axis=0, mode='wrap') diff --git a/public/util/path.py b/public/util/path.py new file mode 100644 index 0000000..dbf3bee --- /dev/null +++ b/public/util/path.py @@ -0,0 +1,32 @@ +"""The path to another point""" +import random + +from public.hlt import EAST, WEST, NORTH, SOUTH + + +def path_to(start, end, width, height): + """Given start = (x,y), end = (x,y), end return dx, dy""" + x1, y1 = start + x2, y2 = end + + def settle(p1, p2, modulo): + dp = min(abs(p1 - p2), modulo - abs(p1 - p2)) + if p1 < p2 and p2 - p1 != dp: # TODO contract formula + dp = -dp + elif p1 > p2 and p1 - p2 == dp: + dp = -dp + return dp + + return settle(x1, x2, width), settle(y1, y2, height) + + +def move_to(dxy): + """Move to the closest square given the tuple (dx, dy)""" + dx, dy = dxy + assert abs(dx) > 0 or abs(dy) > 0, "No closer move possible" + prob_east_west = abs(dx) / (abs(dx) + abs(dy)) + if random.uniform(0, 1) < prob_east_west: # Act east_west + move = EAST if dx > 0 else WEST + else: # Act north_south + move = SOUTH if dy > 0 else NORTH + return move diff --git a/requirements.txt b/requirements.txt index 0a080ab..e21a3b0 100644 --- a/requirements.txt +++ b/requirements.txt @@ -2,6 +2,7 @@ tensorflow coverage>=3.6 pytest-cov pytest-xdist +pytest-benchmark>=3.1 coveralls pylint>=1.6 flask \ No newline at end of file diff --git a/tests/path_test.py b/tests/path_test.py new file mode 100644 index 0000000..d58b0c6 --- /dev/null +++ b/tests/path_test.py @@ -0,0 +1,24 @@ +"""Test the path_to function""" +import unittest + +from public.util.path import path_to + +class PathTo(unittest.TestCase): + """ + Tests the path_to function + """ + + def test_path_to(self): + """ + Test the path_to function + """ + self.assertTrue(path_to((0, 0), (0, 4), 5, 5) == (0, -1)) + self.assertTrue(path_to((4, 4), (0, 0), 5, 5) == (1, 1)) + self.assertTrue(path_to((0, 0), (4, 4), 5, 5) == (-1, -1)) + self.assertTrue(path_to((0, 0), (4, 4), 5, 10) == (-1, 4)) + self.assertTrue(path_to((0, 0), (4, 4), 6, 10) == (-2, 4)) + self.assertTrue(path_to((0, 0), (4, 4), 7, 10) == (-3, 4)) + + +if __name__ == '__main__': + unittest.main() diff --git a/tests/reward_test.py b/tests/reward_test.py index 56e5815..868d8c4 100644 --- a/tests/reward_test.py +++ b/tests/reward_test.py @@ -5,9 +5,11 @@ import numpy as np +from public.state.state import State1 from tests.util import game_states_from_url from train.experience import ExperienceVanilla -from train.reward import discount_rewards, raw_rewards_function, all_rewards_function +from train.reward.reward import Reward +from train.reward.util import discount_rewards from train.worker import Worker @@ -15,6 +17,7 @@ class TestReward(unittest.TestCase): """ Tests the reward function """ + def test_length_discount_rewards(self): """ Test the length of the discount reward @@ -29,14 +32,15 @@ def test_reward(self): game_url = 'https://s3.eu-central-1.amazonaws.com/halite-python-rl/hlt-games/trained-bot.hlt' game_states, moves = game_states_from_url(game_url) - raw_rewards = raw_rewards_function(game_states) + r = Reward(State1()) + raw_rewards = r.raw_rewards_function(game_states) self.assertTrue(len(raw_rewards) == len(game_states) - 1) - all_states, all_moves, all_rewards = all_rewards_function(game_states, moves) + all_states, all_moves, all_rewards = r.all_rewards_function(game_states, moves) self.assertTrue(len(all_states) >= len(game_states) - 1) self.assertTrue(len(all_moves) >= len(moves)) self.assertTrue(len(all_rewards) == len(all_moves) and len(all_states) == len(all_moves)) - experience = ExperienceVanilla() + experience = ExperienceVanilla(r) experience.add_episode(game_states, moves) experience.add_episode(game_states, moves) self.assertTrue(len(experience.moves) == 2 * len(all_moves)) diff --git a/tests/state_speed_test.py b/tests/state_speed_test.py new file mode 100644 index 0000000..18d8a33 --- /dev/null +++ b/tests/state_speed_test.py @@ -0,0 +1,73 @@ +"""Testing the speed of building different game states""" +import numpy as np +import pytest + +from public.state.state import State2, State1 +from tests.util import game_states_from_file + + +def build_game_state(game_states, state): + for game_state in game_states: + for (y, x), k in np.ndenumerate(game_state[0]): + if k == 1: + state.get_local(game_state, x, y) + + +@pytest.mark.benchmark(group="state") +def test_state1_scope1(benchmark): + """ + Benchmark the time of dijsktra + """ + game_states, _ = game_states_from_file() + benchmark(build_game_state, game_states=game_states, state=State1(scope=1)) + assert True + + +@pytest.mark.benchmark(group="state") +def test_state1_scope2(benchmark): + """ + Benchmark the time of dijsktra + """ + game_states, _ = game_states_from_file() + benchmark(build_game_state, game_states=game_states, state=State1(scope=2)) + assert True + + +@pytest.mark.benchmark(group="state") +def test_state1_scope3(benchmark): + """ + Benchmark the time of dijsktra + """ + game_states, _ = game_states_from_file() + benchmark(build_game_state, game_states=game_states, state=State1(scope=3)) + assert True + + +@pytest.mark.benchmark(group="state") +def test_state2_scope2(benchmark): + """ + Benchmark the time of dijsktra + """ + game_states, _ = game_states_from_file() + benchmark(build_game_state, game_states=game_states, state=State2(scope=2)) + assert True + + +@pytest.mark.benchmark(group="state") +def test_state2_scope3(benchmark): + """ + Benchmark the time of dijsktra + """ + game_states, _ = game_states_from_file() + benchmark(build_game_state, game_states=game_states, state=State2(scope=3)) + assert True + + +@pytest.mark.benchmark(group="state") +def test_state2_scope4(benchmark): + """ + Benchmark the time of dijsktra + """ + game_states, _ = game_states_from_file() + benchmark(build_game_state, game_states=game_states, state=State2(scope=4)) + assert True diff --git a/tests/tensorflow_call_speed_test.py b/tests/tensorflow_call_speed_test.py index ca05b62..3d43572 100644 --- a/tests/tensorflow_call_speed_test.py +++ b/tests/tensorflow_call_speed_test.py @@ -4,26 +4,28 @@ from public.models.agent.Agent import start_agent from public.models.agent.VanillaAgent import VanillaAgent +from public.state.state import State1 from tests.util import game_states_from_file -from train.reward import normalize_game_state, local_state_from_global def tensorflow_naive(game_states, sess, agent): + state = State1(scope=1) for game_state in game_states: for y in range(len(game_state[0])): for x in range(len(game_state[0][0])): if game_state[0][y][x] == 1: - game_state_n = normalize_game_state(local_state_from_global(game_state, x, y)).reshape(1, -1) + game_state_n = state.get_local_and_normalize(game_state, x, y).reshape(1, -1) sess.run(agent.policy, feed_dict={agent.state_in: game_state_n}) def tensorflow_combined(game_states, sess, agent): + state = State1(scope=1) for game_state in game_states: - all_game_state_n = np.array([]).reshape(0, 27) + all_game_state_n = np.array([]).reshape(0, state.local_size) for y in range(len(game_state[0])): for x in range(len(game_state[0][0])): if game_state[0][y][x] == 1: - game_state_n = normalize_game_state(local_state_from_global(game_state, x, y)).reshape(1, -1) + game_state_n = state.get_local_and_normalize(game_state, x, y).reshape(1, -1) all_game_state_n = np.concatenate((all_game_state_n, game_state_n), axis=0) sess.run(agent.policy, feed_dict={agent.state_in: all_game_state_n}) @@ -33,7 +35,7 @@ def test_tensorflow_naive_speed(benchmark): """ Benchmark the time of dijsktra """ - sess, agent = start_agent(VanillaAgent) + sess, agent = start_agent(VanillaAgent, State1(scope=1)) game_states, _ = game_states_from_file() benchmark(tensorflow_naive, game_states=game_states, sess=sess, agent=agent) assert True @@ -44,7 +46,7 @@ def test_tensorflow_combined_speed(benchmark): """ Benchmark the time of dijsktra """ - sess, agent = start_agent(VanillaAgent) + sess, agent = start_agent(VanillaAgent, State1(scope=1)) game_states, _ = game_states_from_file() benchmark(tensorflow_combined, game_states=game_states, sess=sess, agent=agent) assert True diff --git a/train/experience.py b/train/experience.py index 49704b5..55707a3 100644 --- a/train/experience.py +++ b/train/experience.py @@ -3,7 +3,7 @@ """ import numpy as np -from train.reward import all_rewards_function, raw_rewards_metric +from train.reward.util import production_increments_function class Experience: @@ -24,7 +24,7 @@ def batch(self, size): pass def compute_metric(self, game_states): - production_increments = np.sum(np.sum(raw_rewards_metric(game_states), axis=2), axis=1) + production_increments = production_increments_function(game_states) self.metric = np.append(self.metric, production_increments.dot(np.linspace(2.0, 1.0, num=len(game_states) - 1))) def save_metric(self, name): @@ -36,15 +36,16 @@ class ExperienceVanilla(Experience): Stores states in addition to the inherited attributes of Experience """ - def __init__(self): + def __init__(self, reward): super(ExperienceVanilla, self).__init__() - self.states = np.array([]).reshape(0, 27) + self.reward = reward + self.states = np.array([]).reshape(0, self.reward.state.local_size) def add_episode(self, game_states, moves): self.compute_metric(game_states) - all_states, all_moves, all_rewards = all_rewards_function(game_states, moves) + all_states, all_moves, all_rewards = self.reward.all_rewards_function(game_states, moves) - self.states = np.concatenate((self.states, all_states.reshape(-1, 27)), axis=0) + self.states = np.concatenate((self.states, all_states.reshape(-1, self.reward.state.local_size)), axis=0) self.moves = np.concatenate((self.moves, all_moves)) self.rewards = np.concatenate((self.rewards, all_rewards)) diff --git a/train/main.py b/train/main.py index e693ffb..981b541 100644 --- a/train/main.py +++ b/train/main.py @@ -1,39 +1,42 @@ """This main.py file runs the training.""" -import threading import os import sys -import tensorflow as tf +import threading -from public.models.agent.VanillaAgent import VanillaAgent -from train.experience import ExperienceVanilla -from train.worker import Worker +import tensorflow as tf +from tensorflow.python.framework.errors_impl import InvalidArgumentError os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) - +try: + from public.models.agent.VanillaAgent import VanillaAgent + from public.state.state import State1 + from train.experience import ExperienceVanilla + from train.worker import Worker + from train.reward.reward import Reward +except: + raise port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 tf.reset_default_graph() # Clear the Tensorflow graph. with tf.device("/cpu:0"): - lr = 1e-3 - s_size = 9 * 3 - a_size = 5 - h_size = 50 - with tf.variable_scope('global'): - master_experience = ExperienceVanilla() - master_agent = VanillaAgent(master_experience, lr, s_size, a_size, h_size) + state = State1(scope=2) + master_experience = ExperienceVanilla(Reward(state)) + master_agent = VanillaAgent(state, master_experience) - num_workers = 5 - n_simultations = 500 + num_workers = 8 + n_simultations = 5000 workers = [] if num_workers > 1: for i in range(num_workers): with tf.variable_scope('worker_' + str(i)): - workers.append(Worker(port, i, VanillaAgent(ExperienceVanilla(), lr, s_size, a_size, h_size))) + state = State1(scope=2) + experience = ExperienceVanilla(Reward(state)) + workers.append(Worker(port, i, VanillaAgent(state, experience))) else: workers.append(Worker(port, 0, master_agent)) # We need only to save the global @@ -47,7 +50,7 @@ try: saver.restore(sess, os.path.abspath( os.path.dirname(__file__)) + '/../public/models/variables/' + master_agent.name + '/' + master_agent.name) - except FileNotFoundError: + except InvalidArgumentError: print("Model not found - initiating new one") coord = tf.train.Coordinator() diff --git a/train/reward.py b/train/reward.py deleted file mode 100644 index 4ef8143..0000000 --- a/train/reward.py +++ /dev/null @@ -1,159 +0,0 @@ -"""The reward.py file to compute the reward""" -import numpy as np -from public.hlt import NORTH, EAST, SOUTH, WEST, Move - -STRENGTH_SCALE = 255 -PRODUCTION_SCALE = 10 - - -def get_game_state(game_map, my_id): - game_state = np.reshape( - [[(square.owner == my_id) + 0, square.strength, square.production] for square in game_map], - [game_map.height, game_map.width, 3]) - return np.swapaxes(np.swapaxes(game_state, 2, 0), 1, 2) - - -def normalize_game_state(game_state): - return game_state / np.array([1, STRENGTH_SCALE, PRODUCTION_SCALE])[:, np.newaxis, np.newaxis] - - -def get_game_prod(game_state): - return np.sum(game_state[0] * game_state[2]) - - -def get_strength(game_state): - return np.sum(game_state[0] * game_state[1]) - # np.sum([square.strength for square in game_map if square.owner == my_id]) - - -def get_number(game_state): - return np.sum(game_state[0]) - # np.sum([square.strength for square in game_map if square.owner == my_id]) - - -def discount_rewards(r, gamma=0.8): - """ take 1D float array of rewards and compute discounted reward """ - discounted_r = np.zeros_like(r, dtype=np.float64) - running_add = 0 - for t in reversed(range(0, r.size)): - running_add = running_add * gamma + r[t] - discounted_r[t] = running_add - return discounted_r - - -def take_surrounding_square(game_state, x, y, size=1): - return np.take(np.take(game_state, range(y - size, y + size + 1), axis=1, mode='wrap'), - range(x - size, x + size + 1), axis=2, mode='wrap') - - -def local_state_from_global(game_state, x, y, size=1): - # TODO: for now we still take a square, but a more complex shape could be better. - return np.take(np.take(game_state, range(y - size, y + size + 1), axis=1, mode='wrap'), - range(x - size, x + size + 1), axis=2, mode='wrap') - - -def raw_rewards_metric(game_states): - return np.array([game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2] - for i in range(len(game_states) - 1)]) - - -def raw_rewards_function(game_states): - return np.array( - [0.0001 * np.power(game_states[i + 1][0] * game_states[i + 1][2] - game_states[i][0] * game_states[i][2], 4) - for i in range(len(game_states) - 1)]) - - -def strength_rewards(game_states): - return np.array([(get_strength(game_states[i + 1]) - get_strength(game_states[i])) - for i in range(len(game_states) - 1)]) - - -def discounted_reward_function(next_reward, move_before, strength_before, discount_factor=1.0): - """ - Given all the below arguments, return the discounted reward. - :param next_reward: - :param move_before: - :param strength_before: - :param discount_factor: - :return: - """ - reward = np.zeros_like(next_reward) - - def take_value(matrix, x, y): - return np.take(np.take(matrix, x, axis=1, mode='wrap'), y, axis=0, mode='wrap') - - for (y, x), d in np.ndenumerate(move_before): - if d != -1: - dy = (-1 if d == NORTH else 1) if (d == SOUTH or d == NORTH) else 0 - dx = (-1 if d == WEST else 1) if (d == WEST or d == EAST) else 0 - reward[y][x] = discount_factor * take_value(next_reward, x + dx, y + dy) \ - if strength_before[y][x] >= take_value(strength_before, x + dx, y + dy) \ - else 0 - return reward - - -def discounted_rewards_function(game_states, moves): - """ - Compute height*width matrices of rewards - not yet individualized - :param game_states: The list of game states - :param moves: The list of moves - :return: - """ - raw_rewards = raw_rewards_function(game_states) - # strength_rewards = strength_rewards(game_states) - discounted_rewards = np.zeros_like(raw_rewards, dtype=np.float64) - running_reward = np.zeros_like(raw_rewards[0], dtype=np.float64) - for t, (raw_reward, move, game_state) in reversed(list(enumerate(zip(raw_rewards, moves, game_states)))): - running_reward = discounted_reward_function(running_reward, move, game_state[1], - discount_factor=0.6) + \ - discounted_reward_function(raw_reward, move, game_state[1]) - discounted_rewards[t] = running_reward - return discounted_rewards - - -def individual_states_and_rewards(game_state, move, discounted_reward): - """ - Return the triplet states, moves, rewards for each of the square in one frame. - :param game_state: One game state - still a 3*3*3 matrix - :param move: The move for the given square - :param discounted_reward: The global matrix of discounted reward at time t, - from we we extract one frame - :return: - """ - states = [] - moves = [] - rewards = [] - for y in range(len(game_state[0])): - for x in range(len(game_state[0][0])): - if game_state[0][y][x] == 1: - states += [normalize_game_state(local_state_from_global(game_state, x, y))] - moves += [move[y][x]] - rewards += [discounted_reward[y][x]] - return states, moves, rewards - - -def all_individual_states_and_rewards(game_states, moves, discounted_rewards): - all_states = [] - all_moves = [] - all_rewards = [] - for game_state, move, discounted_reward in zip(game_states, moves, discounted_rewards): - states_, moves_, rewards_ = individual_states_and_rewards(game_state, move, discounted_reward) - all_states += states_ - all_moves += moves_ - all_rewards += rewards_ - return np.array(all_states), np.array(all_moves), np.array(all_rewards) - - -def all_rewards_function(game_states, moves): - # game_states n+1, moves n - discounted_rewards = discounted_rewards_function(game_states, moves) - return all_individual_states_and_rewards(game_states[:-1], moves, discounted_rewards) - - -def format_moves(game_map, moves): - moves_to_send = [] - for y in range(len(game_map.contents)): - for x in range(len(game_map.contents[0])): - if moves[y][x] != -1: - moves_to_send += [Move(game_map.contents[y][x], moves[y][x])] - return moves_to_send diff --git a/train/reward/__init__.py b/train/reward/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/train/reward/reward.py b/train/reward/reward.py new file mode 100644 index 0000000..ee09c7f --- /dev/null +++ b/train/reward/reward.py @@ -0,0 +1,76 @@ +"""The reward.py file to compute the reward""" +import numpy as np + +from public.hlt import NORTH, EAST, SOUTH, WEST +from train.reward.util import get_prod + + +class Reward: + """The reward class""" + + def __init__(self, state, discount_factor=0.6): + self.discount_factor = discount_factor + self.state = state + + def raw_rewards_function(self, game_states): + return np.array( + [0.1 * np.power(get_prod(game_states[i + 1]) - get_prod(game_states[i]), 2) + for i in range(len(game_states) - 1)]) + + def individual_states_and_rewards(self, game_state, move, discounted_reward): + """Self-explanatory""" + states = [] + moves = [] + rewards = [] + + for (y, x), k in np.ndenumerate(game_state[0]): + if k == 1 and move[y][x] != -1: + states += [self.state.get_local_and_normalize(game_state, x, y)] + moves += [move[y][x]] + rewards += [discounted_reward[y][x]] + return states, moves, rewards + + def discounted_reward_function(self, next_reward, move_before, strength_before, discount_factor=1.0): + """Self-explanatory""" + reward = np.zeros_like(next_reward) + + def take_value(matrix, x, y): + return np.take(np.take(matrix, x, axis=1, mode='wrap'), y, axis=0, mode='wrap') + + for (y, x), d in np.ndenumerate(move_before): + if d != -1: + dy = (-1 if d == NORTH else 1) if (d == SOUTH or d == NORTH) else 0 + dx = (-1 if d == WEST else 1) if (d == WEST or d == EAST) else 0 + reward[y][x] = discount_factor * take_value(next_reward, x + dx, y + dy) \ + if strength_before[y][x] >= take_value(strength_before, x + dx, y + dy) \ + else 0 + return reward + + def discounted_rewards_function(self, game_states, moves): + """Self-explanatory""" + raw_rewards = self.raw_rewards_function(game_states) + discounted_rewards = np.zeros_like(raw_rewards, dtype=np.float64) + running_reward = np.zeros_like(raw_rewards[0], dtype=np.float64) + for t, (raw_reward, move, game_state) in reversed(list(enumerate(zip(raw_rewards, moves, game_states)))): + running_reward = self.discounted_reward_function(running_reward, move, game_state[1], + discount_factor=self.discount_factor) + \ + self.discounted_reward_function(raw_reward, move, game_state[1]) + discounted_rewards[t] = running_reward - 0.01 + return discounted_rewards + + def all_individual_states_and_rewards(self, game_states, moves, discounted_rewards): + """Self-explanatory""" + all_states = [] + all_moves = [] + all_rewards = [] + for game_state, move, discounted_reward in zip(game_states, moves, discounted_rewards): + states_, moves_, rewards_ = self.individual_states_and_rewards( + game_state, move, discounted_reward) + all_states += states_ + all_moves += moves_ + all_rewards += rewards_ + return np.array(all_states), np.array(all_moves), np.array(all_rewards) + + def all_rewards_function(self, game_states, moves): + discounted_rewards = self.discounted_rewards_function(game_states, moves) + return self.all_individual_states_and_rewards(game_states[:-1], moves, discounted_rewards) diff --git a/train/reward/util.py b/train/reward/util.py new file mode 100644 index 0000000..de9046a --- /dev/null +++ b/train/reward/util.py @@ -0,0 +1,37 @@ +"""Useful for computing the rewards""" +import numpy as np + + +def discount_rewards(r, gamma=0.8): + """ take 1D float array of rewards and compute discounted reward """ + discounted_r = np.zeros_like(r, dtype=np.float64) + running_add = 0 + for t in reversed(range(0, r.size)): + running_add = running_add * gamma + r[t] + discounted_r[t] = running_add + return discounted_r + + +def get_total_prod(game_state): + return np.sum(game_state[0] * game_state[2]) + + +def get_prod(game_state): + return game_state[0] * game_state[2] + + +def get_total_strength(game_state): + return np.sum(game_state[0] * game_state[1]) + + +def get_strength(game_state): + return game_state[0] * game_state[1] + + +def get_total_number(game_state): + return np.sum(game_state[0]) + + +def production_increments_function(game_states): + return np.array([get_total_prod(game_states[i + 1]) - get_total_prod(game_states[i]) + for i in range(len(game_states) - 1)]) diff --git a/train/worker.py b/train/worker.py index 708d8c7..9abd146 100644 --- a/train/worker.py +++ b/train/worker.py @@ -1,13 +1,15 @@ """The worker class for training and parallel operations""" import multiprocessing -import time import os +import time import tensorflow as tf -from train.reward import format_moves, get_game_state -from networking.start_game import start_game from networking.hlt_networking import HLT +from networking.start_game import start_game +from public.hlt import format_moves +from public.state.state import get_game_state + def update_target_graph(from_scope, to_scope): from_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, from_scope) @@ -24,19 +26,21 @@ class Worker(): The Worker class for training. Each worker has an individual port, number, and agent. Each of them work with the global session, and use the global saver. """ + def __init__(self, port, number, agent): self.name = 'worker_' + str(number) self.number = number self.port = port + number def worker(): - start_game(self.port, quiet=True, max_game=-1) # Infinite games + start_game(self.port, quiet=True, max_game=-1, width=25, height=25, max_turn=50, + max_strength=60) # Infinite games self.p = multiprocessing.Process(target=worker) self.p.start() time.sleep(1) - self.hlt = HLT(port=self.port) # Launching the pipe operation + self.hlt = HLT(port=self.port) # Launching the pipe operation self.agent = agent self.update_local_ops = update_target_graph('global', self.name) @@ -67,8 +71,9 @@ def work(self, sess, saver, n_simultations): while self.hlt.get_string() == 'Get map and play!': game_map.get_frame(self.hlt.get_string()) game_states += [get_game_state(game_map, my_id)] - moves += [self.agent.choose_actions(sess, game_states[-1])] - self.hlt.send_frame(format_moves(game_map, moves[-1])) + moves1, moves2 = self.agent.choose_actions(sess, game_states[-1]) + moves += [moves1] # We only train on this + self.hlt.send_frame(format_moves(game_map, -(moves1 * moves2))) self.agent.experience.add_episode(game_states, moves) self.agent.update_agent(sess) diff --git a/visualize/hlt/example.hlt b/visualize/hlt/example.hlt new file mode 100644 index 0000000..50622e2 --- /dev/null +++ b/visualize/hlt/example.hlt @@ -0,0 +1 @@ +{"frames":[[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,5],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,10],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,15],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,20],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,25],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,30],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,35],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,40],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,45],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,50],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,55],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,60],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,55],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,0],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[1,46],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,0],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,5],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,30],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,4],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,10],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,11],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,0],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[1,5],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,8],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,15],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,17],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,6],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,12],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,20],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,23],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,12],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,16],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,25],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,29],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,18],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,20],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,30],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,35],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,24],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,24],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,35],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,41],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,30],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,28],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,40],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,25],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,0],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,14],[1,0],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,32],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[1,34],[1,0],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,32],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,4],[1,6],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,0],[1,6],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,5],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[1,22],[1,0],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[1,39],[1,5],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,39],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,4],[1,12],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,0],[1,12],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,4],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[1,11],[1,0],[1,4],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[1,44],[1,10],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,0],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,4],[1,18],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,0],[1,18],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,4],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[1,6],[1,0],[1,3],[1,8],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[1,37],[1,0],[1,15],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[1,16],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,7],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[1,14],[1,0],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,0],[1,24],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,4],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[1,3],[1,0],[1,2],[1,6],[1,0],[1,3],[0,6],[0,5],[0,3]],[[0,2],[0,3],[1,33],[1,0],[1,5],[1,0],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[1,5],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[1,20],[0,22],[0,18],[0,15],[0,10]]]],"height":10,"moves":[[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,4,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,4,0,0,0,0,0,0],[0,0,0,0,4,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,4,0,0,2,0,0,0,0],[0,0,0,4,0,3,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]]],"num_frames":26,"num_players":1,"player_names":["MyBot"],"productions":[[4,6,5,3,4,7,6,4,2,3],[4,3,3,3,5,6,4,2,2,3],[4,4,3,4,7,6,4,2,3,3],[4,4,3,3,5,5,4,3,4,4],[5,3,2,2,3,4,4,4,4,4],[5,4,3,4,5,5,5,4,4,4],[7,5,5,4,4,4,6,5,3,4],[7,6,4,3,3,4,6,5,2,3],[3,3,3,2,2,4,9,8,3,2],[3,4,3,2,2,4,7,6,3,2]],"version":11,"width":10} \ No newline at end of file diff --git a/visualize/static/visualizer.js b/visualize/static/visualizer.js index 7f1392e..bbca207 100755 --- a/visualize/static/visualizer.js +++ b/visualize/static/visualizer.js @@ -84,7 +84,7 @@ function showGame(game, $container, maxWidth, maxHeight, showmovement, isminimal type: "POST", url: '/post_discounted_rewards', data: JSON.stringify(game), - success: function(data) {discountedRewards = JSON.parse(data)['discounte_rewards']}, + success: function(data) {discountedRewards = JSON.parse(data)['discounted_rewards']}, contentType: "application/json; charset=utf-8", //dataType: "json" }) diff --git a/visualize/visualize.py b/visualize/visualize.py index 37e6c7f..e97dd55 100755 --- a/visualize/visualize.py +++ b/visualize/visualize.py @@ -4,17 +4,20 @@ import sys from io import BytesIO -import numpy as np -import pandas as pd -from flask import Flask, render_template, request, make_response, send_from_directory import matplotlib.pyplot as plt from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas from matplotlib.figure import Figure +import numpy as np +import pandas as pd +from flask import Flask, render_template, request, make_response, send_from_directory + sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) try: - from train.reward import discounted_rewards_function + from train.reward.reward import Reward + from public.state.state import State1 from public.models.bot.TrainedBot import TrainedBot + from public.models.agent.VanillaAgent import VanillaAgent except: raise @@ -58,7 +61,7 @@ def make_tree(path): if os.path.isdir(fn): tree['children'].append(make_tree(fn)) else: - if name != ".DS_Store": + if name not in [".DS_Store", "README.md"]: tree['children'].append(dict(path='hlt/' + name, name=name)) print(np) return tree @@ -126,13 +129,14 @@ def get_strength(square): @app.route('/post_discounted_rewards', methods=['POST']) def post_discounted_rewards(): game_states, moves = convert(request) - discounted_rewards = discounted_rewards_function(game_states, moves) + r = Reward(State1(scope=2)) + discounted_rewards = r.discounted_rewards_function(game_states, moves) return json.dumps({'discounted_rewards': discounted_rewards.tolist()}) @app.route('/post_policies', methods=['POST']) def post_policies(): game_states, _ = convert(request) - bot = TrainedBot() + bot = TrainedBot(VanillaAgent, State1(scope=2)) policies = np.array([bot.get_policies(game_state) for game_state in game_states]) return json.dumps({'policies': policies.tolist()}) From 364380bd678b82bf92f6457f6bf099aab9ba4fb5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 10 Oct 2017 12:30:20 -0700 Subject: [PATCH 41/45] Initiate the use of logging, custom decorator Why this change was necessary: * We have scripts, it is import to log what's happening (especially for undebuggable parts) This change addresses the need by: * Showcasing the logging module * Decorator to log function argument Potential side-effects: * None --- logger.ini | 32 +++++++++++++++++++++++ train/decorators.py | 64 +++++++++++++++++++++++++++++++++++++++++++++ train/worker.py | 36 ++++++++++++++++++++----- 3 files changed, 125 insertions(+), 7 deletions(-) create mode 100644 logger.ini create mode 100644 train/decorators.py diff --git a/logger.ini b/logger.ini new file mode 100644 index 0000000..ede64f2 --- /dev/null +++ b/logger.ini @@ -0,0 +1,32 @@ +[loggers] +keys = root + +[handlers] +keys = consoleHandler, fileHandler + +[formatters] +keys = simple, detailed + +[logger_root] +level = DEBUG +handlers = consoleHandler, fileHandler + +[handler_consoleHandler] +class = StreamHandler +level = INFO +formatter = simple +args = (sys.stdout,) + +[handler_fileHandler] +class = handlers.RotatingFileHandler +level = DEBUG +formatter = detailed +args = ('halite.log', 'a', 2000000, 5) + +[formatter_simple] +format = %(module)s - %(funcName)s - %(levelname)s - %(message)s +datefmt = %Y-%m-%d %H:%M:%S + +[formatter_detailed] +format = %(asctime)s %(name)s:%(lineno)s - %(module)s - %(funcName)s - %(levelname)s %(message)s +datefmt = %Y-%m-%d %H:%M:%S \ No newline at end of file diff --git a/train/decorators.py b/train/decorators.py new file mode 100644 index 0000000..135ad7b --- /dev/null +++ b/train/decorators.py @@ -0,0 +1,64 @@ +# -*- coding: utf-8 -*- +""" +Contributors: + - Louis Rémus +""" +import logging.config +import os + +log_file_path = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), + 'logger.ini') +logging.config.fileConfig(log_file_path, disable_existing_loggers=False) + + +def log_args(func): + """ + Decorator to print function call details - parameters names and effective values + Args: + func (function): + + Returns: + Decorated function + """ + logger = logging.getLogger(__name__) + + def wrapper(*func_args, **func_kwargs): + """ + Wrapper of our decorator + Args: + *func_args (tuple): + **func_kwargs (dict): + + Returns: + decorated function + """ + # Get the arguments' names + arg_names = func.__code__.co_varnames + params = [('args', dict(zip(arg_names, func_args))), + ('kwargs', func_kwargs)] + # Do not forget the default arguments + # Values + defaults = func.__defaults__ or () + # Their names (ordered) + default_names = set(arg_names) - set(params[0][1].keys()) - set(params[1][1].keys()) + # Map them to their names + defaults_mapped = dict(zip(default_names, defaults)) + # Add them to the list of parameters to print + params.append(('defaults', defaults_mapped)) + # Log our parameters + logger.debug('{} ({})'.format(func.__name__, ', '.join('%s = %r' % p for p in params))) + # Return our function execution + return func(*func_args, **func_kwargs) + + return wrapper + + +# Example function +@log_args +def awesome_function(a, b, c=1, d='Edouard', e='est pd'): + print(a + b + c) + print('{} {}'.format(d, e)) + +# For test purposes +if __name__ == '__main__': + awesome_function(1, b=3) diff --git a/train/worker.py b/train/worker.py index 708d8c7..58f803d 100644 --- a/train/worker.py +++ b/train/worker.py @@ -1,13 +1,17 @@ """The worker class for training and parallel operations""" +import logging +import logging.config import multiprocessing -import time import os +import time import tensorflow as tf +from train.decorators import log_args -from train.reward import format_moves, get_game_state -from networking.start_game import start_game from networking.hlt_networking import HLT +from networking.start_game import start_game +from train.reward import format_moves, get_game_state + def update_target_graph(from_scope, to_scope): from_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, from_scope) @@ -19,12 +23,23 @@ def update_target_graph(from_scope, to_scope): return op_holder -class Worker(): +class Worker: """ The Worker class for training. Each worker has an individual port, number, and agent. Each of them work with the global session, and use the global saver. """ + + @log_args def __init__(self, port, number, agent): + # Logger definition + log_file_path = os.path.join(os.path.dirname(os.path.dirname(os.path.abspath(__file__))), + 'logger.ini') + logging.config.fileConfig(log_file_path, disable_existing_loggers=False) + self.logger = logging.getLogger(__name__) + + # Log __init__ arguments to DEBUG level + self.logger.debug('port: {} number: {} agent: {}'.format(port, number, agent)) + self.name = 'worker_' + str(number) self.number = number self.port = port + number @@ -36,11 +51,14 @@ def worker(): self.p.start() time.sleep(1) - self.hlt = HLT(port=self.port) # Launching the pipe operation + self.hlt = HLT(port=self.port) # Launching the pipe operation self.agent = agent self.update_local_ops = update_target_graph('global', self.name) + # We finished correctly + self.logger.info('{} __init__ completed'.format(self.name)) + def work(self, sess, saver, n_simultations): """ Using the pipe operation launched at initialization, @@ -52,7 +70,7 @@ def work(self, sess, saver, n_simultations): Afterwards the process is stopped. :return: """ - print("Starting worker " + str(self.number)) + self.logger.info("Starting worker {}".format(self.number)) with sess.as_default(), sess.graph.as_default(): for i in range(n_simultations): # while not coord.should_stop(): @@ -79,9 +97,13 @@ def work(self, sess, saver, n_simultations): + '/public/models/variables/' \ + self.agent.name + '/' if not os.path.exists(directory): - print("Creating directory for agent :" + self.agent.name) + self.logger.info('Creating directory for agent : {}'.format(self.agent.name)) os.makedirs(directory) saver.save(sess, directory + self.agent.name) self.agent.experience.save_metric(directory + self.agent.name) self.p.terminate() + + +if __name__ == '__main__': + worker_ = Worker(1, 2, 3) From 4b6bfa8b1f035a4e1cbe4fbe0c9482329a3c31b5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Louis=20R=C3=A9mus?= <15720130+louis-r@users.noreply.github.com> Date: Tue, 10 Oct 2017 12:50:35 -0700 Subject: [PATCH 42/45] Made files PyLint compliant --- train/decorators.py | 3 ++- train/worker.py | 14 +++++--------- 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/train/decorators.py b/train/decorators.py index 135ad7b..5cd170c 100644 --- a/train/decorators.py +++ b/train/decorators.py @@ -46,7 +46,7 @@ def wrapper(*func_args, **func_kwargs): # Add them to the list of parameters to print params.append(('defaults', defaults_mapped)) # Log our parameters - logger.debug('{} ({})'.format(func.__name__, ', '.join('%s = %r' % p for p in params))) + logger.debug('%s (%s)', func.__name__, ', '.join('%s = %r' % p for p in params)) # Return our function execution return func(*func_args, **func_kwargs) @@ -59,6 +59,7 @@ def awesome_function(a, b, c=1, d='Edouard', e='est pd'): print(a + b + c) print('{} {}'.format(d, e)) + # For test purposes if __name__ == '__main__': awesome_function(1, b=3) diff --git a/train/worker.py b/train/worker.py index 58f803d..9100eed 100644 --- a/train/worker.py +++ b/train/worker.py @@ -7,10 +7,10 @@ import tensorflow as tf from train.decorators import log_args +from train.reward import format_moves, get_game_state from networking.hlt_networking import HLT from networking.start_game import start_game -from train.reward import format_moves, get_game_state def update_target_graph(from_scope, to_scope): @@ -38,7 +38,7 @@ def __init__(self, port, number, agent): self.logger = logging.getLogger(__name__) # Log __init__ arguments to DEBUG level - self.logger.debug('port: {} number: {} agent: {}'.format(port, number, agent)) + self.logger.debug('port: %d number: %d agent: %d', port, number, agent) self.name = 'worker_' + str(number) self.number = number @@ -57,7 +57,7 @@ def worker(): self.update_local_ops = update_target_graph('global', self.name) # We finished correctly - self.logger.info('{} __init__ completed'.format(self.name)) + self.logger.info('%s __init__ completed', self.name) def work(self, sess, saver, n_simultations): """ @@ -70,7 +70,7 @@ def work(self, sess, saver, n_simultations): Afterwards the process is stopped. :return: """ - self.logger.info("Starting worker {}".format(self.number)) + self.logger.info("Starting worker %d", self.number) with sess.as_default(), sess.graph.as_default(): for i in range(n_simultations): # while not coord.should_stop(): @@ -97,13 +97,9 @@ def work(self, sess, saver, n_simultations): + '/public/models/variables/' \ + self.agent.name + '/' if not os.path.exists(directory): - self.logger.info('Creating directory for agent : {}'.format(self.agent.name)) + self.logger.info('Creating directory for agent : %s', self.agent.name) os.makedirs(directory) saver.save(sess, directory + self.agent.name) self.agent.experience.save_metric(directory + self.agent.name) self.p.terminate() - - -if __name__ == '__main__': - worker_ = Worker(1, 2, 3) From 94a84f10ee2d194fc4ac7926d4b23765ef28e47b Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Wed, 11 Oct 2017 16:32:45 +0200 Subject: [PATCH 43/45] Create a json config file for agent / state / hyperparameters /training Why this change was necessary: * Need to change every time the parameters, when training a model, playing the bot, or using the graphical debugging interface This change addresses the need by: * Decoupling the Experience, Reward, State object, and initializing all of these in a Strategy object which could potentially cover multiple agent. --- .gitignore | 3 +- .pylintrc | 2 +- .travis.yml | 1 + docs/_posts/2017-10-08-dijkstra.markdown | 17 ++- networking/runGame.bat | 2 +- networking/runGame.sh | 4 +- public/MyBot.py | 7 +- public/OpponentBot.py | 4 +- public/models/agent/Agent.py | 48 +------ public/models/agent/VanillaAgent.py | 74 +++-------- public/models/bot/TrainedBot.py | 26 ---- .../ImprovedStrategy.py} | 4 +- .../RandomStrategy.py} | 4 +- .../{bot/Bot.py => strategy/Strategy.py} | 2 +- public/models/strategy/TrainedStrategy.py | 119 ++++++++++++++++++ public/strategy.json | 11 ++ tests/reward_test.py | 10 +- tests/tensorflow_call_speed_test.py | 20 ++- tests/util.py | 11 +- train/experience.py | 34 +++-- train/main.py | 43 +++---- train/worker.py | 34 ++--- visualize/hlt/example.hlt | 1 - visualize/visualize.py | 12 +- 24 files changed, 261 insertions(+), 232 deletions(-) delete mode 100644 public/models/bot/TrainedBot.py rename public/models/{bot/ImprovedBot.py => strategy/ImprovedStrategy.py} (89%) rename public/models/{bot/RandomBot.py => strategy/RandomStrategy.py} (79%) rename public/models/{bot/Bot.py => strategy/Strategy.py} (89%) create mode 100644 public/models/strategy/TrainedStrategy.py create mode 100644 public/strategy.json delete mode 100644 visualize/hlt/example.hlt diff --git a/.gitignore b/.gitignore index ae86ca3..6baa3ae 100644 --- a/.gitignore +++ b/.gitignore @@ -105,7 +105,8 @@ ENV/ .mypy_cache/ .benchmarks/ public/halite -public/models/variables/vanilla/ +public/models/variables/* src/core/Halite.o src/main.o src/networking/Networking.o +visualize/other_hlt/ diff --git a/.pylintrc b/.pylintrc index abe13e9..ed97dc9 100644 --- a/.pylintrc +++ b/.pylintrc @@ -38,7 +38,7 @@ enable=indexing-exception,old-raise-syntax # --enable=similarities". If you want to run only the classes checker, but have # no Warning level messages displayed, use"--disable=all --enable=classes # --disable=W" -disable=invalid-unary-operand-type,design,similarities,no-self-use,attribute-defined-outside-init,locally-disabled,star-args,pointless-except,bad-option-value,global-statement,fixme,suppressed-message,useless-suppression,locally-enabled,no-member,no-name-in-module,import-error,unsubscriptable-object,unbalanced-tuple-unpacking,undefined-variable,not-context-manager +disable=arguments-differ,len-as-condition,invalid-unary-operand-type,design,similarities,no-self-use,attribute-defined-outside-init,locally-disabled,star-args,pointless-except,bad-option-value,global-statement,fixme,suppressed-message,useless-suppression,locally-enabled,no-member,no-name-in-module,import-error,unsubscriptable-object,unbalanced-tuple-unpacking,undefined-variable,not-context-manager # Set the cache size for astng objects. diff --git a/.travis.yml b/.travis.yml index 590f538..58f7dbd 100644 --- a/.travis.yml +++ b/.travis.yml @@ -26,6 +26,7 @@ script: - find . -iname "*.py" | xargs pylint # Coverage checks + - python networking/start_game.py -sp "OpponentBot.py" - py.test --cov=train tests/ after_success: diff --git a/docs/_posts/2017-10-08-dijkstra.markdown b/docs/_posts/2017-10-08-dijkstra.markdown index 6d667d0..18f8598 100644 --- a/docs/_posts/2017-10-08-dijkstra.markdown +++ b/docs/_posts/2017-10-08-dijkstra.markdown @@ -17,7 +17,7 @@ We had dealt with the problem of border squares, learning with a neural network. Dijsktra algorithm, which runs here in linear time, gives us the ability -## Our currently trained vanilla model +## Our currently trained vanilla model with scope 1 Trained for 3h with 8 workers, each simulating 2500 games. The batch_size was 128. @@ -49,4 +49,19 @@ And finally the chosen state was: ``` State1(scope=2) +``` + +## With scope 2 + +``` +State1(scope=2) +lr=1e-4, a_size=5, h_size=200 +width=25, height=25, max_turn=50,max_strength=60 +... +0.1 * np.power(get_prod(game_states[i + 1]) - get_prod(game_states[i]), 2) +... +discount_factor=0.6 +... +0.01 +discounted_rewards[t] = running_reward - 0.01 ``` \ No newline at end of file diff --git a/networking/runGame.bat b/networking/runGame.bat index 7d9bd31..8b54593 100644 --- a/networking/runGame.bat +++ b/networking/runGame.bat @@ -1 +1 @@ -.\halite.exe -d "30 30" "python MyBot.py" "python RandomBot.py" +.\halite.exe -d "30 30" "python MyBot.py" "python RandomStrategy.py" diff --git a/networking/runGame.sh b/networking/runGame.sh index 275e549..61c4df4 100755 --- a/networking/runGame.sh +++ b/networking/runGame.sh @@ -1,7 +1,7 @@ #!/bin/bash if hash python3 2>/dev/null; then - ./halite -d "30 30" "python3 MyBot.py" "python3 RandomBot.py" + ./halite -d "30 30" "python3 MyBot.py" "python3 RandomStrategy.py" else - ./halite -d "30 30" "python MyBot.py" "python RandomBot.py" + ./halite -d "30 30" "python MyBot.py" "python RandomStrategy.py" fi diff --git a/public/MyBot.py b/public/MyBot.py index 45e4f50..8b196eb 100644 --- a/public/MyBot.py +++ b/public/MyBot.py @@ -4,9 +4,7 @@ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) try: - from public.state.state import State1 - from public.models.bot.TrainedBot import TrainedBot - from public.models.agent.VanillaAgent import VanillaAgent + from public.models.strategy.TrainedStrategy import TrainedStrategy from networking.hlt_networking import HLT except: raise @@ -18,7 +16,8 @@ port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 hlt = HLT(port=port) -bot = TrainedBot(VanillaAgent, State1(scope=2)) +bot = TrainedStrategy() +bot.init_session() while True: my_id, game_map = hlt.get_init() diff --git a/public/OpponentBot.py b/public/OpponentBot.py index 67344c3..fcac0ad 100644 --- a/public/OpponentBot.py +++ b/public/OpponentBot.py @@ -4,7 +4,7 @@ sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) try: - from public.models.bot.ImprovedBot import ImprovedBot + from public.models.strategy.ImprovedStrategy import ImprovedStrategy from networking.hlt_networking import HLT except: raise @@ -16,7 +16,7 @@ port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 hlt = HLT(port=port) -bot = ImprovedBot() +bot = ImprovedStrategy() while True: my_id, game_map = hlt.get_init() diff --git a/public/models/agent/Agent.py b/public/models/agent/Agent.py index 21c3d00..f4ba78d 100644 --- a/public/models/agent/Agent.py +++ b/public/models/agent/Agent.py @@ -1,55 +1,11 @@ """The Agent general class""" -import os - -import numpy as np -import tensorflow as tf -from tensorflow.python.framework.errors_impl import InvalidArgumentError class Agent: """The Agent general class""" - def __init__(self, name, state, experience): - self.name = name - self.experience = experience - self.state = state - if self.experience is not None: - try: - self.experience.metric = np.load(os.path.abspath( - os.path.join(os.path.dirname(__file__), '..')) - + '/variables/' + self.name + '/' - + self.name + '.npy') - except FileNotFoundError: - print("Metric file not found") - self.experience.metric = np.array([]) - - def choose_actions(self, sess, state, frac_progress=1.0, debug=False): + def choose_actions(self, sess, local_game_state_n, train=True): pass - def update_agent(self, sess): + def update_agent(self, sess, states, moves, rewards): pass - - -def start_agent(agent_class, state): - """Start and return a tf session and its corresponding agent""" - os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' - tf.reset_default_graph() - - with tf.device("/cpu:0"): - with tf.variable_scope('global'): - agent = agent_class(state) - - global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') - saver = tf.train.Saver(global_variables) - init = tf.global_variables_initializer() - - sess = tf.Session() - sess.run(init) - try: - saver.restore(sess, os.path.abspath( - os.path.join(os.path.dirname(__file__), '..')) - + '/variables/' + agent.name + '/' - + agent.name) - except InvalidArgumentError: - print("Model not found - initiating new one") - return sess, agent diff --git a/public/models/agent/VanillaAgent.py b/public/models/agent/VanillaAgent.py index b559b7c..eb23968 100644 --- a/public/models/agent/VanillaAgent.py +++ b/public/models/agent/VanillaAgent.py @@ -5,18 +5,15 @@ import tensorflow.contrib.slim as slim from public.models.agent.Agent import Agent -from public.util.dijkstra import build_graph_from_state, dijkstra -from public.util.path import move_to, path_to - class VanillaAgent(Agent): """The Vanilla Agent""" - def __init__(self, state, experience=None, lr=1e-4, a_size=5, h_size=200): # all these are optional ? - super(VanillaAgent, self).__init__('vanilla-debug', state, experience) + def __init__(self, lr=1e-4, s_size=50, h_size=200, a_size=5): # all these are optional ? + super(VanillaAgent, self).__init__() # These lines established the feed-forward part of the network. The agent takes a state and produces an action. - self.state_in = tf.placeholder(shape=[None, state.local_size], dtype=tf.float32) + self.state_in = tf.placeholder(shape=[None, s_size], dtype=tf.float32) hidden = slim.fully_connected(self.state_in, h_size, activation_fn=tf.nn.relu) @@ -29,33 +26,26 @@ def __init__(self, state, experience=None, lr=1e-4, a_size=5, h_size=200): # al self.action_holder = tf.placeholder(shape=[None], dtype=tf.int32) self.indexes = tf.range(0, tf.shape(self.policy)[0]) * tf.shape(self.policy)[1] + self.action_holder - self.responsible_outputs = tf.gather(tf.reshape(self.policy, [-1]), self.indexes) - if experience is not None: - loss = -tf.reduce_mean(tf.log(self.responsible_outputs) * self.reward_holder) + self.responsible_outputs = tf.gather(tf.reshape(self.policy, [-1]), self.indexes) #TODO... + + loss = -tf.reduce_mean(tf.log(self.responsible_outputs) * self.reward_holder) - self.tvars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=tf.get_variable_scope().name) - self.gradients = tf.gradients(loss, self.tvars) + self.tvars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope=tf.get_variable_scope().name) + self.gradients = tf.gradients(loss, self.tvars) - self.gradient_holders = [] - for idx in range(len(self.tvars)): - placeholder = tf.placeholder(tf.float32, name=str(idx) + '_holder') - self.gradient_holders.append(placeholder) + self.gradient_holders = [] + for idx in range(len(self.tvars)): + placeholder = tf.placeholder(tf.float32, name=str(idx) + '_holder') + self.gradient_holders.append(placeholder) - global_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'global') - optimizer = tf.train.AdamOptimizer(learning_rate=lr) + global_vars = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, 'global') + optimizer = tf.train.AdamOptimizer(learning_rate=lr) - self.update_global = optimizer.apply_gradients(zip(self.gradient_holders, global_vars)) # self.tvars + self.update_global = optimizer.apply_gradients(zip(self.gradient_holders, global_vars)) # self.tvars def get_policy(self, sess, state): return sess.run(self.policy, feed_dict={self.state_in: [state.reshape(-1)]}) - def get_policies(self, sess, game_state): - policies = np.zeros(game_state[0].shape + (5,)) - for (y, x), k in np.ndenumerate(game_state[0]): - if k == 1: - policies[y][x] = self.get_policy(sess, self.state.get_local_and_normalize(game_state, x, y)) - return policies - def choose_action(self, sess, state, train=True): # Here the state is normalized ! if train: # keep randomness @@ -66,40 +56,16 @@ def choose_action(self, sess, state, train=True): a = sess.run(self.predict, feed_dict={self.state_in: [state.reshape(-1)]}) return a - def choose_actions(self, sess, game_state, train=True): + def choose_actions(self, sess, local_game_state_n, train=True): """Choose all actions using one call to tensorflow""" - g = build_graph_from_state(game_state[0]) - dist_dict, closest_dict = dijkstra(g.g, 0) - all_game_state_n = np.array([]).reshape(0, self.state.local_size) - moves = np.zeros_like(game_state[0], dtype=np.int64) - 1 - moves_2 = np.zeros_like(game_state[0], dtype=np.int64) - 1 - moves_where = [] - for (y, x), k in np.ndenumerate(game_state[0]): - if k == 1: - if (y, x) in dist_dict and dist_dict[(y, x)] in [1, 2]: - moves_where += [(y, x)] - game_state_n = self.state.get_local_and_normalize(game_state, x, y).reshape(1, - self.state.local_size) - all_game_state_n = np.concatenate((all_game_state_n, game_state_n), axis=0) - else: - if game_state[1][y][x] > 10: # Set a minimum strength - y_t, x_t = y, x - y_t, x_t = closest_dict[(y_t, x_t)] - # while closest_dict[(y_t,x_t)]!=0: Pbbly unnecessary - # y_t, x_t = closest_dict[(y_t, x_t)] - moves_2[y][x] = move_to(path_to((x, y), (x_t, y_t), len(game_state[0][0]), len(game_state[0]))) if train: - actions = sess.run(self.policy, feed_dict={self.state_in: all_game_state_n}) + actions = sess.run(self.policy, feed_dict={self.state_in: local_game_state_n}) actions = [np.argmax(action == np.random.choice(action, p=action)) for action in actions] else: - actions = sess.run(self.predict, feed_dict={self.state_in: all_game_state_n}) - for (y, x), d in zip(moves_where, actions): - moves[y][x] = d - return moves, moves_2 - - def update_agent(self, sess): - states, moves, rewards = self.experience.batch() + actions = sess.run(self.predict, feed_dict={self.state_in: local_game_state_n}) + return actions + def update_agent(self, sess, states, moves, rewards): feed_dict = {self.state_in: states, self.action_holder: moves, self.reward_holder: rewards} diff --git a/public/models/bot/TrainedBot.py b/public/models/bot/TrainedBot.py deleted file mode 100644 index 3d72e5e..0000000 --- a/public/models/bot/TrainedBot.py +++ /dev/null @@ -1,26 +0,0 @@ -"""The Trained Bot""" -from public.hlt import format_moves -from public.models.agent.Agent import start_agent -from public.models.bot.Bot import Bot -from public.state.state import get_game_state - - -class TrainedBot(Bot): - """The trained bot""" - - def __init__(self, agent_class, state): - self.sess, self.agent = start_agent(agent_class, state) - - def compute_moves(self, game_map): - """Compute the moves given a game_map""" - game_state = get_game_state(game_map, self.my_id) - moves1, moves2 = self.agent.choose_actions(self.sess, game_state, train=False) - return format_moves(game_map, -(moves1*moves2)) - - def get_policies(self, game_state): - """Compute the policies given a game_state""" - return self.agent.get_policies(self.sess, game_state) - - def close(self): - """Close the tensorflow session""" - self.sess.close() diff --git a/public/models/bot/ImprovedBot.py b/public/models/strategy/ImprovedStrategy.py similarity index 89% rename from public/models/bot/ImprovedBot.py rename to public/models/strategy/ImprovedStrategy.py index c655eea..e35d2ce 100644 --- a/public/models/bot/ImprovedBot.py +++ b/public/models/strategy/ImprovedStrategy.py @@ -2,10 +2,10 @@ import random from public.hlt import Move, NORTH, STILL, WEST -from public.models.bot.Bot import Bot +from public.models.strategy.Strategy import Strategy -class ImprovedBot(Bot): +class ImprovedStrategy(Strategy): def compute_moves(self, game_map): """Compute the moves given a game_map""" moves = [] diff --git a/public/models/bot/RandomBot.py b/public/models/strategy/RandomStrategy.py similarity index 79% rename from public/models/bot/RandomBot.py rename to public/models/strategy/RandomStrategy.py index 1827185..e0f0424 100644 --- a/public/models/bot/RandomBot.py +++ b/public/models/strategy/RandomStrategy.py @@ -2,10 +2,10 @@ import random from public.hlt import EAST, Move, NORTH, SOUTH, STILL, WEST -from public.models.bot.Bot import Bot +from public.models.strategy.Strategy import Strategy -class RandomBot(Bot): +class RandomStrategy(Strategy): def compute_moves(self, game_map): """Compute the moves given a game_map""" return [Move(square, random.choice((NORTH, EAST, SOUTH, WEST, STILL))) for square in game_map if diff --git a/public/models/bot/Bot.py b/public/models/strategy/Strategy.py similarity index 89% rename from public/models/bot/Bot.py rename to public/models/strategy/Strategy.py index f553835..69a932b 100644 --- a/public/models/bot/Bot.py +++ b/public/models/strategy/Strategy.py @@ -1,5 +1,5 @@ """The General Bot class""" -class Bot: +class Strategy: def compute_moves(self, game_map): pass diff --git a/public/models/strategy/TrainedStrategy.py b/public/models/strategy/TrainedStrategy.py new file mode 100644 index 0000000..b94465b --- /dev/null +++ b/public/models/strategy/TrainedStrategy.py @@ -0,0 +1,119 @@ +"""The Trained Bot""" +import json +import os +import warnings + +import numpy as np +import tensorflow as tf +from tensorflow.python.framework.errors_impl import InvalidArgumentError + +from public.hlt import format_moves +from public.models.agent.VanillaAgent import VanillaAgent +from public.models.strategy.Strategy import Strategy +from public.state.state import get_game_state, State1 +from public.util.dijkstra import build_graph_from_state, dijkstra +from public.util.path import move_to, path_to +from train.experience import ExperienceVanilla +from train.reward.reward import Reward + + +class TrainedStrategy(Strategy): + """The trained strategy""" + + def __init__(self, tf_scope='global'): + os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' + tf.reset_default_graph() + warnings.filterwarnings("ignore") + + config = open(os.path.abspath(os.path.join(os.path.dirname(__file__), '../../strategy.json'))).read() + config = json.loads(config) + self.name = config["saving_name"] + self.state = State1(scope=config["agent"]["scope"]) + self.reward = Reward(state=self.state) + self.experience = ExperienceVanilla(self.state.local_size, self.name) + with tf.variable_scope(tf_scope): + self.agent1 = VanillaAgent(s_size=self.state.local_size, h_size=config["agent"]["h_size"]) + + def init_session(self, sess=None, saver=None): + if sess is None: + global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') + self.saver = tf.train.Saver(global_variables) + init = tf.global_variables_initializer() + self.sess = tf.Session() + self.sess.run(init) + try: + self.saver.restore(self.sess, os.path.abspath( + os.path.join(os.path.dirname(__file__), '..')) + + '/variables/' + self.name + '/' + + self.name) + except InvalidArgumentError: + print("Model not found - initiating new one") + else: + self.sess = sess + self.saver = saver + + def set_id(self, my_id): + super(TrainedStrategy, self).set_id(my_id) + self.agent1_moves = [] + self.agent1_game_states = [] + + def compute_moves(self, game_map, train=False): + """Compute the moves given a game_map""" + game_state = get_game_state(game_map, self.my_id) + g = build_graph_from_state(game_state[0]) + dist_dict, closest_dict = dijkstra(g.g, 0) + self.agent1_game_states += [game_state] + self.agent1_moves += [np.zeros_like(game_state[0], dtype=np.int64) - 1] + self.agent1_local_game_states = np.array([]).reshape(0, self.state.local_size) + dijkstra_moves = np.zeros_like(game_state[0], dtype=np.int64) - 1 + agent1_positions = [] + for (y, x), k in np.ndenumerate(game_state[0]): + if k == 1: + if (y, x) in dist_dict and dist_dict[(y, x)] in [1, 2]: + agent1_positions += [(y, x)] + game_state_n = self.state.get_local_and_normalize(game_state, x, y).reshape(1, + self.state.local_size) + self.agent1_local_game_states = np.concatenate((self.agent1_local_game_states, game_state_n), + axis=0) + else: + if game_state[1][y][x] > 10: # Set a minimum strength + y_t, x_t = y, x + y_t, x_t = closest_dict[(y_t, x_t)] + dijkstra_moves[y][x] = move_to( + path_to((x, y), (x_t, y_t), len(game_state[0][0]), len(game_state[0]))) + actions = self.agent1.choose_actions(self.sess, self.agent1_local_game_states, train) + for (y, x), d in zip(agent1_positions, actions): + self.agent1_moves[-1][y][x] = d + + return format_moves(game_map, -(self.agent1_moves[-1] * dijkstra_moves)) + + def add_episode(self): + all_states, all_moves, all_rewards = self.reward.all_rewards_function(self.agent1_game_states, + self.agent1_moves) + self.experience.add_episode(self.agent1_game_states, all_states, all_moves, all_rewards) + + def update_agent(self): + train_states, train_moves, train_rewards = self.experience.batch() + self.agent1.update_agent(self.sess, train_states, train_moves, train_rewards) + + def save(self): + directory = os.path.abspath(os.path.join(os.path.dirname(__file__), '..')) + '/variables/' + self.name + if not os.path.exists(directory): + print("Creating directory named :" + self.name) + os.makedirs(directory) + self.saver.save(self.sess, directory + '/' + self.name) + self.experience.save_metric(directory + '/' + self.name) + + def get_policies(self, game_states): + policies = [] + for game_state in game_states: + policies += [np.zeros(game_state[0].shape + (5,))] + for (y, x), k in np.ndenumerate(game_state[0]): + if k == 1: + policies[-1][y][x] = self.agent1.get_policy( + self.sess, self.state.get_local_and_normalize(game_state, x, y)) + return np.array(policies) + + def close(self): + """Close the tensorflow session""" + self.sess.close() diff --git a/public/strategy.json b/public/strategy.json new file mode 100644 index 0000000..54a1625 --- /dev/null +++ b/public/strategy.json @@ -0,0 +1,11 @@ +{ + "agent":{ + "type":"vanilla", + "scope":2, + "h_size":200 + }, + "dijkstra":{ + "scope":2 + }, + "saving_name": "vanilla-scope-2" +} \ No newline at end of file diff --git a/tests/reward_test.py b/tests/reward_test.py index 868d8c4..268c256 100644 --- a/tests/reward_test.py +++ b/tests/reward_test.py @@ -32,7 +32,8 @@ def test_reward(self): game_url = 'https://s3.eu-central-1.amazonaws.com/halite-python-rl/hlt-games/trained-bot.hlt' game_states, moves = game_states_from_url(game_url) - r = Reward(State1()) + s = State1() + r = Reward(s) raw_rewards = r.raw_rewards_function(game_states) self.assertTrue(len(raw_rewards) == len(game_states) - 1) @@ -40,9 +41,10 @@ def test_reward(self): self.assertTrue(len(all_states) >= len(game_states) - 1) self.assertTrue(len(all_moves) >= len(moves)) self.assertTrue(len(all_rewards) == len(all_moves) and len(all_states) == len(all_moves)) - experience = ExperienceVanilla(r) - experience.add_episode(game_states, moves) - experience.add_episode(game_states, moves) + + experience = ExperienceVanilla(s.local_size, name='') + experience.add_episode(game_states, all_states, all_moves, all_rewards) + experience.add_episode(game_states, all_states, all_moves, all_rewards) self.assertTrue(len(experience.moves) == 2 * len(all_moves)) batch_states, batch_moves, batch_rewards = experience.batch() self.assertTrue(len(batch_rewards) == len(batch_moves) and len(batch_states) == len(batch_moves)) diff --git a/tests/tensorflow_call_speed_test.py b/tests/tensorflow_call_speed_test.py index 3d43572..2c7227d 100644 --- a/tests/tensorflow_call_speed_test.py +++ b/tests/tensorflow_call_speed_test.py @@ -2,14 +2,11 @@ import numpy as np import pytest -from public.models.agent.Agent import start_agent -from public.models.agent.VanillaAgent import VanillaAgent -from public.state.state import State1 +from public.models.strategy.TrainedStrategy import TrainedStrategy from tests.util import game_states_from_file -def tensorflow_naive(game_states, sess, agent): - state = State1(scope=1) +def tensorflow_naive(game_states, sess, agent, state): for game_state in game_states: for y in range(len(game_state[0])): for x in range(len(game_state[0][0])): @@ -18,8 +15,7 @@ def tensorflow_naive(game_states, sess, agent): sess.run(agent.policy, feed_dict={agent.state_in: game_state_n}) -def tensorflow_combined(game_states, sess, agent): - state = State1(scope=1) +def tensorflow_combined(game_states, sess, agent, state): for game_state in game_states: all_game_state_n = np.array([]).reshape(0, state.local_size) for y in range(len(game_state[0])): @@ -35,9 +31,10 @@ def test_tensorflow_naive_speed(benchmark): """ Benchmark the time of dijsktra """ - sess, agent = start_agent(VanillaAgent, State1(scope=1)) + bot = TrainedStrategy() + bot.init_session() game_states, _ = game_states_from_file() - benchmark(tensorflow_naive, game_states=game_states, sess=sess, agent=agent) + benchmark(tensorflow_naive, game_states=game_states, sess=bot.sess, agent=bot.agent1, state=bot.state) assert True @@ -46,7 +43,8 @@ def test_tensorflow_combined_speed(benchmark): """ Benchmark the time of dijsktra """ - sess, agent = start_agent(VanillaAgent, State1(scope=1)) + bot = TrainedStrategy() + bot.init_session() game_states, _ = game_states_from_file() - benchmark(tensorflow_combined, game_states=game_states, sess=sess, agent=agent) + benchmark(tensorflow_combined, game_states=game_states, sess=bot.sess, agent=bot.agent1, state=bot.state) assert True diff --git a/tests/util.py b/tests/util.py index 0348eed..7e8ca09 100644 --- a/tests/util.py +++ b/tests/util.py @@ -1,7 +1,8 @@ """Importing the game from aws""" -import os import json +import os import urllib.request + import numpy as np @@ -30,9 +31,9 @@ def text_to_game(text): return game_states, moves -def game_states_from_file(filepath=None): - path_to_hlt = 'visualize/hlt/' if filepath is None else filepath # 'visualize/hlt/' +def game_states_from_file(): + path_to_hlt = os.path.abspath(os.path.join(os.path.dirname(__file__), '../visualize/hlt/')) # 'visualize/hlt/' - hlt_files = [hlt_file for hlt_file in os.listdir(path_to_hlt) if hlt_file != '.DS_Store'] + hlt_files = [hlt_file for hlt_file in os.listdir(path_to_hlt) if hlt_file not in ['.DS_Store', 'README.md']] filepath = hlt_files[0] - return text_to_game(open(path_to_hlt + filepath).read()) + return text_to_game(open(path_to_hlt + '/' + filepath).read()) diff --git a/train/experience.py b/train/experience.py index 55707a3..8f83d25 100644 --- a/train/experience.py +++ b/train/experience.py @@ -1,6 +1,8 @@ """ Experience class definition """ +import os + import numpy as np from train.reward.util import production_increments_function @@ -11,13 +13,15 @@ class Experience: Experience class to store moves, rewards and metric values """ - def __init__(self): + def __init__(self, max_size=10000, min_size=5000): + self.max_size = max_size + self.min_size = min_size self.moves = np.array([]) self.rewards = np.array([]) self.metric = np.array([]) - def add_episode(self, game_states, moves): + def add_episode(self, game_states, all_states, all_moves, all_rewards): pass def batch(self, size): @@ -36,18 +40,32 @@ class ExperienceVanilla(Experience): Stores states in addition to the inherited attributes of Experience """ - def __init__(self, reward): + def __init__(self, s_size, name): super(ExperienceVanilla, self).__init__() - self.reward = reward - self.states = np.array([]).reshape(0, self.reward.state.local_size) + self.s_size = s_size + self.states = np.array([]).reshape(0, s_size) + try: + self.metric = np.load(os.path.abspath( + os.path.join(os.path.dirname(__file__), '..')) + + '/public/models/variables/' + name + '/' + + name + '.npy') + except FileNotFoundError: + print("Metric file not found") + self.metric = np.array([]) - def add_episode(self, game_states, moves): + def add_episode(self, game_states, all_states, all_moves, all_rewards): self.compute_metric(game_states) - all_states, all_moves, all_rewards = self.reward.all_rewards_function(game_states, moves) - self.states = np.concatenate((self.states, all_states.reshape(-1, self.reward.state.local_size)), axis=0) + self.states = np.concatenate((self.states, all_states.reshape(-1, self.s_size)), axis=0) self.moves = np.concatenate((self.moves, all_moves)) self.rewards = np.concatenate((self.rewards, all_rewards)) + if len(self.states) >= self.max_size: + self.resize() + + def resize(self): + self.states = self.states[self.min_size:] + self.moves = self.moves[self.min_size:] + self.rewards = self.rewards[self.min_size:] def batch(self, size=128): indices = np.random.randint(len(self.states), size=min(int(len(self.states) / 2), size)) diff --git a/train/main.py b/train/main.py index 981b541..1fbbcfe 100644 --- a/train/main.py +++ b/train/main.py @@ -6,50 +6,37 @@ import tensorflow as tf from tensorflow.python.framework.errors_impl import InvalidArgumentError -os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) try: - from public.models.agent.VanillaAgent import VanillaAgent - from public.state.state import State1 - from train.experience import ExperienceVanilla + from public.models.strategy.TrainedStrategy import TrainedStrategy from train.worker import Worker - from train.reward.reward import Reward except: raise port = int(sys.argv[1]) if len(sys.argv) > 1 else 2000 -tf.reset_default_graph() # Clear the Tensorflow graph. +strategy = TrainedStrategy(tf_scope='global') -with tf.device("/cpu:0"): - with tf.variable_scope('global'): - state = State1(scope=2) - master_experience = ExperienceVanilla(Reward(state)) - master_agent = VanillaAgent(state, master_experience) +num_workers = 1 +n_simultations = 100 - num_workers = 8 - n_simultations = 5000 - - workers = [] - if num_workers > 1: - for i in range(num_workers): - with tf.variable_scope('worker_' + str(i)): - state = State1(scope=2) - experience = ExperienceVanilla(Reward(state)) - workers.append(Worker(port, i, VanillaAgent(state, experience))) - else: - workers.append(Worker(port, 0, master_agent)) - # We need only to save the global - global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') - saver = tf.train.Saver(global_variables) - init = tf.global_variables_initializer() +workers = [] +if num_workers > 1: + for i in range(num_workers): + workers.append(Worker(port, i, TrainedStrategy(tf_scope='worker_' + str(i)))) +else: + workers.append(Worker(port, 0, strategy)) +# We need only to save the global +global_variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='global') +saver = tf.train.Saver(global_variables) +init = tf.global_variables_initializer() # Launch the tensorflow graph with tf.Session() as sess: sess.run(init) try: saver.restore(sess, os.path.abspath( - os.path.dirname(__file__)) + '/../public/models/variables/' + master_agent.name + '/' + master_agent.name) + os.path.dirname(__file__)) + '/../public/models/variables/' + strategy.name + '/' + strategy.name) except InvalidArgumentError: print("Model not found - initiating new one") diff --git a/train/worker.py b/train/worker.py index 9abd146..fec8b34 100644 --- a/train/worker.py +++ b/train/worker.py @@ -1,14 +1,11 @@ """The worker class for training and parallel operations""" import multiprocessing -import os import time import tensorflow as tf from networking.hlt_networking import HLT from networking.start_game import start_game -from public.hlt import format_moves -from public.state.state import get_game_state def update_target_graph(from_scope, to_scope): @@ -27,7 +24,7 @@ class Worker(): Each of them work with the global session, and use the global saver. """ - def __init__(self, port, number, agent): + def __init__(self, port, number, strategy): self.name = 'worker_' + str(number) self.number = number self.port = port + number @@ -41,8 +38,7 @@ def worker(): time.sleep(1) self.hlt = HLT(port=self.port) # Launching the pipe operation - self.agent = agent - + self.strategy = strategy self.update_local_ops = update_target_graph('global', self.name) def work(self, sess, saver, n_simultations): @@ -51,13 +47,13 @@ def work(self, sess, saver, n_simultations): the worker works `n_simultations` games to train the agent :param sess: The global session - :param saver: The saver :param n_simultations: Number of max simulations to run. Afterwards the process is stopped. :return: """ print("Starting worker " + str(self.number)) + self.strategy.init_session(sess, saver) with sess.as_default(), sess.graph.as_default(): for i in range(n_simultations): # while not coord.should_stop(): if i % 10 == 1 and self.number == 0: @@ -65,28 +61,14 @@ def work(self, sess, saver, n_simultations): sess.run(self.update_local_ops) # GET THE WORK DONE FROM OTHER my_id, game_map = self.hlt.get_init() self.hlt.send_init("MyPythonBot") - - moves = [] - game_states = [] + self.strategy.set_id(my_id) while self.hlt.get_string() == 'Get map and play!': game_map.get_frame(self.hlt.get_string()) - game_states += [get_game_state(game_map, my_id)] - moves1, moves2 = self.agent.choose_actions(sess, game_states[-1]) - moves += [moves1] # We only train on this - self.hlt.send_frame(format_moves(game_map, -(moves1 * moves2))) + self.hlt.send_frame(self.strategy.compute_moves(game_map, train=True)) - self.agent.experience.add_episode(game_states, moves) - self.agent.update_agent(sess) + self.strategy.add_episode() + self.strategy.update_agent() if self.number == 0: - directory = os.path.abspath( - os.path.join(os.path.dirname(__file__), '..')) \ - + '/public/models/variables/' \ - + self.agent.name + '/' - if not os.path.exists(directory): - print("Creating directory for agent :" + self.agent.name) - os.makedirs(directory) - saver.save(sess, directory + self.agent.name) - self.agent.experience.save_metric(directory + self.agent.name) - + self.strategy.save() self.p.terminate() diff --git a/visualize/hlt/example.hlt b/visualize/hlt/example.hlt deleted file mode 100644 index 50622e2..0000000 --- a/visualize/hlt/example.hlt +++ /dev/null @@ -1 +0,0 @@ -{"frames":[[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,5],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,10],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,15],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,20],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,25],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,30],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,35],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,40],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,45],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,50],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,55],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[0,5],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,60],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[0,9],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,55],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,0],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[0,16],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[1,46],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,0],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,5],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[0,19],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,30],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,4],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,10],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,11],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,0],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,14],[1,5],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,8],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,15],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,17],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,6],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,12],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,20],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,23],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,12],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,16],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,25],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,29],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,18],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,20],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,30],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,35],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,24],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,24],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,35],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[0,16],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,41],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[0,16],[1,30],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,28],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[0,6],[1,40],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,25],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,18],[1,0],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,14],[1,0],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[0,10],[1,32],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[1,34],[1,0],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,32],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,4],[1,6],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,0],[1,6],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,9],[1,5],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[0,11],[1,22],[1,0],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[1,39],[1,5],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,39],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,4],[1,12],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,0],[1,12],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,4],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[0,5],[1,11],[1,0],[1,4],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[0,7],[1,44],[1,10],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[0,23],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,0],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[0,4],[1,18],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,0],[1,18],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,4],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[0,3],[1,6],[1,0],[1,3],[1,8],[0,5],[0,6],[0,5],[0,3]],[[0,2],[0,3],[0,4],[1,37],[1,0],[1,15],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[0,10],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[1,16],[0,22],[0,18],[0,15],[0,10]]],[[[0,3],[0,3],[0,6],[0,16],[0,16],[1,7],[0,15],[0,9],[0,6],[0,4]],[[0,2],[0,2],[0,3],[0,11],[1,14],[1,0],[0,16],[0,9],[0,5],[0,2]],[[0,2],[0,2],[0,3],[0,9],[1,0],[1,24],[0,14],[0,9],[0,4],[0,2]],[[0,2],[0,2],[0,4],[0,11],[0,4],[1,0],[0,8],[0,7],[0,5],[0,3]],[[0,2],[1,3],[1,0],[1,2],[1,6],[1,0],[1,3],[0,6],[0,5],[0,3]],[[0,2],[0,3],[1,33],[1,0],[1,5],[1,0],[0,9],[0,9],[0,6],[0,4]],[[0,2],[0,3],[0,5],[0,7],[0,8],[1,5],[0,18],[0,20],[0,9],[0,5]],[[0,3],[0,3],[0,5],[0,9],[0,10],[0,16],[0,24],[0,24],[0,10],[0,6]],[[0,4],[0,4],[0,8],[0,21],[0,20],[0,23],[0,23],[0,24],[0,14],[0,9]],[[0,5],[0,4],[0,10],[0,25],[0,24],[1,20],[0,22],[0,18],[0,15],[0,10]]]],"height":10,"moves":[[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,4,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,1,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,4,0,0,0,0,0,0],[0,0,0,0,4,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]],[[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,0,0,1,0,0,0,0,0],[0,0,0,0,0,4,0,0,0,0],[0,0,4,0,0,2,0,0,0,0],[0,0,0,4,0,3,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0],[0,0,0,0,0,0,0,0,0,0]]],"num_frames":26,"num_players":1,"player_names":["MyBot"],"productions":[[4,6,5,3,4,7,6,4,2,3],[4,3,3,3,5,6,4,2,2,3],[4,4,3,4,7,6,4,2,3,3],[4,4,3,3,5,5,4,3,4,4],[5,3,2,2,3,4,4,4,4,4],[5,4,3,4,5,5,5,4,4,4],[7,5,5,4,4,4,6,5,3,4],[7,6,4,3,3,4,6,5,2,3],[3,3,3,2,2,4,9,8,3,2],[3,4,3,2,2,4,7,6,3,2]],"version":11,"width":10} \ No newline at end of file diff --git a/visualize/visualize.py b/visualize/visualize.py index e97dd55..7d42fb4 100755 --- a/visualize/visualize.py +++ b/visualize/visualize.py @@ -11,13 +11,11 @@ import pandas as pd from flask import Flask, render_template, request, make_response, send_from_directory - sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), '..'))) try: from train.reward.reward import Reward from public.state.state import State1 - from public.models.bot.TrainedBot import TrainedBot - from public.models.agent.VanillaAgent import VanillaAgent + from public.models.strategy.TrainedStrategy import TrainedStrategy except: raise @@ -76,7 +74,8 @@ def performance_plot(): fig = Figure() sub1 = fig.add_subplot(111) path_to_variables = os.path.abspath(os.path.dirname(__file__)) + '/../public/models/variables/' - list_variables = [name for name in os.listdir(path_to_variables) if name != "README.md"] + list_variables = [name for name in os.listdir(path_to_variables) if name not in [".DS_Store", "README.md"]] + path_to_npy = [path_to_variables + name + '/' + name + '.npy' for name in list_variables] rewards = [np.load(path) for path in path_to_npy] @@ -137,6 +136,7 @@ def post_discounted_rewards(): @app.route('/post_policies', methods=['POST']) def post_policies(): game_states, _ = convert(request) - bot = TrainedBot(VanillaAgent, State1(scope=2)) - policies = np.array([bot.get_policies(game_state) for game_state in game_states]) + bot = TrainedStrategy() + bot.init_session() + policies = bot.get_policies(game_states) return json.dumps({'policies': policies.tolist()}) From d8fc62db1894614d9e5a21126125da93007d8d76 Mon Sep 17 00:00:00 2001 From: Edouard360 Date: Thu, 26 Oct 2017 08:45:37 +0200 Subject: [PATCH 44/45] Reverting attempt --- src/networking/Networking.cpp | 4 ---- 1 file changed, 4 deletions(-) diff --git a/src/networking/Networking.cpp b/src/networking/Networking.cpp index 29a8ada..a862fea 100644 --- a/src/networking/Networking.cpp +++ b/src/networking/Networking.cpp @@ -326,12 +326,8 @@ int Networking::handleInitNetworking(unsigned char playerTag, const hlt::Map & m std::string response; try { - std::string readyToReplay = getString(playerTag); - std::cout << "I'm ready to replay"< Date: Sun, 26 Nov 2017 17:19:57 +0100 Subject: [PATCH 45/45] Updating dijkstra post --- docs/_posts/2017-10-08-dijkstra.markdown | 65 ++++++++---------------- 1 file changed, 20 insertions(+), 45 deletions(-) diff --git a/docs/_posts/2017-10-08-dijkstra.markdown b/docs/_posts/2017-10-08-dijkstra.markdown index 18f8598..33627ec 100644 --- a/docs/_posts/2017-10-08-dijkstra.markdown +++ b/docs/_posts/2017-10-08-dijkstra.markdown @@ -9,59 +9,34 @@ categories: main {% include center.css %} -# The Dijsktra algorithm - -## The power of Dijsktra - -We had dealt with the problem of border squares, learning with a neural network. +# Expansion at the border -Dijsktra algorithm, which runs here in linear time, gives us the ability +As detailed in the previous blog articles, we train jointly the individual agents at the border of the map. As you can see below, we obtain +an agent that perform significantly well at small scale. Indeed, it has learnt to conquer the **highly productive squares** (the bright ones) **in priority**. -## Our currently trained vanilla model with scope 1 +

+ +

-Trained for 3h with 8 workers, each simulating 2500 games. The batch_size was 128. +To assess the confidence of our agent, we can look at the **entropy** of the learnt policy. For convenience, we implemented a interface that displays the **softmax probabilities** at time t as you click on the agent. +We can see the 5 NORTH-EAST-SOUTH-WEST-STILL probabilities associated with each move, and how the agent greedily selects them. -Every worker would update his agent - and then the global agent - after each game / episode is played. +

+ +

-``` -self.agent.experience.add_episode(game_states, moves) -self.agent.update_agent(sess) -``` - -Parameters: - -``` -lr=1e-3, a_size=5, h_size=50 -``` +# The Dijsktra algorithm -The main features of the reward function were the following: +## The power of Dijsktra -``` -np.array([np.power(get_prod(game_states[i + 1]) - get_prod(game_states[i]),1) - for i in range(len(game_states) - 1)]) -... -discount_factor=0.6 -... -discounted_rewards[t] = running_reward - 0.1 -``` +We had dealt with the problem of border squares, learning with a neural network. -And finally the chosen state was: +The Dijsktra algorithm, which runs here in linear time, gives us the ability to handle the squares in the middle of the map: -``` -State1(scope=2) -``` +Now, only the borders's behaviour is determined by our trained policy. We adopt a **deterministic strategy for the interior of the map**. -## With scope 2 +

+ + +

-``` -State1(scope=2) -lr=1e-4, a_size=5, h_size=200 -width=25, height=25, max_turn=50,max_strength=60 -... -0.1 * np.power(get_prod(game_states[i + 1]) - get_prod(game_states[i]), 2) -... -discount_factor=0.6 -... -0.01 -discounted_rewards[t] = running_reward - 0.01 -``` \ No newline at end of file