Skip to content

Commit

Permalink
Merge pull request #15 from garethjns/initial_dqn
Browse files Browse the repository at this point in the history
Adding dueling DQN, update readme
  • Loading branch information
garethjns authored May 11, 2020
2 parents ced26ed + eff1d3a commit 4b6541d
Show file tree
Hide file tree
Showing 4 changed files with 96 additions and 48 deletions.
59 changes: 39 additions & 20 deletions README.MD
Original file line number Diff line number Diff line change
Expand Up @@ -3,28 +3,35 @@

Social distancing is an unfortunately unclear term; it means stay from other people to avoid killing yourself and them.

But why?

![Example cats vs responsible](https://github.com/garethjns/social-distancing-sim/blob/master/images/joined.gif)

This package models disease spread through a population, allowing modification of many dynamics affecting spread. These simulations can be viewed as animations, or run many times to collect statistics. The simulation supports agent input, and can test the affect of policies such a mass vaccination and social distancing and isolation. Some examples are shown below.
This package models disease spread through a population, allowing modification of many dynamics affecting spread. These simulations can be viewed as animations, or run many times to collect statistics, evaluate response strategies, etc. The simulation supports agent input, which can either enact scripted policies such as mass vaccination and social distancing, or reinforcement learning agents that have learned their own strategies through experience.

The code aims to be as simple and understandable as possible, but is still WIP (along with the documentation). The documentation is mainly example driven see below and the Scripts/ folder for up to date usage examples.

# Population dynamics
![Example cats vs responsible](https://github.com/garethjns/social-distancing-sim/blob/master/images/joined.gif)

# Simulation dynamics

The dynamics of this simulation aim to be simple but interesting, with scope in the parameters to run experiments on many different environment setups.

Populations are randomly generated using a [networkx.random_partition_graph](https://networkx.github.io/documentation/stable/reference/generated/networkx.generators.community.random_partition_graph.html#networkx.generators.community.random_partition_graph). This creates a network consisting of communities where individual members have a given chance to be connected. Each individual member also has a lower chance to be connected to members of other communities.

The connections between individuals (graph nodes) define opportunities for a member to infect another. Each day (step) every infected node has one chance to infect each of its neighbours, the chance of this happening is defined by the disease virulence.

Each day, infected nodes also have the chance to end their infection. The probability of this happening grows with the length of time the individual has been infected. If the infection ends, the individual either recovers and gains immunity, or dies. The chance of recovery is defined by the recovery rate of the disease, modified by the current burden on the healthcare system. When the healthcare system is below capacity, no penalty is applied to the recovery rate. When it's above, the recovery rate is reduced proportionally to the size of the burden. If a node survives, it gains (or not) imperfect immunity that decays with time.

In addition to communities, populations define a healthcare capacity. When above this capacity, the recovery rate from the disease is reduced.

The connections between individuals (graph nodes) define opportunities for a member to infect another. Each day (step) every infected node has one chance to infect each of it's neighbours, the chance of this happening is defined by the disease virulence.

Each day, infected nodes also have the chance to end their infection. The chance of this happening grows with the length of time the individual has been infected. If the infection ends, the individual either recovers and gains immunity, or dies. The chance of recovery is defined by the recovery rate of the disease, modified by the current burden on the healthcare system. When the healthcare system is below capacity, no penalty is applied to the recovery rate. When it's above, the recovery rate is reduced proportionally to the size of the burden.
# Agent interaction

The simulation environment defines an action space that allows agents to perform actions each turn and influence disease spread. This interface supports basic agents social_distancing_sim.agent), "policy" agents with hardcoded logic, and reinforcement learning agents (supporting the OpenAI Gym API).

Agents are able to perform treatment, isolate, reconnect and vaccinate actions. Basic agents typically perform single actions in a semi-targeted fashion, and "policy" agents support multiple basic agents operating over different time periods. This allows for definition and experimentation with different strategies for managing outbreaks. (Note here "policy" refers to scripted strategy like isolating early, vaccinating when available, reconnecting nodes later on, etc. rather than a reinforcement learning agents learned policy).

A flexible scoring system allows for setting of action costs and environment rewards and penalties. This can be used for agent/policy evaluation, and for training of the included RL agents (social_distancing_sim.gym.agent)

## v0.4.0 Features and supported dynamics

## v0.7.1 Features and supported dynamics
- [NetworkX](https://networkx.github.io/) graph-based population environment of inter and intra connected communities, where edge probabilities can model connected or socially distanced communities. Examples: **scripts/visual_compare_two_populations.py**, **scripts/visual_run_single_population.py**.
- Disease virulence and imperfect and decaying immunity. Examples: **scripts/visual_compare_two_diseases_immunity**, **scripts/visual_compare_two_diseases_immunity_small.py**, **scripts/visual_compare_two_diseases_immunity.py**.
- Healthcare capacity, effects on survival when overburdened
Expand All @@ -34,10 +41,12 @@ Each day, infected nodes also have the chance to end their infection. The chance
- Visual simulation with history logging. Examples: **scripts/visual_*.py**.
- Statistical simulation for multiple runs of the same parameters, aggregate statistics, experiment comparison (using [MLFlow](https://mlflow.org/)). Examples **scripts/stats_*.py**.
- Basic (non-learning) agents to enact simple polices such as social distancing, vaccination, etc.
- Open AI Gym compatibility
- Linear and deep-q reinforcement learning agents Examples: **scripts/train_deep_q_learner.py, scripts/train_linear_q_learner.py**.
- A scoring system with settable action costs and environment rewards/penalties

## Planned features
- Open AI gym API compatibility
- Reinforcement learning agents
- Actor-critic reinforcement learning agent, and agents supporting specific node targeting.
- Less accurate testing, adding definable false positive and false negative rates
- Docker container and rest API

Expand All @@ -54,7 +63,7 @@ git clone https://github.com/garethjns/social-distancing-sim
````

# Simulation structure and components
The social_distancing_sim package is split into 3 main modules; .sim, .environment, and .agent. See docstrings for object parameters and details.
The social_distancing_sim package is split into 5 main modules; .sim, .environment, .agent, .gym, and .templates. See docstrings for object parameters and details.

## .environment
Contains the code for running the simulation, including the action space available to any agent. The top level object, Environment can be used run and plot individual simulations. Actions can be fed to the environment manually (or not at all), or can be handled by the Sim class in the .sim submodule (see below).
Expand All @@ -78,15 +87,25 @@ Contains the code defining the agent interface and, currently, 4 basic agents.
- isolation_agent.**IsolationAgent** - An agent that randomly isolates a number of infected + connected nodes and randomly reconnects recovered + isolated nodes.
- vaccination_agent.**VaccinationAgent** - An agent that randomly vaccinates currently a number of non-infected nodes each turn.

## social_distancing_sim.sim
## .sim
Contains objects to handle running and logging experiments with agent input
- .sim.**Sim** - Handles the Environment, and an Agent. Steps the simulation, gets actions from agent, passes to env, etc.
- .multi_sim.**MultiSim** - Handles running Sim objects multiple times with different seeds. Outputs MLflow logs and aggregated statistics.

## .gym
Contains environment and agent definitions designed to comply with the [OpenAI Gym API](https://gym.openai.com/). This include the trainable reinforcement learning agents.
- .**gym_env** - Wrapper to make social_distancing_sim.environment.Environments Gym compatible
- .**gym_templates** - Gym environments specs for example environment set ups in social_distancing_sim.templates
- .**agent.rl** - Reinforcement learning agent implementations
- .**wrappers** - Various Gym envriroment wrappers

## .templates
Example environment set ups.

The simulated environment consists of a number of pparameterised objects.
# Example experiments
The rest of this readme contains a dump of example experiments with outputs, which can be run using the code below or the relevant script in scripts/.

# Run a single simulation
## Run a single simulation
![single simulation example](https://github.com/garethjns/social-distancing-sim/blob/master/images/single_simulation_example.gif)
To run a single passive, visual, simulation, the Environment object can be defined and run without using the Sim and MultiSim handlers.

Expand Down Expand Up @@ -136,7 +155,7 @@ pop.replay()
print(pop.history.keys())
````

# Compare two populations: Social distancing
## Compare two populations: Social distancing
![Example cats vs responsible](https://github.com/garethjns/social-distancing-sim/blob/master/images/joined.gif)
([Discussion](https://new.reddit.com/r/dataisbeautiful/comments/fov56p/oc_comparing_the_effect_of_social_distancing_on/))

Expand Down Expand Up @@ -191,7 +210,7 @@ Parallel(n_jobs=2,



# Importance of testing: Modifying ObservationSpace test rate
## Importance of testing: Modifying ObservationSpace test rate
![Example testing rate](https://github.com/garethjns/social-distancing-sim/blob/master/images/testing_example.gif)
([Discussion](https://new.reddit.com/r/dataisbeautiful/comments/fse6l1/oc_the_importance_of_testing_and_effect_on/))

Expand Down Expand Up @@ -247,7 +266,7 @@ Parallel(n_jobs=2,
```


# Compare immunity effects
## Compare immunity effects
![Example testing rate](https://github.com/garethjns/social-distancing-sim/blob/master/images/joined_3.gif)

Version 0.2.0 adds incomplete immunity and decay of immunity. These are part of the disease definition, and allow reinfection after a node has survived infection.
Expand Down Expand Up @@ -311,7 +330,7 @@ Parallel(n_jobs=2,
```


# Basic agents and strategy comparison
## Basic agents and strategy comparison

````bash
python3 -m social-distancing-sim.scripts.visual_compare_basic_agents
Expand Down Expand Up @@ -377,7 +396,7 @@ Parallel(n_jobs=4,

```

# MultiSims: Statistical comparisons - basic agents and strategy comparison
## MultiSims: Statistical comparisons - basic agents and strategy comparison
![Test basic agents](https://github.com/garethjns/social-distancing-sim/blob/master/images/agent_comparison_score_example.png)

````bash
Expand Down
15 changes: 13 additions & 2 deletions scripts/train_deep_q_learner.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,14 @@
from social_distancing_sim.templates.small import Small


def prepare_tf(memory_limit: int = 1024):
import tensorflow as tf

tf.config.experimental.set_virtual_device_configuration(tf.config.experimental.list_physical_devices('GPU')[0],
[tf.config.experimental.VirtualDeviceConfiguration(
memory_limit=memory_limit)])


def prepare(agent_gamma: float = 0.99,
agent_eps: float = 0.99,
agent_eps_decay: float = 0.001) -> Tuple[DeepQAgent, SummaryGraphObservationWrapper]:
Expand Down Expand Up @@ -69,7 +77,7 @@ def train(agent: DeepQAgent, env: SummaryGraphObservationWrapper,
ep_rewards.append(total_reward)
print(total_reward)

if not ep % 50:
if not ep % 5:
roll = 50
plt.plot(np.convolve(ep_rewards, np.ones(roll), 'valid') / roll)
plt.show()
Expand All @@ -78,11 +86,14 @@ def train(agent: DeepQAgent, env: SummaryGraphObservationWrapper,


if __name__ == "__main__":
prepare_tf(1024)

agent_, env_ = prepare(agent_gamma=0.98,
agent_eps=0.95,
agent_eps_decay=0.002)
agent_ = train(agent_, env_,
n_episodes=1000,
n_episodes=300,
max_episode_steps=200)

agent_.save('deep_q_learner.pkl')
DeepQAgent.load('deep_q_learner.pkl')
2 changes: 1 addition & 1 deletion social_distancing_sim/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
MAJOR = 0
MINOR = 7
PATCH = 0
PATCH = 1

__version__ = ".".join(str(v) for v in [MAJOR, MINOR, PATCH])
68 changes: 43 additions & 25 deletions social_distancing_sim/gym/agent/rl/q_learners/deep_q_agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,13 @@ def __init__(self, env: GymEnv,
replay_buffer: ReplayBuffer = None,
gamma: float = 0.98,
replay_buffer_samples=75,
dueling: bool = True,
*args, **kwargs) -> None:

super().__init__(*args, **kwargs)
self.env = env
self.gamma = gamma
self.dueling = dueling
if replay_buffer is None:
replay_buffer = ReplayBuffer()
self.replay_buffer = replay_buffer
Expand All @@ -49,24 +51,37 @@ def _prep_pp(self) -> None:

def _build_model(self, model_name: str) -> keras.Model:

conv_shape = self.env.observation_space[1].sample().shape
graph_shape = self.env.observation_space[1].sample().shape
graph_nodes = graph_shape[0] * graph_shape[1]

fc_input = keras.layers.Input(name='fc_input', shape=self.env.observation_space[0].shape)
fc1 = keras.layers.Dense(units=12, name='fc1', activation='relu')(fc_input)
summary_input = keras.layers.Input(name='summary_input', shape=self.env.observation_space[0].shape)
summary_fc1 = keras.layers.Dense(units=12, name='summary_fc1', activation='relu')(summary_input)

conv_input = keras.layers.Input(name='conv_input', shape=(conv_shape[0], conv_shape[1], 1))
conv1 = keras.layers.Conv2D(24, kernel_size=(6, 6),
name='conv1', activation='relu', dtype=np.float32)(conv_input)
conv2 = keras.layers.Conv2D(12, kernel_size=(3, 3), name='conv2', activation='relu')(conv1)
flatten = keras.layers.Flatten(name='flatten')(conv2)
concat = keras.layers.Concatenate(name='concat')([fc1, flatten])
graph_input = keras.layers.Input(name='conv_input', shape=(graph_shape[0], graph_shape[1], 1))
flatten = keras.layers.Flatten(name='flatten')(graph_input)
graph_fc1 = keras.layers.Dense(units=int(graph_nodes), name='graph_fc1', activation='relu')(flatten)
graph_fc2 = keras.layers.Dense(units=int(graph_nodes / 2), name='graph_fc2', activation='relu')(graph_fc1)
graph_fc3 = keras.layers.Dense(units=int(graph_nodes / 4), name='graph_fc3', activation='relu')(graph_fc2)

fc2 = keras.layers.Dense(units=64, name='fc2', activation='relu')(concat)
fc3 = keras.layers.Dense(units=16, name='fc3', activation='relu')(fc2)
output = keras.layers.Dense(units=self.env.action_space.n, name='output', activation=None)(fc3)
concat = keras.layers.Concatenate(name='concat')([summary_fc1, graph_fc3])
fc1 = keras.layers.Dense(units=64, name='fc2', activation='relu')(concat)
fc2 = keras.layers.Dense(units=16, name='fc3', activation='relu')(fc1)

if self.dueling:
# Using dueling architecture (split value and action advantages)
v_layer = keras.layers.Dense(1, activation='linear')(fc2)
a_layer = keras.layers.Dense(self.env.action_space.n, activation='linear')(fc2)

def merge_layer(layer_inputs):
return layer_inputs[0] + layer_inputs[1] - keras.backend.mean(layer_inputs[1], axis=1, keepdims=True)

output = keras.layers.Lambda(merge_layer, output_shape=(self.env.action_space.n,),
name="output")([v_layer, a_layer])
else:
output = keras.layers.Dense(units=self.env.action_space.n, name='output', activation=None)(fc2)

opt = keras.optimizers.Adam(learning_rate=0.001)
model = keras.Model(inputs=[fc_input, conv_input], outputs=[output],
model = keras.Model(inputs=[summary_input, graph_input], outputs=[output],
name=model_name)
model.compile(opt, loss='mse')

Expand Down Expand Up @@ -143,8 +158,7 @@ def update(self, state1: Tuple[np.ndarray, np.ndarray], action: int, reward: flo
verbose=0)

def set_env(self, *args, **kwargs):
"""Pass for compatibility with set env used in AgentBase. Not necessary here as this agent only uses the env
to sample examples."""
"""Pass for compatibility with set env used in AgentBase. Not necessary here"""
pass

def _select_actions_targets(self) -> Dict[int, int]:
Expand Down Expand Up @@ -174,21 +188,25 @@ def update_action_model(self):
self._target_model.set_weights(self._policy_model.get_weights())

def save(self, fn: str):
model_to_save = copy.deepcopy(self)
model_to_save._policy_model = None
model_to_save._target_model = None
self._policy_model.save(f"{fn}.h5")
self._policy_model = None
self._target_model = None

name = fn.split('.')[0]
self._policy_model.save(f"{name}.h5")
pickle.dump(model_to_save, open(f"{name}.pkl", "wb"))
agent_to_save = copy.deepcopy(self)
pickle.dump(agent_to_save, open(f"{fn}.pkl", "wb"))

self._policy_model = keras.models.load_model(f"{fn}.h5")
self._target_model = keras.models.load_model(f"{fn}.h5")

@classmethod
def load(cls, fn: str) -> "DeepQAgent":
name = fn.split('.')[0]
loaded_model = pickle.load(open(f"{name}.pkl"))
keras.models.load_model(f"{name}.h5")
agent = pickle.load(open(f"{fn}.pkl", 'rb'))
model = keras.models.load_model(f"{fn}.h5")

agent._policy_model = model
agent._target_model = model

return loaded_model
return agent

def clone(self) -> "DeepQAgent":
return copy.deepcopy(self)

0 comments on commit 4b6541d

Please sign in to comment.