|
| 1 | +# pysimkernel Specification |
| 2 | + |
| 3 | +pysimkernel is a task definition framework that abstracts away the gory details |
| 4 | +of implementing boilerplate code for parameter scans, repetitive simulation |
| 5 | +runs and result aggregation. |
| 6 | + |
| 7 | +## What does it do? |
| 8 | + |
| 9 | +- simulation experiments with parameter scans / iterations with possibly |
| 10 | + several independent runs (pleasingly parallel |
| 11 | + settings) |
| 12 | +- Monte Carlo simulations with Common Random Numbers |
| 13 | + |
| 14 | +## Limitations |
| 15 | + |
| 16 | +- Limited to one parameter set per task |
| 17 | +- Limited to at most one task per run |
| 18 | + |
| 19 | +## Who is it for? |
| 20 | + |
| 21 | + |
| 22 | +## Related Work |
| 23 | + |
| 24 | + |
| 25 | +## Design Philosophy |
| 26 | + |
| 27 | + |
| 28 | +## Integrations |
| 29 | + |
| 30 | + |
| 31 | +## Model |
| 32 | + |
| 33 | +pysimkernel defines tasks for a user-defined simulation experiment. |
| 34 | +pysimkernel is not a task scheduler. |
| 35 | + |
| 36 | +One should be able to use pysimkernel both interactively in the Notebook (by |
| 37 | +the means of a decorator and a serial or concurrent task scheduler), and on a |
| 38 | +cluster with Jug or similar parallelization frameworks (by the means of a |
| 39 | +central simulation control script). |
| 40 | + |
| 41 | + |
| 42 | + |
| 43 | +### Data type |
| 44 | + |
| 45 | +The *run* function outputs a [NumPy structured |
| 46 | +array](http://docs.scipy.org/doc/numpy/user/basics.rec.html). |
| 47 | +Other choices such as *pytables*, *xarray*, *pandas.DataFrame*, *h5py* are |
| 48 | +possible. |
| 49 | +This is a design decision as NumPy is considered stable, has multi-dimensional |
| 50 | +arrays (as xarray, but unlike pandas.DataFrame), is resistant in memory as well |
| 51 | +as on disk (with memmap; h5py and pytables are disk-only), and does not require |
| 52 | +additional libraries. |
| 53 | + |
| 54 | +### Experiment output |
| 55 | + |
| 56 | +The output of the whole experiment is a NumPy structured array. |
| 57 | +Its shape is linear in the parameter dimension. |
| 58 | +If there are other dimensions present in the *single-run output*, they are |
| 59 | +present in the experiment output as well. |
| 60 | +The *aggregation graph* specifies the field names and data types. |
| 61 | +The experiment also outputs a mapping (Python dictionary) that maps simulation |
| 62 | +inputs to indices of the first dimension of the output array. |
| 63 | + |
| 64 | +### The run function |
| 65 | + |
| 66 | +The simulationist defines the *run function*. |
| 67 | +For *multi-run experiments*, it takes a ``seed`` keyword argument. |
| 68 | + |
| 69 | +It outputs (**single-run output**) a NumPy structured array. |
| 70 | + |
| 71 | +### The task function |
| 72 | + |
| 73 | +The task function first executes the specified number of runs. |
| 74 | + |
| 75 | +If a *multi-run experiment* has multiple runs per task, the **task function** |
| 76 | +subsequently merges the single-run outputs into a single NumPy array. |
| 77 | +The merged array has the same data type as the original arrays, albeit |
| 78 | +different shape. |
| 79 | +The single-run outputs are arranged along the first dimension (first index |
| 80 | +corresponds to the single run). |
| 81 | +Note that this is just the reduce function. |
| 82 | + |
| 83 | +In all other cases, this is the identity function for the single-run output. |
| 84 | + |
| 85 | +### The reduce function |
| 86 | + |
| 87 | +The reduce function merges single-run outputs (or arrays of merged single-run |
| 88 | +outputs) into a single array. |
| 89 | + |
| 90 | +### The job function |
| 91 | + |
| 92 | +If a *multi-run experiment* has multiple tasks per job, the job function first |
| 93 | +reduces the task outputs into a single array (*reduce function*). |
| 94 | + |
| 95 | +In a *multi-run experiment*, the job function subsequently aggregates the |
| 96 | +single-run outputs to statistics according to the *aggregation graph*. |
| 97 | + |
| 98 | +In *mono-run experiments*, the job function is the identity function (as there |
| 99 | +is no statistics to aggregate). |
| 100 | + |
| 101 | +### The experiment function |
| 102 | + |
| 103 | +The experiment function merges statistics for all jobs (simulation inputs) into |
| 104 | +a single numpy array. |
| 105 | +This is again just the reduce function. |
| 106 | + |
| 107 | +### The task graph |
| 108 | + |
| 109 | + |
| 110 | + |
| 111 | +A sample task graph for a multi-run experiment with multiple (2) tasks per job |
| 112 | +and multiple (2) runs per task, and two sets of input parameters, leading to |
| 113 | +8 runs in total (with Common Random Numbers). |
| 114 | + |
| 115 | +### The reduced task graph |
| 116 | + |
| 117 | + |
| 118 | + |
| 119 | +A reduced version of the task graph. |
| 120 | +These are the task that are actually up for external scheduling, as the |
| 121 | +``task`` function executes the single runs. |
| 122 | + |
| 123 | +## Terminology |
| 124 | + |
| 125 | +- **simulation input**: a (single) combination of parameter values |
| 126 | +- **multi-run** and **mono-run** experiments: |
| 127 | + A typical instance of a **multi-run** experiment is a stochastic experiment, |
| 128 | + which runs the simulation several times for each combination of parameter |
| 129 | + values, but with different random seeds. |
| 130 | + A **mono-run** experiment runs the simulation only once for each input. |
| 131 | + A typical instance of a mono-run experiment is a simulation of a |
| 132 | + deterministic system. |
| 133 | +- With **Common Random Numbers**, the same seeds are used across all inputs in |
| 134 | + a multi-run experiment. |
| 135 | +- **simulation control script**, canonical name ''simcontrol.py'' |
| 136 | + |
| 137 | +## API |
| 138 | + |
| 139 | +### The Aggregation Graph / Decorator |
| 140 | + |
| 141 | +The aggregation graph aggregates single-run simulation outputs of the *run* |
| 142 | +function for each combination of parameter values in *multi-run experiments*. |
| 143 | + |
| 144 | +This has no effect for *mono-run experiments*. |
| 145 | + |
| 146 | +The aggregation graph is a list of dictionaries with the keys |
| 147 | + |
| 148 | +- ``'run_output'``: a list of fields from the *single-run output* or ':' (for all fields) |
| 149 | +- ``'input_params'`` (optional, defaults to ``False``): a list of parameters or |
| 150 | + Boolean (``True`` for all parameters) |
| 151 | +- ``'function'``: the aggregation function, takes one argument if |
| 152 | + ``input_params`` is ``False`` and two otherwise. The first argument is the |
| 153 | + (selected) outputs of the single runs, the second argument are the requested |
| 154 | + input parameters. |
| 155 | +- ``'output_dtype'`` (optional, defaults to ``np.float``): the data type of the |
| 156 | + output field. |
| 157 | +- ``'output_name'`` (optional): the name of the output field. Defaults to the first |
| 158 | + of the input fields and the function name, separated by an underscore. |
| 159 | + |
| 160 | +### The Experiment Class / Decorator |
| 161 | + |
| 162 | +Takes as construction arguments: |
| 163 | + |
| 164 | +- ``simparams`` (*simulation parameters*): an iterable of *simulation inputs* |
| 165 | + in the form of argument lists or keyword argument dictionaries to be supplied |
| 166 | + to the *run* function at each iteration run, excluding the *seed*. |
| 167 | + (**simulation input**) |
| 168 | +- ``aggregation``: *aggregation graph* (**simulation output**) |
| 169 | +- ``taskdef``: *task definition parameters*, number of runs per task, number of |
| 170 | + tasks per job, how to store result |
| 171 | + |
| 172 | +All arguments are Python dictionaries. |
| 173 | + |
| 174 | +The ``task_graph`` method provides a [dask task |
| 175 | +graph](http://dask.pydata.org/en/latest/graphs.html). |
| 176 | + |
| 177 | +### Utilities for Jug integration |
| 178 | + |
| 179 | +The ``dask2jug`` function converts a [dask task |
| 180 | +graph](http://dask.pydata.org/en/latest/graphs.html) to a list of [Jug |
| 181 | +tasks](https://jug.readthedocs.org/en/latest/tasks.html). |
| 182 | + |
| 183 | +The ``jugfile`` function returns a |
| 184 | +[jugfile](https://jug.readthedocs.org/en/latest/tutorial.html?highlight=jugfile#task-generators) |
| 185 | +to be written to disk and run with |
| 186 | +[Jug](https://jug.readthedocs.org/en/latest/) or |
| 187 | +[gridjug](http://gridjug.readthedocs.org/en/stable/). |
| 188 | + |
| 189 | +For using a single control file with gridjug, we can pickle the simparams, |
| 190 | +aggregation and taskdef dictionaries and store their pickled strings in an |
| 191 | +automatically generated jugfile (use a template engine!). |
| 192 | +The control file could be also enhanced to [connect to |
| 193 | +sumatra](http://pythonhosted.org/Sumatra/using_the_api.html). |
0 commit comments