SeldonIO
diff --git a/‎README.md
+14-11 b/‎README.md
+14-11
diff --git a/‎benchmarks/k8s_benchmark_serve.sh
+19 b/‎benchmarks/k8s_benchmark_serve.sh
+19
diff --git a/‎benchmarks/k8s_serve_explanations.py
+274 b/‎benchmarks/k8s_serve_explanations.py
+274
@@ -1,23 +1,26 @@
-# Running distributed KernelSHAP
+# Running distributed KernelSHAP with Ray Serve
 
-To create a virtual environment that allows you to run KernelSHAP in a distributed fashion with [`ray`](https://github.com/ray-project/ray) you need to configure your environment first, which requires [`conda`](https://problemsolvingwithpython.com/01-Orientation/01.05-Installing-Anaconda-on-Linux/) to be installed. You can then run the command::
+To create a virtual environment that allows you to run KernelSHAP in a distributed fashion with [`ray serve`](https://github.com/ray-project/ray) you need to configure your environment first, which requires [`conda`](https://problemsolvingwithpython.com/01-Orientation/01.05-Installing-Anaconda-on-Linux/) to be installed. You can then run the command::
 
 `conda env create -f environment.yml -p /home/user/anaconda3/envs/env_name`
 
-to create the environment and then activate it with `conda activate shap`. If you don not wish to change the installation path then you can skip the `-p` option. You are now ready to run the experiments. The steps involved are:
+to create the environment and then activate it with `conda activate shap`. If you do not wish to change the installation path then you can skip the `-p` option. You are now ready to run the experiments. The steps involved are:
 
 1. data processing 
-2. running the experiments
+2. predictor fitting
+2. running benchmarking experiments
 
-To process the data it is sufficient to run `python preprocess_data.py` with the default options. This will output a preprocessed version of the [`Adult`](http://archive.ics.uci.edu/ml/datasets/Adult) dataset and a partition of it that is used to initialise the KernelSHAP explainer. However, you can proceed to step 2 if you don't intend to change the default parameters as the same data will be automatically downloaded.
+_**Step 1 (optional):**_ Run `python scripts/preprocess_data.py` by specifying the training set size (in samples) via `-n_train_examples` argument and how many samples to include in the KernelSHAP background dataset via the `-n_background_samples` argument (which simply chooses the first `n_background_samples` from the training set). This will output a preprocessed version of the [`Adult`](http://archive.ics.uci.edu/ml/datasets/Adult) dataset and a partition of it that is used to initialise the KernelSHAP explainer. However, you can proceed to step 3 if you don't intend to change the default parameters as the same data will be automatically downloaded.
 
-You can run an experiment with the command `python experiment.py`. By default, this will run the explainer on the `2560` examples from the `Adult` dataset with a background dataset with `100` samples, sequentially (5 times if the `-benchmark 1` option is passed to it). The resuults are saved in the `results/` folder. If you wish to run the same explanations in parallel, then run the command
+_**Step 2 (optional):**_ A logistic regression predictor can be fit on the preprocessed data by running `python scripts/fit_adult_model.py`. The predictor will be saved in the `assets/` directory under the `predictor.pkl` filename. If you did not run the data processing script, it is not necessary to run this script as the predictor will be automatically downloaded and saved to `assets/`.
 
-`python experiment.py -cores 3`
 
-which will use `ray` to perform explanations across multiple cores.
+_**Step 3:**_ You can distribute the task of explaining `2560` examples for the Adult (our test split) with KernelSHAP configured with a background dataset of `100` samples by running the `serve_explanations` script. The configurable options are:
 
-Other options for the script are:
+- `-replicas`: controls how many explainer replicas will serve the requests
 
-- `-benchmark`: if set to 1, `-cores` will be treated as the upper bound of number of cores to compute the explanations on. The lower bound is `2`, and the explanations are computed 5 times (by default) to provide runtime averages. The number of repetitions can be controlled using the `-nruns` argument.
-- `-batch_size`: controls how many instances are explained by a core at once. This parameter has an important bearing to the code runtime performance
+- `-max_batch_size`: sending a batch of requests as opposed to a single request to one replica can improve performance. Use this argument to optimize the maximum size of a batch of requests sent to each replica. 
+
+- `-benchmark`: if set to `1`, this algorithm will run the experiment over an increasingly large number of replicas. The replicas range is `range(1, -replicas + 1)`. For each number of replicas and each value in `-max_batch_size` the experiment is repeated to obtain runtime averages.
+
+- `-nruns`: controls how many times an experiment with a given `-replicas` setting is run for each value in the `-max_batch_size` array. This allows obtaining the average runtimes for the task. This setting only takes effect only if the option `-benchmark 1` is specified.
@@ -0,0 +1,19 @@
+#!/bin/bash
+START=$1
+END=$2
+BATCH_MODE=$3
+BATCH_SIZE=(1 2 5 10 15 20)
+echo "Workers range tested: {$START..$END}"
+echo "Batch mode: $BATCH_MODE"
+cd ./cluster || exit
+for i in $(seq "$START" "$END"); do
+  for j in "${BATCH_SIZE[@]}"; do
+    echo "Distributing explanations over $i workers"
+    echo "Current batch size: $j instances"
+    make -f Makefile.serve deploy
+    make -f Makefile.serve upload-script
+    make -f Makefile.serve run-experiment WORKERS="$i" BATCH="$j" BATCH_MODE="$BATCH_MODE"
+    make -f Makefile.serve pull-results
+    make -f Makefile.serve destroy
+  done
+done
@@ -0,0 +1,274 @@
+import argparse
+import logging
+import os
+import ray
+import pickle
+import requests
+import numpy as np
+
+import explainers.wrappers as wrappers
+
+from collections import namedtuple
+from ray import serve
+from timeit import default_timer as timer
+from typing import Any, Dict, List, Tuple
+from explainers.utils import get_filename, batch, load_data, load_model
+
+
+logging.basicConfig(level=logging.INFO)
+
+PREDICTOR_URL = 'https://storage.googleapis.com/seldon-models/alibi/distributed_kernel_shap/predictor.pkl'
+PREDICTOR_PATH = 'assets/predictor.pkl'
+"""
+str: The file containing the predictor. The predictor can be created by running `fit_adult_model.py` or output by 
+calling `explainers.utils.load_model()`, which will download a default predictor if `assets/` does not contain one. 
+"""
+
+
+def endpont_setup(tag: str, backend_tag: str, route: str = "/"):
+    """
+    Creates an endpoint for serving explanations.
+
+    Parameters
+    ----------
+    tag
+        Endpoint tag.
+    backend_tag
+        A tag for the backend this explainer will connect to.
+    route
+        The URL where the explainer can be queried.
+    """
+    serve.create_endpoint(tag, backend=backend_tag, route=route, methods=["GET"])
+
+
+def backend_setup(tag: str, worker_args: Tuple, replicas: int, max_batch_size: int) -> None:
+    """
+    Setups the backend for the distributed explanation task.
+
+    Parameters
+    ----------
+    tag
+        A tag for the backend component. The same tag must be passed to `endpoint_setup`.
+    worker_args
+        A tuple containing the arguments for initialising the explainer and fitting it.
+    replicas
+        The number of backend replicas that serve explanations.
+    max_batch_size
+        Maximum number of requests to batch and send to a worker process.
+    """
+
+    if max_batch_size == 1:
+        config = {'num_replicas': max(replicas, 1)}
+        serve.create_backend(tag, wrappers.KernelShapModel, *worker_args)
+    else:
+        config = {'num_replicas': max(replicas, 1), 'max_batch_size': max_batch_size}
+        serve.create_backend(tag, wrappers.BatchKernelShapModel, *worker_args)
+    serve.update_backend_config(tag, config)
+
+    logging.info(f"Backends: {serve.list_backends()}")
+
+
+def prepare_explainer_args(data: Dict[str, Any]) -> Tuple[str, np.ndarray, dict, dict]:
+    """
+    Extracts the name of the features (group_names) and the columns corresponding to each feature in the faeture matrix
+    (group_names) from the `data` dict and defines the explainer arguments. The background data necessary to initialise
+    the explainer is also extracted from the same dictionary.
+
+    Parameters
+    ----------
+    data
+        A dictionary that contains all information necessary to initialise the explainer.
+
+    Returns
+    -------
+    A tuple containing the positional and keyword arguments necessary for initialising the explainers.
+    """
+
+    groups = data['all']['groups']
+    group_names = data['all']['group_names']
+    background_data = data['background']['X']['preprocessed']
+    assert background_data.shape[0] == 100
+    init_kwargs = {'link': 'logit', 'feature_names': group_names, 'seed': 0}
+    fit_kwargs = {'groups': groups, 'group_names': group_names}
+    predictor = load_model(PREDICTOR_URL)
+    worker_args = (predictor, background_data, init_kwargs, fit_kwargs)
+
+    return worker_args
+
+
+@ray.remote
+def distribute_request(instance: np.ndarray, url: str = "http://localhost:8000/explain") -> str:
+    """
+    Task for distributing the explanations across the backend.
+
+    Parameters
+    ----------
+    instance
+        Instance to be explained.
+    url:
+        The explainer URL.
+
+    Returns
+    -------
+    A str representation of the explanation output json file.
+    """
+
+    resp = requests.get(url, json={"array": instance.tolist()})
+    return resp.json()
+
+
+def request_explanations(instances: List[np.ndarray], *, url: str) -> namedtuple:
+    """
+    Sends the instances to the explainer URL.
+
+    Parameters
+    ----------
+    instances:
+        Array of instances to be explained.
+    url
+        Explainer endpoint.
+
+
+    Returns
+    -------
+    responses
+        A named tuple with a `responses` field and a `t_elapsed` field.
+    """
+
+    run_output = namedtuple('run_output', 'responses t_elapsed')
+    tstart = timer()
+    responses_id = [distribute_request.remote(instance, url=url) for instance in instances]
+    responses = [ray.get(resp_id) for resp_id in responses_id]
+    t_elapsed = timer() - tstart
+    logging.info(f"Time elapsed: {t_elapsed}...")
+
+    return run_output(responses=responses, t_elapsed=t_elapsed)
+
+
+def run_explainer(X_explain: np.ndarray,
+                  n_runs: int,
+                  replicas: int,
+                  max_batch_size: int,
+                  batch_mode: str = 'ray',
+                  url: str = "http://localhost:8000/explain"):
+    """
+    Setup an endpoint and a backend and send requests to the endpoint.
+
+    Parameters
+    -----------
+    X_explain
+        Instances to be explained. Each row is an instance that is explained independently of others.
+    n_runs
+        Number of times to run an experiment where the entire set of explanations is sent to the explainer endpoint.
+        Used to determine the average runtime given the number of cores.
+    replicas
+        How many backend replicas should be used for distributing the workload
+    max_batch_size
+        The maximum batch size the explainer accepts.
+    batch_mode : {'ray', 'default'}
+        If 'ray', ray_serve components are leveraged for minibatches. Otherwise the input tensor is split into
+        minibatches which are sent to the endpoint.
+    url
+        The url of the explainer endpoint.
+    """
+
+    result = {'t_elapsed': [], 'explanations': []}
+    # extract instances to be explained from the dataset
+    assert X_explain.shape[0] == 2560
+
+    # split input into separate requests
+    if batch_mode == 'ray':
+        instances = np.split(X_explain, X_explain.shape[0])  # use ray serve to batch the requests
+        logging.info(f"Explaining {len(instances)} instances...")
+    else:
+        instances = batch(X_explain, batch_size=max_batch_size)
+        logging.info(f"Explaining {len(instances)} mini-batches of size {max_batch_size}...")
+
+    # distribute it
+    for run in range(n_runs):
+        logging.info(f"Experiment run: {run}...")
+        results = request_explanations(instances, url=url)
+        result['t_elapsed'].append(results.t_elapsed)
+        result['explanations'].append(results.responses)
+
+    with open(get_filename(replicas, max_batch_size), 'wb') as f:
+        pickle.dump(result, f)
+
+
+def main():
+
+    if not os.path.exists('results'):
+        os.mkdir('results')
+
+    data = load_data()
+    X_explain = data['all']['X']['processed']['test'].toarray()
+
+    max_batch_size = [int(elem) for elem in args.max_batch_size][0]
+    batch_mode, replicas = args.batch_mode, args.replicas
+    ray.init(address='auto')  # connect to the cluster
+    serve.init(http_host='0.0.0.0')  # listen on 0.0.0.0 to make endpoint accessible from other machines
+    host, route = os.environ.get("RAY_HEAD_SERVICE_HOST", args.host), "explain"
+    url = f"http://{host}:{args.port}/{route}"
+    backend_tag = "kernel_shap:b100"  # b100 means 100 background samples
+    endpoint_tag = f"{backend_tag}_endpoint"
+    worker_args = prepare_explainer_args(data)
+    if batch_mode == 'ray':
+        backend_setup(backend_tag, worker_args, replicas, max_batch_size)
+        logging.info(f"Batching with max_batch_size of {max_batch_size} ...")
+    else:  # minibatches are sent to the ray worker
+        backend_setup(backend_tag, worker_args, replicas, 1)
+        logging.info(f"Minibatches distributed of size {max_batch_size} ...")
+    endpont_setup(endpoint_tag, backend_tag, route=f"/{route}")
+
+    run_explainer(X_explain, args.n_runs, replicas, max_batch_size, batch_mode=batch_mode, url=url)
+
+
+if __name__ == '__main__':
+    parser = argparse.ArgumentParser()
+    parser.add_argument(
+        "-r",
+        "--replicas",
+        default=1,
+        type=int,
+        help="The number of backend replicas used to serve the explainer."
+    )
+    parser.add_argument(
+        "-batch",
+        "--max_batch_size",
+        nargs='+',
+        help="A list of values representing the maximum batch size of pending queries sent to the same worker."
+             "This should only contain one element as the backend is reset from `k8s_benchmark_serve.sh`.",
+        required=True,
+    )
+    parser.add_argument(
+        "-batch_mode",
+        type=str,
+        default='ray',
+        help="If set to 'ray' the batching will be leveraging ray serve. Otherwise, the input array is split into "
+             "minibatches that are sent to the endpoint.",
+        required=True,
+    )
+    parser.add_argument(
+        "-n",
+        "--n_runs",
+        default=5,
+        type=int,
+        help="Controls how many times an experiment is run (in benchmark mode) for a given number of cores to obtain "
+             "run statistics."
+    )
+    parser.add_argument(
+        "-ho",
+        "--host",
+        default="localhost",
+        type=str,
+        help="Hostname."
+    )
+    parser.add_argument(
+        "-p",
+        "--port",
+        default="8000",
+        type=str,
+        help="Port."
+    )
+    args = parser.parse_args()
+    main()