Add toil demo notebook with access via TRS

david4096 · david4096 · commit 18fc55f6bd91 · 2018-04-23T10:36:17.000-07:00
diff --git a/notebooks/toil-wes-demo.ipynb b/notebooks/toil-wes-demo.ipynb
@@ -0,0 +1,256 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# UCSC Toil Workflow Execution Service Demonstration\n",
+    "\n",
+    "<img src=\"https://github.com/dockstore/dockstore-ui2/raw/develop/src/assets/images/sponsors/coloured/ga4gh.png\" width=\"180\" align=\"right\"/>\n",
+    "\n",
+    "<img src=\"https://cgl.genomics.ucsc.edu/wp-content/uploads/2017/07/TOIL-Slug-Logo-862x1116.jpg\" width=\"200\" align=\"right\"/>\n",
+    "\n",
+    "This notebook is meant to contain all of the necessary parts needed to install and run the [Workflow Execution Service](https://github.com/ga4gh/workflow-execution-service-schemas) backed by [UCSC Toil](https://toil.readthedocs.io/en/3.15.0/) on a linux compliant system with Python installed!\n",
+    "\n",
+    "The Workflow Execution Service attempts to present the interface for workflow execution over HTTP methods. Simple JSON requests including the inputs and outputs for a workflow are sent to a service. This allows us to \"ship code to the data,\" since data privacy and egress costs require that data is not shared.\n",
+    "\n",
+    "UCSC Toil is software for executing workflows. It presents a Python native API, which will not be demonstrated here, as well as a CWL compliant CLI interface. For that reason, any CWLRunner can easily be exposed by the workflow-service, demonstrated here.\n",
+    "\n",
+    "## Installing the Dependencies\n",
+    "\n",
+    "[Docker](https://docs.docker.com/install/) is a required dependency to make workflow execution portable in this example. So install that first.\n",
+    "\n",
+    "Once you have docker installed, you can follow the below instructions, which will use Python's package manager to download the requirements."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install toil git+git://github.com/common-workflow-language/workflow-service"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If you have trouble executing that line, try putting it into a terminal. Depending on your Python installation, you may need to enter your password."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Starting the Server"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now that you have toil and the workflow-service installed, you just have to turn on the server, and it will be ready to accept requests!\n",
+    "\n",
+    "We'll have to tell the service which runner to use, the CWL runner which comes with the service, and the optional tool to use to run it, in this case `cwltoil`. Lastly, we lower the log output of toil so that the output JSON can be read by the wes-server (and returned to the client)."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      " * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)\n",
+      "^C\n"
+     ]
+    }
+   ],
+   "source": [
+    "!wes-server --backend=wes_service.cwl_runner --opt runner=cwltoil --opt extra=--logLevel=CRITICAL"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Background processes aren't supported directly in notebooks, so we close it here. But you can paste this command in a terminal and it will bring up your very own Workflow Execution Service!"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Using the Client\n",
+    "\n",
+    "The server is now running, but Toil hasn't started yet as we haven't issued any Workflow Execution requests. Here, using the provided CLI client, we demonstrate a simple workflow which calculates an md5sum.\n",
+    "\n",
+    "The workflow description is provided in the workflow-service test data, and we specify local inputs and outputs. `File` is currently the only supported file system protocol of the workflow-service."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Accessing a workflow via Dockstore Tool Registry Service\n",
+    "\n",
+    "We will start by accessing the metadata for a workflow from dockstore.\n",
+    "\n",
+    "#### List Tools\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "{u'verified': True, u'name': u'master', u'url': u'https://dockstore.org:8443/api/ga4gh/v2/tools/quay.io%2Fbriandoconnor%2Fdockstore-tool-md5sum/versions/master', u'image': u'7f82fc51fa35d36bbd61297ee0c05170ab4ba67c969a9a66b28e5ed3c100034b', u'meta-version': u'2017-07-23 15:45:37.0', u'descriptor-type': [u'CWL', u'WDL'], u'dockerfile': True, u'id': u'quay.io/briandoconnor/dockstore-tool-md5sum:master', u'verified-source': u'Phase 1 GA4GH Tool Execution Challenge'}\n"
+     ]
+    }
+   ],
+   "source": [
+    "import requests\n",
+    "response = requests.get('https://dockstore.org:8443/api/ga4gh/v1/tools/', params={\"name\": \"md5sum\"})\n",
+    "print(response.json()[0]['versions'][0])\n",
+    "md5sum_url = response.json()[0]['versions'][0]['url'] + '/plain-CWL/descriptor/%2FDockstore.cwl'"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We now have a URL we can pass too WES for execution!"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "https://dockstore.org:8443/api/ga4gh/v2/tools/quay.io%2Fbriandoconnor%2Fdockstore-tool-md5sum/versions/master/plain-CWL/descriptor/%2FDockstore.cwl\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(md5sum_url)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Using the WES CLI client to Execute"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "INFO:root:Workflow id is 1b7cbeac80e84740a450f2c6bc12b7f2\n",
+      "INFO:root:State is COMPLETE\n",
+      "INFO:root:\n",
+      "{\n",
+      "    \"output_file\": {\n",
+      "        \"format\": \"http://edamontology.org/data_3671\", \n",
+      "        \"checksum\": \"sha1$5cd16de143136d95a0307bc1db27d88b57b033e9\", \n",
+      "        \"basename\": \"md5sum.txt\", \n",
+      "        \"nameext\": \".txt\", \n",
+      "        \"nameroot\": \"md5sum\", \n",
+      "        \"http://commonwl.org/cwltool#generation\": 0, \n",
+      "        \"location\": \"file:///home/david/git/workflow-service/workflows/1b7cbeac80e84740a450f2c6bc12b7f2/outdir/md5sum.txt\", \n",
+      "        \"class\": \"File\", \n",
+      "        \"size\": 33\n",
+      "    }\n",
+      "}"
+     ]
+    }
+   ],
+   "source": [
+    "!wes-client --host localhost:8080 --proto http $md5sum_url testdata/md5sum.cwl.json"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "As you can see, the wes-client routed a request and polled the service until the its state was `COMPLETE`. It then shows us the location of the outputs, so we can read them."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "b1946ac92492d2347c6235b4d2611184\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!cat /home/david/git/workflow-service/workflows/1b7cbeac80e84740a450f2c6bc12b7f2/outdir/md5sum.txt"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Future Work\n",
+    "\n",
+    "Because toil implements the cwl CLI interface, it can be easily exchanged for a number of CWL runners. Although this demonstration works only for local files, it should be possible to demonstrate provisioners like those in Toil.\n",
+    "\n",
+    "Both the workflow-service and Toil are Python native applications, and this suggests a deeper integration is possible. Future demonstrations like these could use native Python code to interact with WES.\n",
+    "\n",
+    "Dockstore and the Tool Registry Service API could be used to first find the workflow that will be run, demonstrating interoperability in these services.\n",
+    "\n",
+    "By provisioning using DOS URLs, it should be possible for systems to reason about file locations whether they are system local or on a cloud."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "env3",
+   "language": "python",
+   "name": "env3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 2
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython2",
+   "version": "2.7.12+"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}