Skip to content

Commit 18fc55f

Browse files
committed
Add toil demo notebook with access via TRS
1 parent 02b6ce9 commit 18fc55f

File tree

1 file changed

+256
-0
lines changed

1 file changed

+256
-0
lines changed

notebooks/toil-wes-demo.ipynb

Lines changed: 256 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,256 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# UCSC Toil Workflow Execution Service Demonstration\n",
8+
"\n",
9+
"<img src=\"https://github.com/dockstore/dockstore-ui2/raw/develop/src/assets/images/sponsors/coloured/ga4gh.png\" width=\"180\" align=\"right\"/>\n",
10+
"\n",
11+
"<img src=\"https://cgl.genomics.ucsc.edu/wp-content/uploads/2017/07/TOIL-Slug-Logo-862x1116.jpg\" width=\"200\" align=\"right\"/>\n",
12+
"\n",
13+
"This notebook is meant to contain all of the necessary parts needed to install and run the [Workflow Execution Service](https://github.com/ga4gh/workflow-execution-service-schemas) backed by [UCSC Toil](https://toil.readthedocs.io/en/3.15.0/) on a linux compliant system with Python installed!\n",
14+
"\n",
15+
"The Workflow Execution Service attempts to present the interface for workflow execution over HTTP methods. Simple JSON requests including the inputs and outputs for a workflow are sent to a service. This allows us to \"ship code to the data,\" since data privacy and egress costs require that data is not shared.\n",
16+
"\n",
17+
"UCSC Toil is software for executing workflows. It presents a Python native API, which will not be demonstrated here, as well as a CWL compliant CLI interface. For that reason, any CWLRunner can easily be exposed by the workflow-service, demonstrated here.\n",
18+
"\n",
19+
"## Installing the Dependencies\n",
20+
"\n",
21+
"[Docker](https://docs.docker.com/install/) is a required dependency to make workflow execution portable in this example. So install that first.\n",
22+
"\n",
23+
"Once you have docker installed, you can follow the below instructions, which will use Python's package manager to download the requirements."
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"metadata": {},
30+
"outputs": [],
31+
"source": [
32+
"!pip install toil git+git://github.com/common-workflow-language/workflow-service"
33+
]
34+
},
35+
{
36+
"cell_type": "markdown",
37+
"metadata": {},
38+
"source": [
39+
"If you have trouble executing that line, try putting it into a terminal. Depending on your Python installation, you may need to enter your password."
40+
]
41+
},
42+
{
43+
"cell_type": "markdown",
44+
"metadata": {},
45+
"source": [
46+
"## Starting the Server"
47+
]
48+
},
49+
{
50+
"cell_type": "markdown",
51+
"metadata": {},
52+
"source": [
53+
"Now that you have toil and the workflow-service installed, you just have to turn on the server, and it will be ready to accept requests!\n",
54+
"\n",
55+
"We'll have to tell the service which runner to use, the CWL runner which comes with the service, and the optional tool to use to run it, in this case `cwltoil`. Lastly, we lower the log output of toil so that the output JSON can be read by the wes-server (and returned to the client)."
56+
]
57+
},
58+
{
59+
"cell_type": "code",
60+
"execution_count": 5,
61+
"metadata": {},
62+
"outputs": [
63+
{
64+
"name": "stdout",
65+
"output_type": "stream",
66+
"text": [
67+
" * Running on http://0.0.0.0:8080/ (Press CTRL+C to quit)\n",
68+
"^C\n"
69+
]
70+
}
71+
],
72+
"source": [
73+
"!wes-server --backend=wes_service.cwl_runner --opt runner=cwltoil --opt extra=--logLevel=CRITICAL"
74+
]
75+
},
76+
{
77+
"cell_type": "markdown",
78+
"metadata": {},
79+
"source": [
80+
"Background processes aren't supported directly in notebooks, so we close it here. But you can paste this command in a terminal and it will bring up your very own Workflow Execution Service!"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
"## Using the Client\n",
88+
"\n",
89+
"The server is now running, but Toil hasn't started yet as we haven't issued any Workflow Execution requests. Here, using the provided CLI client, we demonstrate a simple workflow which calculates an md5sum.\n",
90+
"\n",
91+
"The workflow description is provided in the workflow-service test data, and we specify local inputs and outputs. `File` is currently the only supported file system protocol of the workflow-service."
92+
]
93+
},
94+
{
95+
"cell_type": "markdown",
96+
"metadata": {},
97+
"source": [
98+
"### Accessing a workflow via Dockstore Tool Registry Service\n",
99+
"\n",
100+
"We will start by accessing the metadata for a workflow from dockstore.\n",
101+
"\n",
102+
"#### List Tools\n"
103+
]
104+
},
105+
{
106+
"cell_type": "code",
107+
"execution_count": 17,
108+
"metadata": {},
109+
"outputs": [
110+
{
111+
"name": "stdout",
112+
"output_type": "stream",
113+
"text": [
114+
"{u'verified': True, u'name': u'master', u'url': u'https://dockstore.org:8443/api/ga4gh/v2/tools/quay.io%2Fbriandoconnor%2Fdockstore-tool-md5sum/versions/master', u'image': u'7f82fc51fa35d36bbd61297ee0c05170ab4ba67c969a9a66b28e5ed3c100034b', u'meta-version': u'2017-07-23 15:45:37.0', u'descriptor-type': [u'CWL', u'WDL'], u'dockerfile': True, u'id': u'quay.io/briandoconnor/dockstore-tool-md5sum:master', u'verified-source': u'Phase 1 GA4GH Tool Execution Challenge'}\n"
115+
]
116+
}
117+
],
118+
"source": [
119+
"import requests\n",
120+
"response = requests.get('https://dockstore.org:8443/api/ga4gh/v1/tools/', params={\"name\": \"md5sum\"})\n",
121+
"print(response.json()[0]['versions'][0])\n",
122+
"md5sum_url = response.json()[0]['versions'][0]['url'] + '/plain-CWL/descriptor/%2FDockstore.cwl'"
123+
]
124+
},
125+
{
126+
"cell_type": "markdown",
127+
"metadata": {},
128+
"source": [
129+
"We now have a URL we can pass too WES for execution!"
130+
]
131+
},
132+
{
133+
"cell_type": "code",
134+
"execution_count": 18,
135+
"metadata": {},
136+
"outputs": [
137+
{
138+
"name": "stdout",
139+
"output_type": "stream",
140+
"text": [
141+
"https://dockstore.org:8443/api/ga4gh/v2/tools/quay.io%2Fbriandoconnor%2Fdockstore-tool-md5sum/versions/master/plain-CWL/descriptor/%2FDockstore.cwl\n"
142+
]
143+
}
144+
],
145+
"source": [
146+
"print(md5sum_url)"
147+
]
148+
},
149+
{
150+
"cell_type": "markdown",
151+
"metadata": {},
152+
"source": [
153+
"### Using the WES CLI client to Execute"
154+
]
155+
},
156+
{
157+
"cell_type": "code",
158+
"execution_count": 19,
159+
"metadata": {},
160+
"outputs": [
161+
{
162+
"name": "stdout",
163+
"output_type": "stream",
164+
"text": [
165+
"INFO:root:Workflow id is 1b7cbeac80e84740a450f2c6bc12b7f2\n",
166+
"INFO:root:State is COMPLETE\n",
167+
"INFO:root:\n",
168+
"{\n",
169+
" \"output_file\": {\n",
170+
" \"format\": \"http://edamontology.org/data_3671\", \n",
171+
" \"checksum\": \"sha1$5cd16de143136d95a0307bc1db27d88b57b033e9\", \n",
172+
" \"basename\": \"md5sum.txt\", \n",
173+
" \"nameext\": \".txt\", \n",
174+
" \"nameroot\": \"md5sum\", \n",
175+
" \"http://commonwl.org/cwltool#generation\": 0, \n",
176+
" \"location\": \"file:///home/david/git/workflow-service/workflows/1b7cbeac80e84740a450f2c6bc12b7f2/outdir/md5sum.txt\", \n",
177+
" \"class\": \"File\", \n",
178+
" \"size\": 33\n",
179+
" }\n",
180+
"}"
181+
]
182+
}
183+
],
184+
"source": [
185+
"!wes-client --host localhost:8080 --proto http $md5sum_url testdata/md5sum.cwl.json"
186+
]
187+
},
188+
{
189+
"cell_type": "markdown",
190+
"metadata": {},
191+
"source": [
192+
"As you can see, the wes-client routed a request and polled the service until the its state was `COMPLETE`. It then shows us the location of the outputs, so we can read them."
193+
]
194+
},
195+
{
196+
"cell_type": "code",
197+
"execution_count": 20,
198+
"metadata": {},
199+
"outputs": [
200+
{
201+
"name": "stdout",
202+
"output_type": "stream",
203+
"text": [
204+
"b1946ac92492d2347c6235b4d2611184\r\n"
205+
]
206+
}
207+
],
208+
"source": [
209+
"!cat /home/david/git/workflow-service/workflows/1b7cbeac80e84740a450f2c6bc12b7f2/outdir/md5sum.txt"
210+
]
211+
},
212+
{
213+
"cell_type": "markdown",
214+
"metadata": {},
215+
"source": [
216+
"## Future Work\n",
217+
"\n",
218+
"Because toil implements the cwl CLI interface, it can be easily exchanged for a number of CWL runners. Although this demonstration works only for local files, it should be possible to demonstrate provisioners like those in Toil.\n",
219+
"\n",
220+
"Both the workflow-service and Toil are Python native applications, and this suggests a deeper integration is possible. Future demonstrations like these could use native Python code to interact with WES.\n",
221+
"\n",
222+
"Dockstore and the Tool Registry Service API could be used to first find the workflow that will be run, demonstrating interoperability in these services.\n",
223+
"\n",
224+
"By provisioning using DOS URLs, it should be possible for systems to reason about file locations whether they are system local or on a cloud."
225+
]
226+
},
227+
{
228+
"cell_type": "code",
229+
"execution_count": null,
230+
"metadata": {},
231+
"outputs": [],
232+
"source": []
233+
}
234+
],
235+
"metadata": {
236+
"kernelspec": {
237+
"display_name": "env3",
238+
"language": "python",
239+
"name": "env3"
240+
},
241+
"language_info": {
242+
"codemirror_mode": {
243+
"name": "ipython",
244+
"version": 2
245+
},
246+
"file_extension": ".py",
247+
"mimetype": "text/x-python",
248+
"name": "python",
249+
"nbconvert_exporter": "python",
250+
"pygments_lexer": "ipython2",
251+
"version": "2.7.12+"
252+
}
253+
},
254+
"nbformat": 4,
255+
"nbformat_minor": 2
256+
}

0 commit comments

Comments
 (0)