forked from blaze/blaze
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
15 changed files
with
752 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,72 @@ | ||
<!doctype html> | ||
<html lang="en"> | ||
|
||
<head> | ||
<meta charset="utf-8"> | ||
|
||
<title>Slides</title> | ||
|
||
<link rel="stylesheet" href="css/reveal.css"> | ||
<link rel="stylesheet" href="css/theme/default.css" id="theme"> | ||
|
||
<link rel="stylesheet" href="lib/css/zenburn.css"> | ||
</head> | ||
|
||
<body> | ||
|
||
<div class="reveal"> | ||
|
||
<div class="slides"> | ||
|
||
<section data-markdown="markdown/icml-mloss.md" | ||
data-separator="^\n\n\n" | ||
data-vertical="^\n\n"></section> | ||
<section data-markdown="markdown/mloss/foundations.md" | ||
data-separator="^\n\n\n" | ||
data-vertical="^\n\n"></section> | ||
<section data-markdown="markdown/dask-array.md" | ||
data-separator="^\n\n\n" | ||
data-vertical="^\n\n"></section> | ||
<section data-markdown="markdown/dask-array-meteorology.md" | ||
data-separator="^\n\n\n" | ||
data-vertical="^\n\n"></section> | ||
<section data-markdown="markdown/mloss/dask-core.md" | ||
data-separator="^\n\n\n" | ||
data-vertical="^\n\n"></section> | ||
<section data-markdown="markdown/dask-svd.md" | ||
data-separator="^\n\n\n" | ||
data-vertical="^\n\n"></section> | ||
<section data-markdown="markdown/mloss/cross-validation.md" | ||
data-separator="^\n\n\n" | ||
data-vertical="^\n\n"></section> | ||
<section data-markdown="markdown/mloss/finish.md" | ||
data-separator="^\n\n\n" | ||
data-vertical="^\n\n"></section> | ||
</div> | ||
</div> | ||
|
||
<script src="lib/js/head.min.js"></script> | ||
<script src="js/reveal.js"></script> | ||
|
||
<script> | ||
|
||
Reveal.initialize({ | ||
controls: true, | ||
progress: true, | ||
history: true, | ||
center: true, | ||
|
||
// Optional libraries used to extend on reveal.js | ||
dependencies: [ | ||
{ src: 'lib/js/classList.js', condition: function() { return !document.body.classList; } }, | ||
{ src: 'marked.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } }, | ||
{ src: 'markdown.js', condition: function() { return !!document.querySelector( '[data-markdown]' ); } }, | ||
{ src: 'plugin/highlight/highlight.js', async: true, callback: function() { hljs.initHighlightingOnLoad(); } }, | ||
{ src: 'plugin/notes/notes.js' } | ||
] | ||
}); | ||
|
||
</script> | ||
|
||
</body> | ||
</html> |
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
16 changes: 16 additions & 0 deletions
16
docs/source/_static/presentations/markdown/dask-dataframe.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
## `dask.dataframe` is... | ||
|
||
* an out-of-core, multi-core, partitioned dataframe | ||
* that copies the `pandas` interface | ||
* using blocked algorithms | ||
* and task scheduling | ||
* to orchestrate many in-memory Pandas operations | ||
|
||
<img src="images/dataframe.png" alt="Partitioned DataFrame"> | ||
|
||
|
||
## `dask.dataframe` is... | ||
|
||
Very new. | ||
|
||
Ready for use but known failures popping up. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
## Example: SVD | ||
|
||
|
||
## Most Parallel Computation is Simple | ||
|
||
>>> import dask.bag as db | ||
>>> b = db.from_s3('githubarchive-data', '2015-01-01-*.json.gz') | ||
.map(json.loads) | ||
.map(lambda d: d['type'] == 'PushEvent') | ||
.count() | ||
|
||
<img src="images/embarrassing.png" alt="embarassingly parallel dask workload"> | ||
|
||
|
||
## What about more complex workflows? | ||
|
||
>>> import dask.array as da | ||
>>> x = da.ones((5000, 1000), chunks=(1000, 1000)) | ||
>>> u, s, v = da.linalg.svd(x) | ||
|
||
<a href="images/dask-svd.png"> | ||
<img src="images/dask-svd.png" alt="Dask SVD graph" width="30%"> | ||
</a> | ||
|
||
*Work by Mariano Tepper. "Compressed Nonnegative Matrix Factorization is Fast | ||
and Accurate" [arXiv](http://arxiv.org/abs/1505.04650)* | ||
|
||
|
||
## SVD - Dict | ||
|
||
>>> s.dask | ||
{('x', 0, 0): (np.ones, (1000, 1000)), | ||
('x', 1, 0): (np.ones, (1000, 1000)), | ||
('x', 2, 0): (np.ones, (1000, 1000)), | ||
('x', 3, 0): (np.ones, (1000, 1000)), | ||
('x', 4, 0): (np.ones, (1000, 1000)), | ||
('tsqr_2_QR_st1', 0, 0): (np.linalg.qr, ('x', 0, 0)), | ||
('tsqr_2_QR_st1', 1, 0): (np.linalg.qr, ('x', 1, 0)), | ||
('tsqr_2_QR_st1', 2, 0): (np.linalg.qr, ('x', 2, 0)), | ||
('tsqr_2_QR_st1', 3, 0): (np.linalg.qr, ('x', 3, 0)), | ||
('tsqr_2_QR_st1', 4, 0): (np.linalg.qr, ('x', 4, 0)), | ||
('tsqr_2_R', 0, 0): (operator.getitem, ('tsqr_2_QR_st2', 0, 0), 1), | ||
('tsqr_2_R_st1', 0, 0): (operator.getitem,('tsqr_2_QR_st1', 0, 0), 1), | ||
('tsqr_2_R_st1', 1, 0): (operator.getitem, ('tsqr_2_QR_st1', 1, 0), 1), | ||
('tsqr_2_R_st1', 2, 0): (operator.getitem, ('tsqr_2_QR_st1', 2, 0), 1), | ||
('tsqr_2_R_st1', 3, 0): (operator.getitem, ('tsqr_2_QR_st1', 3, 0), 1), | ||
('tsqr_2_R_st1', 4, 0): (operator.getitem, ('tsqr_2_QR_st1', 4, 0), 1), | ||
('tsqr_2_R_st1_stacked', 0, 0): (np.vstack, | ||
[('tsqr_2_R_st1', 0, 0), | ||
('tsqr_2_R_st1', 1, 0), | ||
('tsqr_2_R_st1', 2, 0), | ||
('tsqr_2_R_st1', 3, 0), | ||
('tsqr_2_R_st1', 4, 0)])), | ||
('tsqr_2_QR_st2', 0, 0): (np.linalg.qr, ('tsqr_2_R_st1_stacked', 0, 0)), | ||
('tsqr_2_SVD_st2', 0, 0): (np.linalg.svd, ('tsqr_2_R', 0, 0)), | ||
('tsqr_2_S', 0): (operator.getitem, ('tsqr_2_SVD_st2', 0, 0), 1)} | ||
|
||
|
||
## SVD - Parallel Profile | ||
|
||
<iframe src="../svd.profile.html" | ||
marginwidth="0" | ||
marginheight="0" scrolling="no" width="800" | ||
height="300"></iframe> | ||
|
||
*Bokeh profile tool by Jim Crist* | ||
|
||
|
||
## Randomized Approximate Parallel Out-of-Core SVD | ||
|
||
>>> import dask.array as da | ||
>>> x = da.ones((5000, 1000), chunks=(1000, 1000)) | ||
>>> u, s, v = da.linalg.svd_compressed(x, k=100, n_power_iter=2) | ||
|
||
<a href="images/dask-svd-random.png"> | ||
<img src="images/dask-svd-random.png" | ||
alt="Dask graph for random SVD" | ||
width="10%" > | ||
</a> | ||
|
||
N. Halko, P. G. Martinsson, and J. A. Tropp. | ||
*Finding structure with randomness: Probabilistic algorithms for | ||
constructing approximate matrix decompositions.* | ||
|
||
*Dask implementation by Mariano Tepper* |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
## Dask | ||
|
||
*or* | ||
|
||
## Python and Parallelism | ||
|
||
*Matthew Rocklin* | ||
|
||
Continuum Analytics | ||
|
||
|
||
## Outline | ||
|
||
* . | ||
* Dask - Dynamic Task Scheduling | ||
* Dask.array - out-of-core, parallel NumPy | ||
* Dask with other workloads | ||
* . | ||
|
||
|
||
## Outline | ||
|
||
* Numeric Python and Parallelism | ||
* Dask - Dynamic Task Scheduling | ||
* Dask.array - out-of-core, parallel NumPy | ||
* Dask with other workloads | ||
* Parallelism and Machine Learning |
32 changes: 32 additions & 0 deletions
32
docs/source/_static/presentations/markdown/mloss/cross-validation.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
## Example: Cross Validation | ||
|
||
Afternoon sprint with Olivier Grisel | ||
|
||
for fold_id in range(n_folds): | ||
... | ||
dsk[(name, 'model', model_id)] = clone(model) | ||
|
||
for partition_id in range(data.npartitions): | ||
if partition_id % n_folds == fold_id: | ||
dsk[(name, 'validation', validation_id)] = (score, ...) | ||
else: | ||
dsk[(name, 'model', model_id)] = (_partial_fit, ...) | ||
|
||
... | ||
|
||
|
||
## Cross Validation | ||
|
||
<a href="../images/dask-cross-validation.pdf"> | ||
<img src="../images/dask-cross-validation.png" alt="Cross validation dask" | ||
width="40%"> | ||
</a> | ||
|
||
|
||
## Cross Validation | ||
|
||
This killed the small-memory-footprint heuristics in the dask scheduler. | ||
Fixing with small amounts of | ||
[static scheduling](https://github.com/ContinuumIO/dask/pull/403). | ||
|
||
[Profile](https://rawgit.com/mrocklin/8ec0443c94da553fe00c/raw/ff7d8d0754d07f35086b08c0d21865a03b3edeac/profile.html) |
109 changes: 109 additions & 0 deletions
109
docs/source/_static/presentations/markdown/mloss/dask-core.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
## `dask.core` | ||
|
||
Dead simple task scheduling | ||
|
||
[dask.pydata.org](http://dask.pydata.org/en/latest/) | ||
|
||
|
||
## We've seen `dask.array` | ||
|
||
* Turns Numpy-ish code | ||
|
||
(2*x + 1) ** 3 | ||
|
||
* Into Graphs | ||
|
||
 | ||
|
||
|
||
## We've seen `dask.array` | ||
|
||
* Turns Numpy-ish code | ||
|
||
(2*x + 1) ** 3 | ||
|
||
* Then executes those graphs | ||
|
||
 | ||
|
||
|
||
### Q: What constitutes a dask graph? | ||
|
||
|
||
<img src="images/dask-simple.png" | ||
alt="A simple dask dictionary" | ||
width="18%" | ||
align="right"> | ||
|
||
# Normal Python # Dask | ||
|
||
def inc(i): | ||
return i + 1 | ||
|
||
def add(a, b): | ||
return a + b | ||
|
||
x = 1 d = {'x': 1, | ||
y = inc(x) 'y': (inc, 'x'), | ||
z = add(y, 10) 'z': (add, 'y', 10)} | ||
|
||
<hr> | ||
|
||
>>> from dask.threaded import get | ||
>>> get(d, 'z') | ||
12 | ||
|
||
* Simple representation | ||
* Use Python to generate graphs (no DSL) | ||
* Not user-friendly | ||
|
||
|
||
### Example - dask.array | ||
|
||
>>> x = da.arange(15, chunks=(5,)) | ||
dask.array<x, shape=(15,), chunks=((5, 5, 5)), dtype=None> | ||
|
||
>>> x.dask | ||
{("x", 0): (np.arange, 0, 5), | ||
("x", 1): (np.arange, 5, 10), | ||
("x", 2): (np.arange, 10, 15)} | ||
|
||
>>> x.sum().dask | ||
{("x", 0): (np.arange, 0, 5), | ||
("x", 1): (np.arange, 5, 10), | ||
("x", 2): (np.arange, 10, 15), | ||
("s", 0): (np.sum, ("x", 0)), | ||
("s", 1): (np.sum, ("x", 1)), | ||
("s", 2): (np.sum, ("x", 2)), | ||
("s",): (sum, [("s", 0), ("s", 1), ("s", 2)])} | ||
|
||
|
||
### Dask.array is a convenient way to make dictionaries | ||
|
||
<hr> | ||
|
||
### Dask is a convenient way to make libraries like dask.array | ||
|
||
|
||
## Dask works for more than just arrays | ||
|
||
* `dask.array` = `numpy` + `threading` | ||
* `dask.bag` = `list` + `multiprocessing` | ||
* `dask.dataframe` = `pandas` + `threading` | ||
* ... | ||
|
||
|
||
* Collections build graphs | ||
* Schedulers execute graphs | ||
|
||
<img src="images/collections-schedulers.png" | ||
width="100%"> | ||
|
||
* Neither side needs the other | ||
|
||
|
||
### Question: Is there a similar class of problems in ML? | ||
|
||
<hr> | ||
|
||
### Question: How should we write them down as code? |
25 changes: 25 additions & 0 deletions
25
docs/source/_static/presentations/markdown/mloss/finish.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
## Final Thoughts | ||
|
||
* Python and Parallelism | ||
* Most data is small | ||
* Storage, representation, streaming, sampling offer bigger gains | ||
* That being said, please [release the GIL](https://github.com/scikit-image/scikit-image/pull/1519) | ||
|
||
* Dask: Dynamic task scheduling yields sane parallelism | ||
* Simple library to enable parallelism | ||
* Dask.array/dataframe demonstrate ability | ||
* Rarely optimal performance (Theano is far smarter) | ||
* Scheduling necessary for composed algorithms | ||
|
||
* Questions: | ||
* Appropriate class of problems in ML? | ||
* What is the right API for algorithm builders? | ||
|
||
|
||
## Questions | ||
|
||
[http://dask.pydata.org](http://dask.pydata.org) | ||
|
||
<img src="images/jenga.png" width="60%"> | ||
|
||
<img src="images/fail-case.gif" width="60%"> |
Oops, something went wrong.