update mloss talk

thequackdaddy · Jul 9, 2015 · dcbd449 · dcbd449
1 parent 4ad943e
commit dcbd449
Show file tree

Hide file tree

Showing 6 changed files with 43 additions and 14 deletions.
diff --git a/docs/source/_static/presentations/icml-mloss.html b/docs/source/_static/presentations/icml-mloss.html
@@ -24,6 +24,9 @@
             <section data-markdown="markdown/mloss/foundations.md"
                      data-separator="^\n\n\n"
                      data-vertical="^\n\n"></section>
+            <section data-markdown="markdown/parallel-options.md"
+                     data-separator="^\n\n\n"
+                     data-vertical="^\n\n"></section>
             <section data-markdown="markdown/dask-array.md"
                      data-separator="^\n\n\n"
                      data-vertical="^\n\n"></section>
@@ -33,6 +36,9 @@
             <section data-markdown="markdown/mloss/dask-core.md"
                      data-separator="^\n\n\n"
                      data-vertical="^\n\n"></section>
+            <section data-markdown="markdown/dask-dataframe.md"
+                     data-separator="^\n\n\n"
+                     data-vertical="^\n\n"></section>
             <section data-markdown="markdown/dask-svd.md"
                      data-separator="^\n\n\n"
                      data-vertical="^\n\n"></section>

diff --git a/docs/source/_static/presentations/images/dask-svd-random.png b/docs/source/_static/presentations/images/dask-svd-random.png
diff --git a/docs/source/_static/presentations/images/frame.png b/docs/source/_static/presentations/images/frame.png
diff --git a/docs/source/_static/presentations/markdown/mloss/cross-validation.md b/docs/source/_static/presentations/markdown/mloss/cross-validation.md
@@ -17,8 +17,8 @@ Afternoon sprint with Olivier Grisel
 
 ## Cross Validation
 
-<a href="../images/dask-cross-validation.pdf">
-<img src="../images/dask-cross-validation.png" alt="Cross validation dask"
+<a href="images/dask-cross-validation.png">
+<img src="images/dask-cross-validation.png" alt="Cross validation dask"
      width="40%">
 </a>
 

diff --git a/docs/source/_static/presentations/markdown/mloss/foundations.md b/docs/source/_static/presentations/markdown/mloss/foundations.md
@@ -10,12 +10,12 @@
 <img src="images/jenga.png" width="100%">
 
 
-### Shared data structures enable interactions without coordination
+### Shared data structures enable interactions
 
 
-### Enables a vibrant ecosystem
+### Shared data structures enable a vibrant ecosystem
 
-### but exposes us to risk of obsolescence
+### but expose us to risk of obsolescence
 
 
 ### Python, NumPy and Pandas are old(ish)
@@ -32,6 +32,7 @@
 *  Poor support for variable length strings
 *  Poor support for missing data
 *  Poor support for nested/semi-structured data
+*  Code bases are now hard to change
 
 
 ### The Numeric Python ecosystem inherits these limitations
@@ -73,10 +74,12 @@
 
 ## Why do we still use Python?
 
-*  Easy to setup and use by domain scientists
+*  Ubiquitous
+*  Easy to setup and use
 *  C/Fortran heritage
-*  Hundreds of PhD theses in software stack
-*  Strong academic and industry communities
+*  Domain expertise in the software stack (scikits)
+*  Strong academic and industry relationship
+*  Other communities (web, sysops, etc..)
 
 
 ### PyData rests on single-threaded foundations

diff --git a/docs/source/_static/presentations/markdown/parallel-options.md b/docs/source/_static/presentations/markdown/parallel-options.md
@@ -1,14 +1,17 @@
+## My Job:  Work towards parallel Numeric Python stack
+
+
 ## Python's options for Parallelism
 
 Explicit control.  Fast but hard.
 
-*  Threads/Processes
-*  MPI
-*  Concurrent.futures
+*  Threads/Processes/MPI
+*  Concurrent.futures/...
 *  Joblib
 *  .
 *  .
 *  .
+*  IPython parallel
 *  Luigi
 *  PySpark
 *  Hadoop (mrjob)
@@ -21,13 +24,13 @@ Implicit control.  Restrictive/slow but easy.
 
 Explicit control.  Fast but hard.
 
-*  Threads/Processes
-*  MPI
-*  Concurrent.futures
+*  Threads/Processes/MPI
+*  Concurrent.futures/...
 *  Joblib
 *  .
 *  .  <-- I need this
 *  .
+*  IPython parallel
 *  Luigi
 *  PySpark
 *  Hadoop (mrjob)
@@ -36,8 +39,25 @@ Explicit control.  Fast but hard.
 Implicit control.  Restrictive but easy.
 
 
+### My Solution: Dynamic task scheduling
+
+
 ## Scale
 
 *  Single four-core laptop (Gigabyte scale)
 *  Single thirty-core workstation (Terabyte scale)
 *  Distributed thousand-core cluster (Petabyte Scale)
+
+
+## Scale
+
+*  Single four-core laptop (Gigabyte scale)
+*  **Single thirty-core workstation (Terabyte scale)**
+*  Distributed thousand-core cluster (Petabyte Scale)
+
+
+## Outline
+
+*  Dask.array - parallel array library using dask
+*  Dask - internals
+*  Dask.dataframe/other - think about if this is useful to you