-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLesson-LitProg.qmd
169 lines (116 loc) · 9.29 KB
/
Lesson-LitProg.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
title: "Biostat 823 - Literate Programming"
author: "Hilmar Lapp"
institute: "Duke University, Department of Biostatistics & Bioinformatics"
date: "Aug 29, 2024"
format:
revealjs:
slide-number: true
editor: visual
knitr:
opts_chunk:
echo: TRUE
---
## Literate Programming
- First introduced by Donald Knuth ("The Art of Computer Programming") in 1984
> Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.
D. E. Knuth, Literate Programming, The Computer Journal, Volume 27, Issue 2, 1984, Pages 97--111, <https://doi.org/10.1093/comjnl/27.2.97>
## TANGLE and WEAVE

## Lit. Prog. and Reproducible Research
> **Literate Programming:** Enhances traditional software development by embedding code in explanatory essays and encourages treating the act of development as one of communication with future maintainers
>
> **Reproducible Research:** Embeds executable code in research reports and publications, with the aim of allowing readers to re-run the analyses described.
::: aside
Schulte, E., Davison, D., Dye, T., & Dominik, C. (2012). A Multi-Language Computing Environment for Literate Programming and Reproducible Research. Journal of Statistical Software, 46(3), 1--24. <https://doi.org/10.18637/jss.v046.i03>
:::
## Code not paper is the scholarship
> An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and complete set of instructions which generated the figures.
::: aside
Buckheit JB, Donoho DL (1995). "WaveLab and Reproducible Research." In A Antoniadis, G Oppenheim (eds.), Wavelets and Statistics, volume 103 of Lecture Notes in Statistics, pp. 55--81. Springer-Verlag, New York. doi:[10.1007/978-1-4612-2544-7_5](https://doi.org/10.1007/978-1-4612-2544-7_5)
who summarized *Claerbout, Jon (1994). Hypertext Documents about Reproducible Research*, http://sepwww.Stanford.edu.
:::
## Research Compendium
- Encapsulates the actual work, not just an abridged version;
- Allows different levels of detail in different renderings;
- Easy to re-run by anyone;
- Provides explicit computational details, enabling others to adapt and extend the reported computational methods;
- Enables programmatic construction and clear provenance of plots and tables;
::: aside
Robert Gentleman & Duncan Temple Lang (2007) Statistical Analyses and Reproducible Research, Journal of Computational and Graphical Statistics, 16:1, 1-23, DOI: [10.1198/106186007X178663](https://doi.org/10.1198/106186007X178663)
:::
------------------------------------------------------------------------

## Reproducible research lens
> Embeds executable code in research reports and publications, with the aim of allowing readers to re-run the analyses described. ([Schulte et al (2012)](https://doi.org/10.18637/jss.v046.i03))
Important concepts:
- Ties together narrative (the "why"), code that implements it, and the results from running the code.
- Can be executed ("executable paper"), in its original or modified form.
- Provenance of tables and charts is clear and verifiable.
## Sweave (Leisch 2002)
". R News. 2 (3): 28--31.](images/Leisch%202002%20-%20Fig%201.png)
------------------------------------------------------------------------

## Renewed interest for reproducible computing
- The practice of copying values for tables and plots for figures into a document *breaks the provenance chain*.
**Provenance**: Information about entities, activities, and people involved in producing data or other results, which are necessary to assess their quality, reliability or trustworthiness.[^1]
[^1]: Modified from W3C's [An Overview of the PROV Family of Documents](https://www.w3.org/TR/prov-overview/).
## "Executable Paper" concept:
- eLife "[Welcome to a new ERA of reproducible publishing](https://elifesciences.org/labs/dc5acbde/welcome-to-a-new-era-of-reproducible-publishing)"
- For example: L Michelle Lewis et al, Reproducibility Project: Cancer Biology (2018) [Replication Study: Transcriptional amplification in tumor cells with elevated c-Myc](https://doi.org/10.7554/eLife.30274). eLife 7:e30274. [Executable version](https://elifesciences.org/articles/30274/executable)
- Boettiger C et al (2016). "[RNeXML: A Package for Reading and Writing Richly Annotated Phylogenetic, Character, and Trait Data in R](https://doi.org/10.1111/2041-210X.12469)." Methods in Ecology and Evolution, 7, pp. 352-357. [Rmarkdown version and rendered PDF](https://github.com/ropensci/RNeXML/tree/master/manuscripts)
## Jupyter and iPython Notebooks
- iPython (interactive Python) created by [Fernando Pérez](https://en.wikipedia.org/wiki/Fernando_Pérez_(software_developer)) in 2001, gained Notebook interface in 2011
- [Project Jupyter](https://jupyter.org) started in 2015 as a spin-off
- Supports a multitude of kernels
- JupyterLab first released in 2018
- JupyterHub first released in 2019
- [Intro to Jupyter](https://reproducible-science-curriculum.github.io/introduction-RR-Jupyter/docs/slides/Jupyter_Intro_Background.slides.html#/)[^2]
- Itself generated from a [source Jupyter Notebook](https://github.com/Reproducible-Science-Curriculum/introduction-RR-Jupyter/blob/gh-pages/notebooks/Jupyter_Intro_Background.ipynb)
[^2]: From the [Reproducible Science Curriculum](https://github.com/Reproducible-Science-Curriculum)
## Knitr and Rmarkdown
- [Knitr](https://yihui.org/knitr/) created by Yihui Xie, first released 2012 <img src="https://db.yihui.org/imgur/yYw46aF.jpg" align="right" width="200" style="margin-left: 1em;"/>
- Designed as a general-purpose literate programming engine
- Design allows different input languages and different output formats
- [Rmarkdown](https://bookdown.org/yihui/rmarkdown/)
- First introduced in knitr in early 2012, withh the idea to embed code chunks in Markdown documents.
- rmarkdown R package created in 2014
- Rich universe of examples at [RPubs](https://rpubs.com)
## Markdown and code
In Rmarkdown, code can be inline:
``` markdown
Consider Edgar Anderson's Iris data of sepal and petal lengths measurements
of `r '\x60r nrow(iris)\x60'` flowers, (`r '\x60r sum(iris$Species=="setosa")\x60'` _I. setosa_,
`r '\x60r sum(iris$Species=="versicolor")\x60'` _I. versicolor_, and
`r '\x60r sum(iris$Species=="virginica")\x60'` _I. virginica_).
```
Consider Edgar Anderson's Iris data of sepal and petal lengths measurement of `r nrow(iris)` flowers, (`r sum(iris$Species=="setosa")` I. *setosa*, `r sum(iris$Species=="versicolor")` I. *versicolor*, and `r sum(iris$Species=="virginica")` I. *virginica*).
## Code chunks {.scrollable}
A linear regression model of petal width by sepal length can be fitted in the following way:
````markdown
`r '\x60\x60\x60{r}'`
lm1 = lm(Petal.Width ~ Sepal.Length, data=iris)
summary(lm1)
`r '\x60\x60\x60'`
````
Rendered:
```{r irisRegr}
lm1 = lm(Petal.Width ~ Sepal.Length, data=iris)
summary(lm1)
```
## Plots also get inlined {.scrollable}
Plots of the regression residuals etc can be obtained in the following way:
```{r}
#| layout-ncol: 2
plot(lm1)
```
## Ecosystem
One report document, rendered to many different output formats

## Further reading {.smaller}
- List of [Literate Programming environments on Wikipedia](https://en.wikipedia.org/wiki/Literate_programming#Literate_programming_practices)
- Xie Y, Allaire J, Grolemund G (2018). [R Markdown: The Definitive Guide](https://bookdown.org/yihui/rmarkdown). Chapman and Hall/CRC, Boca Raton, Florida. ISBN 9781138359338
- Resul Umit (2022) [Writing Reproducible Research Papers with R Markdown](https://resulumit.com/teaching/rmd_workshop.html)
- Chapters "[Jupyter Notebook for Open Science](https://reproducible-analysis-workshop.readthedocs.io/en/latest/4.Jupyter-Notebook.html)", "[R for Reproducible Scientific Analysis (Jupyter)](https://reproducible-analysis-workshop.readthedocs.io/en/latest/5.Jupyter-R.html)", and "[R for Reproducible Scientific Analysis (RMarkdown / knitr)](https://reproducible-analysis-workshop.readthedocs.io/en/latest/6.RMarkdown-knitr.html)" in [Reproducible analysis and Research Transparency Workshop (2017)](https://reproducible-analysis-workshop.readthedocs.io/en/latest/)
- J.M. Perkel (2018). Why Jupyter is data scientists' computational notebook of choice. Nature 563, 145-146. doi:<https://doi.org/10.1038/d41586-018-07196-1>
- Rule, Adam, Amanda Birmingham, Cristal Zuniga, Ilkay Altintas, Shih-Cheng Huang, Rob Knight, Niema Moshiri, et al. 2019. "Ten Simple Rules for Writing and Sharing Computational Analyses in Jupyter Notebooks." PLOS Computational Biology 15 (7): e1007007. <https://doi.org/10.1371/journal.pcbi.1007007>