-
Notifications
You must be signed in to change notification settings - Fork 1
Expand file tree
/
Copy pathbootstrap_example.Rmd
More file actions
93 lines (66 loc) · 2.43 KB
/
bootstrap_example.Rmd
File metadata and controls
93 lines (66 loc) · 2.43 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
---
title: "Example of bootstrap analysis"
author: "Clay Ford"
date: "12/10/2021"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Generate data
First generate some skewed data ~~that's also multimodal~~.
```{r}
set.seed(9)
n <- 180
need1 <- sample(1:6, size = n/2, replace = TRUE, prob = c(1, 1, 1, 3, 4, 3)/14)
need2 <- sample(1:6, size = n/2, replace = TRUE, prob = c(1, 1, 1, 4, 5, 4)/16)
grp <- rep(c("ctrl", "int"), each = n/2)
d <- data.frame(grp, need = c(need1, need2))
```
## Visualize data
The default `geom_density` settings make the data look multimodal because it's trying very hard to interpolate between discrete values. Therefore it looks quite wiggly.
```{r}
library(ggplot2)
ggplot(d) +
aes(x = need, fill = grp) +
geom_density(alpha = 0.4)
```
However we can adjust the bandwidth using the `adjust` argument to make the lines smoother. This basically says "don't try so hard to imagine what's between the discrete values." Higher values lead to smoother estimates.
```{r}
ggplot(d) +
aes(x = need, fill = grp) +
geom_density(alpha = 0.4, adjust = 2)
```
A better plot might just be a bar plot since the data is discrete. A density plot is usually better suited for continuous data (ie, data with decimals)
```{r}
ggplot(d) +
aes(x = need, fill = grp) +
geom_bar(position = "dodge")
```
These last two plots reveal the data is indeed skewed and perhaps bi-modal, but it definitely gives us a different impression that the first plot.
## Summarize data
Compare means and medians. Very similar.
```{r}
tapply(d$need, d$grp, summary)
```
## Basic analysis
T-test
```{r}
t.out <- t.test(need ~ grp, data = d)
t.out$conf.int
```
## Bootstrap analysis
For this we load the boot package that comes with R. We have to write a function that calculates a difference in means. The `d[i,]` represents resampled data. The boot function generates a vector of indices that are resampled row numbers of the data frame. Then it uses those indices to resample the data. The `boot` function applies our function 999 times.
```{r}
library(boot)
f <- function(d, i){
t <- t.test(need ~ grp, d[i,])
diff(t$estimate)
}
b.out <- boot(d, f, R = 999)
```
We can use our boot object to get a bootstrap confidence interval using the `boot.ci` function.
```{r}
boot.ci(b.out, type = "bca")
```
Notice the confidence interval is slightly different but the substance of the result is no different than the t-test.