forked from OpenIntroStat/ims
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path21-inference-paired-means.Rmd
760 lines (620 loc) · 35.9 KB
/
21-inference-paired-means.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
# Inference for comparing paired means {#inference-paired-means}
```{r, include = FALSE}
source("_common.R")
```
::: {.chapterintro data-latex=""}
In Chapter \@ref(inference-two-means) analysis was done to compare the average population value across two different groups.
Recall that one of the important conditions in doing a two-sample analysis is that the two groups are independent.
Here, independence across groups means that knowledge of the observations in one group doesn't change what we'd expect to happen in the other group.
But what happens if the groups are **dependent**?
Sometimes dependency is not something that can be addressed through a statistical method.
However, a particular dependency, **pairing**, can be modeled quite effectively using many of the same tools we have already covered in this text.
:::
Paired data represent a particular type of experimental structure where the analysis is somewhat akin to a one-sample analysis (see Chapter \@ref(inference-one-mean)) but has other features that resemble a two-sample analysis (see Chapter \@ref(inference-two-means)).
As with a two-sample analysis, quantitative measurements are made on each of two different levels of the explanatory variable.
However, because the observational unit is **paired** across the two groups, the two measurements are subtracted such that only the difference is retained.
Table \@ref(tab:pairedexamples) presents some examples of studies where paired designs were implemented.
```{r pairedexamples}
paired_study_examples <- tribble(
~variable, ~col1, ~col2, ~col3,
"Car", "Smooth Turn vs Quick Spin", "amount of tire tread after 1,000 miles", "difference in tread",
"Textbook", "UCLA vs Amazon", "price of new text", "difference in price",
"Individual person", "Pre-course vs Post-course", "exam score", "difference in score"
)
paired_study_examples %>%
kbl(linesep = "", booktabs = TRUE,
caption = "Examples of studies where a paired design is used to measure the difference in the measurement over two conditions.",
col.names = c("Observational unit", "Comparison groups", "Measurement", "Value of interest")
) %>%
kable_styling(bootstrap_options = c("striped", "condensed"),
latex_options = c("striped", "hold_position"), full_width = TRUE) %>%
column_spec(1:3, width = "8em")
```
::: {.important data-latex=""}
**Paired data.**
Two sets of observations are *paired* if each observation in one set has a special correspondence or connection with exactly one observation in the other dataset.
:::
It is worth noting that if mathematical modeling is chosen as the analysis tool, paired data inference on the difference in measurements will be identical to the one-sample mathematical techniques described in Chapter \@ref(inference-one-mean).
However, recall from Chapter \@ref(inference-one-mean) that with pure one-sample data, the computational tools for hypothesis testing are not easy to implement and were not presented (although the bootstrap was presented as a computational approach for constructing a one sample confidence interval).
With paired data, the randomization test fits nicely with the structure of the experiment and is presented here.
```{r include=FALSE}
terms_chp_21 <- c("paired data")
```
## Randomization test for the mean paired difference
Consider an experiment done to measure whether tire brand Smooth Turn or tire brand Quick Spin has longer tread wear (in cm).
That is, after 1,000 miles on a car, which brand of tires has more tread, on average?
### Observed data
The observed data represent 25 tread measurements (in cm) taken on 25 tires of Smooth Turn and 25 tires of Quick Spin.
The study used a total of 25 cars, so on each car, one tire was of Smooth Turn and one was of Quick Spin.
Figure \@ref(fig:tiredata) presents the observed data, calculations on tread remaining (in cm).
The Smooth Turn manufacturer looks at the box plot and says:
> *Clearly the tread on Smooth Turn tires is higher, on average, than the tread on Quick Spin tires after 1,000 miles of driving.*
The Quick Spin manufacturer is skeptical and retorts:
> *But with only 25 cars, it seems that the variability in road conditions (sometimes one tire hits a pothole, etc.) could be what leads to the small difference in average tread amount.*
```{r tiredata, fig.cap = "Boxplots of the tire tread data (in cm) and the brand of tire from which the original measurements came."}
set.seed(47)
brandA <- rnorm(25, 0.310, 0.003)
brandB <- rnorm(25, 0.308, 0.003)
car <- c(paste("car", 1:25))
miny <- min(brandA, brandB) - .003
maxy <- max(brandA, brandB) + .003
tires <- tibble(
tread = c(brandA, brandB),
car = rep(car, 2),
brand = c(rep("Smooth Turn", 25), rep("Quick Spin", 25))
)
orig_means <- tires %>%
group_by(brand) %>%
summarize(mean_tread = round(mean(tread), 4)) %>%
mutate(
mean_label = c("bar(x)[QS]", "bar(x)[ST]"),
mean_label = paste(mean_label, "==", mean_tread)
)
ggplot(tires, aes(x = brand, y = tread,
color = brand, shape = brand)) +
geom_boxplot(show.legend = FALSE,
outlier.shape = "triangle") +
geom_text(aes(label = car),
color = "grey",
hjust = rep(c(-0.15, 1.3), each = 25),
show.legend = FALSE, size = 4
) +
geom_line(aes(group = car), color = "grey", size = 0.25) +
geom_point(show.legend = FALSE) +
geom_text(
data = orig_means,
aes(label = mean_label, y = 0.318),
parse = TRUE, show.legend = FALSE
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
labs(
x = "Tire brand",
y = NULL,
title = "Original data"
)
```
We'd like to be able to systematically distinguish between what the Smooth Turn manufacturer sees in the plot and what the Quick Spin manufacturer sees in the plot.
Fortunately for us, we have an excellent way to simulate the natural variability (from road conditions, etc.) that can lead to tires being worn at different rates.
### Variability of the statistic
A randomization test will identify whether the differences seen in the box plot of the original data in Figure \@ref(fig:tiredata) could have happened just by chance variability.
As before, we will simulate the variability in the study under the assumption that the null hypothesis is true.
In this study, the null hypothesis is that average tire tread wear is the same across Smooth Turn and Quick Spin tires.
- $H_0: \mu_{diff} = 0,$ the average tread wear is the same for the two tire brands.
- $H_A: \mu_{diff} \ne 0,$ the average tread wear is different across the two tire brands.
When observations are paired, the randomization process randomly assigns the tire brand to each of the observed tread values.
Note that in the randomization test for the two-sample mean setting (see Section \@ref(rand2mean)) the explanatory variable was *also* randomly assigned to the responses.
The change in the paired setting, however, is that the assignment happens *within* an observational unit (here, a car).
Remember, if the null hypothesis is true, it will not matter which brand is put on which tire because the overall tread wear will be the same across pairs.
Figures \@ref(fig:tiredata4) and \@ref(fig:tiredata5) show that the random assignment of group (tire brand) happens within a single car.
That is, every single car will still have one tire of each type.
In the first randomization, it just so happens that the 4th car's tire brands were swapped and the 5th car's tire brands were not swapped.
```{r tiredata4, fig.cap = "The 4th car: the tire brand was randomly permuted, and in the randomization calculation, the measurements (in cm) ended up in different groups."}
set.seed(47)
permdata <- tires %>%
group_by(car) %>%
mutate(random_brand = sample(brand))
origplot4 <- tires %>%
filter(car == "car 4") %>%
ggplot(aes(x = brand, y = tread,
color = brand, shape = brand)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3, show.legend = FALSE) +
geom_text(aes(label = car), color = "darkgrey",
hjust = rep(c(-0.15, 1.3), each = 1),
show.legend = FALSE, size = 6) +
ylim(c(miny, maxy)) +
labs(
x = "Brand of tire",
y = NULL,
color = NULL, shape = NULL,
title = "Original data"
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))
shuffbrand4 <- permdata %>%
filter(car == "car 4") %>%
ggplot(aes(x = brand, y = tread,
color = random_brand, shape = random_brand)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3) +
geom_text(aes(label = car),color = "darkgrey",
hjust = rep(c(-0.15,1.3), each = 1),
show.legend = FALSE, size = 6) +
ylim(c(miny, maxy)) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
theme_void() +
annotate(
"text",
x = 1.5, y = 0.315,
label = "Shuffled assignment\nof tire brand",
size = 4
) +
theme(
legend.position = c(0.6, 0.1),
legend.background = element_rect(color = "white")
) +
labs(color = NULL, shape = NULL)
origplot4 + shuffbrand4
```
```{r tiredata5, fig.cap = "The 5th car: the tire brand was randomly permuted to stay the same! In the randomization calculation, the measurements (in cm) ended up in the original groups."}
origplot5 <- tires %>%
filter(car == "car 5") %>%
ggplot(aes(x = brand, y = tread,
color = brand, shape = brand)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3, show.legend = FALSE) +
geom_text(aes(label = car), color = "darkgrey",
hjust = rep(c(-0.15, 1.3), each = 1),
show.legend = FALSE, size = 6) +
ylim(c(miny, maxy)) +
labs(
x = "Brand of tire",
y = NULL,
color = NULL, shape = NULL,
title = "Original data"
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))
shuffbrand5 <- permdata %>%
filter(car == "car 5") %>%
ggplot(aes(x = brand, y = tread,
color = random_brand, shape = random_brand)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3) +
geom_text(aes(label = car),color = "darkgrey",
hjust = rep(c(-0.15, 1.3), each = 1),
show.legend = FALSE, size = 6) +
ylim(c(miny, maxy)) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
theme_void() +
annotate(
"text",
x = 1.5, y = 0.315,
label = "Shuffled assignment\nof tire brand",
size = 4
) +
theme(
legend.position = c(0.6, 0.1),
legend.background = element_rect(color = "white")
) +
labs(color = NULL, shape = NULL)
origplot5 + shuffbrand5
```
We can put the shuffled assignments for all the cars into one plot as seen in Figure \@ref(fig:tiredataPerm).
```{r tiredataPerm, fig.cap = "Tire tread data (in cm) with: the brand of tire from which the original measurements came (left) and shuffled brand assignment (right). As evidenced by the colors, some of the cars kept their original tire assignments and some cars swapped the tire assignments.", out.width = "100%"}
origplot <- tires %>%
ggplot(aes(x = brand, y = tread,
color = brand, shape = brand)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3, show.legend = FALSE) +
geom_text(aes(label = car), color = "grey",
hjust = rep(c(-0.15, 1.3), each = nrow(tires)/2),
show.legend = FALSE, size = 3) +
ylim(c(miny, maxy)) +
labs(
x = "Brand of tire",
y = NULL,
color = NULL, shape = NULL,
title = "Original data"
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))
shuffbrand <- permdata %>%
ggplot(aes(x = brand, y = tread,
color = random_brand, shape = random_brand)) +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 3) +
geom_text(aes(label = car), color = "grey",
hjust = rep(c(-0.15, 1.3), each = nrow(tires)/2),
show.legend = FALSE, size = 3) +
ylim(c(miny, maxy)) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
theme_void() +
theme(
legend.position = c(0.6, 0.1),
legend.background = element_rect(color = "white")
) +
labs(
title = "Shuffled data",
color = NULL, shape = NULL)
origplot + shuffbrand
```
The next step in the randomization test is to sort the brands so that the assigned brand value on the x-axis aligns with the assigned group from the randomization.
See Figure \@ref(fig:tiredataPermSort) which has the same randomized groups (right image in Figure \@ref(fig:tiredataPerm) and left image in Figure \@ref(fig:tiredataPermSort)) as seen previously.
However, the right image in Figure \@ref(fig:tiredataPermSort) sorts the randomized groups so that we can measure the variability across groups as compared to the variability within groups.
```{r tiredataPermSort, fig.cap = "Tire tread from (left) randomized brand assignment, (right) sorted by randomized brand.", out.width = "100%"}
permed_means <- permdata %>%
group_by(random_brand) %>%
summarize(mean_tread = round(mean(tread), 4)) %>%
mutate(mean_label = c("bar(x)[QSsim1]", "bar(x)[STsim1]")) %>%
mutate(mean_label = paste(mean_label, "==", mean_tread))
shuffbrandfull <- permdata %>%
ggplot(aes(x = random_brand, y = tread,
color = random_brand, shape = random_brand)) +
geom_text(aes(label = car),
color = "grey",
hjust = rep(c(-0.15, 1.3), each = 25),
show.legend = FALSE, size = 3
) +
geom_line(aes(group = car), color = "grey") +
ylim(c(miny, maxy)) +
geom_point(size = 3, show.legend = FALSE) +
geom_text(data = permed_means,
aes(label = mean_label, y = 0.318),
parse = TRUE, show.legend = FALSE) +
labs(
x = "Randomized brand of tire",
y = NULL,
title = "Sorted data"
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))
shuffbrand + shuffbrandfull
```
Figure \@ref(fig:tiredatarand1) presents a second randomization of the data.
Notice how the two observations from the same car are linked by a grey line; some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).
```{r tiredatarand1, fig.cap = "A second randomization where the brand is randomly swapped (or not) across the two tread wear measurements (in cm) from the same car."}
set.seed(47)
tires1 <- tires %>%
group_by(car) %>%
mutate(random_brand = sample(brand))
permed_means <- tires1 %>%
group_by(random_brand) %>%
summarize(mean_tread = round(mean(tread), 4)) %>%
mutate(mean_label = c("bar(x)[QSsim2]", "bar(x)[STsim2]")) %>%
mutate(mean_label = paste(mean_label, "==", mean_tread))
ggplot(tires1, aes(x = random_brand, y = tread,
color = random_brand, shape = random_brand)) +
geom_boxplot(show.legend = FALSE, color = "black",
outlier.shape = "triangle") +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 2, show.legend = FALSE) +
geom_text(
data = permed_means,
aes(label = mean_label, y = 0.318),
parse = TRUE, show.legend = FALSE
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
labs(
x = "Brand of tires, randomly assigned (2)",
y = NULL
)
```
Figure \@ref(fig:tiredatarand2) presents yet another randomization of the data.
Again, the same observations are linked by a grey line, and some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).
```{r tiredatarand2, fig.cap = "An additional randomization where the brand is randomly swapped (or not) across the two tread wear measurements (in cm) from the same car."}
set.seed(4747)
tires2 <- tires %>%
group_by(car) %>%
mutate(random_brand = sample(brand))
permed_means <- tires2 %>%
group_by(random_brand) %>%
summarize(mean_tread = round(mean(tread),4)) %>%
mutate(mean_label = c("bar(x)[QSsim3]", "bar(x)[STsim3]")) %>%
mutate(mean_label = paste(mean_label, "==", mean_tread))
ggplot(tires2, aes(x = random_brand, y = tread,
color = random_brand, shape = random_brand)) +
geom_boxplot(show.legend = FALSE, color = "black",
outlier.shape = "triangle") +
geom_line(aes(group = car), color = "grey") +
geom_point(size = 2, show.legend = FALSE) +
geom_text(
data = permed_means,
aes(label = mean_label, y = 0.318),
parse = TRUE, show.legend = FALSE
) +
scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
labs(
x = "Brand of tires, randomly assigned (3)",
y = NULL
)
```
### Observed statistic vs. null statistics
By repeating the randomization process, we can create a distribution of the average of the differences in tire treads, as seen in Figure \@ref(fig:pairRandomiz).
As expected (because the differences were generated under the null hypothesis), the center of the histogram is zero.
A line has been drawn at the observed difference which is well outside the majority of the null differences simulated from natural variability by mixing up which the tire received Smooth Turn and which received Quick Spin.
Because the observed statistic is so far away from the natural variability of the randomized differences, we are convinced that there is a difference between Smooth Turn and Quick Spin.
Our conclusion is that the extra amount of average tire tread in Smooth Turn is due to more than just natural variability: we reject $H_0$ and conclude that $\mu_{ST} \ne \mu_{QS}.$
```{r pairrandfull, fig.cap = "process of randomizing across pairs.", warning = FALSE, out.width="75%", eval = FALSE, echo = FALSE}
include_graphics("images/pairrandfull.png")
```
```{r pairRandomiz, fig.cap = "Histogram of 1,000 mean differences with tire brand randomly assigned across the two tread measurements (in cm) per pair."}
set.seed(474756)
tires_shuffled <- tires %>%
pivot_wider(id_cols = car, names_from = brand, names_prefix = "brand_",
values_from = tread) %>%
mutate(diff_tread = `brand_Quick Spin` - `brand_Smooth Turn`) %>%
specify(response = diff_tread) %>%
hypothesize(null = "point", mu = 0) %>%
generate(1000, type = "bootstrap") %>%
calculate(stat = "mean")
ggplot(tires_shuffled, aes(x = stat)) +
geom_histogram(binwidth = 0.0002) +
labs(
x = "Mean of randomized differences of tire wear\n(Quick Spin - Smooth Turn)",
title = "1,000 means of randomized differences",
y = "Count"
) +
geom_vline(xintercept = mean(brandA) - mean(brandB),
color = IMSCOL["red", "full"])
```
## Bootstrap confidence interval for the mean paired difference
For both the bootstrap and the mathematical models applied to paired data, the analysis is virtually identical to the one-sample approach given in Chapter \@ref(inference-one-mean).
The key to working with paired data (for bootstrapping and mathematical approaches) is to consider the measurement of interest to be the difference in measured values across the pair of observations.
```{r include=FALSE}
terms_chp_21 <- c(terms_chp_21, "bootstrap CI paired difference")
```
### Observed data
In an earlier edition of this textbook, we found that Amazon prices were, on average, lower than those of the UCLA Bookstore for UCLA courses in 2010.
It's been several years, and many stores have adapted to the online market, so we wondered, how is the UCLA Bookstore doing today?
We sampled 201 UCLA courses.
Of those, 68 required books could be found on Amazon.
A portion of the dataset from these courses is shown in Figure \@ref(tab:textbooksDF), where prices are in US dollars.
::: {.data data-latex=""}
The [`ucla_textbooks_f18`](http://openintrostat.github.io/openintro/reference/ucla_textbooks_f18.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
:::
```{r textbooksDF}
ucla_textbooks_f18 %>%
select(subject, course_num, bookstore_new, amazon_new) %>%
mutate(price_diff = bookstore_new - amazon_new) %>%
filter(!is.na(bookstore_new) & !is.na(amazon_new)) %>%
head(4) %>%
kbl(linesep = "", booktabs = TRUE, caption = caption_helper("Four cases from the `ucla_textbooks_f18` dataset.")) %>%
kable_styling(bootstrap_options = c("striped", "condensed"),
latex_options = c("striped", "hold_position"), full_width = FALSE)
```
\index{paired data}
Each textbook has two corresponding prices in the dataset: one for the UCLA Bookstore and one for Amazon.
When two sets of observations have this special correspondence, they are said to be **paired**.
### Variability of the statistic
Following the example of bootstrapping the one-sample statistic, the observed *differences* can be bootstrapped in order to understand the variability of the average difference from sample to sample.
Remember, the differences act as a single value to bootstrap.
That is, the original dataset would include the list of 68 price differences, and each resample will also include 68 price differences (some repeated through the bootstrap resampling process).
The bootstrap procedure for paired differences is quite similar to the procedure applied to the one-sample statistic case in Section \@ref(boot1mean).
In Figure \@ref(fig:pairboot), two 99% confidence intervals for the difference in the cost of a new book at the UCLA bookstore compared with Amazon have been calculated.
The bootstrap percentile confidence interval is computing using the 0.5 percentile and 99.5 percentile bootstrapped differences and is found to be (\$0.25, \$7.87).
::: {.guidedpractice data-latex=""}
Using the histogram of bootstrapped difference in means, estimate the standard error of the mean of the sample differences, $\bar{x}_{diff}.$[^inference-paired-means-1]
:::
[^inference-paired-means-1]: The bootstrapped differences in sample means vary roughly from 0.7 to 7.5, a range of \$6.80.
Although the bootstrap distribution is not symmetric, we use the empirical rule (that with bell-shaped distributions, most observations are within two standard errors of the center), the standard error of the mean differences is approximately \$1.70.
You might note that the standard error calculation given in Section \@ref(mathpaired) is $SE(\bar{x}_{diff}) = \sqrt{s^2_{diff}/n_{diff}}\\ = \sqrt{13.4^2/68} = \$1.62$ (values from Section \@ref(mathpaired)), very close to the bootstrap approximation.
The bootstrap SE interval is found by computing the SE of the bootstrapped differences $(SE_{\overline{x}_{diff}} = \$1.64)$ and the normal multiplier of $z^* = 2.58.$ The averaged difference is $\bar{x} = \$3.58.$ The 99% confidence interval is: $\$3.58 \pm 2.58 \times \$ 1.64 = (\$-0.65, \$7.81).$
The confidence intervals seem to indicate that the UCLA bookstore price is, on average, higher than the Amazon price, as the majority of the confidence interval is positive.
However, if the analysis required a strong degree of certainty (e.g., 99% confidence), and the bootstrap SE interval was most appropriate (given a second course in statistics the nuances of the methods can be investigated), the results of which book seller is higher is not well determined (because the bootstrap SE interval overlaps zero).
That is, the 99% bootstrap SE interval gives potential for UCLA to be lower, on average, than Amazon (because of the possible negative values for the true mean difference in price).
```{r pairboot, fig.cap = "(ref:pairboot-cap)"}
diff_price <- ucla_textbooks_f18 %>%
select(subject, course_num, bookstore_new, amazon_new) %>%
mutate(price_diff = bookstore_new - amazon_new) %>%
filter(!is.na(bookstore_new) & !is.na(amazon_new)) %>%
specify(response = price_diff) %>%
calculate(stat = "mean")
boot_diff <- ucla_textbooks_f18 %>%
select(subject, course_num, bookstore_new, amazon_new) %>%
mutate(price_diff = bookstore_new - amazon_new) %>%
filter(!is.na(bookstore_new) & !is.na(amazon_new)) %>%
specify(response = price_diff) %>%
generate(reps = 1000, type = "bootstrap") %>%
calculate(stat = "mean")
ci_perc_diff <- boot_diff %>%
get_confidence_interval(level = 0.99, type = "percentile")
ci_se_diff <- boot_diff %>%
get_confidence_interval(level = 0.99, type = "se",
point_estimate = diff_price)
boot_diff %>%
infer::visualize(bins = 20) +
infer::shade_confidence_interval(ci_perc_diff,
color = IMSCOL["blue", "full"],
fill = NULL, size = 1, linetype = "dashed"
) +
infer::shade_confidence_interval(ci_se_diff,
color = IMSCOL["red", "full"],
fill = NULL, size = 1, linetype = "dotted"
) +
geom_vline(xintercept = diff_price$stat, size = 1) +
geom_line(aes(y = replicate, x = stat, color = "a", linetype = "a"), alpha = 0, size = 1) + # bogus code
geom_line(aes(y = replicate, x = stat, color = "b", linetype = "b"), alpha = 0, size = 1) + # bogus code
geom_line(aes(y = replicate, x = stat, color = "c", linetype = "c"), alpha = 0) + # bogus code
ylim(c(0, 200)) +
guides(color = guide_legend(override.aes = list(alpha = 1))) +
scale_color_manual(
name = NULL,
values = c("a" = IMSCOL["blue", "full"], "b" = IMSCOL["red", "full"], "c" = IMSCOL["black", "full"]),
labels = c("Percentile\ninterval\n", "SE\ninterval\n", "Observed\nstatistic"),
guide = "legend"
) +
scale_linetype_manual(
name = NULL,
values = c("a" = "dashed", "b" = "dotted", "c" = "solid"),
labels = c("Percentile\ninterval\n", "SE\ninterval\n", "Observed\nstatistic"),
guide = "legend"
) +
labs(
x = "Mean of bootstrapped differences of price\n(UCLA - Amazon)",
title = "1,000 means of bootstrapped differences",
y = "Count"
) +
theme(
legend.position = c(0.925, 0.8),
legend.background = element_rect(color = "white")
)
```
(ref:pairboot-cap) Bootstrap distribution for the average difference in new book price at the UCLA bookstore versus Amazon. 99% confidence intervals are superimposed using blue dashed (bootstrap percentile interval) and red dotted (bootstrap SE interval) lines.
\clearpage
## Mathematical model for the mean paired difference {#mathpaired}
Thinking about the differences as a single observation on an observational unit changes the paired setting into the one-sample setting.
The mathematical model for the one-sample case is covered in Section \@ref(one-mean-math).
### Observed data
To analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations.
In the textbook data, we look at the differences in prices, which is represented as the `price_difference` variable in the dataset.
Here the differences are taken as
$$\text{UCLA Bookstore price} - \text{Amazon price}$$
It is important that we always subtract using a consistent order; here Amazon prices are always subtracted from UCLA prices.
The first difference shown in Table \@ref(tab:textbooksDF) is computed as $47.97 - 47.45 = 0.52.$ Similarly, the second difference is computed as $14.26 - 13.55 = 0.71,$ and the third is $13.50 - 12.53 = 0.97.$ A histogram of the differences is shown in Figure \@ref(fig:diffInTextbookPricesF18).
```{r diffInTextbookPricesF18, fig.cap = "Histogram of the difference in price for each book sampled."}
ucla_textbooks_f18 <- ucla_textbooks_f18 %>%
select(subject, course_num, bookstore_new, amazon_new) %>%
filter(!is.na(bookstore_new), !is.na(amazon_new)) %>%
mutate(price_diff = bookstore_new - amazon_new)
ggplot(ucla_textbooks_f18, aes(x = price_diff)) +
geom_histogram(binwidth = 10) +
scale_x_continuous(labels = label_dollar(accuracy = 1), limits = c(-20, 100), breaks = seq(-20, 100, 20)) +
labs(
x = "UCLA Bookstore Price - Amazon Price (USD)",
y = "Count"
)
```
### Variability of the statistic
To analyze a paired dataset, we simply analyze the differences.
Table \@ref(tab:textbooksSummaryStats) provides the data summaries from the textbook data.
Note that instead of reporting the prices separately for UCLA and Amazon, the summary statistics are given by the mean of the differences, the standard deviation of the differences, and the total number of pairs (i.e., differences).
The parameter of interest is also a single value, $\mu_{diff},$ so we can use the same $t$-distribution techniques we applied in Section \@ref(one-mean-math) directly onto the observed differences.
```{r textbooksSummaryStats}
ucla_textbooks_f18 %>%
summarise(
n = n(),
mean = mean(price_diff),
sd = sd(price_diff)
) %>%
kbl(linesep = "", booktabs = TRUE, caption = "Summary statistics for the 68 price differences.",
col.names = c("n", "Mean", "SD"),
align = "ccc", digits = 2) %>%
kable_styling(bootstrap_options = c("striped", "condensed"),
latex_options = c("striped", "hold_position"), full_width = FALSE) %>%
column_spec(1, width = "8em") %>%
column_spec(2, width = "8em") %>%
column_spec(3, width = "8em")
```
::: {.workedexample data-latex=""}
Set up a hypothesis test to determine whether, on average, there is a difference between Amazon's price for a book and the UCLA bookstore's price.
Also, check the conditions for whether we can move forward with the test using the $t$-distribution.
------------------------------------------------------------------------
We are considering two scenarios: there is no difference or there is some difference in average prices.
- $H_0:$ $\mu_{diff} = 0.$ There is no difference in the average textbook price.
- $H_A:$ $\mu_{diff} \neq 0.$ There is a difference in average prices.
Next, we check the independence and normality conditions.
The observations are based on a simple random sample, so assuming the textbooks are independent seems reasonable.
While there are some outliers, $n = 68$ and none of the outliers are particularly extreme, so the normality of $\bar{x}$ is satisfied.
With these conditions satisfied, we can move forward with the $t$-distribution.
:::
### Observed statistic vs. null statistics
As mentioned previously, the methods applied to a difference will be identical to the one-sample techniques.
Therefore, the full hypothesis test framework is presented as guided practices.
::: {.important data-latex=""}
**The test statistic for assessing a paired mean is a T.**
The T score is a ratio of how the sample mean difference varies from zero as compared to how the observations vary.
$$T = \frac{\bar{x}_{diff} - 0 }{s_{diff}/\sqrt{n_{diff}}}$$
When the null hypothesis is true and the conditions are met, T has a t-distribution with $df = n_{diff} - 1.$
Conditions:
- Independently sampled pairs.
- Large samples and no extreme outliers.
:::
```{r include=FALSE}
terms_chp_21 <- c(terms_chp_21, "T score paired difference")
```
::: {.workedexample data-latex=""}
Complete the hypothesis test started in the previous Example.
------------------------------------------------------------------------
To compute the test compute the standard error associated with $\bar{x}_{diff}$ using the standard deviation of the differences $(s_{diff} = 13.42)$ and the number of differences $(n_{diff} = 68):$
$$SE_{\bar{x}_{diff}} = \frac{s_{diff}}{\sqrt{n_{diff}}} = \frac{13.42}{\sqrt{68}} = 1.63$$
The test statistic is the T score of $\bar{x}_{diff}$ under the null condition that the actual mean difference is 0:
$$T = \frac{\bar{x}_{diff} - 0}{SE_{\bar{x}_{diff}}} = \frac{3.58 - 0}{1.63} = 2.20$$
To visualize the p-value, the sampling distribution of $\bar{x}_{diff}$ is drawn as though $H_0$ is true, and the p-value is represented by the two shaded tails in the figure below.
The degrees of freedom is $df = 68 - 1 = 67.$ Using statistical software, we find the one-tail area of 0.0156.
```{r textbooksF18HTTails, fig.asp = 0.5, out.width = "60%"}
m <- mean(ucla_textbooks_f18$price_diff)
s <- sd(ucla_textbooks_f18$price_diff)
se <- s / sqrt(length(ucla_textbooks_f18$price_diff))
z <- m / se
normTail(
L = -abs(m),
U = abs(m),
s = se,
df = 20,
col = IMSCOL["blue", "full"],
axes = FALSE
)
at <- c(-100, 0, m, 100)
labels <- expression(0, mu[0] * " = 0", bar(x)[diff] * " = 2.98", 0)
axis(1, at, labels, cex.axis = 0.9)
```
Doubling this area gives the p-value: 0.0312.
Because the p-value is less than 0.05, we reject the null hypothesis.
Amazon prices are, on average, lower than the UCLA Bookstore prices for UCLA courses.
:::
Recall that the margin of error is defined by the standard error.
The margin of error for $\bar{x}_{diff}$ can be directly obtained from $SE(\bar{x}_{diff}).$
::: {.important data-latex=""}
**Margin of error for** $\bar{x}_{diff}.$
The margin of error is $t^\star_{df} \times s_{diff}/\sqrt{n_{diff}}$ where $t^\star_{df}$ is calculated from a specified percentile on the t-distribution with *df* degrees of freedom.
:::
::: {.workedexample data-latex=""}
Create a 95% confidence interval for the average price difference between books at the UCLA bookstore and books on Amazon.
------------------------------------------------------------------------
Conditions have already verified and the standard error computed in a previous Example.\
To find the confidence interval, identify $t^{\star}_{67}$ using statistical software or the $t$-table $(t^{\star}_{67} = 2.00),$ and plug it, the point estimate, and the standard error into the confidence interval formula:
$$
\begin{aligned}
\text{point estimate} \ &\pm \ z^{\star} \ \times \ SE \\
3.58 \ &\pm \ 2.00 \ \times \ 1.63 \\
(0.32 \ &, \ 6.84)
\end{aligned}
$$
We are 95% confident that the UCLA Bookstore is, on average, between \$0.32 and \$6.84 more expensive than Amazon for UCLA course books.
:::
::: {.guidedpractice data-latex=""}
We have convincing evidence that Amazon is, on average, less expensive.
How should this conclusion affect UCLA student buying habits?
Should UCLA students always buy their books on Amazon?[^inference-paired-means-2]
:::
[^inference-paired-means-2]: The average price difference is only mildly useful for this question.
Examine the distribution shown in Figure \@ref(fig:diffInTextbookPricesF18).
There are certainly a handful of cases where Amazon prices are far below the UCLA Bookstore's, which suggests it is worth checking Amazon (and probably other online sites) before purchasing.
However, in many cases the Amazon price is above what the UCLA Bookstore charges, and most of the time the price isn't that different.
Ultimately, if getting a book immediately from the bookstore is notably more convenient, e.g., to get started on reading or homework, it's likely a good idea to go with the UCLA Bookstore unless the price difference on a specific book happens to be quite large.
For reference, this is a very different result from what we (the authors) had seen in a similar dataset from 2010.
At that time, Amazon prices were almost uniformly lower than those of the UCLA Bookstore's and by a large margin, making the case to use Amazon over the UCLA Bookstore quite compelling at that time.
Now we frequently check multiple websites to find the best price.
\index{paired}
```{r include=FALSE}
terms_chp_21 <- c(terms_chp_21, "paired difference t-test", "paired difference CI")
```
\clearpage
## Chapter review {#chp21-review}
### Summary
Like the two independent sample procedures in Chapter \@ref(inference-two-means), the paired difference analysis can be done using a t-distribution.
The randomization test applied to the paired differences is slightly different, however.
Note that when randomizing under the paired setting, each null statistic is created by randomly assigning the group to a numerical outcome **within** the individual observational unit.
The procedure for creating a confidence interval for the paired difference is almost identical to the confidence intervals created in Chapter \@ref(inference-one-mean) for a single mean.
### Terms
We introduced the following terms in the chapter.
If you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.
We are purposefully presenting them in alphabetical order, instead of in order of appearance, so they will be a little more challenging to locate.
However you should be able to easily spot them as **bolded text**.
```{r}
make_terms_table(terms_chp_21)
```
\clearpage
## Exercises {#chp21-exercises}
Answers to odd numbered exercises can be found in Appendix \@ref(exercise-solutions-21).
::: {.exercises data-latex=""}
```{r exercises-21, child = "exercises/21-ex-inference-paired-means.Rmd"}
```
:::