21-inference-paired-means.Rmd

# Inference for comparing paired means {#inference-paired-means}

```{r, include = FALSE}
source("_common.R")
```

::: {.chapterintro data-latex=""}
In Chapter \@ref(inference-two-means) analysis was done to compare the average population value across two different groups.
Recall that one of the important conditions in doing a two-sample analysis is that the two groups are independent.
Here, independence across groups means that knowledge of the observations in one group doesn't change what we'd expect to happen in the other group.
But what happens if the groups are **dependent**?
Sometimes dependency is not something that can be addressed through a statistical method.
However, a particular dependency, **pairing**, can be modeled quite effectively using many of the same tools we have already covered in this text.
:::

Paired data represent a particular type of experimental structure where the analysis is somewhat akin to a one-sample analysis (see Chapter \@ref(inference-one-mean)) but has other features that resemble a two-sample analysis (see Chapter \@ref(inference-two-means)).
As with a two-sample analysis, quantitative measurements are made on each of two different levels of the explanatory variable.
However, because the observational unit is **paired** across the two groups, the two measurements are subtracted such that only the difference is retained.
Table \@ref(tab:pairedexamples) presents some examples of studies where paired designs were implemented.

```{r pairedexamples}
paired_study_examples <- tribble(
  ~variable, ~col1, ~col2, ~col3,
  "Car", "Smooth Turn vs Quick Spin", "amount of tire tread after 1,000 miles", "difference in tread",
  "Textbook", "UCLA vs Amazon", "price of new text", "difference in price",
  "Individual person", "Pre-course vs Post-course", "exam score", "difference in score"
)

paired_study_examples %>%
  kbl(linesep = "", booktabs = TRUE, 
    caption = "Examples of studies where a paired design is used to measure the difference in the measurement over two conditions.",
    col.names = c("Observational unit", "Comparison groups", "Measurement", "Value of interest")
  ) %>%
  kable_styling(bootstrap_options = c("striped", "condensed"), 
                latex_options = c("striped", "hold_position"), full_width = TRUE) %>%
  column_spec(1:3, width = "8em")
```

::: {.important data-latex=""}
**Paired data.**

Two sets of observations are *paired* if each observation in one set has a special correspondence or connection with exactly one observation in the other dataset.
:::

It is worth noting that if mathematical modeling is chosen as the analysis tool, paired data inference on the difference in measurements will be identical to the one-sample mathematical techniques described in Chapter \@ref(inference-one-mean).
However, recall from Chapter \@ref(inference-one-mean) that with pure one-sample data, the computational tools for hypothesis testing are not easy to implement and were not presented (although the bootstrap was presented as a computational approach for constructing a one sample confidence interval).
With paired data, the randomization test fits nicely with the structure of the experiment and is presented here.

```{r include=FALSE}
terms_chp_21 <- c("paired data")
```

## Randomization test for the mean paired difference

Consider an experiment done to measure whether tire brand Smooth Turn or tire brand Quick Spin has longer tread wear (in cm).
That is, after 1,000 miles on a car, which brand of tires has more tread, on average?

### Observed data

The observed data represent 25 tread measurements (in cm) taken on 25 tires of Smooth Turn and 25 tires of Quick Spin.
The study used a total of 25 cars, so on each car, one tire was of Smooth Turn and one was of Quick Spin.
Figure \@ref(fig:tiredata) presents the observed data, calculations on tread remaining (in cm).

The Smooth Turn manufacturer looks at the box plot and says:

> *Clearly the tread on Smooth Turn tires is higher, on average, than the tread on Quick Spin tires after 1,000 miles of driving.*

The Quick Spin manufacturer is skeptical and retorts:

> *But with only 25 cars, it seems that the variability in road conditions (sometimes one tire hits a pothole, etc.) could be what leads to the small difference in average tread amount.*

```{r tiredata, fig.cap = "Boxplots of the tire tread data (in cm) and the brand of tire from which the original measurements came."}
set.seed(47)
brandA <- rnorm(25, 0.310, 0.003)
brandB <- rnorm(25, 0.308, 0.003)
car <- c(paste("car", 1:25))
miny <- min(brandA, brandB) - .003
maxy <- max(brandA, brandB) + .003

tires <- tibble(
  tread = c(brandA, brandB),
  car = rep(car, 2),
  brand = c(rep("Smooth Turn", 25), rep("Quick Spin", 25))
)

orig_means <- tires %>%
  group_by(brand) %>%
  summarize(mean_tread = round(mean(tread), 4)) %>%
  mutate(
    mean_label = c("bar(x)[QS]", "bar(x)[ST]"),
    mean_label = paste(mean_label, "==", mean_tread)
  )

ggplot(tires, aes(x = brand, y = tread, 
                  color = brand, shape = brand)) +
  geom_boxplot(show.legend = FALSE, 
               outlier.shape = "triangle") +
  geom_text(aes(label = car),
    color = "grey",
    hjust = rep(c(-0.15, 1.3), each = 25),
    show.legend = FALSE, size = 4
  ) +
  geom_line(aes(group = car), color = "grey", size = 0.25) +
  geom_point(show.legend = FALSE) +
  geom_text(
    data = orig_means, 
    aes(label = mean_label, y = 0.318), 
    parse = TRUE, show.legend = FALSE
    ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  labs(
    x = "Tire brand",
    y = NULL,
    title = "Original data"
  )
```

We'd like to be able to systematically distinguish between what the Smooth Turn manufacturer sees in the plot and what the Quick Spin manufacturer sees in the plot.
Fortunately for us, we have an excellent way to simulate the natural variability (from road conditions, etc.) that can lead to tires being worn at different rates.

### Variability of the statistic

A randomization test will identify whether the differences seen in the box plot of the original data in Figure \@ref(fig:tiredata) could have happened just by chance variability.
As before, we will simulate the variability in the study under the assumption that the null hypothesis is true.
In this study, the null hypothesis is that average tire tread wear is the same across Smooth Turn and Quick Spin tires.

-   $H_0: \mu_{diff} = 0,$ the average tread wear is the same for the two tire brands.
-   $H_A: \mu_{diff} \ne 0,$ the average tread wear is different across the two tire brands.

When observations are paired, the randomization process randomly assigns the tire brand to each of the observed tread values.
Note that in the randomization test for the two-sample mean setting (see Section \@ref(rand2mean)) the explanatory variable was *also* randomly assigned to the responses.
The change in the paired setting, however, is that the assignment happens *within* an observational unit (here, a car).
Remember, if the null hypothesis is true, it will not matter which brand is put on which tire because the overall tread wear will be the same across pairs.

Figures \@ref(fig:tiredata4) and \@ref(fig:tiredata5) show that the random assignment of group (tire brand) happens within a single car.
That is, every single car will still have one tire of each type.
In the first randomization, it just so happens that the 4th car's tire brands were swapped and the 5th car's tire brands were not swapped.

```{r tiredata4, fig.cap = "The 4th car: the tire brand was randomly permuted, and in the randomization calculation, the measurements (in cm) ended up in different groups."}

set.seed(47)
permdata <- tires %>%
  group_by(car) %>%
  mutate(random_brand = sample(brand))

origplot4 <- tires %>%
  filter(car == "car 4") %>%
  ggplot(aes(x = brand, y = tread,
            color = brand, shape = brand)) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3, show.legend = FALSE) +
  geom_text(aes(label = car), color = "darkgrey", 
            hjust = rep(c(-0.15, 1.3), each = 1),
            show.legend = FALSE, size = 6) +
  ylim(c(miny, maxy)) + 
  labs(
    x = "Brand of tire",
    y = NULL,
    color = NULL, shape = NULL,
    title = "Original data"
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))

shuffbrand4 <- permdata %>%
  filter(car == "car 4") %>%
  ggplot(aes(x = brand, y = tread, 
             color = random_brand, shape = random_brand)) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3) +
  geom_text(aes(label = car),color = "darkgrey", 
            hjust = rep(c(-0.15,1.3), each = 1),
            show.legend = FALSE, size = 6) +
  ylim(c(miny, maxy)) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  theme_void() +
  annotate(
    "text",
    x = 1.5, y = 0.315,
    label = "Shuffled assignment\nof tire brand",
    size = 4
  ) +
  theme(
    legend.position = c(0.6, 0.1),
    legend.background = element_rect(color = "white")
  ) +
  labs(color = NULL, shape = NULL)

origplot4 + shuffbrand4
```

```{r tiredata5, fig.cap = "The 5th car: the tire brand was randomly permuted to stay the same! In the randomization calculation, the measurements (in cm) ended up in the original groups."}
origplot5 <- tires %>%
  filter(car == "car 5") %>%
  ggplot(aes(x = brand, y = tread, 
             color = brand, shape = brand)) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3, show.legend = FALSE) +
  geom_text(aes(label = car), color = "darkgrey", 
            hjust = rep(c(-0.15, 1.3), each = 1),
            show.legend = FALSE, size = 6) +
  ylim(c(miny, maxy)) + 
  labs(
    x = "Brand of tire",
    y = NULL,
    color = NULL, shape = NULL,
    title = "Original data"
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))

shuffbrand5 <- permdata %>%
  filter(car == "car 5") %>%
  ggplot(aes(x = brand, y = tread, 
             color = random_brand, shape = random_brand)) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3) +
  geom_text(aes(label = car),color = "darkgrey", 
            hjust = rep(c(-0.15, 1.3), each = 1),
            show.legend = FALSE, size = 6) +
  ylim(c(miny, maxy)) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  theme_void() +
  annotate(
    "text",
    x = 1.5, y = 0.315,
    label = "Shuffled assignment\nof tire brand",
    size = 4
  ) +
  theme(
    legend.position = c(0.6, 0.1),
    legend.background = element_rect(color = "white")
  ) +
  labs(color = NULL, shape = NULL)

origplot5 + shuffbrand5
```

We can put the shuffled assignments for all the cars into one plot as seen in Figure \@ref(fig:tiredataPerm).

```{r tiredataPerm, fig.cap = "Tire tread data (in cm) with: the brand of tire from which the original measurements came (left) and shuffled brand assignment (right).  As evidenced by the colors, some of the cars kept their original tire assignments and some cars swapped the tire assignments.", out.width = "100%"}

origplot <- tires %>%
  ggplot(aes(x = brand, y = tread, 
             color = brand, shape = brand)) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3, show.legend = FALSE) +
  geom_text(aes(label = car), color = "grey",
            hjust = rep(c(-0.15, 1.3), each = nrow(tires)/2),
            show.legend = FALSE, size = 3) +
  ylim(c(miny, maxy)) + 
  labs(
    x = "Brand of tire",
    y = NULL,
    color = NULL, shape = NULL,
    title = "Original data"
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))

shuffbrand <- permdata %>%
  ggplot(aes(x = brand, y = tread, 
             color = random_brand, shape = random_brand)) +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 3) +
  geom_text(aes(label = car), color = "grey",
            hjust = rep(c(-0.15, 1.3), each = nrow(tires)/2),
            show.legend = FALSE, size = 3) +
  ylim(c(miny, maxy)) + 
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  theme_void() +
  theme(
    legend.position = c(0.6, 0.1),
    legend.background = element_rect(color = "white")
  ) +
  labs(
    title = "Shuffled data",
    color = NULL, shape = NULL)

origplot + shuffbrand
```

The next step in the randomization test is to sort the brands so that the assigned brand value on the x-axis aligns with the assigned group from the randomization.
See Figure \@ref(fig:tiredataPermSort) which has the same randomized groups (right image in Figure \@ref(fig:tiredataPerm) and left image in Figure \@ref(fig:tiredataPermSort)) as seen previously.
However, the right image in Figure \@ref(fig:tiredataPermSort) sorts the randomized groups so that we can measure the variability across groups as compared to the variability within groups.

```{r tiredataPermSort, fig.cap = "Tire tread from (left) randomized brand assignment, (right) sorted by randomized brand.", out.width = "100%"}

permed_means <- permdata %>%
  group_by(random_brand) %>%
  summarize(mean_tread = round(mean(tread), 4))  %>%
  mutate(mean_label = c("bar(x)[QSsim1]", "bar(x)[STsim1]")) %>%
  mutate(mean_label = paste(mean_label, "==", mean_tread))

shuffbrandfull <- permdata %>%
  ggplot(aes(x = random_brand, y = tread,
             color = random_brand, shape = random_brand)) +
  geom_text(aes(label = car),
    color = "grey",
    hjust = rep(c(-0.15, 1.3), each = 25),
    show.legend = FALSE, size = 3
  ) +
  geom_line(aes(group = car), color = "grey") +
  ylim(c(miny, maxy)) +
  geom_point(size = 3, show.legend = FALSE) +
  geom_text(data = permed_means,
            aes(label = mean_label, y = 0.318), 
            parse = TRUE, show.legend = FALSE) +
  labs(
    x = "Randomized brand of tire",
    y = NULL,
    title = "Sorted data"
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"]))

shuffbrand + shuffbrandfull
```

Figure \@ref(fig:tiredatarand1) presents a second randomization of the data.
Notice how the two observations from the same car are linked by a grey line; some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).

```{r tiredatarand1, fig.cap = "A second randomization where the brand is randomly swapped (or not) across the two tread wear measurements (in cm) from the same car."}
set.seed(47)
tires1 <- tires %>%
  group_by(car) %>%
  mutate(random_brand = sample(brand))

permed_means <- tires1 %>%
  group_by(random_brand) %>%
  summarize(mean_tread = round(mean(tread), 4)) %>%
  mutate(mean_label = c("bar(x)[QSsim2]", "bar(x)[STsim2]")) %>%
  mutate(mean_label = paste(mean_label, "==", mean_tread))

ggplot(tires1, aes(x = random_brand, y = tread, 
                   color = random_brand, shape = random_brand)) +
  geom_boxplot(show.legend = FALSE, color = "black",
               outlier.shape = "triangle") +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 2, show.legend = FALSE) +
  geom_text(
    data = permed_means,
    aes(label = mean_label, y = 0.318), 
    parse = TRUE, show.legend = FALSE
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  labs(
    x = "Brand of tires, randomly assigned (2)",
    y = NULL
  )
```

Figure \@ref(fig:tiredatarand2) presents yet another randomization of the data.
Again, the same observations are linked by a grey line, and some of the tread values have been randomly assigned to the opposite tire brand than they were originally (while some are still connected to their original tire brands).

```{r tiredatarand2, fig.cap = "An additional randomization where the brand is randomly swapped (or not) across the two tread wear measurements (in cm) from the same car."}
set.seed(4747)
tires2 <- tires %>%
  group_by(car) %>%
  mutate(random_brand = sample(brand))

permed_means <- tires2 %>%
  group_by(random_brand) %>%
  summarize(mean_tread = round(mean(tread),4))  %>%
  mutate(mean_label = c("bar(x)[QSsim3]", "bar(x)[STsim3]")) %>%
  mutate(mean_label = paste(mean_label, "==", mean_tread))

ggplot(tires2, aes(x = random_brand, y = tread, 
                   color = random_brand, shape = random_brand)) +
  geom_boxplot(show.legend = FALSE, color = "black",
               outlier.shape = "triangle") +
  geom_line(aes(group = car), color = "grey") +
  geom_point(size = 2, show.legend = FALSE) +
  geom_text(
    data = permed_means,
    aes(label = mean_label, y = 0.318), 
    parse = TRUE, show.legend = FALSE
  ) +
  scale_color_manual(values = c(IMSCOL["blue", "full"], IMSCOL["red", "full"])) +
  labs(
    x = "Brand of tires, randomly assigned (3)",
    y = NULL
  )
```

### Observed statistic vs. null statistics

By repeating the randomization process, we can create a distribution of the average of the differences in tire treads, as seen in Figure \@ref(fig:pairRandomiz).
As expected (because the differences were generated under the null hypothesis), the center of the histogram is zero.
A line has been drawn at the observed difference which is well outside the majority of the null differences simulated from natural variability by mixing up which the tire received Smooth Turn and which received Quick Spin.
Because the observed statistic is so far away from the natural variability of the randomized differences, we are convinced that there is a difference between Smooth Turn and Quick Spin.
Our conclusion is that the extra amount of average tire tread in Smooth Turn is due to more than just natural variability: we reject $H_0$ and conclude that $\mu_{ST} \ne \mu_{QS}.$

```{r pairrandfull, fig.cap = "process of randomizing across pairs.", warning = FALSE,  out.width="75%", eval = FALSE, echo = FALSE}
include_graphics("images/pairrandfull.png")
```

```{r pairRandomiz, fig.cap = "Histogram of 1,000 mean differences with tire brand randomly assigned across the two tread measurements (in cm) per pair."}
set.seed(474756)
tires_shuffled <- tires %>%
  pivot_wider(id_cols = car, names_from = brand, names_prefix = "brand_",
              values_from = tread) %>%
  mutate(diff_tread = `brand_Quick Spin` - `brand_Smooth Turn`) %>%
  specify(response = diff_tread) %>%
  hypothesize(null = "point", mu = 0) %>%
  generate(1000, type = "bootstrap") %>%
  calculate(stat = "mean")

ggplot(tires_shuffled, aes(x = stat)) +
  geom_histogram(binwidth = 0.0002) +
  labs(
    x = "Mean of randomized differences of tire wear\n(Quick Spin - Smooth Turn)",
    title = "1,000 means of randomized differences",
    y = "Count"
  ) +
  geom_vline(xintercept = mean(brandA) - mean(brandB),
             color = IMSCOL["red", "full"])
```

## Bootstrap confidence interval for the mean paired difference

For both the bootstrap and the mathematical models applied to paired data, the analysis is virtually identical to the one-sample approach given in Chapter \@ref(inference-one-mean).
The key to working with paired data (for bootstrapping and mathematical approaches) is to consider the measurement of interest to be the difference in measured values across the pair of observations.

```{r include=FALSE}
terms_chp_21 <- c(terms_chp_21, "bootstrap CI paired difference")
```

### Observed data

In an earlier edition of this textbook, we found that Amazon prices were, on average, lower than those of the UCLA Bookstore for UCLA courses in 2010.
It's been several years, and many stores have adapted to the online market, so we wondered, how is the UCLA Bookstore doing today?

We sampled 201 UCLA courses.
Of those, 68 required books could be found on Amazon.
A portion of the dataset from these courses is shown in Figure \@ref(tab:textbooksDF), where prices are in US dollars.

::: {.data data-latex=""}
The [`ucla_textbooks_f18`](http://openintrostat.github.io/openintro/reference/ucla_textbooks_f18.html) data can be found in the [**openintro**](http://openintrostat.github.io/openintro) R package.
:::

```{r textbooksDF}
ucla_textbooks_f18 %>% 
  select(subject, course_num, bookstore_new, amazon_new) %>%
  mutate(price_diff = bookstore_new - amazon_new) %>%
  filter(!is.na(bookstore_new) & !is.na(amazon_new)) %>%
  head(4) %>%
  kbl(linesep = "", booktabs = TRUE, caption = caption_helper("Four cases from the `ucla_textbooks_f18` dataset.")) %>%
  kable_styling(bootstrap_options = c("striped", "condensed"), 
                latex_options = c("striped", "hold_position"), full_width = FALSE) 
```

\index{paired data}

Each textbook has two corresponding prices in the dataset: one for the UCLA Bookstore and one for Amazon.
When two sets of observations have this special correspondence, they are said to be **paired**.

### Variability of the statistic

Following the example of bootstrapping the one-sample statistic, the observed *differences* can be bootstrapped in order to understand the variability of the average difference from sample to sample.
Remember, the differences act as a single value to bootstrap.
That is, the original dataset would include the list of 68 price differences, and each resample will also include 68 price differences (some repeated through the bootstrap resampling process).
The bootstrap procedure for paired differences is quite similar to the procedure applied to the one-sample statistic case in Section \@ref(boot1mean).

In Figure \@ref(fig:pairboot), two 99% confidence intervals for the difference in the cost of a new book at the UCLA bookstore compared with Amazon have been calculated.
The bootstrap percentile confidence interval is computing using the 0.5 percentile and 99.5 percentile bootstrapped differences and is found to be (\$0.25, \$7.87).

::: {.guidedpractice data-latex=""}
Using the histogram of bootstrapped difference in means, estimate the standard error of the mean of the sample differences, $\bar{x}_{diff}.$[^inference-paired-means-1]
:::

[^inference-paired-means-1]: The bootstrapped differences in sample means vary roughly from 0.7 to 7.5, a range of \$6.80.
    Although the bootstrap distribution is not symmetric, we use the empirical rule (that with bell-shaped distributions, most observations are within two standard errors of the center), the standard error of the mean differences is approximately \$1.70.
    You might note that the standard error calculation given in Section \@ref(mathpaired) is $SE(\bar{x}_{diff}) = \sqrt{s^2_{diff}/n_{diff}}\\ = \sqrt{13.4^2/68} = \$1.62$ (values from Section \@ref(mathpaired)), very close to the bootstrap approximation.

The bootstrap SE interval is found by computing the SE of the bootstrapped differences $(SE_{\overline{x}_{diff}} = \$1.64)$ and the normal multiplier of $z^* = 2.58.$ The averaged difference is $\bar{x} = \$3.58.$ The 99% confidence interval is: $\$3.58 \pm 2.58 \times \$ 1.64 = (\$-0.65, \$7.81).$

The confidence intervals seem to indicate that the UCLA bookstore price is, on average, higher than the Amazon price, as the majority of the confidence interval is positive.
However, if the analysis required a strong degree of certainty (e.g., 99% confidence), and the bootstrap SE interval was most appropriate (given a second course in statistics the nuances of the methods can be investigated), the results of which book seller is higher is not well determined (because the bootstrap SE interval overlaps zero).
That is, the 99% bootstrap SE interval gives potential for UCLA to be lower, on average, than Amazon (because of the possible negative values for the true mean difference in price).

```{r pairboot, fig.cap = "(ref:pairboot-cap)"}
diff_price <- ucla_textbooks_f18 %>% 
  select(subject, course_num, bookstore_new, amazon_new) %>%
  mutate(price_diff = bookstore_new - amazon_new) %>%
  filter(!is.na(bookstore_new) & !is.na(amazon_new)) %>%
  specify(response = price_diff) %>%
  calculate(stat = "mean")

boot_diff <- ucla_textbooks_f18 %>% 
  select(subject, course_num, bookstore_new, amazon_new) %>%
  mutate(price_diff = bookstore_new - amazon_new) %>%
  filter(!is.na(bookstore_new) & !is.na(amazon_new)) %>%
  specify(response = price_diff) %>%
  generate(reps = 1000, type = "bootstrap") %>%
  calculate(stat = "mean") 

ci_perc_diff <- boot_diff %>%
  get_confidence_interval(level = 0.99, type = "percentile")

ci_se_diff <- boot_diff %>%
  get_confidence_interval(level = 0.99, type = "se",
                          point_estimate = diff_price)
boot_diff %>%
  infer::visualize(bins = 20) +
  infer::shade_confidence_interval(ci_perc_diff,
    color = IMSCOL["blue", "full"],
    fill = NULL, size = 1, linetype = "dashed"
  ) +
  infer::shade_confidence_interval(ci_se_diff,
    color = IMSCOL["red", "full"],
    fill = NULL, size = 1, linetype = "dotted"
  ) +
  geom_vline(xintercept = diff_price$stat, size = 1) +
  geom_line(aes(y = replicate, x = stat, color = "a", linetype = "a"), alpha = 0, size = 1) + # bogus code
  geom_line(aes(y = replicate, x = stat, color = "b", linetype = "b"), alpha = 0, size = 1) + # bogus code
  geom_line(aes(y = replicate, x = stat, color = "c", linetype = "c"), alpha = 0) + # bogus code
  ylim(c(0, 200)) +
  guides(color = guide_legend(override.aes = list(alpha = 1))) +
  scale_color_manual(
    name = NULL,
    values = c("a" = IMSCOL["blue", "full"], "b" = IMSCOL["red", "full"], "c" = IMSCOL["black", "full"]),
    labels = c("Percentile\ninterval\n", "SE\ninterval\n", "Observed\nstatistic"),
    guide = "legend"
  ) +
  scale_linetype_manual(
    name = NULL,
    values = c("a" = "dashed", "b" = "dotted", "c" = "solid"),
    labels = c("Percentile\ninterval\n", "SE\ninterval\n", "Observed\nstatistic"),
    guide = "legend"    
  ) +
  labs(
    x = "Mean of bootstrapped differences of price\n(UCLA - Amazon)",
    title = "1,000 means of bootstrapped differences",
    y = "Count"
  ) +
  theme(
    legend.position = c(0.925, 0.8),
    legend.background = element_rect(color = "white")
  )
```

(ref:pairboot-cap) Bootstrap distribution for the average difference in new book price at the UCLA bookstore versus Amazon. 99% confidence intervals are superimposed using blue dashed (bootstrap percentile interval) and red dotted (bootstrap SE interval) lines.

\clearpage

## Mathematical model for the mean paired difference {#mathpaired}

Thinking about the differences as a single observation on an observational unit changes the paired setting into the one-sample setting.
The mathematical model for the one-sample case is covered in Section \@ref(one-mean-math).

### Observed data

To analyze paired data, it is often useful to look at the difference in outcomes of each pair of observations.
In the textbook data, we look at the differences in prices, which is represented as the `price_difference` variable in the dataset.
Here the differences are taken as

$$\text{UCLA Bookstore price} - \text{Amazon price}$$

It is important that we always subtract using a consistent order; here Amazon prices are always subtracted from UCLA prices.
The first difference shown in Table \@ref(tab:textbooksDF) is computed as $47.97 - 47.45 = 0.52.$ Similarly, the second difference is computed as $14.26 - 13.55 = 0.71,$ and the third is $13.50 - 12.53 = 0.97.$ A histogram of the differences is shown in Figure \@ref(fig:diffInTextbookPricesF18).

```{r diffInTextbookPricesF18, fig.cap = "Histogram of the difference in price for each book sampled."}
ucla_textbooks_f18 <- ucla_textbooks_f18 %>% 
  select(subject, course_num, bookstore_new, amazon_new) %>%
  filter(!is.na(bookstore_new), !is.na(amazon_new)) %>%
  mutate(price_diff = bookstore_new - amazon_new)

ggplot(ucla_textbooks_f18, aes(x = price_diff)) +
  geom_histogram(binwidth = 10) +
  scale_x_continuous(labels = label_dollar(accuracy = 1), limits = c(-20, 100), breaks = seq(-20, 100, 20)) +
  labs(
    x = "UCLA Bookstore Price - Amazon Price (USD)",
    y = "Count"
  )
```

### Variability of the statistic

To analyze a paired dataset, we simply analyze the differences.
Table \@ref(tab:textbooksSummaryStats) provides the data summaries from the textbook data.
Note that instead of reporting the prices separately for UCLA and Amazon, the summary statistics are given by the mean of the differences, the standard deviation of the differences, and the total number of pairs (i.e., differences).
The parameter of interest is also a single value, $\mu_{diff},$ so we can use the same $t$-distribution techniques we applied in Section \@ref(one-mean-math) directly onto the observed differences.

```{r textbooksSummaryStats}
ucla_textbooks_f18 %>%
  summarise(
    n = n(),
    mean = mean(price_diff),
    sd = sd(price_diff)
  ) %>%
  kbl(linesep = "", booktabs = TRUE, caption = "Summary statistics for the 68 price differences.",
      col.names = c("n", "Mean", "SD"),
      align = "ccc", digits = 2) %>%
  kable_styling(bootstrap_options = c("striped", "condensed"), 
                latex_options = c("striped", "hold_position"), full_width = FALSE) %>%
  column_spec(1, width = "8em") %>%
  column_spec(2, width = "8em") %>%
  column_spec(3, width = "8em")
```

::: {.workedexample data-latex=""}
Set up a hypothesis test to determine whether, on average, there is a difference between Amazon's price for a book and the UCLA bookstore's price.
Also, check the conditions for whether we can move forward with the test using the $t$-distribution.

------------------------------------------------------------------------

We are considering two scenarios: there is no difference or there is some difference in average prices.

-   $H_0:$ $\mu_{diff} = 0.$ There is no difference in the average textbook price.

-   $H_A:$ $\mu_{diff} \neq 0.$ There is a difference in average prices.

Next, we check the independence and normality conditions.
The observations are based on a simple random sample, so assuming the textbooks are independent seems reasonable.
While there are some outliers, $n = 68$ and none of the outliers are particularly extreme, so the normality of $\bar{x}$ is satisfied.
With these conditions satisfied, we can move forward with the $t$-distribution.
:::

### Observed statistic vs. null statistics

As mentioned previously, the methods applied to a difference will be identical to the one-sample techniques.
Therefore, the full hypothesis test framework is presented as guided practices.

::: {.important data-latex=""}
**The test statistic for assessing a paired mean is a T.**

The T score is a ratio of how the sample mean difference varies from zero as compared to how the observations vary.

$$T = \frac{\bar{x}_{diff} - 0 }{s_{diff}/\sqrt{n_{diff}}}$$

When the null hypothesis is true and the conditions are met, T has a t-distribution with $df = n_{diff} - 1.$

Conditions:

-   Independently sampled pairs.
-   Large samples and no extreme outliers.
:::

```{r include=FALSE}
terms_chp_21 <- c(terms_chp_21, "T score paired difference")
```

::: {.workedexample data-latex=""}
Complete the hypothesis test started in the previous Example.

------------------------------------------------------------------------

To compute the test compute the standard error associated with $\bar{x}_{diff}$ using the standard deviation of the differences $(s_{diff} = 13.42)$ and the number of differences $(n_{diff} = 68):$

$$SE_{\bar{x}_{diff}} = \frac{s_{diff}}{\sqrt{n_{diff}}} = \frac{13.42}{\sqrt{68}} = 1.63$$

The test statistic is the T score of $\bar{x}_{diff}$ under the null condition that the actual mean difference is 0:

$$T = \frac{\bar{x}_{diff} - 0}{SE_{\bar{x}_{diff}}} = \frac{3.58 - 0}{1.63} = 2.20$$

To visualize the p-value, the sampling distribution of $\bar{x}_{diff}$ is drawn as though $H_0$ is true, and the p-value is represented by the two shaded tails in the figure below.
The degrees of freedom is $df = 68 - 1 = 67.$ Using statistical software, we find the one-tail area of 0.0156.

```{r textbooksF18HTTails, fig.asp = 0.5, out.width = "60%"}
m  <- mean(ucla_textbooks_f18$price_diff)
s  <- sd(ucla_textbooks_f18$price_diff)
se <- s / sqrt(length(ucla_textbooks_f18$price_diff))
z  <- m / se

normTail(
  L = -abs(m),
  U = abs(m),
  s = se,
  df = 20,
  col = IMSCOL["blue", "full"],
  axes = FALSE
)
at <- c(-100, 0, m, 100)
labels <- expression(0, mu[0] * " = 0", bar(x)[diff] * " = 2.98", 0)
axis(1, at, labels, cex.axis = 0.9)
```

Doubling this area gives the p-value: 0.0312.
Because the p-value is less than 0.05, we reject the null hypothesis.
Amazon prices are, on average, lower than the UCLA Bookstore prices for UCLA courses.
:::

Recall that the margin of error is defined by the standard error.
The margin of error for $\bar{x}_{diff}$ can be directly obtained from $SE(\bar{x}_{diff}).$

::: {.important data-latex=""}
**Margin of error for** $\bar{x}_{diff}.$

The margin of error is $t^\star_{df} \times s_{diff}/\sqrt{n_{diff}}$ where $t^\star_{df}$ is calculated from a specified percentile on the t-distribution with *df* degrees of freedom.
:::

::: {.workedexample data-latex=""}
Create a 95% confidence interval for the average price difference between books at the UCLA bookstore and books on Amazon.

------------------------------------------------------------------------

Conditions have already verified and the standard error computed in a previous Example.\
To find the confidence interval, identify $t^{\star}_{67}$ using statistical software or the $t$-table $(t^{\star}_{67} = 2.00),$ and plug it, the point estimate, and the standard error into the confidence interval formula:

$$
\begin{aligned}
\text{point estimate} \ &\pm \ z^{\star} \ \times \ SE \\
3.58 \ &\pm \ 2.00 \ \times \ 1.63 \\
(0.32 \ &, \ 6.84)
\end{aligned}
$$

We are 95% confident that the UCLA Bookstore is, on average, between \$0.32 and \$6.84 more expensive than Amazon for UCLA course books.
:::

::: {.guidedpractice data-latex=""}
We have convincing evidence that Amazon is, on average, less expensive.
How should this conclusion affect UCLA student buying habits?
Should UCLA students always buy their books on Amazon?[^inference-paired-means-2]
:::

[^inference-paired-means-2]: The average price difference is only mildly useful for this question.
    Examine the distribution shown in Figure \@ref(fig:diffInTextbookPricesF18).
    There are certainly a handful of cases where Amazon prices are far below the UCLA Bookstore's, which suggests it is worth checking Amazon (and probably other online sites) before purchasing.
    However, in many cases the Amazon price is above what the UCLA Bookstore charges, and most of the time the price isn't that different.
    Ultimately, if getting a book immediately from the bookstore is notably more convenient, e.g., to get started on reading or homework, it's likely a good idea to go with the UCLA Bookstore unless the price difference on a specific book happens to be quite large.
    For reference, this is a very different result from what we (the authors) had seen in a similar dataset from 2010.
    At that time, Amazon prices were almost uniformly lower than those of the UCLA Bookstore's and by a large margin, making the case to use Amazon over the UCLA Bookstore quite compelling at that time.
    Now we frequently check multiple websites to find the best price.

\index{paired}

```{r include=FALSE}
terms_chp_21 <- c(terms_chp_21, "paired difference t-test", "paired difference CI")
```

\clearpage

## Chapter review {#chp21-review}

### Summary

Like the two independent sample procedures in Chapter \@ref(inference-two-means), the paired difference analysis can be done using a t-distribution.
The randomization test applied to the paired differences is slightly different, however.
Note that when randomizing under the paired setting, each null statistic is created by randomly assigning the group to a numerical outcome **within** the individual observational unit.
The procedure for creating a confidence interval for the paired difference is almost identical to the confidence intervals created in Chapter \@ref(inference-one-mean) for a single mean.

### Terms

We introduced the following terms in the chapter.
If you're not sure what some of these terms mean, we recommend you go back in the text and review their definitions.
We are purposefully presenting them in alphabetical order, instead of in order of appearance, so they will be a little more challenging to locate.
However you should be able to easily spot them as **bolded text**.

```{r}
make_terms_table(terms_chp_21)
```

\clearpage

## Exercises {#chp21-exercises}

Answers to odd numbered exercises can be found in Appendix \@ref(exercise-solutions-21).

::: {.exercises data-latex=""}
```{r exercises-21, child = "exercises/21-ex-inference-paired-means.Rmd"}
```
:::