Skip to content

Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing

License

Notifications You must be signed in to change notification settings

hoxo-m/deltatest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

53 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deltatest: Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing

R-CMD-check

1. Overview

In online A/B testing, we often face a significant practical challenge: the randomization unit differs from the analysis unit. Typically, control and treatment groups are randomly assigned at the user level, while metrics—such as click-through rate—are measured at a more granular level (e.g., per page-view). In this case, the randomization unit is user, but the analysis unit is page-view.

This discrepancy raises concerns for statistical hypothesis testing, which assumes that data points are independent and identically distributed (i.i.d.). Specifically, a single user can generate multiple page-views, and each user may have a different probability of clicking. As a result, the data may exhibit within-user correlation, thereby violating the i.i.d. assumption.

When the standard Z-test is applied to such correlated data, the resulting p-values do not follow the expected uniform distribution under the null hypothesis. As a result, smaller p-values tend to occur more frequently despite there being no true difference, increasing the risk of falsely detecting a significant difference.

To address this problem, Deng et al. (2018) proposed a modified statistical hypothesis testing method. This approach replaces the standard variance estimation formula in the Z-test with an approximate formula derived via the Delta method, which can account for within-user correlation. To simplify the application of this method, the deltatest package has been developed.

First, we prepare a data frame that includes columns of the number of clicks and page-views aggregated for each user. This data frame also contains a column indicating whether each user was assigned to the control or treatment group.

library(dplyr)

n_user <- 2000

set.seed(314)
data <- deltatest::generate_dummy_data(n_user) |> 
  mutate(group = if_else(group == 0, "control", "treatment")) |>
  group_by(user_id, group) |> 
  summarise(clicks = sum(metric), pageviews = n(), .groups = "drop")

data
#> # A tibble: 2,000 × 4
#>    user_id group     clicks pageviews
#>      <int> <chr>      <int>     <int>
#>  1       1 treatment      1         6
#>  2       2 treatment      2        11
#>  3       3 control        0        17
#>  4       4 control        4        12
#>  5       5 control        5        10
#>  6       6 control        1        15
#>  7       7 control        2         6
#>  8       8 treatment      2        11
#>  9       9 treatment      2        16
#> 10      10 control        0        17
#> # ℹ 1,990 more rows

To perform a statistical hypothesis test using the Delta method on this data, as follows:

library(deltatest)

deltatest(data, clicks / pageviews, by = group)
#> 
#>  Two Sample Z-test Using the Delta Method
#> 
#> data:  clicks/pageviews by group
#> Z = 0.31437, p-value = 0.7532
#> alternative hypothesis: true difference in means between control and treatment is not equal to 0
#> 95 percent confidence interval:
#>  -0.01410593  0.01949536
#> sample estimates:
#>   mean in control mean in treatment        difference 
#>       0.245959325       0.248654038       0.002694713

This version of the Z-test yields p-values that follow the expected uniform distribution under the null hypothesis, even when within-user correlation is present.

2. Installation

You can install the development version of deltatest from GitHub with:

# install.packages("remotes")
remotes::install_github("hoxo-m/deltatest")

3. Details

The deltatest package provides the deltatest function to perform the statistical hypothesis testing using the Delta method, as proposed by Deng et al. (2018).

3.1 data argument

To run deltatest, you need to prepare a data frame. It must include columns for the numerator and denominator of your metric, aggregated by each randomization unit (typically, the randomization unit is a user). For example:

  • If the metric is CTR, the numerator is the number of clicks, and the denominator is the number of page views.
  • If the metric is CVR, the numerator is the number of converted sessions, and the denominator is the number of sessions.

The generate_dummy_data function provides

library(dplyr)

n_user <- 2000

set.seed(314)
data <- deltatest::generate_dummy_data(n_user) |>
  group_by(user_id, group) |>
  summarise(clicks = sum(metric), pageviews = n(), .groups = "drop")

data
#> # A tibble: 2,000 × 4
#>    user_id group clicks pageviews
#>      <int> <int>  <int>     <int>
#>  1       1     1      1         6
#>  2       2     1      2        11
#>  3       3     0      0        17
#>  4       4     0      4        12
#>  5       5     0      5        10
#>  6       6     0      1        15
#>  7       7     0      2         6
#>  8       8     1      2        11
#>  9       9     1      2        16
#> 10      10     0      0        17
#> # ℹ 1,990 more rows

3.2 formula and by argument

  1. Standard formula
deltatest(data, clicks / pageviews ~ group)
  1. Lambda formula
deltatest(data, ~ clicks / pageviews, by = group)
  1. Expression (NSE; non-standard evaluation)
deltatest(data, clicks / pageviews, by = group)
  • With calculation (for all styles)
deltatest(data, pos / (pos + neg), by = group)

3.3 Other arguments

3.4 Return value

result <- deltatest(data, clicks / pageviews, by = group)
result
#> 
#>  Two Sample Z-test Using the Delta Method
#> 
#> data:  clicks/pageviews by group
#> Z = 0.31437, p-value = 0.7532
#> alternative hypothesis: true difference in means between control and treatment is not equal to 0
#> 95 percent confidence interval:
#>  -0.01410593  0.01949536
#> sample estimates:
#>   mean in control mean in treatment        difference 
#>       0.245959325       0.248654038       0.002694713
library(broom)

tidy(result)
#> # A tibble: 1 × 9
#>   estimate1 estimate2 estimate3 statistic p.value conf.low conf.high method     
#>       <dbl>     <dbl>     <dbl>     <dbl>   <dbl>    <dbl>     <dbl> <chr>      
#> 1     0.246     0.249   0.00269     0.314   0.753  -0.0141    0.0195 Two Sample…
#> # ℹ 1 more variable: alternative <chr>
result$info
#>   group size    x     n      mean        var          se     lower     upper
#> 1     0 1044 4124 16767 0.2459593 0.03625455 0.005892927 0.2344094 0.2575092
#> 2     1  956 3741 15045 0.2486540 0.03704608 0.006225041 0.2364532 0.2608549

ggplot(result$info, aes(group, mean)) +
  geom_pointrange(aes(ymin = lower, ymax = upper))

4. Related Work

5. References

  • Deng, A., Knoblich, U., & Lu, J. (2018). Applying the Delta Method in Metric Analytics: A Practical Guide with Novel Ideas. Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. doi:10.1145/3219819.3219919
  • Deng, A., Lu, J., & Litz, J. (2017). Trustworthy Analysis of Online A/B Tests: Pitfalls, challenges and solutions. Proceedings of the Tenth ACM International Conference on Web Search and Data Mining. doi:10.1145/3018661.3018677
  • id:sz_dr (2018). Calculating the mean and variance of the ratio of random variables using the Delta method [in Japanese]. If you are human, think more now. https://www.szdrblog.info/entry/2018/11/18/154952

About

Statistical Hypothesis Testing Using the Delta Method for Online A/B Testing

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Languages