Replication files for:
Boyer, C., Li, K.Q., Shi, X., & Tchetgen Tchetgen, T. J. (2025). “Identification and estimation of vaccine effectiveness in the test-negative design under equi-confounding”. arXiv. https://doi.org/10.48550/arXiv.2504.20360
The test-negative design (TND) is widely used to evaluate vaccine effectiveness in real-world settings. In a TND study, individuals with similar symptoms who seek care are tested, and effectiveness is estimated by comparing vaccination histories of test-positive cases and test-negative controls. The TND is often justified on the grounds that it reduces confounding due to unmeasured health-seeking behavior, although this has not been formally described using potential outcomes. At the same time, concerns persist that conditioning on test receipt can introduce selection bias. We provide a formal justification of the TND under an assumption of odds ratio equi-confounding, where unmeasured confounders affect test-positive and test-negative individuals equivalently on the odds ratio scale. Health-seeking behavior is one plausible example. We also show that these results hold under the outcome-dependent sampling used in TNDs. We discuss the design implications of the equi-confounding assumption and provide alternative estimators for the marginal risk ratio among the vaccinated under equi-confounding, including outcome modeling and inverse probability weighting estimators as well as a semiparametric estimator that is doubly robust. When equi-confounding does not hold, we suggest a straightforward sensitivity analysis that parameterizes the magnitude of the deviation on the odds ratio scale. A simulation study evaluates the empirical performance of our proposed estimators under a wide range of scenarios. Finally, we discuss broader uses of test-negative outcomes to de-bias cohort studies in which testing is triggered by symptoms.
This code requires R version 4.0 or higher. The following R packages are required:
Core simulation and analysis:
data.table- for data manipulation and simulation frameworkprogressr- for progress tracking during simulationsreadr- for reading/writing simulation resultslmtest- for linear model testing (coefci function)sandwich- for robust standard errors (vcovHC function)survival- for survival analysis functionsnumDeriv- for numerical derivatives in estimating equations
Table and figure generation:
ggplot2- for plotting simulation resultspatchwork- for combining plotstidyverse- collection of tidy data packages (includes dplyr, tidyr, stringr)dplyr- for data manipulation in table generationtidyr- for data tidyingstringr- for string manipulationkableExtra- for LaTeX table formatting
- Clone this repository:
git clone https://github.com/boyercb/parallel-tnd.git
cd parallel-tnd- Install required R packages:
install.packages(c("data.table", "ggplot2", "progressr", "survival",
"lmtest", "sandwich", "readr", "numDeriv",
"dplyr", "tidyr", "stringr", "kableExtra",
"patchwork", "tidyverse"))- Create the data directory (if it doesn't exist):
dir.create("data", showWarnings = FALSE)code/- Contains all R scripts for the simulation studyrun.R- Main script that runs all simulation scenariosdatagen.R- Data generation functionsestimators.R- Implementation of all estimatorssim.R- Simulation wrapper functionstable.R- Code to generate results tablesplot.R- Code to generate figures
data/- Data files (in this project only saved simulation results)manuscript/- LaTeX source files for the manuscriptresults/- Output files (tables and figures)
To reproduce all simulation results from the paper, run the following command in R from the project root directory:
source("code/run.R")Note: The simulation study runs:
- Scenarios 1-7: 1,000 replications each with sample size N=15,000
- Scenario 8: 2,000 replications each with sample size N=15,000 across 4 sub-scenarios
This will take several hours to complete (estimated 6-8 hours depending on your system).
After running the simulations, you can generate the tables and figures using:
# Load simulation results
sims <- readr::read_rds("data/sims.rds")
# Generate tables (creates LaTeX files in results/ directory)
source("code/table.R")
# Generate figures (creates PDF files in results/ directory)
source("code/plot.R")You can also run individual parts of the analysis:
# Load required functions
source("code/datagen.R")
source("code/estimators.R")
source("code/sim.R")
# Run a single scenario (much faster for testing)
# See run.R for all scenario definitionsThe simulation study evaluates 8 different scenarios:
- No unmeasured confounding - Baseline scenario where TND assumptions hold
- Equi-confounding - Main scenario where proposed methods should work
- Direct effect of vaccination on test-negative infection - Vaccination affects test-negative outcomes (exclusion restriction violated)
- Equi-confounding violated - Confounding differs between test-positive and test-negative
- Equi-selection violated - Selection bias differs between test-positive and test-negative
- Equal effect of vaccination on testing - Equal effects of vaccination on testing behavior
- Unequal effect of vaccination on testing - Unequal effects of vaccination on testing behavior
- Effect heterogeneity - Scenarios with treatment effect modification
- 8a: Both models correctly specified
- 8b: Propensity score model misspecified
- 8c: Outcome model misspecified
- 8d: Both models misspecified
Each scenario compares multiple estimators:
- TND estimators:
- Logistic regression (
logit_reg) - traditional TND approach - Risk ratio among vaccinated - outcome modeling (
rrv_om) - Risk ratio among vaccinated - inverse probability weighting (
rrv_ipw) - Risk ratio among vaccinated - doubly-robust (
rrv_dr)
- Logistic regression (
- Cohort estimators:
- Cohort with unmeasured confounders (
cohort_reg_U) - oracle estimator - Cohort without unmeasured confounders (
cohort_reg_noU) - naive estimator - Difference-in-differences (
did_reg) - equivalent to TND under equi-confounding
- Cohort with unmeasured confounders (
The simulation generates:
- Performance metrics: Bias, coverage probability, and confidence interval length for each estimator
- LaTeX tables: Saved to
results/directory:sims.tex- Main simulation resultssims_dr*.tex- Additional robustness results
- Figures: PDF files showing estimator performance across scenarios (
sims1.pdf,sims2.pdf) - Raw data: Complete simulation results saved as
data/sims.rds(R data format) - Sample sizes: Actual TND sample sizes achieved for each simulation
If you use this code, please cite:
Boyer, C., Li, K.Q., Shi, X., & Tchetgen Tchetgen, T. J. (2025). "Identification and estimation of vaccine effectiveness in the test-negative design under equi-confounding". arXiv. https://doi.org/10.48550/arXiv.2504.20360