From 3249bb03d0b896ebbad6768ef6f861b6e1778d10 Mon Sep 17 00:00:00 2001 From: Timothy Willard <9395586+TimothyWillard@users.noreply.github.com> Date: Fri, 29 May 2026 14:53:25 -0400 Subject: [PATCH] Add MERS Korea 2015 vignette Added `vignette("mers-korea-2015")` to demonstrate how to use this package on a "real world" data set. The model does not translate exactly to this data set, but is close enough for illustrating how someone could use this package with their line list. Closes #115. --- DESCRIPTION | 1 + NEWS.md | 2 +- vignettes/mers-korea-2015.Rmd | 195 ++++++++++++++++++++++++++++++++++ 3 files changed, 197 insertions(+), 1 deletion(-) create mode 100644 vignettes/mers-korea-2015.Rmd diff --git a/DESCRIPTION b/DESCRIPTION index 2313727..dcd095a 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -38,6 +38,7 @@ Suggests: ggplot2, knitr, lobstr, + outbreaks, pkgdown, posterior, rmarkdown, diff --git a/NEWS.md b/NEWS.md index 6a1ae37..b00fb73 100644 --- a/NEWS.md +++ b/NEWS.md @@ -117,7 +117,7 @@ fit_result # post-warmup draws per chain=250, total post-warmup draws=500. # ... ``` -- Added a `getting-started` vignette demonstrating the default model, from synthetic data generation through model fitting and result extraction as well as a `model-explainer` vignette to do a deep dive into the model. [#78](https://github.com/ACCIDDA/SeverityEstimate/issues/78), [#107](https://github.com/ACCIDDA/SeverityEstimate/issues/107). +- Added a `getting-started` vignette demonstrating the default model, from synthetic data generation through model fitting and result extraction as well as a `model-explainer` vignette to do a deep dive into the model. Also added a `mers-korea-2015` vignette to highlight how to use this package on a "real world" data set. [#78](https://github.com/ACCIDDA/SeverityEstimate/issues/78), [#107](https://github.com/ACCIDDA/SeverityEstimate/issues/107), [#115](https://github.com/ACCIDDA/SeverityEstimate/issues/115). - Setup `pkgdown` site hosted on GitHub pages at [accidda.github.io/SeverityEstimate/](https://accidda.github.io/SeverityEstimate/). [#28](https://github.com/ACCIDDA/SeverityEstimate/issues/28). - Enhanced documentation elements, such as the `README.md`, `CONTRIBUTING.md` to add more helpful content to the package's documentation site. [#3](https://github.com/ACCIDDA/SeverityEstimate/issues/3), [#53](https://github.com/ACCIDDA/SeverityEstimate/issues/53). - Restructured package utilizing [`rstantools`](https://mc-stan.org/rstantools/index.html). [#73](https://github.com/ACCIDDA/SeverityEstimate/issues/73). diff --git a/vignettes/mers-korea-2015.Rmd b/vignettes/mers-korea-2015.Rmd new file mode 100644 index 0000000..1cbb504 --- /dev/null +++ b/vignettes/mers-korea-2015.Rmd @@ -0,0 +1,195 @@ +--- +title: "MERS Korea 2015" +output: rmarkdown::html_vignette +vignette: > + %\VignetteIndexEntry{MERS Korea 2015} + %\VignetteEngine{knitr::rmarkdown} + %\VignetteEncoding{UTF-8} +--- + +```{r, include = FALSE} +knitr::opts_chunk$set( + collapse = TRUE, + comment = "#>" +) +``` + +## Overview + +This vignette shows how a public outbreak line list can be reshaped for +`SeverityEstimate`. The example uses the MERS-CoV outbreak in South Korea in +2015 from the `outbreaks` package: +. + +The data is not a perfect match for this package's assumptions. In particular, +the line list does not contain a explicit active/passive surveillance labels and +the denominator for the outbreak was not the full population of South Korea. +Here we use the case-contact data as a transparent surveillance proxy and use +national age-specific population denominators as a simple first pass. + +```{r setup} +options(mc.cores = 1L) +library(SeverityEstimate) +``` + +## Line List + +The `mers_korea_2015` object contains a case line list and a table of probable +case-to-case transmission links. + +```{r load data} +mers <- outbreaks::mers_korea_2015 +linelist_raw <- mers$linelist +contacts <- mers$contacts + +str(linelist_raw) +head(contacts) +``` + +We use reporting week as the model time step because it is complete for all +cases. + +The `outbreaks` line list reports final outcome as `Alive` or `Dead`. For this +model we collapse those to the package's outcome labels: + +- `Alive` becomes `Symptomatic` +- `Dead` becomes `Death` + +This means the model is estimating fatality among observed symptomatic MERS +infections rather than an asymptomatic infection rate. + +```{r format line list} +linelist <- data.frame( + time = linelist_raw$week_report, + age_class = as.character(linelist_raw$age_class), + detection = ifelse( + linelist_raw$id %in% contacts$to, + "Active", + "Passive" + ), + outcome = ifelse(linelist_raw$outcome == "Dead", "Death", "Symptomatic") +) + +head(linelist) +table(linelist$detection, linelist$outcome) +``` + +The detection variable is a proxy. Cases appearing as `contacts$to` are treated +as `Active` because they are linked to a probable source case in the outbreak +investigation data. Cases without a recorded source are treated as `Passive` +because they are closer to unlinked or index presentations. This is not the same +as a validated surveillance field, so fitted estimates should be interpreted as +an exploratory analysis. + +## Population + +The MERS data already provide age classes, which makes age stratification +straightforward. As a simple denominator, we use Republic of Korea 2015 +population estimates from page 58 of the United Nations Demographic Yearbook +2015, Table 7: . + +The UN table reports standard five-year age groups. The population table below +aggregates those into the age classes used by the line list. The youngest line +list stratum is labelled `10-20`; here it is approximated by `10-14 + 15-19`. + +```{r population} +population <- data.frame( + age_class = c( + "10-20", "20-29", "30-39", "40-49", + "50-59", "60-69", "70-79", "80-89" + ), + population = c( + 2475819L + 3175320L, # 10-14 + 15-19 + 3525734L + 3279000L, # 20-24 + 25-29 + 3807021L + 3846466L, # 30-34 + 35-39 + 4239971L + 4225823L, # 40-44 + 45-49 + 4255812L + 3857441L, # 50-54 + 55-59 + 2740743L + 2121186L, # 60-64 + 65-69 + 1721079L + 1371183L, # 70-74 + 75-79 + 859318L + 387911L # 80-84 + 85-89 + ) +) + +population +``` + +These denominators are intentionally broad. A more specific analysis would use +the population at risk in exposed hospitals, health care workers, visitors, +patients, or monitored contacts rather than the national resident population. + +## Model + +We build the model explicitly so that age is treated as an ordered, smoothed +strata. With eight age classes, `degrees_of_freedom = 2L` uses a two-column +polynomial basis for the age effect. Otherwise pretty broad priors are used for +the detection probabilities. + +```{r build model} +age_levels <- population$age_class + +model <- SeverityEstimateModel(linelist, population) |> + set_active_prior(alpha = 1.0, beta = 1.0) |> + set_passive_asymptomatic_prior(alpha = 1.0, beta = 3.0) |> + set_passive_symptomatic_prior(alpha = 3.0, beta = 1.0) |> + set_timesteps("time") |> + set_strata( + "age_class", + levels = age_levels, + degrees_of_freedom = 2L + ) |> + set_detection( + "detection", + map = c("Active" = "active", "Passive" = "passive") + ) |> + set_outcome( + "outcome", + map = c("Symptomatic" = "symptomatic", "Death" = "severe") + ) +``` + +The sampler settings below are intentionally modest so that the vignette remains +reasonable to render. They are enough to exercise the workflow, but they should +be increased and checked with the usual Stan diagnostics before interpreting the +results. + +```{r fit model} +estimate <- fit( + model, + chains = 2L, + iter = 1000L, + seed = 123L, + cores = 2L, + refresh = 0L +) + +summary(estimate) +``` + +After fitting, the usual extractors can be used to inspect surveillance +parameters and age-stratified severity estimates: + +```{r summarize fit} +calculate_parameter_estimates(estimate, alpha = 0.05) + +calculate_fatality_ratio( + estimate, + median_estimate = FALSE, + naive_estimate = TRUE +) +``` + +## Interpretation + +This example is useful as a real-world formatting workflow, but the assumptions +are stronger than in an ideal `SeverityEstimate` analysis: + +- `Active` means contact-linked in the public outbreak data, not a confirmed + active surveillance process. +- `Passive` means unlinked or index-like in the public outbreak data, not + necessarily passively detected in the field. +- `Alive` cases are treated as symptomatic non-fatal cases, so the model is not + learning asymptomatic infection risk from this data set. +- The national age-specific population is used as a pragmatic denominator even + though the outbreak risk was concentrated in health care settings. + +Those limitations should be kept with any estimates produced from this example.