Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions DESCRIPTION
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ Suggests:
ggplot2,
knitr,
lobstr,
outbreaks,
pkgdown,
posterior,
rmarkdown,
Expand Down
2 changes: 1 addition & 1 deletion NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ fit_result
# post-warmup draws per chain=250, total post-warmup draws=500.
# ...
```
- Added a `getting-started` vignette demonstrating the default model, from synthetic data generation through model fitting and result extraction as well as a `model-explainer` vignette to do a deep dive into the model. [#78](https://github.com/ACCIDDA/SeverityEstimate/issues/78), [#107](https://github.com/ACCIDDA/SeverityEstimate/issues/107).
- Added a `getting-started` vignette demonstrating the default model, from synthetic data generation through model fitting and result extraction as well as a `model-explainer` vignette to do a deep dive into the model. Also added a `mers-korea-2015` vignette to highlight how to use this package on a "real world" data set. [#78](https://github.com/ACCIDDA/SeverityEstimate/issues/78), [#107](https://github.com/ACCIDDA/SeverityEstimate/issues/107), [#115](https://github.com/ACCIDDA/SeverityEstimate/issues/115).
- Setup `pkgdown` site hosted on GitHub pages at [accidda.github.io/SeverityEstimate/](https://accidda.github.io/SeverityEstimate/). [#28](https://github.com/ACCIDDA/SeverityEstimate/issues/28).
- Enhanced documentation elements, such as the `README.md`, `CONTRIBUTING.md` to add more helpful content to the package's documentation site. [#3](https://github.com/ACCIDDA/SeverityEstimate/issues/3), [#53](https://github.com/ACCIDDA/SeverityEstimate/issues/53).
- Restructured package utilizing [`rstantools`](https://mc-stan.org/rstantools/index.html). [#73](https://github.com/ACCIDDA/SeverityEstimate/issues/73).
Expand Down
195 changes: 195 additions & 0 deletions vignettes/mers-korea-2015.Rmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,195 @@
---
title: "MERS Korea 2015"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{MERS Korea 2015}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>"
)
```

## Overview

This vignette shows how a public outbreak line list can be reshaped for
`SeverityEstimate`. The example uses the MERS-CoV outbreak in South Korea in
2015 from the `outbreaks` package:
<https://www.reconverse.org/outbreaks/reference/mers_korea_2015.html>.

The data is not a perfect match for this package's assumptions. In particular,
the line list does not contain a explicit active/passive surveillance labels and
the denominator for the outbreak was not the full population of South Korea.
Here we use the case-contact data as a transparent surveillance proxy and use
national age-specific population denominators as a simple first pass.

```{r setup}
options(mc.cores = 1L)
library(SeverityEstimate)
```

## Line List

The `mers_korea_2015` object contains a case line list and a table of probable
case-to-case transmission links.

```{r load data}
mers <- outbreaks::mers_korea_2015
linelist_raw <- mers$linelist
contacts <- mers$contacts

str(linelist_raw)
head(contacts)
```

We use reporting week as the model time step because it is complete for all
cases.

The `outbreaks` line list reports final outcome as `Alive` or `Dead`. For this
model we collapse those to the package's outcome labels:

- `Alive` becomes `Symptomatic`
- `Dead` becomes `Death`

This means the model is estimating fatality among observed symptomatic MERS
infections rather than an asymptomatic infection rate.

```{r format line list}
linelist <- data.frame(
time = linelist_raw$week_report,
age_class = as.character(linelist_raw$age_class),
detection = ifelse(
linelist_raw$id %in% contacts$to,
"Active",
"Passive"
),
outcome = ifelse(linelist_raw$outcome == "Dead", "Death", "Symptomatic")
)

head(linelist)
table(linelist$detection, linelist$outcome)
```

The detection variable is a proxy. Cases appearing as `contacts$to` are treated
as `Active` because they are linked to a probable source case in the outbreak
investigation data. Cases without a recorded source are treated as `Passive`
because they are closer to unlinked or index presentations. This is not the same
as a validated surveillance field, so fitted estimates should be interpreted as
an exploratory analysis.

## Population

The MERS data already provide age classes, which makes age stratification
straightforward. As a simple denominator, we use Republic of Korea 2015
population estimates from page 58 of the United Nations Demographic Yearbook
2015, Table 7: <https://unstats.un.org/unsd/demographic-social/products/dyb/documents/dyb2015/table07.pdf>.

The UN table reports standard five-year age groups. The population table below
aggregates those into the age classes used by the line list. The youngest line
list stratum is labelled `10-20`; here it is approximated by `10-14 + 15-19`.

```{r population}
population <- data.frame(
age_class = c(
"10-20", "20-29", "30-39", "40-49",
"50-59", "60-69", "70-79", "80-89"
),
population = c(
2475819L + 3175320L, # 10-14 + 15-19
3525734L + 3279000L, # 20-24 + 25-29
3807021L + 3846466L, # 30-34 + 35-39
4239971L + 4225823L, # 40-44 + 45-49
4255812L + 3857441L, # 50-54 + 55-59
2740743L + 2121186L, # 60-64 + 65-69
1721079L + 1371183L, # 70-74 + 75-79
859318L + 387911L # 80-84 + 85-89
)
)

population
```

These denominators are intentionally broad. A more specific analysis would use
the population at risk in exposed hospitals, health care workers, visitors,
patients, or monitored contacts rather than the national resident population.

## Model

We build the model explicitly so that age is treated as an ordered, smoothed
strata. With eight age classes, `degrees_of_freedom = 2L` uses a two-column
polynomial basis for the age effect. Otherwise pretty broad priors are used for
the detection probabilities.

```{r build model}
age_levels <- population$age_class

model <- SeverityEstimateModel(linelist, population) |>
set_active_prior(alpha = 1.0, beta = 1.0) |>
set_passive_asymptomatic_prior(alpha = 1.0, beta = 3.0) |>
set_passive_symptomatic_prior(alpha = 3.0, beta = 1.0) |>
set_timesteps("time") |>
set_strata(
"age_class",
levels = age_levels,
degrees_of_freedom = 2L
) |>
set_detection(
"detection",
map = c("Active" = "active", "Passive" = "passive")
) |>
set_outcome(
"outcome",
map = c("Symptomatic" = "symptomatic", "Death" = "severe")
)
```

The sampler settings below are intentionally modest so that the vignette remains
reasonable to render. They are enough to exercise the workflow, but they should
be increased and checked with the usual Stan diagnostics before interpreting the
results.

```{r fit model}
estimate <- fit(
model,
chains = 2L,
iter = 1000L,
seed = 123L,
cores = 2L,
refresh = 0L
)

summary(estimate)
```

After fitting, the usual extractors can be used to inspect surveillance
parameters and age-stratified severity estimates:

```{r summarize fit}
calculate_parameter_estimates(estimate, alpha = 0.05)

calculate_fatality_ratio(
estimate,
median_estimate = FALSE,
naive_estimate = TRUE
)
```

## Interpretation

This example is useful as a real-world formatting workflow, but the assumptions
are stronger than in an ideal `SeverityEstimate` analysis:

- `Active` means contact-linked in the public outbreak data, not a confirmed
active surveillance process.
- `Passive` means unlinked or index-like in the public outbreak data, not
necessarily passively detected in the field.
- `Alive` cases are treated as symptomatic non-fatal cases, so the model is not
learning asymptomatic infection risk from this data set.
- The national age-specific population is used as a pragmatic denominator even
though the outbreak risk was concentrated in health care settings.

Those limitations should be kept with any estimates produced from this example.