Add vignette to compare impact functions

r-ash · r-ash · commit d56c8bedaf97 · 2021-10-11T17:32:47.000+01:00
diff --git a/.gitignore b/.gitignore
@@ -3,3 +3,5 @@ docs
 .Renviron
 .Rhistory
 .DS_Store
+vignettes_src/internal-impact.md
+vignettes_src/impact_comparisons.html
diff --git a/Makefile b/Makefile
@@ -36,4 +36,12 @@ pkgdown:
 website: pkgdown
 	./scripts/update_web.sh
 
+vignettes/internal-impact.Rmd: vignettes_src/internal-impact.Rmd
+	./scripts/build_impact_vignette
+
+vignettes: vignettes/using-vimpact.Rmd vignettes/vignette.Rmd
+	${RSCRIPT} -e 'tools::buildVignettes(dir = ".")'
+	mkdir -p inst/doc
+	cp vignettes/*.html vignettes/*.Rmd inst/doc
+
 .PHONY: all test document install vignettes
diff --git a/scripts/build_impact_vignette b/scripts/build_impact_vignette
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+# We can't run the internal-impact vignette on a computer that doesn't
+# have access to montagu DB (including CI) so we pre-compile it
+# ocassinally here. The version in vignettes_src should be edited,
+# then this script run to create the version in vignettes
+(cd vignettes_src && Rscript -e 'knitr::knit("internal-impact.Rmd")')
+header="DO NOT EDIT THIS FILE - see vignettes_src and make changes there"
+widget=' \
+```{r} \
+htmltools::tags$iframe( \
+  src = "impact_comparisons.html", \
+  width = "100%", \
+  height = "400", \
+  scrolling = "no", \
+  seamless = "seamless", \
+  frameBorder = "0", \
+  `data-external` = "1" \
+)\
+```'
+sed -s 's/[[:space:]]*$//' vignettes_src/internal-impact.md |
+    sed 's/\r//g' |
+    sed "s/HEADER/$header/" |
+    sed "s/<!-- WIDGET -->/$widget/" > vignettes/internal-impact.Rmd
+cp vignettes_src/impact_comparisons.html vignettes/impact_comparisons.html
diff --git a/vignettes/internal-impact.Rmd b/vignettes/internal-impact.Rmd
@@ -0,0 +1,274 @@
+---
+title: "Using vimpact for estimating vaccine impact - internal"
+author: "Rob Ashton"
+date: "2021-10-11"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Using vimpact for estimating vaccine impact - internal}
+  %\VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+<!-- DO NOT EDIT THIS FILE - see vignettes_src and make changes there -->
+
+
+
+This vignette describes how to use vimpact to calculate impact as a member of VIMC. This requires a connection to the montagu database so can only be used internally. Note that this is all in development and the interface is likely to change.
+
+## Impact by calendar year & impact by birth year
+
+### Function interface
+
+
+```r
+impact <- vimpact::calculate_impact(
+  con, method = "calendar_year", touchstone = "201710gavi-5",
+  modelling_group = "CDA-Razavi",  disease = "HepB",
+  focal_scenario_type = "default", focal_vaccine_delivery = list(
+    list(vaccine = "HepB_BD", activity_type = "routine"),
+    list(vaccine = "HepB", activity_type = "routine")
+  ),
+  baseline_scenario_type = "novac",
+  burden_outcomes = c("hepb_deaths_acute", "hepb_deaths_dec_cirrh",
+                      "hepb_deaths_hcc"))
+str(impact)
+#> tibble [9,292 × 4] (S3: tbl_df/tbl/data.frame)
+#>  $ country       : chr [1:9292] "AFG" "AFG" "AFG" "AFG" ...
+#>  $ burden_outcome: chr [1:9292] "deaths" "deaths" "deaths" "deaths" ...
+#>  $ year          : int [1:9292] 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
+#>  $ impact        : num [1:9292] 0 0 0.01 0 0 ...
+```
+
+Or for impact by birth year, use `method = "birth_year"`. This will get `burden_estimate_set` ids for the baseline and focal scenarios for this particular touchstone, modelling group, disease. Then uses those to pull the `burden_estimate` data for specified `burden_outcomes`. Optionally filtering on country via `countries` arg, year via `vaccination_years` and under 5 age groups if `is_under5 = TRUE`. We then use raw impact from `burden_estimate` table and call relevant public facing impact method. For `method = "calendar_year"` `impact_by_calendar_year`. For `method = "birth_year"` `impact_by_birth_year`.
+
+
+
+### Recipe interface
+
+Define a recipe either as a csv or using `recipe_template`
+
+
+```r
+recipe <- data.frame(
+  touchstone = "201710gavi-5",
+  modelling_group = "CDA-Razavi", disease = "HepB",
+  focal = "default:HepB_BD-routine;HepB-routine",
+  baseline = "novac",
+  burden_outcome = "hepb_deaths_acute,hepb_deaths_dec_cirrh,hepb_deaths_hcc;hepb_cases_acute_severe,hepb_cases_dec_cirrh,hepb_cases_hcc")
+t <- tempfile(fileext = ".csv")
+write.csv(recipe, t, row.names = FALSE)
+```
+
+This is a set of properties defining what data we want to extract from the db e.g. `touchstone`, `modelling_group`, `disease`, focal and baseline scenarios and `burden_outcome`. It captures the same info as the args to `calculate_impact` above. Then use the recipe to define meta data frame
+
+
+```r
+meta <- vimpact:::get_meta_from_recipe(default_recipe = FALSE, recipe = t, con = con)
+```
+
+And use this to calculate impact
+
+
+```r
+old_impact <- vimpact:::get_raw_impact_details(con, meta, "deaths")
+str(old_impact)
+#> 'data.frame':	9292 obs. of  7 variables:
+#>  $ country       : int  104 104 104 104 104 104 104 104 104 104 ...
+#>  $ time          : int  2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 ...
+#>  $ baseline_value: num  12592 12763 12922 13107 13297 ...
+#>  $ focal_value   : num  12592 12763 12922 13106 13293 ...
+#>  $ value         : num  0 0 0 0.7 3.8 ...
+#>  $ index         : int  1 1 1 1 1 1 1 1 1 1 ...
+#>  $ burden_outcome: chr  "deaths" "deaths" "deaths" "deaths" ...
+```
+
+## Impact by year of vaccination
+
+### Function interface
+
+Very similar to examples for calendar year and birth year, to calculate impact by year of vaccination stratified by activity type run
+
+
+```r
+impact <- vimpact::calculate_impact(
+  con, method = "yov_activity_type", touchstone = "201710gavi-5",
+  modelling_group = "CDA-Razavi",  disease = "HepB",
+  focal_scenario_type = "default", focal_vaccine_delivery = list(
+    list(vaccine = "HepB_BD", activity_type = "routine"),
+    list(vaccine = "HepB", activity_type = "routine")
+  ),
+  baseline_scenario_type = "novac",
+  burden_outcomes = "dalys")
+str(impact)
+#> tibble [4,077 × 6] (S3: tbl_df/tbl/data.frame)
+#>  $ country       : chr [1:4077] "AFG" "AFG" "AFG" "AFG" ...
+#>  $ vaccine       : chr [1:4077] "HepB" "HepB" "HepB" "HepB" ...
+#>  $ activity_type : chr [1:4077] "routine" "routine" "routine" "routine" ...
+#>  $ year          : int [1:4077] 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 ...
+#>  $ burden_outcome: chr [1:4077] "dalys" "dalys" "dalys" "dalys" ...
+#>  $ impact        : num [1:4077] 116637 119565 118335 124335 126887 ...
+```
+
+or use `method = "yov_birth_cohort` for impact by year of vaccination stratified by birth cohort. This uses the same logic as other impact methods but before calling the public facing impact function it will also extract FVP data from the data base for this `touchstone` and vaccine delivery. It then calls either `impact_by_year_of_vaccination_activity_type` or `impact_by_year_of_vaccination_birth_cohort` to get impact.
+
+### Recipe interface
+
+We can re-use the recipe from above and get meta table for this method
+
+
+```r
+meta <- vimpact:::get_meta_from_recipe(default_recipe = FALSE, recipe = t,
+                                      method = "method2a", con = con)
+```
+
+Then use this to get raw impact
+
+
+```r
+old_raw_impact <- vimpact:::get_raw_impact_details(con, meta, "dalys")
+```
+
+Extract the fvp data and map output to required interface
+
+
+```r
+fvps <- vimpact::extract_vaccination_history(con, "201710gavi-5", year_min = 2000,
+                                             year_max = 2030,
+                                             disease_to_extract = "HepB")
+#> User defined touchstone version is used.
+#> Converting input coverage data......
+#> Extracted interpolated population.
+#> Extracted raw coverage data...
+#> Transformed coverage data.
+fvps$fvps <- fvps$fvps_adjusted
+fvps$country <- fvps$country_nid
+```
+
+Then can use `meta`, `raw_impact` and `fvps` to calculate impact by year of vaccination
+
+
+```r
+old_impact <- vimpact:::impact_by_year_of_vaccination(
+  meta, old_raw_impact, fvps, vaccination_years = 2000:2030)
+str(old_impact)
+#> 'data.frame':	4256 obs. of  27 variables:
+#>  $ country             : int  4 4 4 4 4 4 4 4 4 4 ...
+#>  $ vaccine             : chr  "HepB_BD" "HepB" "HepB" "HepB_BD" ...
+#>  $ activity_type       : chr  "routine" "routine" "routine" "routine" ...
+#>  $ scenario_description: chr  "best-estimates" "best-estimates" "best-estimates" "best-estimates" ...
+#>  $ coverage_set        : int  620 619 619 620 619 620 620 620 620 620 ...
+#>  $ delivery_id         : int  3666 1138 1910 3665 1909 3652 3663 3664 3653 3661 ...
+#>  $ disease             : chr  "HepB" "HepB" "HepB" "HepB" ...
+#>  $ scenario_type       : chr  "default" "default" "default" "default" ...
+#>  $ gavi_support_level  : chr  "with" "with" "with" "with" ...
+#>  $ year                : int  2028 2009 2030 2027 2029 2014 2025 2026 2015 2023 ...
+#>  $ gavi_support        : logi  TRUE TRUE TRUE TRUE TRUE FALSE ...
+#>  $ gender              : chr  "Both" "Both" "Both" "Both" ...
+#>  $ age                 : num  0 0 0 0 0 0 0 0 0 0 ...
+#>  $ target_source       : num  1123235 1065815 1137944 1113858 1132169 ...
+#>  $ coverage_source     : num  0.644 0.63 0.779 0.614 0.769 ...
+#>  $ cohort_size         : num  1123235 1065815 1137944 1113858 1132169 ...
+#>  $ delivery_population : num  1123235 1065815 1137944 1113858 1132169 ...
+#>  $ fvps_source         : num  723476 671463 886576 684020 870755 ...
+#>  $ fvps_adjusted       : num  723476 671463 886576 684020 870755 ...
+#>  $ coverage_adjusted   : num  0.644 0.63 0.779 0.614 0.769 ...
+#>  $ country_nid         : int  4 4 4 4 4 4 4 4 4 4 ...
+#>  $ fvps                : num  723476 671463 886576 684020 870755 ...
+#>  $ time                : num  2028 2009 2030 2027 2029 ...
+#>  $ burden_outcome      : chr  "dalys" "dalys" "dalys" "dalys" ...
+#>  $ impact_ratio        : num  0.176 0.176 0.176 0.176 0.176 ...
+#>  $ impact              : num  127502 118335 156245 120548 153457 ...
+#>  $ index               : int  1 1 1 1 1 1 1 1 1 1 ...
+```
+
+## Comparison
+
+`calculate_impact` has a very similar interface as a single row in an impact recipe. The output is slightly different format but output from previous method can be transformed into same format.
+
+
+```r
+country <- dplyr::tbl(con, "country")
+old_impact <- old_impact %>%
+  dplyr::left_join(country, by = c("country" = "nid"), copy = TRUE) %>%
+  dplyr::select(country = id, vaccine, activity_type, year = time,
+                burden_outcome, impact) %>%
+  dplyr::filter(!is.na(impact)) %>%
+  dplyr::arrange(activity_type, country, year, vaccine)
+str(old_impact)
+#> 'data.frame':	4077 obs. of  6 variables:
+#>  $ country       : chr  "AFG" "AFG" "AFG" "AFG" ...
+#>  $ vaccine       : chr  "HepB" "HepB" "HepB" "HepB" ...
+#>  $ activity_type : chr  "routine" "routine" "routine" "routine" ...
+#>  $ year          : num  2007 2008 2009 2010 2011 ...
+#>  $ burden_outcome: chr  "dalys" "dalys" "dalys" "dalys" ...
+#>  $ impact        : num  116637 119565 118335 124336 126887 ...
+```
+
+and we can see from a plot that the output is very similar
+
+
+```r
+impact_plot <- merge(old_impact, impact, all.x = TRUE, all.y = TRUE,
+                     by = c("country", "vaccine", "activity_type", "year",
+                            "burden_outcome"))
+countries <- unique(impact_plot$country)
+impact_plot <- impact_plot %>%
+  dplyr::rename(recipe_impact = impact.x, function_impact = impact.y) %>%
+  tidyr::pivot_wider(names_from = country,
+                     values_from = c(recipe_impact, function_impact))
+impact_plot <- as.data.frame(impact_plot)
+countries_filter <- lapply(unique(countries), function(country) {
+  button <- list(
+    method = "restyle",
+    args = list("y", c(
+      list(impact_plot[, paste0("recipe_impact_", country)]),
+      list(impact_plot[, paste0("function_impact_", country)]))),
+    label = country
+  )
+})
+plot <- plotly::plot_ly(data = impact_plot) %>%
+  plotly::add_trace(x = ~year, y = ~recipe_impact_AFG, name = "recipe",
+                    type = "scatter", mode = "markers", text = ~vaccine,
+                    marker = list(
+                      color = "rgb(235, 204, 42)",
+                      size = 10
+                    )) %>%
+  plotly::add_trace(x = ~year, y = ~function_impact_AFG, name = "function",
+                    type = "scatter", mode = "markers", text = ~vaccine,
+                    marker = list(
+                      color = "rgb(60, 154, 178)",
+                      symbol = "circle-open",
+                      size = 10,
+                      line = list(
+                        color = "rgb(60, 154, 178)",
+                        width = 2
+                      )
+                    )) %>%
+  plotly::layout(
+    yaxis = list(title = "impact"),
+    updatemenus = list(
+      list(
+        y = 0.7,
+        buttons = countries_filter
+      )
+    ))
+```
+
+
+
+ 
+```{r} 
+htmltools::tags$iframe( 
+  src = "impact_comparisons.html", 
+  width = "100%", 
+  height = "400", 
+  scrolling = "no", 
+  seamless = "seamless", 
+  frameBorder = "0", 
+  `data-external` = "1" 
+)
+```
+
+There are small differences in impact because of differences in precision. `calculate_impact` tries to do large parts of the aggregation on the database where `burden_estimate` values are stored as Postgres `real` type which has 4 bytes of storage size and 6 decimal digits of precision. Whereas if we pull this before aggregation the values are stored in R as doubles which are much higher precision leading to small differences when aggregated.
+
+`calculate_impact` at the moment won't be able to generate impact for you for more than 1 set of scenario comparisons. Next steps will be to add a wrapper which can take some data like the impact recipe and call `calculate_imapct` for multiple scenarios.
diff --git a/vignettes/vignette.Rmd b/vignettes/vignette.Rmd
@@ -4,7 +4,7 @@ author: "Xiang Li"
 date: "`r Sys.Date()`"
 output: rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{vignette}
+  %\VignetteIndexEntry{Using vimpact for estimating vaccine impact - VIMC members}
   %\VignetteEngine{knitr::rmarkdown}
   %\VignetteEncoding{UTF-8}
 ---
diff --git a/vignettes_src/internal-impact.Rmd b/vignettes_src/internal-impact.Rmd
@@ -168,11 +168,24 @@ countries_filter <- lapply(unique(countries), function(country) {
     label = country
   )
 })
-plotly::plot_ly(data = impact_plot) %>%
+plot <- plotly::plot_ly(data = impact_plot) %>%
   plotly::add_trace(x = ~year, y = ~recipe_impact_AFG, name = "recipe", 
-                    type = "scatter", mode = "markers", text = ~vaccine) %>%
+                    type = "scatter", mode = "markers", text = ~vaccine,
+                    marker = list(
+                      color = "rgb(235, 204, 42)",
+                      size = 10
+                    )) %>%
   plotly::add_trace(x = ~year, y = ~function_impact_AFG, name = "function",
-                    type = "scatter", mode = "markers", text = ~vaccine) %>%
+                    type = "scatter", mode = "markers", text = ~vaccine,
+                    marker = list(
+                      color = "rgb(60, 154, 178)",
+                      symbol = "circle-open",
+                      size = 10,
+                      line = list(
+                        color = "rgb(60, 154, 178)",
+                        width = 2
+                      )
+                    )) %>%
   plotly::layout(
     yaxis = list(title = "impact"),
     updatemenus = list(
@@ -183,6 +196,13 @@ plotly::plot_ly(data = impact_plot) %>%
     ))
 ```
 
+```{r include = FALSE}
+htmlwidgets::saveWidget(plotly::partial_bundle(plot), 
+                        "impact_comparisons.html")
+```
+
+<!-- WIDGET -->
+
 There are small differences in impact because of differences in precision. `calculate_impact` tries to do large parts of the aggregation on the database where `burden_estimate` values are stored as Postgres `real` type which has 4 bytes of storage size and 6 decimal digits of precision. Whereas if we pull this before aggregation the values are stored in R as doubles which are much higher precision leading to small differences when aggregated. 
 
 `calculate_impact` at the moment won't be able to generate impact for you for more than 1 set of scenario comparisons. Next steps will be to add a wrapper which can take some data like the impact recipe and call `calculate_imapct` for multiple scenarios.