@@ -301,8 +301,8 @@ growth_rate_recipe <- epi_recipe(
301301 step_epi_lag(case_rate, lag = c(0, 1, 2, 3, 7, 14)) |>
302302 step_epi_lag(death_rate, lag = c(0, 7, 14)) |>
303303 step_epi_ahead(death_rate, ahead = 4 * 7) |>
304- step_growth_rate(death_rate) |>
305304 step_epi_naomit() |>
305+ step_growth_rate(death_rate) |>
306306 step_training_window()
307307```
308308
@@ -317,7 +317,9 @@ growth_rate_recipe |>
317317 death_rate, gr_7_rel_change_death_rate
318318 )
319319```
320+
320321And the role:
322+
321323``` {r growth_rate_roles}
322324prepped <- growth_rate_recipe |>
323325 prep(training_data)
@@ -329,7 +331,9 @@ To demonstrate the changes in the layers that come along with it, we will use
329331``` {r layer_and_fit}
330332growth_rate_layers <- frosting() |>
331333 layer_predict() |>
332- layer_quantile_distn(quantile_levels = c(0.1, 0.25, 0.5, 0.75, 0.9)) |>
334+ layer_quantile_distn(
335+ quantile_levels = c(0.1, 0.25, 0.5, 0.75, 0.9)
336+ ) |>
333337 layer_point_from_distn() |>
334338 layer_add_forecast_date() |>
335339 layer_add_target_date() |>
@@ -429,7 +433,121 @@ which are 2-3 orders of magnitude larger than the corresponding rates above.
429433while ` rate_rescaling ` gives the denominator of the rate (our fit values were
430434per 100,000).
431435
432- [ ^ 1 ] : Think of baking a cake, where adding the frosting is the last step in the process of actually baking.
436+ # Custom classifier workflow
437+
438+ As a more complicated example of the kind of pipeline that you can build using
439+ this framework, here is an example of a hotspot prediction model, which predicts
440+ whether the case rates are increasing (` up ` ), decreasing (` down ` ) or flat
441+ (` flat ` ).
442+ This comes from a paper by McDonald, Bien, Green, Hu et al[ ^ 3 ] , and roughly
443+ serves as an extension of ` arx_classifier() ` .
444+
445+ First, we need to add a factor version of the ` geo_value ` , so that it can
446+ actually be used as a feature.
447+
448+ ``` {r training_factor}
449+ training_data <-
450+ training_data %>%
451+ mutate(geo_value_factor = as.factor(geo_value))
452+ ```
453+
454+ Then we put together the recipe, using a combination of base ` {recipe} `
455+ functions such as ` add_role() ` and ` step_dummy() ` , and ` {epipreict} ` functions
456+ such as ` step_growth_rate() ` .
457+
458+ ``` {r class_recipe}
459+ classifier_recipe <- epi_recipe(training_data) %>%
460+ add_role(time_value, new_role = "predictor") %>%
461+ step_dummy(geo_value_factor) %>%
462+ step_growth_rate(case_rate, role = "none", prefix = "gr_") %>%
463+ step_epi_lag(starts_with("gr_"), lag = c(0, 7, 14)) %>%
464+ step_epi_ahead(starts_with("gr_"), ahead = 7, role = "none") %>%
465+ # note recipes::step_cut() has a bug in it, or we could use that here
466+ step_mutate(
467+ response = cut(
468+ ahead_7_gr_7_rel_change_case_rate,
469+ breaks = c(-Inf, -0.2, 0.25, Inf) / 7, # division gives weekly not daily
470+ labels = c("down", "flat", "up")
471+ ),
472+ role = "outcome"
473+ ) %>%
474+ step_rm(has_role("none"), has_role("raw")) %>%
475+ step_epi_naomit()
476+ ```
477+
478+
479+ Roughly, this adds as predictors:
480+
481+ 1 . the time value (via ` add_role() ` )
482+ 2 . the ` geo_value ` (via ` step_dummy() ` and the ` as.factor() ` above)
483+ 3 . the growth rate, both at prediction time and lagged by one and two weeks
484+
485+ The outcome is created by composing several steps together: ` step_epi_ahead() `
486+ creates a column with the growth rate one week into the future, while
487+ ` step_mutate() ` creates a factor with the 3 values:
488+
489+ $$
490+ Z_{\ell, t}=
491+ \begin{cases}
492+ \text{up}, & \text{if}\ Y^{\Delta}_{\ell, t} > 0.25 \\
493+ \text{down}, & \text{if}\ Y^{\Delta}_{\ell, t} < -0.20\\
494+ \text{flat}, & \text{otherwise}
495+ \end{cases}
496+ $$
497+
498+ where $Y^{\Delta}_ {\ell, t}$ is the growth rate at location $\ell$ and time $t$.
499+ ` up ` means that the ` case_rate ` is has increased by at least 25%, while ` down `
500+ means it has decreased by at least 20%.
501+
502+ Note that both ` step_growth_rate() ` and ` step_epi_ahead() ` assign the role
503+ ` none ` explicitly; this is because they're used as intermediate steps to create
504+ both predictors and the outcome.
505+ ` step_rm() ` drops them after they're done, along with the original ` raw ` columns
506+ ` death_rate ` and ` case_rate ` (both ` geo_value ` and ` time_value ` are retained
507+ because their roles have been reassigned).
508+
509+
510+ To fit a classification model like this, we will need to use a parsnip model
511+ with mode classification; the simplest example is ` multinomial_reg() ` .
512+ We don't actually need to apply any layers, so we can skip adding one to the ` epiworkflow() ` :
513+
514+ ``` {r, warning=FALSE}
515+ wf <- epi_workflow(
516+ classifier_recipe,
517+ multinom_reg()
518+ ) %>%
519+ fit(training_data)
520+
521+ forecast(wf) %>% filter(!is.na(.pred_class))
522+ ```
523+
524+ And comparing the result with the actual growth rates at that point:
525+ ``` {r growth_rate_results}
526+ growth_rates <- covid_case_death_rates |>
527+ filter(geo_value %in% used_locations) |>
528+ group_by(geo_value) |>
529+ mutate(
530+ # multiply by 7 to get to weekly equivalents
531+ case_gr = growth_rate(x = time_value, y = case_rate) * 7
532+ ) |>
533+ ungroup()
534+
535+ growth_rates |> filter(time_value == "2021-08-01")
536+ ```
537+
538+ So they're all increasing at significantly higher than 25% per week (36%-62%),
539+ which matches the classification.
540+
541+
542+ See the [ tooling book] ( https://cmu-delphi.github.io/delphi-tooling-book/preprocessing-and-models.html ) for a more in depth discussion of this example.
543+
544+
545+ [ ^ 1 ] : Think of baking a cake, where adding the frosting is the last step in the
546+ process of actually baking.
433547
434548[ ^ 2 ] : Note that the frosting doesn't require any information about the training
435549 data, since the output of the model only depends on the model used.
550+
551+ [ ^ 3 ] : McDonald, Bien, Green, Hu, et al. “Can auxiliary indicators improve
552+ COVID-19 forecasting and hotspot prediction?.” Proceedings of the National
553+ Academy of Sciences 118.51 (2021): e2111453118. doi:10.1073/pnas.2111453118
0 commit comments