@@ -301,8 +301,8 @@ growth_rate_recipe <- epi_recipe(
301301  step_epi_lag(case_rate, lag = c(0, 1, 2, 3, 7, 14)) |> 
302302  step_epi_lag(death_rate, lag = c(0, 7, 14)) |> 
303303  step_epi_ahead(death_rate, ahead = 4 * 7) |> 
304-   step_growth_rate(death_rate) |> 
305304  step_epi_naomit() |> 
305+   step_growth_rate(death_rate) |> 
306306  step_training_window() 
307307``` 
308308
@@ -317,7 +317,9 @@ growth_rate_recipe |>
317317    death_rate, gr_7_rel_change_death_rate 
318318  ) 
319319``` 
320+ 
320321And the role:
322+ 
321323``` {r  growth_rate_roles}
322324prepped <- growth_rate_recipe |> 
323325  prep(training_data) 
@@ -329,7 +331,9 @@ To demonstrate the changes in the layers that come along with it, we will use
329331``` {r  layer_and_fit}
330332growth_rate_layers <- frosting() |> 
331333  layer_predict() |> 
332-   layer_quantile_distn(quantile_levels = c(0.1, 0.25, 0.5, 0.75, 0.9)) |> 
334+   layer_quantile_distn( 
335+     quantile_levels = c(0.1, 0.25, 0.5, 0.75, 0.9) 
336+   ) |> 
333337  layer_point_from_distn() |> 
334338  layer_add_forecast_date() |> 
335339  layer_add_target_date() |> 
@@ -429,7 +433,121 @@ which are 2-3 orders of magnitude larger than the corresponding rates above.
429433while ` rate_rescaling `  gives the denominator of the rate (our fit values were
430434per 100,000).
431435
432- [ ^ 1 ] : Think of baking a cake, where adding the frosting is the last step in the process of actually baking.
436+ # Custom classifier workflow  
437+ 
438+ As a more complicated example of the kind of pipeline that you can build using
439+ this framework, here is an example of a hotspot prediction model, which predicts
440+ whether the case rates are increasing (` up ` ), decreasing (` down ` ) or flat
441+ (` flat ` ).
442+ This comes from a paper by McDonald, Bien, Green, Hu et al[ ^ 3 ] , and roughly
443+ serves as an extension of ` arx_classifier() ` .
444+ 
445+ First, we need to add a factor version of the ` geo_value ` , so that it can
446+ actually be used as a feature.
447+ 
448+ ``` {r  training_factor}
449+ training_data <- 
450+   training_data %>% 
451+   mutate(geo_value_factor = as.factor(geo_value)) 
452+ ``` 
453+ 
454+ Then we put together the recipe, using a combination of base ` {recipe} ` 
455+ functions such as ` add_role() `  and ` step_dummy() ` , and ` {epipreict} `  functions
456+ such as ` step_growth_rate() ` .
457+ 
458+ ``` {r  class_recipe}
459+ classifier_recipe <- epi_recipe(training_data) %>% 
460+   add_role(time_value, new_role = "predictor") %>% 
461+   step_dummy(geo_value_factor) %>% 
462+   step_growth_rate(case_rate, role = "none", prefix = "gr_") %>% 
463+   step_epi_lag(starts_with("gr_"), lag = c(0, 7, 14)) %>% 
464+   step_epi_ahead(starts_with("gr_"), ahead = 7, role = "none") %>% 
465+   # note recipes::step_cut() has a bug in it, or we could use that here 
466+   step_mutate( 
467+     response = cut( 
468+       ahead_7_gr_7_rel_change_case_rate, 
469+       breaks = c(-Inf, -0.2, 0.25, Inf) / 7, # division gives weekly not daily 
470+       labels = c("down", "flat", "up") 
471+     ), 
472+     role = "outcome" 
473+   ) %>% 
474+   step_rm(has_role("none"), has_role("raw")) %>% 
475+   step_epi_naomit() 
476+ ``` 
477+ 
478+ 
479+ Roughly, this adds as predictors:
480+ 
481+ 1 .  the time value (via ` add_role() ` )
482+ 2 .  the ` geo_value `  (via ` step_dummy() `  and the ` as.factor() `  above)
483+ 3 .  the growth rate, both at prediction time and lagged by one and two weeks
484+ 
485+ The outcome is created by composing several steps together: ` step_epi_ahead() ` 
486+ creates a column with the growth rate one week into the future, while
487+ ` step_mutate() `  creates a factor with the 3 values:
488+ 
489+ $$ 
490+  Z_{\ell, t}= 
491+     \begin{cases} 
492+       \text{up}, & \text{if}\ Y^{\Delta}_{\ell, t} > 0.25 \\ 
493+       \text{down}, & \text{if}\  Y^{\Delta}_{\ell, t} < -0.20\\ 
494+       \text{flat}, & \text{otherwise} 
495+     \end{cases} 
496+ $$ 
497+ 
498+ where $Y^{\Delta}_ {\ell, t}$ is the growth rate at location $\ell$ and time $t$.
499+ ` up `  means that the ` case_rate `  is has increased by at least 25%, while ` down ` 
500+ means it has decreased by at least 20%.
501+ 
502+ Note that both ` step_growth_rate() `  and ` step_epi_ahead() `  assign the role
503+ ` none `  explicitly; this is because they're used as intermediate steps to create
504+ both predictors and the outcome.
505+ ` step_rm() `  drops them after they're done, along with the original ` raw `  columns
506+ ` death_rate `  and ` case_rate `  (both ` geo_value `  and ` time_value `  are retained
507+ because their roles have been reassigned).
508+ 
509+ 
510+ To fit a classification model like this, we will need to use a parsnip model
511+ with mode classification; the simplest example is ` multinomial_reg() ` .
512+ We don't actually need to apply any layers, so we can skip adding one to the ` epiworkflow() ` :
513+ 
514+ ``` {r,  warning=FALSE}
515+ wf <- epi_workflow( 
516+   classifier_recipe, 
517+   multinom_reg() 
518+ ) %>% 
519+   fit(training_data) 
520+ 
521+ forecast(wf) %>% filter(!is.na(.pred_class)) 
522+ ``` 
523+ 
524+ And comparing the result with the actual growth rates at that point:
525+ ``` {r  growth_rate_results}
526+ growth_rates <- covid_case_death_rates |> 
527+   filter(geo_value %in% used_locations) |> 
528+   group_by(geo_value) |> 
529+   mutate( 
530+     # multiply by 7 to get to weekly equivalents 
531+     case_gr = growth_rate(x = time_value, y = case_rate) * 7 
532+   ) |> 
533+   ungroup() 
534+ 
535+ growth_rates |> filter(time_value == "2021-08-01") 
536+ ``` 
537+ 
538+ So they're all increasing at significantly higher than 25% per week (36%-62%),
539+ which matches the classification.
540+ 
541+ 
542+ See the [ tooling book] ( https://cmu-delphi.github.io/delphi-tooling-book/preprocessing-and-models.html )  for a more in depth discussion of this example.
543+ 
544+ 
545+ [ ^ 1 ] : Think of baking a cake, where adding the frosting is the last step in the
546+     process of actually baking.
433547
434548[ ^ 2 ] : Note that the frosting doesn't require any information about the training
435549    data, since the output of the model only depends on the model used.
550+ 
551+ [ ^ 3 ] : McDonald, Bien, Green, Hu, et al. “Can auxiliary indicators improve
552+     COVID-19 forecasting and hotspot prediction?.” Proceedings of the National
553+     Academy of Sciences 118.51 (2021): e2111453118. doi:10.1073/pnas.2111453118
0 commit comments