@@ -126,9 +126,9 @@ data.
126126<details >
127127<summary > Creating the dataset using `{epidatr}` and `{epiprocess}` </summary >
128128
129- This dataset can be found in the package as <TODO DOESN'T EXIST> ; we demonstrate
130- some of the typically ubiquitous cleaning operations needed to be able to
131- forecast.
129+ This dataset can be found in the package as ` covid_case_death_rates ` ; we
130+ demonstrate some of the typically ubiquitous cleaning operations needed to be
131+ able to forecast.
132132First we pull both jhu-csse cases and deaths from
133133[ ` {epidatr} ` ] ( https://cmu-delphi.github.io/epidatr/ ) package:
134134
@@ -152,26 +152,34 @@ deaths <- pub_covidcast(
152152 geo_values = "*"
153153) |>
154154 select(geo_value, time_value, death_rate = value)
155+ ```
156+
157+ Since visualizing the results on every geography is somewhat overwhelming,
158+ we'll only train on a subset of 5.
159+ ``` {r date, warning = FALSE}
160+ used_locations <- c("ca", "ma", "ny", "tx")
155161cases_deaths <-
156162 full_join(cases, deaths, by = c("time_value", "geo_value")) |>
163+ filter(geo_value %in% used_locations) |>
157164 as_epi_df(as_of = as.Date("2022-01-01"))
158- plot_locations <- c("ca", "ma", "ny", "tx")
159165# plotting the data as it was downloaded
160166cases_deaths |>
161- filter(geo_value %in% plot_locations) |>
162- pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
163- ggplot(aes(x = time_value, y = value)) +
164- geom_line() +
165- facet_grid(source ~ geo_value, scale = "free") +
167+ autoplot(
168+ case_rate,
169+ death_rate,
170+ .color_by = "none"
171+ ) +
172+ facet_grid(.response_name ~ geo_value, scale = "free") +
166173 scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
167174 theme(axis.text.x = element_text(angle = 90, hjust = 1))
168175```
169176
170177As with basically any dataset, there is some cleaning that we will need to do to
171178make it actually usable; we'll use some utilities from
172- [ ` {epiprocess} ` ] ( https://cmu-delphi.github.io/epiprocess/ ) for this. First, to
173- eliminate some of the noise coming from daily reporting, we do 7 day averaging
174- over a trailing window[ ^ 1 ] :
179+ [ ` {epiprocess} ` ] ( https://cmu-delphi.github.io/epiprocess/ ) for this.
180+
181+ First, to eliminate some of the noise coming from daily reporting, we do 7 day
182+ averaging over a trailing window[ ^ 1 ] :
175183
176184[ ^ 1 ] : This makes it so that any given day of the processed timeseries only
177185 depends on the previous week, which means that we avoid leaking future
@@ -199,10 +207,12 @@ cases_deaths <-
199207 group_by(geo_value) |>
200208 mutate(
201209 outlr_death_rate = detect_outlr_rm(
202- time_value, death_rate, detect_negatives = TRUE
210+ time_value, death_rate,
211+ detect_negatives = TRUE
203212 ),
204213 outlr_case_rate = detect_outlr_rm(
205- time_value, case_rate, detect_negatives = TRUE
214+ time_value, case_rate,
215+ detect_negatives = TRUE
206216 )
207217 ) |>
208218 unnest(cols = starts_with("outlr"), names_sep = "_") |>
@@ -212,7 +222,6 @@ cases_deaths <-
212222 case_rate = outlr_case_rate_replacement
213223 ) |>
214224 select(geo_value, time_value, case_rate, death_rate)
215- cases_deaths
216225```
217226</details >
218227
@@ -224,14 +233,13 @@ of the states, noting the actual forecast date:
224233``` {r plot_locs}
225234forecast_date_label <-
226235 tibble(
227- geo_value = rep(plot_locations , 2),
228- source = c(rep("case_rate", 4), rep("death_rate", 4)),
229- dates = rep(forecast_date - 7 * 2, 2 * length(plot_locations )),
236+ geo_value = rep(used_locations , 2),
237+ .response_name = c(rep("case_rate", 4), rep("death_rate", 4)),
238+ dates = rep(forecast_date - 7 * 2, 2 * length(used_locations )),
230239 heights = c(rep(150, 4), rep(1.0, 4))
231240 )
232241processed_data_plot <-
233242 cases_deaths |>
234- filter(geo_value %in% plot_locations) |>
235243 pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
236244 ggplot(aes(x = time_value, y = value)) +
237245 geom_line() +
@@ -292,36 +300,37 @@ data narrowed somewhat
292300narrow_data_plot <-
293301 cases_deaths |>
294302 filter(time_value > "2021-04-01") |>
295- filter(geo_value %in% plot_locations) |>
296- pivot_longer(cols = c("case_rate", "death_rate"), names_to = "source") |>
297- ggplot(aes(x = time_value, y = value)) +
298- geom_line() +
299- facet_grid(source ~ geo_value, scale = "free") +
303+ autoplot(
304+ case_rate,
305+ death_rate,
306+ .color_by = "none"
307+ ) +
308+ facet_grid(.response_name ~ geo_value, scale = "free") +
300309 geom_vline(aes(xintercept = forecast_date)) +
301310 geom_text(
302311 data = forecast_date_label,
303312 aes(x = dates, label = "forecast\ndate", y = heights),
304313 size = 3, hjust = "right"
305314 ) +
306315 scale_x_date(date_breaks = "3 months", date_labels = "%Y %b") +
307- theme(axis.text.x = element_text(angle = 90, hjust = 1))
316+ theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
317+ ylim(0, NA)
308318```
309319
310320Putting that together with a plot of the bands, and a plot of the median
311321prediction.
312322
313323``` {r plotting_forecast, warning=FALSE}
314324epiworkflow <- four_week_ahead$epi_workflow
325+
315326restricted_predictions <-
316327 four_week_ahead$predictions |>
317- filter(geo_value %in% plot_locations) |>
318328 rename(time_value = target_date, value = .pred) |>
319- mutate(source = "death_rate")
329+ mutate(.response_name = "death_rate")
320330forecast_plot <-
321331 narrow_data_plot |>
322332 epipredict:::plot_bands(
323- restricted_predictions,
324- levels = 0.9
333+ restricted_predictions
325334 ) +
326335 geom_point(
327336 data = restricted_predictions,
@@ -351,5 +360,6 @@ A couple of things to note:
351360If you encounter a bug or have a feature request, feel free to file an [ issue on
352361our github page] ( https://github.com/cmu-delphi/epipredict/issues ) .
353362For other
354- questions, feel free to contact
[ Daniel
] ( [email protected] ) ,
[ David
] ( [email protected] ) ,
[ Dmitry
] ( [email protected] ) , or
355- [ Logan
] ( [email protected] ) , either via email or on the Insightnet slack.
363+ questions, feel free to reach out to the authors, either via this [ contact
364+ form] ( https://docs.google.com/forms/d/e/1FAIpQLScqgT1fKZr5VWBfsaSp-DNaN03aV6EoZU4YljIzHJ1Wl_zmtg/viewform ) ,
365+ email or the Insightnet slack.
0 commit comments