Skip to content

Conversation

capnrefsmmat
Copy link
Contributor

This includes the following changes:

I propose we merge this and leave the further changes from #266 for a subsequent release, in combination with #275.

ryantibs and others added 9 commits November 14, 2020 16:04
By providing the `repo` block with a link pointing to
R-packages/covidcast/, pkgdown can build the correct URLs.
Profiling revealed that latest_issue was responsible for a large portion
of the time taken in building correlation-utils.Rmd (apart from
downloading the data). Much of this time was spent in dplyr::filter.

Rather than grouping by geography and time, we can use dplyr::distinct,
knowing that each geo_value and time_value should appear only once per
issue date. By taking the first or last (after sorting by issue date),
we get the desired result.

dplyr does not document algorithmic details, so I can't easily give O(n)
notation here. Algorithmic details notwithstanding, the results are
extraordinary:

> nrow(d)
[1] 203360
> system.time(latest_issue_old(d))
   user  system elapsed
  6.395   0.037   6.465
> system.time(latest_issue(d))
   user  system elapsed
  0.025   0.003   0.027
Always run your tests before pushing...
@capnrefsmmat
Copy link
Contributor Author

Ryan checked the latest_issue stuff in #266, and we're anxious to get the fixed vignettes up, so I'm using Executive Privilege and merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants