-
Notifications
You must be signed in to change notification settings - Fork 2
Issue 62: write function to summarise by reference time #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #63 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 11 12 +1
Lines 305 334 +29
=========================================
+ Hits 305 334 +29 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approving but noting that there is now a naming collision between get_nowcast_df and get_prob_nowcast_df I think which do different things.
Is there? So the PR to generate_probabilistic_nowcast_df #55 actually creates two functions: I think the larger issue that I ran into when reorganising the methods is that I think we want to clearly define:
If we call the summarised by reference time thing a nowcast, we should clearly distinguish something of that form to the reporting square objects. Would you say a "nowcast" is typically defined as the sum of the elements across delays, and the matrices are described in the more verbose way as say reporting triangle, reporting square, etc.? |
Ah it didn't make it in.
I think this is still quite a confusing conflict as it sounds like the same thing.
What is the difference between these two things? Without context they sound like the same thing?
This is the thing that in literature (i.e.) Johannes paper is what is being talked about in terms of a nowcast. The additional complexity in the above is there are versions of all of these that do and don't include the observed data where it is available and there are use cases for both (most applied users what the data fused versions most of the time with the pure posterior prediction being mostly for model checking). My usual call is to stick to the common bayesian naming scheme but here that won't work due to it not being Bayesian.
I am being a pedant here but I think we want to use aggregate or summed here or something as summarise is probably going to mislead people. I'm not sure this PR is the place to have this discussion so maybe it calls for a meta issue w/ a table or something so we can get an overview and have a think? |
One is summarised by reference time (so just indexed by reference time) e.g. could be a vector, the other is indexed by reference time and report time (a matrix) Yeah I will make a meta-issue with a table to distinguish between these things. However, for this PR I think its reasonable to still change the name if its confusing (which I believe it is currently). What about |
Made a new issue #74 |
Description
This PR closes #62. It builds off of #56 so recommend reviewing that first.
Changes made:
summarise_by_ref_time()
which ingests a long tidy dataframe of draws of reporting squares and produces probabilistic nowcasts as sums across all delaysThe vignette uses the new function and then joins the resulting dataframe of probabilistic nowcasts summarised by reference time with the observed data. Red line is the data as of the nowcast date (July 01, 2021), black line is the data as of 3 months later (October 01, 2021). Gray lines are probabilistic nowcasts, using a maximum delay of 40 and estimating the dispersion from 18 reporting triangles (the package defaults).
## Checklist