Skip to content

Issue 62: write function to summarise by reference time #63

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 2, 2025

Conversation

kaitejohnson
Copy link
Collaborator

@kaitejohnson kaitejohnson commented Mar 24, 2025

Description

This PR closes #62. It builds off of #56 so recommend reviewing that first.

Changes made:

  • adds a function called summarise_by_ref_time() which ingests a long tidy dataframe of draws of reporting squares and produces probabilistic nowcasts as sums across all delays
  • adds tests for this function
  • modifies the "Getting Started" vignette to use this function and produce a figure as described below:

The vignette uses the new function and then joins the resulting dataframe of probabilistic nowcasts summarised by reference time with the observed data. Red line is the data as of the nowcast date (July 01, 2021), black line is the data as of 3 months later (October 01, 2021). Gray lines are probabilistic nowcasts, using a maximum delay of 40 and estimating the dispersion from 18 reporting triangles (the package defaults).
image## Checklist

  • My PR is based on a package issue and I have explicitly linked it.
  • I have included the target issue or issues in the PR title in the for Issue(s) issue-numbers: PR title
  • I have read the contribution guidelines.
  • I have tested my changes locally.
  • I have added or updated unit tests where necessary.
  • I have updated the documentation if required.
  • My code follows the established coding standards.
  • I have added a news item linked to this PR.
  • I have reviewed CI checks for this PR and addressed them as far as I am able.

Copy link

github-actions bot commented Mar 24, 2025

Thank you for your contribution kaitejohnson 🚀! Your website is ready for download 👉 here 👈!
(The artifact expires on 2025-04-06T10:07:26Z. You can re-generate it by re-running the workflow here.)

Copy link

codecov bot commented Mar 24, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (3957643) to head (c50944e).
Report is 9 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #63   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           11        12    +1     
  Lines          305       334   +29     
=========================================
+ Hits           305       334   +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@kaitejohnson kaitejohnson changed the base branch from main to 55-generate-nowcast_df March 24, 2025 13:23
@kaitejohnson kaitejohnson marked this pull request as ready for review March 24, 2025 16:00
seabbs
seabbs previously approved these changes Mar 28, 2025
Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good.

@kaitejohnson kaitejohnson requested a review from seabbs March 28, 2025 15:45
Base automatically changed from 55-generate-nowcast_df to main March 28, 2025 16:26
@seabbs seabbs dismissed their stale review March 28, 2025 16:26

The base branch was changed.

seabbs
seabbs previously approved these changes Mar 28, 2025
Copy link
Collaborator

@seabbs seabbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving but noting that there is now a naming collision between get_nowcast_df and get_prob_nowcast_df I think which do different things.

@seabbs seabbs enabled auto-merge March 28, 2025 18:44
@seabbs seabbs added this pull request to the merge queue Mar 28, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to a conflict with the base branch Mar 28, 2025
@kaitejohnson
Copy link
Collaborator Author

Approving but noting that there is now a naming collision between get_nowcast_df and get_prob_nowcast_df I think which do different things.

Is there? So the PR to generate_probabilistic_nowcast_df #55 actually creates two functions: convert_reporting_square_to_df() and nowcast_list_to_df() (which have the functionality of generating a probabilistic nowcast where a "nowcast" is the full reporting square)

I think the larger issue that I ran into when reorganising the methods is that I think we want to clearly define:

  • reporting square with observed and imputed point estimates (I've been calling this a point nowcast)
  • reporting square with observed and expected observed estimates (I've been calling this an expected observed nowcast)
  • a vector of expected observed total counts at each reference time (this is being generated in this PR, so we are implicitly calling it a nowcast).

If we call the summarised by reference time thing a nowcast, we should clearly distinguish something of that form to the reporting square objects.

Would you say a "nowcast" is typically defined as the sum of the elements across delays, and the matrices are described in the more verbose way as say reporting triangle, reporting square, etc.?

@kaitejohnson kaitejohnson requested a review from seabbs March 31, 2025 08:23
@seabbs
Copy link
Collaborator

seabbs commented Mar 31, 2025

Ah it didn't make it in.

nowcast_list_to_df()

I think this is still quite a confusing conflict as it sounds like the same thing.

reporting square with observed and imputed point estimates (I've been calling this a point nowcast)
reporting square with observed and expected observed estimates (I've been calling this an expected observed nowcast)

What is the difference between these two things? Without context they sound like the same thing?

If we call the summarised by reference time thing a nowcast, we should clearly distinguish something of that form to the reporting square objects.

This is the thing that in literature (i.e.) Johannes paper is what is being talked about in terms of a nowcast.

The additional complexity in the above is there are versions of all of these that do and don't include the observed data where it is available and there are use cases for both (most applied users what the data fused versions most of the time with the pure posterior prediction being mostly for model checking).

My usual call is to stick to the common bayesian naming scheme but here that won't work due to it not being Bayesian.

the summarised by reference time thing a nowcast, w

I am being a pedant here but I think we want to use aggregate or summed here or something as summarise is probably going to mislead people.

I'm not sure this PR is the place to have this discussion so maybe it calls for a meta issue w/ a table or something so we can get an overview and have a think?

seabbs
seabbs previously approved these changes Mar 31, 2025
@kaitejohnson
Copy link
Collaborator Author

What is the difference between these two things? Without context they sound like the same thing?

One is summarised by reference time (so just indexed by reference time) e.g. could be a vector, the other is indexed by reference time and report time (a matrix)

Yeah I will make a meta-issue with a table to distinguish between these things. However, for this PR I think its reasonable to still change the name if its confusing (which I believe it is currently). What about aggregate_by_reference_time() instead of get_nowcast_df()?

@kaitejohnson
Copy link
Collaborator Author

Made a new issue #74

@kaitejohnson
Copy link
Collaborator Author

kaitejohnson commented Apr 1, 2025

Going to change the name to aggregate_by_ref_time()
Latest function flow:
function_flow

@kaitejohnson kaitejohnson requested a review from seabbs April 1, 2025 10:35
@seabbs seabbs added this pull request to the merge queue Apr 2, 2025
Merged via the queue into main with commit 20a97e0 Apr 2, 2025
7 checks passed
@seabbs seabbs deleted the 62-summarise-by-ref-time branch April 2, 2025 16:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Write function to summarise by reference time
2 participants