using epidata metadata to determine stalenes for nhsn pulling #2142

aysim319 · 2025-04-07T13:40:04Z

Description

using metadata api to determine staleness

Associated Issue(s)

Addresses nhsn skips run if data updated after scheduled run #2137

nhsn/tests/test_pull.py

minhkhul

LGTM. One nit was the utcfromtimestamp thing but everything else looks good.

melange396

in theory, i really like the idea of using our own metadata, but in practice, its going to have problems that make it unworkable. The biggest is that there is an indeterminate delay between data getting inserted into the database and when the metadata is updated to include that data (the delay is bounded, but long enough to throw a wrench in this). Another is that patches will affect the metadata, thereby disrupting scheduling.

melange396 · 2025-06-25T21:33:20Z

nhsn/delphi_nhsn/pull.py

+        est = timezone(timedelta(hours=-5))
+        last_updated = datetime.fromtimestamp(nhsn_meta_df["last_update"].min(), tz=est)


this is gonna have issues because of DST changes, however this timestamp should be for UTC already.

it shouldnt make too much of a difference because the probability of it biting us should be rare, but i think youll also want a max instead of a min (in case we change signal names or discontinue signals, among other things).

Suggested change

est = timezone(timedelta(hours=-5))

last_updated = datetime.fromtimestamp(nhsn_meta_df["last_update"].min(), tz=est)

last_updated = datetime.fromtimestamp(nhsn_meta_df["last_update"].max(), tz=timezone.utc)

melange396 · 2025-06-25T22:05:36Z

nhsn/delphi_nhsn/pull.py

+        last_updated = datetime.fromtimestamp(nhsn_meta_df["last_update"].min(), tz=est)
+
+        # currently set to run twice a week, RECENTLY_UPDATED_DIFF may need adjusting based on the cadence
+        recently_updated_source = (updated_timestamp - last_updated) > RECENTLY_UPDATED_DIFF


i dont think this math is quite right... why wouldnt we want to proceed any time socrata has a newer timestamp than we do? the form you have here has the potential to delay processing if updates are frequent enough or if RECENTLY_UPDATED_DIFF is too large.

Suggested change

recently_updated_source = (updated_timestamp - last_updated) > RECENTLY_UPDATED_DIFF

socrata_ts = updated_timestamp

delphi_ts = last_updated

recently_updated_source = socrata_ts > delphi_ts

moved time diff into constants and extended to 36 hours

44974ac

aysim319 requested review from nolangormley and minhkhul April 7, 2025 13:40

added comment

7b419b0

minhkhul reviewed Apr 8, 2025

View reviewed changes

nhsn/tests/test_pull.py Outdated Show resolved Hide resolved

minhkhul approved these changes Apr 8, 2025

View reviewed changes

aysim319 added 2 commits April 9, 2025 13:48

working on converting into api calls instead of time diff

2f2f63a

finished converting raw time diff to using api

8b6a0e6

aysim319 changed the title ~~moved time diff into constants and extended to 36 hours~~ using epidata metadata to determine stalenes for nhsn pulling Apr 9, 2025

aysim319 requested a review from melange396 April 9, 2025 21:54

forgot to mock in test_run

40d6cd2

aysim319 requested a review from minhkhul April 9, 2025 21:58

forgot to mock in other tests in test_pull

9c9b550

melange396 reviewed Jun 25, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

using epidata metadata to determine stalenes for nhsn pulling #2142

using epidata metadata to determine stalenes for nhsn pulling #2142

Uh oh!

aysim319 commented Apr 7, 2025 •

edited

Loading

Uh oh!

Uh oh!

minhkhul left a comment

Uh oh!

melange396 left a comment

Uh oh!

melange396 Jun 25, 2025

Uh oh!

melange396 Jun 25, 2025

Uh oh!

Uh oh!

		est = timezone(timedelta(hours=-5))
		last_updated = datetime.fromtimestamp(nhsn_meta_df["last_update"].min(), tz=est)

-        recently_updated_source = (updated_timestamp - last_updated) > RECENTLY_UPDATED_DIFF
+        socrata_ts = updated_timestamp
+        delphi_ts = last_updated
+        recently_updated_source = socrata_ts > delphi_ts

using epidata metadata to determine stalenes for nhsn pulling #2142

Are you sure you want to change the base?

using epidata metadata to determine stalenes for nhsn pulling #2142

Uh oh!

Conversation

aysim319 commented Apr 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Associated Issue(s)

Uh oh!

Uh oh!

minhkhul left a comment

Choose a reason for hiding this comment

Uh oh!

melange396 left a comment

Choose a reason for hiding this comment

Uh oh!

melange396 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

melange396 Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aysim319 commented Apr 7, 2025 •

edited

Loading