2085 add proportions nhsn #2111

aysim319 · 2025-01-28T19:50:43Z

Description

address #2085

Changelog

added new function that checks the metadata for last update
-> the function also checks for 503 error
-> added signals for total reporting hospitals

Associated Issue(s)

Addresses add retry for 50x error for scorata api and also throw error when that happens #2091

nmdefries

Are you planning on adding the new RSV signals in this PR or in a separate one?

nhsn/delphi_nhsn/constants.py

nmdefries · 2025-02-04T15:13:43Z

nhsn/delphi_nhsn/pull.py

+    updated_timestamp = datetime.utcfromtimestamp(int(response["rowsUpdatedAt"]))
+    now = datetime.utcnow()
+    recently_updated = (now - updated_timestamp) < timedelta(days=1)


issue: I think this "recently-updated" logic is sufficient but not robust. For example, if we fail to pull data for multiple days, the next day we run we would not pull data we had never seen before if it was not posted in the last day.

The more robust solution would be to save last pull's updated_timestamp to a local file. We would then load that and compare updated_timestamp to that -- if exactly equal, skip update; if unequal, pull data.

definitely makes sense and something I didn't think about! The only thing I did different was use the api instead of scanning the file since I imagine the file list is going to go and doesn't make much sense to scan the file list every day

Yeah, checking the API could make sense, too. The one thing I'd caution is timezones -- your previous approach explicitly used UTC on both "old" and "now" timestamps, but I don't know what the API uses.

Second, the API only has dates, not times. Would that ever cause problems? E.g. we want to check for updates multiple times a day.

Yeah, checking the API could make sense, too. The one thing I'd caution is timezones -- your previous
approach explicitly used UTC on both "old" and "now" timestamps, but I don't know what the API uses

since the data and the dates are just date and not datetime, I didn't take timezones into account....hmm i also don't know for sure which timezone, i believe it's EST, but have to double check

the API only has dates, not times. Would that ever cause problems? E.g. we want to check for updates multiple times a day.

since this is data that generally updates weekly, I was planning on just running once a day, so I thought timezone wouldn't be as much of an issue

Okay, given these complications, I'm thinking reading/writing to a file is easier. We wouldn't need to keep a complete list of all update datetimes ever, just the single most recent datetime. So the file wouldn't keep getting bigger and bigger, we could just read a single line.

This lets us store a UTC date (no timezones to worry about), no API date-processing to worry about, and we can store a datetime to be extra precise.

I wasn't a fan of have metadata files, seems overkill / introduce more complexity than I would like, so after talking things through with Nolan just now, I decided to simplify the logic and create backups daily, but still do simple check to see recently updated to actually continue processing and create the csv files, so if there are outages that happened after the initial pulls, we can go back and do patches for them.

Nolan also mentioned that for the future, we could look into creating a generic tool/script to dedup things specifically and I like that direction since it would seperate the complexity away from this code base

nhsn/delphi_nhsn/pull.py

aysim319 · 2025-02-04T15:35:48Z

Are you planning on adding the new RSV signals in this PR or in a separate one?

I was originally planning for a seperate one. I was considering adding the new signal in this PR, but when I tried to add just the new columns and locally ran the tests and ran across issues and looks like it might be more involved, so i think it'd be better to create a seperate one. I also kinda shoved in other issues (retry and daily checking) and didn't want to add more things on top

…for patching

nhsn/delphi_nhsn/pull.py

nolangormley

Some small questions

nolangormley · 2025-02-07T18:18:28Z

nhsn/delphi_nhsn/pull.py

-        df = df.astype(TYPE_DICT)
+        try:
+            df = df.astype(TYPE_DICT)
+        except KeyError:


Why does this just pass?

The idea was that some of the older data didn't have newer signals (rsv, reporting hospitals) in the source back up it would log, but resume the patching process.

Previously I tried modify TYPE_DICT, but being mutable caused some issue in patching runs. So this was the next solution...is it a good one ehhh....I log that that the signal is unavailable eariler (line 150) and I thought I shouldn't log basically the same message twice

Since you brought up...I should also check if the rest of the columns actually changed data types and maybe look into a less janky way

nolangormley · 2025-02-07T18:19:54Z

nhsn/delphi_nhsn/pull.py

+            logger.info(f"{prelim_prefix}NHSN data is stale; Skipping", updated_timestamp=updated_timestamp)
+    # pylint: disable=W0703
+    except Exception as e:
+        logger.info("error while processing socrata metadata; continuing processing", error=str(e))


It seems like this function would return true if it encounters an error?

Yeah because it's just checking if it's stale or not, if we already have the data I don't want this to be the reason that the indicator stops. This was a check to limit duplicating data

nolangormley · 2025-02-07T20:40:40Z

nhsn/delphi_nhsn/pull.py

+            time.sleep(2 + random.randint(0, 1000) / 1000.0)
+            page = client.get(dataset_id, limit=limit, offset=offset)
+        else:
+            raise err


this should probably log as well

Ah yup, I missed that;

aysim319 added 7 commits January 16, 2025 16:21

initial implimentation for proportion

ff91c4c

in progress

5ef99b2

check for update in progress

6b19402

merged with main

ad92262

adding checking updates in progress

f4b3c40

adding just num reporting hospital

1df478c

tests and undoing proportion signal code

6e5a99b

aysim319 linked an issue Jan 28, 2025 that may be closed by this pull request

add retry for 50x error for scorata api and also throw error when that happens #2091

Open

aysim319 added 3 commits January 29, 2025 09:37

lint

7cabd8a

fixed test

6a73c35

fix test part 2

6e0d4c2

aysim319 requested review from nolangormley and nmdefries February 3, 2025 19:35

nmdefries reviewed Feb 4, 2025

View reviewed changes

aysim319 added 4 commits February 4, 2025 14:53

fixed bugs related to patching added test for missing signal columns …

2da6c08

…for patching

suggestion

1e408ba

missed fix for test data

77662dc

fixed test

783ab24

aysim319 requested a review from nmdefries February 5, 2025 15:57

nolangormley requested changes Feb 5, 2025

View reviewed changes

nhsn/delphi_nhsn/pull.py Outdated Show resolved Hide resolved

suggested change

76d5436

aysim319 commented Feb 5, 2025

View reviewed changes

nhsn/delphi_nhsn/pull.py Outdated Show resolved Hide resolved

aysim319 requested a review from nolangormley February 5, 2025 20:19

aysim319 added 4 commits February 6, 2025 10:43

changed logic to be cleaner; always create backups

18de943

lint

e3e96bf

wrapped in try block

e9bb0a7

retrigger jenkin build

33f3db5

nolangormley requested changes Feb 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2085 add proportions nhsn #2111

2085 add proportions nhsn #2111

aysim319 commented Jan 28, 2025

nmdefries left a comment

nmdefries Feb 4, 2025

aysim319 Feb 4, 2025 •

edited

Loading

nmdefries Feb 5, 2025

aysim319 Feb 5, 2025 •

edited

Loading

nmdefries Feb 5, 2025 •

edited

Loading

aysim319 Feb 6, 2025 •

edited

Loading

aysim319 commented Feb 4, 2025 •

edited

Loading

nolangormley left a comment

nolangormley Feb 7, 2025

aysim319 Feb 7, 2025 •

edited

Loading

nolangormley Feb 7, 2025

aysim319 Feb 7, 2025

nolangormley Feb 7, 2025

aysim319 Feb 7, 2025

2085 add proportions nhsn #2111

Are you sure you want to change the base?

2085 add proportions nhsn #2111

Conversation

aysim319 commented Jan 28, 2025

Description

Changelog

Associated Issue(s)

nmdefries left a comment

Choose a reason for hiding this comment

nmdefries Feb 4, 2025

Choose a reason for hiding this comment

aysim319 Feb 4, 2025 • edited Loading

Choose a reason for hiding this comment

nmdefries Feb 5, 2025

Choose a reason for hiding this comment

aysim319 Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

nmdefries Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

aysim319 Feb 6, 2025 • edited Loading

Choose a reason for hiding this comment

aysim319 commented Feb 4, 2025 • edited Loading

nolangormley left a comment

Choose a reason for hiding this comment

nolangormley Feb 7, 2025

Choose a reason for hiding this comment

aysim319 Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

nolangormley Feb 7, 2025

Choose a reason for hiding this comment

aysim319 Feb 7, 2025

Choose a reason for hiding this comment

nolangormley Feb 7, 2025

Choose a reason for hiding this comment

aysim319 Feb 7, 2025

Choose a reason for hiding this comment

aysim319 Feb 4, 2025 •

edited

Loading

aysim319 Feb 5, 2025 •

edited

Loading

nmdefries Feb 5, 2025 •

edited

Loading

aysim319 Feb 6, 2025 •

edited

Loading

aysim319 commented Feb 4, 2025 •

edited

Loading

aysim319 Feb 7, 2025 •

edited

Loading