-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add requests checking and retry to solar flare fetch #92
Conversation
This comes up because I just went to put together an FSDS for the new replan central page and realized the image at bottom was just broken. |
Couple of ideas for discussion: Since the image only updates once a day, it seems like the cached image can be retained if the creation date of the file is less than 24 hours old. If the site goes down for a few hours there is no need to invalidate the cache completely. For the (apparently common) case that the site is not responding, we don't need the logs cluttered with full tracebacks. It is probably sufficient to print some short benign message to the logs. People will notice if the image disappears for long stretches (and anyway there is nothing we can really do to fix the site being down). About making this job run asynchronously, that is basically happening now with the entire |
Though since the displayed image file is a separate copy not in the cache, invalidating the cache / deleting the cached image is benign and the right thing to do if we've moved on to a new day and don't have the file. |
Sure, so should we just include that handling in main to not spit out the traceback? And should this do the retry or not? I figured the retry would likely work for remote server hiccups. |
Good point. |
Also regarding "For the (apparently common) case that the site is not responding" - I have no information that it is common - just that the code in master did not invalidate the cached image (and instead created a broken one) on a failure and would have blithely served the failure text instead of the image for the rest of the day. |
Looks OK but you need to update the functional testing. My other question is whether the "Failed" and "Error" is going to trigger jobwatch emails / spam. I don't remember where things stand with the hourly job watch, but I don't think we need to be getting hourly messages if this particular image is not available. |
I at least did the mangle_alert_words in the retry (to remove the number of lines with printed "real" warnings) and redid the functional testing. With regard to jobwatch emails/spam, if this fails all attempts the last failure to get the page or image is and would be caught by the arc3 watch_cron_logs (to send an email once a day I think) and the daily log checking in jobwatch for replan central is also going to be red (if we don't add an exception). We shouldn't get hourly messages from jobwatch because the hourly watch just checks a subset of replan central outputs and this new image is not in the list. This might be the right level of spam for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Description
Add requests checking and retry to solar flare fetch
Fixes issue/bug in #90 that a failed fetch will create a cached "image" file that is the http error response like
Functional testing
I locally edited the script to change the value of img_url to a non-existent remote file, and confirmed that