Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add requests checking and retry to solar flare fetch #92

Merged
merged 3 commits into from
Mar 4, 2025

Conversation

jeanconn
Copy link
Contributor

@jeanconn jeanconn commented Mar 3, 2025

Description

Add requests checking and retry to solar flare fetch

Fixes issue/bug in #90 that a failed fetch will create a cached "image" file that is the http error response like

jeanconn-fido> more AR_CH_20250302.png 
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
<title>404 Not Found</title>
<style type="text/css"><!--
.style3 {
  font-family: Verdana, Arial, Helvetica, sans-serif;
  color: #CC0000;
}
.s {
  font-family: Verdana, Arial, Helvetica, sans-serif;
  font-size: 11px;
  font-weight: normal;
  color: #000000;
  line-height: 18px;
  text-align: center;
  border: 1px solid #CCCCCC;
  background-color: #FFFFEC;
}
body {
  background-color: #FFFFFF;
  margin-top: 100px;
}
--></style>
</head>
<body>
<div align="center">
<h2><span class="style3">404 Not Found</span></h2>
<table border="0" cellpadding="8" cellspacing="0" width="460">
<tbody><tr><td class="s">The requested URL was not found on this server.</td></tr></tbody>
</table>
</div>
</body>
</html>

Functional testing

I locally edited the script to change the value of img_url to a non-existent remote file, and confirmed that

  1. the previously cached image was deleted/invalidated
  2. the retry tried to get the file as requested
  3. no new file was output into the cache directory
(ska3) flame:arc jean$ python get_solar_flare_png.py --image-cache-dir myretest --out-file new.png
WARN1NG: get_last_referenced_web_image(url=https://www.solen.info/solar/index.html, img_src_pattern=<img src=\"(images/AR_CH_\d{8}\.png)\", cache_dir=myretest) excepti0n: 404 Client Err0r: Not Found for url: https://www.solen.info/solar/images/AR_CH_20250303.png_broken, retrying in 5 seconds...
WARN1NG: get_last_referenced_web_image(url=https://www.solen.info/solar/index.html, img_src_pattern=<img src=\"(images/AR_CH_\d{8}\.png)\", cache_dir=myretest) excepti0n: 404 Client Err0r: Not Found for url: https://www.solen.info/solar/images/AR_CH_20250303.png_broken, retrying in 5 seconds...
Failed to get page or image: 404 Client Error: Not Found for url: https://www.solen.info/solar/images/AR_CH_20250303.png_broken

@jeanconn jeanconn requested a review from taldcroft March 3, 2025 15:25
@jeanconn
Copy link
Contributor Author

jeanconn commented Mar 3, 2025

This comes up because I just went to put together an FSDS for the new replan central page and realized the image at bottom was just broken.

@taldcroft
Copy link
Member

Couple of ideas for discussion:

Since the image only updates once a day, it seems like the cached image can be retained if the creation date of the file is less than 24 hours old. If the site goes down for a few hours there is no need to invalidate the cache completely.

For the (apparently common) case that the site is not responding, we don't need the logs cluttered with full tracebacks. It is probably sufficient to print some short benign message to the logs. People will notice if the image disappears for long stretches (and anyway there is nothing we can really do to fix the site being down).

About making this job run asynchronously, that is basically happening now with the entire main() in a try/except block.

@jeanconn
Copy link
Contributor Author

jeanconn commented Mar 3, 2025

If the site goes down for a few hours there is no need to invalidate the cache completely.

Though since the displayed image file is a separate copy not in the cache, invalidating the cache / deleting the cached image is benign and the right thing to do if we've moved on to a new day and don't have the file.

@jeanconn
Copy link
Contributor Author

jeanconn commented Mar 3, 2025

For the (apparently common) case that the site is not responding, we don't need the logs cluttered with full tracebacks.

Sure, so should we just include that handling in main to not spit out the traceback? And should this do the retry or not? I figured the retry would likely work for remote server hiccups.

@jeanconn
Copy link
Contributor Author

jeanconn commented Mar 3, 2025

About making this job run asynchronously, that is basically happening now with the entire main() in a try/except block.

Good point.

@jeanconn
Copy link
Contributor Author

jeanconn commented Mar 3, 2025

Also regarding "For the (apparently common) case that the site is not responding" - I have no information that it is common - just that the code in master did not invalidate the cached image (and instead created a broken one) on a failure and would have blithely served the failure text instead of the image for the rest of the day.

@taldcroft
Copy link
Member

Looks OK but you need to update the functional testing.

My other question is whether the "Failed" and "Error" is going to trigger jobwatch emails / spam. I don't remember where things stand with the hourly job watch, but I don't think we need to be getting hourly messages if this particular image is not available.

@jeanconn
Copy link
Contributor Author

jeanconn commented Mar 4, 2025

I at least did the mangle_alert_words in the retry (to remove the number of lines with printed "real" warnings) and redid the functional testing.

With regard to jobwatch emails/spam, if this fails all attempts the last failure to get the page or image is and would be caught by the arc3 watch_cron_logs (to send an email once a day I think) and the daily log checking in jobwatch for replan central is also going to be red (if we don't add an exception). We shouldn't get hourly messages from jobwatch because the hourly watch just checks a subset of replan central outputs and this new image is not in the list.

This might be the right level of spam for now.

Copy link
Member

@taldcroft taldcroft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@jeanconn jeanconn merged commit 0eed3e6 into master Mar 4, 2025
2 checks passed
@jeanconn jeanconn deleted the solar-flare-patch branch March 4, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants