Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

502 Bad Gateway: Registered endpoint failed to handle the request #122

Open
mm225022 opened this issue Apr 8, 2021 · 16 comments
Open

502 Bad Gateway: Registered endpoint failed to handle the request #122

mm225022 opened this issue Apr 8, 2021 · 16 comments

Comments

@mm225022
Copy link

mm225022 commented Apr 8, 2021

I have been using the API for some time to access the publicly available data.

About a month or so ago, I noticed that I would sometimes get a 502 Bad Gateway response.

However, I have been unable to pull data yesterday (4/7/21); I keep getting a 502 Bad Gateway: Registered endpoint failed to handle the request.

The specific URL that I am trying to hit is:
https://api.gsa.gov/analytics/dap/v1.1/domain/healthcare.gov/reports/domain/data?api_key=MY_API_KEY&after=2019-01-01&limit=10000

I have also tried it with api_key=DEMO_KEY and without the date and limit parameters.

I also tried a different government website and I got the 502 error as well.

Today (4/8/21), I tried again. I hit the domain Healthcare.gov this morning and it retrieved data without a hitch.

However, I then tried cuidadodesalud.gov and it came back with the 502 gateway error. I tried it a second time and it bounced with the 502 error again.

To test, I went back and tried to hit healthcare.gov and this time it also returned the 502 gateway error.

@tdlowden
Copy link
Contributor

tdlowden commented Apr 8, 2021

@ryanwoldatwork tagging you for our discussion later today to possibly troubleshoot this.

@tdlowden
Copy link
Contributor

tdlowden commented Apr 8, 2021

So far, we haven't been able to pinpoint why this is happening. The 502 definitely seems to be a result of a timeout. But we don't understand why some queries are quick to resolve and others not. Trying to round up help to examine more deeply.

@mm225022
Copy link
Author

mm225022 commented Apr 9, 2021

Thanks Tim. I tried again this morning (just now) and am still getting the 502 error. When you are testing on your end, is the resolution time independent of the domain? I would assume so, since it's bouncing off your backend data and no connection to the underlying domain, but I haven't tested against a broad number of different domains.

@mm225022
Copy link
Author

mm225022 commented Apr 9, 2021

I access the API via R. I closed my RStudio session and restarted and so far I have been able to pull the data without a hitch. I don't know if there was something cached or it has to do with the fact that it's now Friday afternoon and traffic on the API is probably extremely low.

@ryanwoldatwork
Copy link
Contributor

ryanwoldatwork commented Apr 10, 2021

Hi @jonathanmburns, thanks for the update.

Please report if this happens again.
An update to the database has been made to improve performance.
The queries are reliably resolving. I expect this to continue, but let us know next week if you experience otherwise.

@mm225022
Copy link
Author

Hi @tdlowden and @ryanwoldatwork

I wanted to let you know that the issue is happening again. I had no problem last week, but I am trying to access the API today and no luck - 502 Bad Gateway issues.

@mm225022
Copy link
Author

mm225022 commented May 3, 2021

Hello @tdlowden and @ryanwoldatwork

I wanted to check in with you. I was wondering if you had a chance to look into this. I am trying again to access the API and getting the 502 error.

Thanks

@tdlowden
Copy link
Contributor

tdlowden commented May 3, 2021

Hi Jonathan. We're still trying to assess this with the resources we have. Apologies for the delay, but it's definitely on the radar. Trying to line up time for someone who can examine it in depth.

@mm225022
Copy link
Author

mm225022 commented Jun 8, 2021

Hi @tdlowden and @ryanwoldatwork
I hope you don't mind; I wanted to check back in on this. I tried accessing the API again today and am still getting the 502 error.

Thanks

@mm225022
Copy link
Author

Hello @tdlowden and @ryanwoldatwork and @nick-jones-gov:

I've noticed @nick-jones-gov has had a lot of activity related to this on posts #161 and #172. I wanted to give you some feedback on my end.
Yesterday, I ran the process. I set it up in a loop; the first time I tried it errored out with the 502 about 25 times until it finally returned data. Today, I ran it 4 times and returned data 4/4.
So, if @nick-jones-gov, if you did implement the changes that you proposed in #172, thank you! It looks like at least from my end they did the trick. The queries still run slow, but at least now I am consistently getting data.

@nick-jones-gov
Copy link
Contributor

Hi @jonathanmburns! Unfortunately, I can't take credit for that - we haven't changed anything in the production environment yet, so I suspect it was just random chance or a case of the database having some of the data you were requesting recently cached. We are hoping to try something out in the next few days (or early next week latest) that should hopefully both reduce the frequency at which you see 502s, and speed up all requests in general. Please let us know if you keep having issues, and if my proposed fix doesn't do it, we'll continue investigating on our end. Thank you!

@tdlowden
Copy link
Contributor

Hi @jonathanmburns. We have some good news, finally! We think we have a solution to improve performance on the API and therefore eliminate the errors, thanks to what @nick-jones-gov has done in #172. The only hiccup is that when running it locally, we found certain reports no longer list the data in perfect metric descending order (because of the way datapoints are assigned ids). The endpoint is still sorted by date (recent to oldest), but within the date, there may be variation in the sort order.

From our perspective, it is more important to have a functioning API than one that sorts in the same fashion as it initially did, so we intend to make this merge a week from today (I am going to notify others via email). We just wanted to give you a heads up so that you wouldn't be caught off guard next week.

I am also adding to our backlog to create a more long-term fix that will restore sort by date, then value, but I don't have a timetable for when we'll be able to get that done, unfortunately.

@mm225022
Copy link
Author

Thanks @tdlowden. That is great news!
I certainly agree with your prioritization. I'm not so concerned about the potential issue with the sort order; it's good to have that heads up but I can obviously can sort the data on my own once I've retrieved it.
I appreciate the work that you and @ryanwoldatwork and @nick-jones-gov have done on this issue!
Thanks again

@nick-jones-gov
Copy link
Contributor

Hi @jonathanmburns - we merged a change this morning that should, in general, speed up most of the API endpoints. The endpoints that filter by domain (like you're using for healthcare.gov) won't see as drastic of a performance increase, but should still be faster than before. There are still longer term things we hope to do to improve the performance in general - but I'm not sure about the timeline there.

In the meantime, you will likely see faster response times if you use a smaller limit - I see that you specified limit=10000, I'm guessing so that you can fetch all the data you need at once. Alternatively, you could specify limit=100, and use the page parameter (explained in more detail in the docs here) to page through the results. In testing on my laptop, I see this endpoint respond in ~5 seconds: https://api.gsa.gov/analytics/dap/v1.1/domain/healthcare.gov/reports/domain/data\?api_key\=DEMO_KEY1\&after\=2019-01-01\&limit\=100

Hope this helps!

@mm225022
Copy link
Author

mm225022 commented Aug 2, 2021

Hi @nick-jones-gov . Thanks for all your work on this.
Unfortunately, though, this is not working for me. The 502 error is gone, but now I am getting alternatively a 504 error or a 400 error "An error occurred. Please check the application logs.". I tried querying numerous times, both through the code I am using in R and directly in Firefox. I even rebooted my machine; after I rebooted it, I am no longer getting the 504 error but just the 400 errors.
I have tried setting the limit to be limit=100, but that hasn't helped.
image
Interestingly, I went to the DAP API page and I tried a couple of the sample reports (download and domain) and those loaded lightening fast. I also tried from that page a filtering by agency (HHS) and that also loaded extremely quickly.
After trying again with my API KEY, I went back to the DAP API page and tried the domain sample report again. This time, I got the 400 error again.
Any ideas you have would be appreciated but, unfortunately, I appear to be back at square one.

@nick-jones-gov
Copy link
Contributor

Thanks @jonathanmburns - my guess is the behavior you're seeing isn't necessarily due to the API key being different, but more random chance when you were testing (i.e. the database happened to be overloaded at the time you used your API key vs the public one on the DAP API page). But obviously this is still an issue - I'll try to look into it more later today or tomorrow, sorry that our change didn't fix things for you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants