-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate slow endpoints / performance #161
Comments
(Adding some notes from investigation I did the past few days) I looked into the 502s specifically, pulling logs from cloud.gov - there doesn't seem to be any obvious pattern to which endpoints return 502s, suggesting that it's a widespread problem (and not specific to a report type or an agency). While we don't have New Relic setup to track DB queries (because of an issue with NR and Knex, our query library - see: #158), the app code for these endpoints is not super complicated - so it seems very likely that any performance issues are coming from the database. We can see what queries the app generates by running the app locally with
if we format it nicely, the. query looks like:
and
(note that this endpoint and query return nothing, because my local DB has no data in it) |
The DB's schema looks like this (the result of running the 3 migrations in the
Notably, there is a multi-column index on ( Two things that stick out when looking back at the DB query from above (copied here):
|
My initial thought was that the
The main thing to note is that the outermost blocks deal with sorting, not with satisfying the It turns out that the index on
The query plans/analyses for that query, and the same one (but specifying
Postgres is sorting the entire table here! Which is, unsurprisingly, extremely slow and resource intensive, and therefore the query takes a very very long time (491 seconds, or a little over 8 minutes) just to fetch the two most recent rows (as defined by the
With This clearly helps a lot! But the query used by the API is more complicated, and just adding the |
If the only thing we replace in the query is using
if we simplify the query a little bit by removing the
To speed this up, I think we need the |
One (relatively simple) option would be to change the second part of the
It's important that we have some stable ordering, because users can paginate through results by specifying the Assuming that the ordering by total_events and visits is arbitrary (I don't see anything in the API docs that suggest that ordering matters), we could instead order by ( To speed up that ordering, we'd add a new multi-column index on @tdlowden - do you have thoughts on the impact of changing the ordering of results returned to be arbitrary (beyond the date)? I can explore adding a different index that would keep the ordering the same as it works today, but I think that may get a bit messier because the |
Fixed this for all domain queries except the downloads query by adding an index on the jsonb column. Added a new ticket for the domain download query as it's a little different and needs a different implementation |
As part of looking into #122 and looking at metrics in New Relic, it appears that many standard queries (i.e. ones that are run as part of handling the core 3 endpoints) are much slower than they should be. There are seemingly appropriate indexes in place, but requests still seem a lot slower than they should be.
The 502s reported in the issue linked above are likely just the worst case scenarios for queries that are slow in general; so if we can figure out why those are slow in general, we'll likely also address the 502s issue.
The text was updated successfully, but these errors were encountered: