Change order by clause, add multi-column index to speed up queries #172
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Current state
There is a bunch of discussion around this here: #161
The tl;dr is that all endpoints are extremely slow right now. Based on the investigation above, I feel pretty confident that it's an issue with DB queries being extremely slow. According to New Relic, in the last week:
get /v1.1/domain/:domain/reports/:reportName/data
averages 112 seconds response timeget /v1.1/agencies/:reportAgency/reports/:reportName/data
averages 7 seconds response timeget /v1.1/reports/:reportName/data
averages 4 seconds response timeThe diagnosing of the source of the slowness is mostly documented in the issue linked to above. The ordering used in all queries now is:
The tl;dr is that this ordering isn't able to use any existing indexes, and there's no way (from what I could find) to add a multi-column index that includes specific JSON fields (i.e. the
data->>'total_events' and
data->>'visits'` fields).Proposed changes
order by
clause to useid
as a secondary sort key (afterdate
)ORDER BY date desc NULLS LAST, id
clauseNote that changing the
order by
clause can alter the ordering of data from what happens today. In practice, it seems like the data order won't change, because data happens to be inserted into the database in decreasing order byvisits
ortotal_events
.I am open to other ideas of addressing these performance issues, but this seemed to be the most promising. In the longer term, I would suggest we consider moving data out of the generic
data
JSON column into specific columns for each field (i.e.total_events
andvisits
should each have their own columns).Note that this change is reversible; if for any reason we want to go back to using the previous
ORDER BY
clause, we can do that with no problem.