Minor performance improvement of async order grid processing #40271
+21
−15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The following should help add a minor performance improvement to async order grid processing, we saw this recently on a high volume store which has a large number of orders and continued to get more orders added at a high throughput.
On this system we had a bug which was causing the redis cache to be purged more frequently than it should have been which caused the symptoms to be more apparent in our case. For all normal usages, this will only really be a problem post deployment or if people are very eagerly flushing the full cache storage.
With a fix like this in place we can stop the query from trying to load data from the entire
sales_ordermore frequently than it ought to (and other associated grid tables).How async order grid processing works
magento2/app/code/Magento/Sales/Model/ResourceModel/Provider/UpdatedAtListProvider.php
Lines 12 to 17 in 182fdaa
The idea is that we avoid populating the order grid tables synchronously during the order placement process, and offload it to a background job. We should then only update the order grid tables for entities that have been recently updated.
We can see that the functionality to get the order IDs to reprocess attaches
$select->where('main_table.updated_at > ?', $lastUpdatedAt);to the query to keep the amount of data being queried to the minimal amountmagento2/app/code/Magento/Sales/Model/ResourceModel/Provider/UpdatedAtListProvider.php
Lines 50 to 63 in 182fdaa
The issue in our case, is that
$this->lastUpdateTimeCache->get($gridTableName);is pulled from a transient storage, which means we can be running more expensive queries on very large tables more often than we actually need to. In the event of a cache miss we have to scan the whole table.How this can be improved
Swapping the storage mechanism from using a cache storage to being held in the database makes sense to me. This is very like how we have
version_idon themview_statetable which gets updated as the mview indexers process through their backlog.I am not certain that we need a whole new table, so I thought about placing this data into the more permanent flag table.
This way even if deployments occur or caches are flushed, we don't lose the pointer to the last updated timestamp.
On small stores this is barely a blip, but when you start approaching millions+ of orders and more constantly streaming in, every little helps.
Manual testing scenarios (*)
For regression testing of the core functionality
bin/magento config:set --lock-env dev/grid/async_indexing 1; php bin/magento app:config:importsales_grid_order_async_insertcronFor manual testing and deep inspection of the queries produced and the process
You can enable async indexing and the db logger for easy analysis of queries.
bin/magento config:set --lock-env dev/grid/async_indexing 1; php bin/magento app:config:import bin/magento dev:query-log:enableYou can place orders properly through the frontend and see them synced over into the grid when the cron runs, but for a quick and dirty look at the process we can spoof things like so.
A query to directly insert a dummy order into
sales_order(on 2.4.8)Initial data looks like
I trigger a
sales_orderinsert using above spoof query.I trigger the cron and inspect the query produced, see that it does not include a timestamp for filtering, as this is the first one.
I inspect the data in the database, i can see my flag persisted and the order grid populating.
I trigger a
sales_orderinsert using above spoof query.I trigger the cron and inspect the query produced, see that the query produced now includes the timestamp for filtering.
I can repeat this process of insert order, run cron, inspect query and see that the timestamp moves along as expected.
Contribution checklist (*)