Skip to content

perf: setQueryData O(n²) performance bottleneck and optimize replaceEqualDeep for large datasets #9392

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

jiji-hoon96
Copy link

Optimized setQueryData performance by eliminating O(n) cache lookups and improving replaceEqualDeep algorithm for large datasets. (Fixes #6783, #1633)

QueryCache Optimization

// Before: O(n) linear search
const query = queryCache.find(options.queryKey)

// After: O(1) hash map access  
const query = queryCache.get(queryHash)

replaceEqualDeep Algorithm Improvements

  • Added early reference equality checks
  • Implemented chunked processing for large arrays
  • Added WeakMap memoization for circular references
  • Optimized object key iteration

Memory Management

  • Reused hash computations
  • Reduced object allocations in hot paths
  • Added cleanup for WeakMap entries

@sukvvon
Copy link
Contributor

sukvvon commented Jul 8, 2025

@jiji-hoon96

https://github.com/TanStack/query/blob/main/CONTRIBUTING.md#commit-message-format

IMO, above docs can help you naming PR title!

@jiji-hoon96 jiji-hoon96 changed the title Perf: optimize query cache and structural sharing perf: setQueryData O(n²) performance bottleneck and optimize replaceEqualDeep for large datasets Jul 8, 2025
@@ -250,11 +250,28 @@ export function partialMatchKey(a: any, b: any): boolean {
* If not, it will replace any deeply equal children of `b` with those of `a`.
* This can be used for structural sharing between JSON values for example.
*/
// WeakMap for memoization to avoid processing the same objects multiple times
const replaceCache = new WeakMap<object, WeakMap<object, any>>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be a misguided point due to my limited understanding, but I have some concerns about using replaceCache as you suggested:

  1. I understand that replaceEqualDeep is a utility function that returns a copy of b, but returns a if a and b are deeply equal.

  2. From what I know, replaceEqualDeep is used when replacing existing data in a query with new data. So, I think it would be rare for existing data to change to different values multiple times.
    For example, if existing data changes from A to B as a result of queryFn, in most cases the reference to the previous A would be lost, making caching meaningless.

  3. As mentioned in point 1, replaceEqualDeep is used to replace existing data in a query with new data. Even if we receive values with the same structure through fetching in most cases, the references of the two values being compared would be different. Therefore, it seems rare to call get(b) again on the weakMap returned from replaceCache.get(a).

  4. I believe utility functions should maintain purity. If multiple queryClients are running, replaceCache operations from one queryClient might affect the behavior of another queryClient. (Of course, based on the current implementation, affecting other clients doesn't seem to cause major issues.) So, for replaceCaching, I think it would be better to create separate replaceCache objects for each queryClient or queryObserver rather than using a utility function.

If my understanding is incorrect or if you have different thoughts, please let me know your feedback :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thoughtful feedback! You’re absolutely right about the purity concerns and global state issues.

However, I believe there’s still value here based on real usage patterns - optimistic updates, form synchronization, and incremental data changes are extremely common and would benefit from this optimization. The performance impact in issue #1633 (500ms → 50ms) demonstrates real user pain points.

Let me revise the approach: remove the global WeakMap and implement QueryClient-scoped memoization as an opt-in feature. This addresses your architectural concerns while still providing performance benefits for users who need them.

What do you think about this compromise?​​​​​​​​​​​​​​​​

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue you mentioned has already been resolved. The setQueryData is already using get instead of find.

(PR that removed find from setQueryData)

If test cases were included, we could verify that performance improvements can be achieved through the code in the PR.

Also, I don't think the WeakMap-based cache would work even with optimistic updates, form synchronization, and incremental data changes.

As I mentioned in my previous comment, it's hard to imagine fetching an object with the same reference value as a discarded object that has already been updated.

(Though I suppose it might be meaningful when applying the same reference-valued object created on the client to replaceEqualDeep multiple times)

@jiji-hoon96
Copy link
Author

@ss0526100 You're absolutely right - the core performance issues were already resolved in PR #6918, and my proposed WeakMap optimizations would add complexity without meaningful real-world benefits given typical object reference patterns.
Closing this PR as the optimization is unnecessary. Thank you for the thorough technical review! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

setQueryData is slow with large datasets
3 participants