Skip to content

Conversation

@prasjaiswal
Copy link
Collaborator

NOTE: It probably matters very less now, but we cannot batch read cosmos db because the URLs will land up in different partitions. Using same partitions for similar sites etc will run into hotspot issues etc. But we have following significant performance improvement here anyways.

  1. Change SQL based cosmos db lookups to point lookups. Point lookups are way faster because it is using key part of <key, value> system to lookup a value (instead of some field of value). We can achieve this by looking up and storing up hashed URL (raw URL cannot be stored due to special characters)
  2. Parallelize the cosmos db lookup calls. This helps with network latency significantly as the network delays for individual requests overlap.

@prasjaiswal prasjaiswal requested a review from rvguha December 19, 2025 08:44
@rvguha rvguha merged commit fb466d2 into main Dec 19, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants