Skip to content

feat: enhance MongoDB index checker with detailed sharding guidelines and common patterns#25

Open
mfori wants to merge 2 commits into
mainfrom
chore/sharded-collections-prompt
Open

feat: enhance MongoDB index checker with detailed sharding guidelines and common patterns#25
mfori wants to merge 2 commits into
mainfrom
chore/sharded-collections-prompt

Conversation

@mfori
Copy link
Copy Markdown
Member

@mfori mfori commented May 28, 2026

There are false positives in evaluation of sharding rules, for example here: https://github.com/apify/apify-core/pull/28206#pullrequestreview-4372775926

This should hopefully perform better.

Note: waiting for https://github.com/apify/apify-core/pull/28238

@mfori mfori requested review from mtrunkat, mvolfik and valekjo May 28, 2026 20:09
@mfori mfori self-assigned this May 28, 2026
Copy link
Copy Markdown
Member

@mvolfik mvolfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i have no idea how to review this, nothing here seems factually wrong, don't know what else is expected from a prompt review

@mfori
Copy link
Copy Markdown
Member Author

mfori commented May 29, 2026

nothing here seems factually wrong

I think that's enough and we will see and can modify it later. You're on review mainly to be aware of this :)

Copy link
Copy Markdown
Member

@valekjo valekjo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, some tiny suggestions, and I'm not even very sure about those :D

1. **Multi-shard collections** — the same collection's data is split across multiple physical shards via a shard key. Identify these by reading `src/packages/mongo-connection/src/mongo_connection.ts`: any field typed as `ShardAwareCollection<TSchema, TShardKeys>` is multi-shard. The second generic argument lists the shard-key fields.
2. **Single-shard placement** — the collection lives on a non-default physical shard (no shard key, no chunks across shards, just placed on a different machine). In `mongo_connection.ts` these fields are typed `MovedCollection<TSchema, 'shard-N'>` (the shard tag is the second generic argument); the authoritative placement map is `SHARD_PLACEMENT` in `src/packages/mongo-connection/src/shard_placement.ts`. Collections absent from that map live on the default shard (`shard-0`).

Read those two files to determine each collection's sharding kind. The prompt deliberately doesn't list specific collections — the set evolves over time, and the source files are authoritative.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This seems extra.

Suggested change
Read those two files to determine each collection's sharding kind. The prompt deliberately doesn't list specific collections — the set evolves over time, and the source files are authoritative.
Read those two files to determine each collection's sharding kind. The set evolves over time, and the source files are authoritative.


Flag only when ALL of the following hold:
- The collection is multi-shard (per `mongo_connection.ts`).
- At least one shard-key field is *entirely absent* from the filter object — not just `undefined`, *missing*.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Isn't this already covered by the types?

Flag when these run on a multi-shard collection without `readConcern: 'available'`:
- `.skip(N)` / `$skip: N`. Suggest cursor pagination on the sort key as the better alternative; `readConcern: 'available'` is a fallback. 🟠 high.
- `.countDocuments(...)` on `.rawCollection` or on the underlying `Collection`. Note: `ShardAwareCollection.approximateCountDocuments()` already wraps `readConcern: 'available'` — **do not flag it**. 🟠 high.
- `aggregate` ending in `$count` or with wide `$group` over many keys. 🟠 high.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'm not sure that "wide $group" is specific enough. Maybe "low cardinality"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants