perf: reduce DB bandwidth ~1.5 TB/day#572
Conversation
…and query rewrites Phase 1: Add `by_moderation` compound index and rewrite 8 cron query functions to use `.withIndex()` instead of `.filter().collect()` full table scans (~6 GB/day saved). Phase 2: Denormalize `isSuspicious` onto skills table with 6 compound indexes so `listPublicPageV2` can filter at the index level instead of paginating the entire table (~1 TB/day saved). Includes backfill mutation and write-path updates across all moderation mutations. Phase 3: Add `latestVersionSummary` denormalization to avoid reading full ~6.4 KB `skillVersions` docs on list pages (~500 GB/day saved). Phase 4: Split trending leaderboard query to one day at a time to stay under 32K doc limit. Reduce global stats recount from hourly to daily since delta tracking handles real-time accuracy (~400 MB/day saved). Phase 5: Add "Convex Query & Bandwidth Rules" section to AGENTS.md. Backfill commands (run after deploy): bunx convex run maintenance:backfillIsSuspiciousInternal bunx convex run maintenance:backfillLatestVersionSummaryInternal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@sethconvex is attempting to deploy a commit to the Amantus Machina Team on Vercel. A member of the Team first needs to authorize it. |
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 53f901a40e
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
convex/skills.ts
Outdated
| // Graceful backfill handling: during backfill, isSuspicious is undefined for | ||
| // un-patched docs, which won't match eq('isSuspicious', false). If the indexed | ||
| // query returns an unexpectedly empty first page, retry with the old path + JS filter. | ||
| if (useNonsuspiciousIndex && filteredPage.length === 0 && !result.isDone && !initialCursor) { |
There was a problem hiding this comment.
Remove impossible guard from non-suspicious fallback
The backfill fallback for nonSuspiciousOnly is gated by !result.isDone, but when the new eq('isSuspicious', false) index has no rows (the common pre-backfill state), Convex returns page=[] with isDone=true. That means this branch never runs and listPublicPageV2 returns an empty page even though many clean skills still exist with isSuspicious unset, which can blank first-page non-suspicious views until the backfill completes.
Useful? React with 👍 / 👎.
| const latestVersion = hasSummary | ||
| ? toPublicSkillListVersionFromSummary(skill.latestVersionSummary!, skill.latestVersionId) | ||
| : toPublicSkillListVersion(latestVersionDoc) |
There was a problem hiding this comment.
Keep version summary and latestVersionId consistent
This path now prefers skill.latestVersionSummary over loading skill.latestVersionId, but other mutations can retarget latestVersionId without rewriting the summary (for example updateTags in convex/skills.ts updates latestVersionId only). After a retag of latest, list responses can therefore return _id for one version while version/changelog still come from the previous summary.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in e352123 — updateTags now writes latestVersionSummary when the latest tag is repointed to a different version.
convex/skills.ts
Outdated
| .withIndex('by_moderation', (q) => | ||
| q.eq('moderationStatus', 'active').eq('moderationReason', 'scanner.vt.pending'), | ||
| ) | ||
| .take(limit), |
There was a problem hiding this comment.
Avoid pre-limiting stale-reason candidates before VT check
Each moderation-reason bucket is truncated with .take(limit) before verifying whether the referenced version has vtAnalysis, so the query can return fewer than limit even when additional stale rows exist beyond that first slice. Because syncModerationReasons treats a short page as completion, this can cause the sync job to stop early and leave stale moderation reasons unsynced.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in e352123 — each bucket now takes limit * 2 to account for rows filtered out by the VT check.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Greptile SummaryThis PR delivers a multi-phase bandwidth reduction (~1.5 TB/day) across the Convex backend by converting full-table scans to indexed queries, denormalizing two computed fields ( Key changes:
Confidence Score: 4/5
Last reviewed commit: 417012d |
| if (!latestVersionId) return null | ||
| return { | ||
| _id: latestVersionId, | ||
| _creationTime: summary.createdAt, |
There was a problem hiding this comment.
_creationTime sourced from createdAt, not the real Convex field
_creationTime is Convex's internal insertion timestamp (set automatically by the DB), while createdAt is the application-set field stored in latestVersionSummary. In practice these are nearly identical since both are set to now during the same transaction, but they are semantically distinct fields — toPublicSkillListVersion correctly reads version._creationTime, whereas this path substitutes summary.createdAt.
| _creationTime: summary.createdAt, | |
| _creationTime: summary.createdAt, // approximates _creationTime (see: version.createdAt ≈ version._creationTime) |
No behavioral impact expected, but worth a comment to make the approximation explicit and prevent future confusion if the two ever diverge.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/skills.ts
Line: 617
Comment:
**`_creationTime` sourced from `createdAt`, not the real Convex field**
`_creationTime` is Convex's internal insertion timestamp (set automatically by the DB), while `createdAt` is the application-set field stored in `latestVersionSummary`. In practice these are nearly identical since both are set to `now` during the same transaction, but they are semantically distinct fields — `toPublicSkillListVersion` correctly reads `version._creationTime`, whereas this path substitutes `summary.createdAt`.
```suggestion
_creationTime: summary.createdAt, // approximates _creationTime (see: version.createdAt ≈ version._creationTime)
```
No behavioral impact expected, but worth a comment to make the approximation explicit and prevent future confusion if the two ever diverge.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in e352123 — added clarifying comment noting the approximation.
convex/lib/leaderboards.ts
Outdated
| for (let day = startDay; day <= endDay; day++) { | ||
| const rows = await ctx.db | ||
| .query('skillDailyStats') | ||
| .withIndex('by_day', (q) => q.eq('day', day)) | ||
| .collect() | ||
| for (const row of rows) { | ||
| const current = totals.get(row.skillId) ?? { installs: 0, downloads: 0 } | ||
| current.installs += row.installs | ||
| current.downloads += row.downloads | ||
| totals.set(row.skillId, current) | ||
| } | ||
| } |
There was a problem hiding this comment.
Sequential daily queries could be parallelized
The 7 daily queries are issued serially with await inside the for loop, adding 7 round-trips instead of 1. Since there are no cross-day dependencies, these can be parallelized with Promise.all for better latency in the cron job:
| for (let day = startDay; day <= endDay; day++) { | |
| const rows = await ctx.db | |
| .query('skillDailyStats') | |
| .withIndex('by_day', (q) => q.eq('day', day)) | |
| .collect() | |
| for (const row of rows) { | |
| const current = totals.get(row.skillId) ?? { installs: 0, downloads: 0 } | |
| current.installs += row.installs | |
| current.downloads += row.downloads | |
| totals.set(row.skillId, current) | |
| } | |
| } | |
| const dayKeys = Array.from({ length: endDay - startDay + 1 }, (_, i) => startDay + i) | |
| const perDayRows = await Promise.all( | |
| dayKeys.map((day) => | |
| ctx.db | |
| .query('skillDailyStats') | |
| .withIndex('by_day', (q) => q.eq('day', day)) | |
| .collect(), | |
| ), | |
| ) | |
| for (const rows of perDayRows) { | |
| for (const row of rows) { | |
| const current = totals.get(row.skillId) ?? { installs: 0, downloads: 0 } | |
| current.installs += row.installs | |
| current.downloads += row.downloads | |
| totals.set(row.skillId, current) | |
| } | |
| } |
This is low-priority for a cron job, but it's an easy win.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/lib/leaderboards.ts
Line: 34-45
Comment:
**Sequential daily queries could be parallelized**
The 7 daily queries are issued serially with `await` inside the `for` loop, adding 7 round-trips instead of 1. Since there are no cross-day dependencies, these can be parallelized with `Promise.all` for better latency in the cron job:
```suggestion
const dayKeys = Array.from({ length: endDay - startDay + 1 }, (_, i) => startDay + i)
const perDayRows = await Promise.all(
dayKeys.map((day) =>
ctx.db
.query('skillDailyStats')
.withIndex('by_day', (q) => q.eq('day', day))
.collect(),
),
)
for (const rows of perDayRows) {
for (const row of rows) {
const current = totals.get(row.skillId) ?? { installs: 0, downloads: 0 }
current.installs += row.installs
current.downloads += row.downloads
totals.set(row.skillId, current)
}
}
```
This is low-priority for a cron job, but it's an easy win.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in e352123 — parallelized with Promise.all.
convex/maintenance.ts
Outdated
|
|
||
| let patched = 0 | ||
| for (const skill of page) { | ||
| if (skill.latestVersionSummary) continue |
There was a problem hiding this comment.
Backfill skips skills with existing latestVersionSummary
The guard if (skill.latestVersionSummary) continue means any skill that already has a summary (e.g. one published after this PR is deployed but before the backfill completes) will be silently skipped and never re-evaluated by the backfill. This is the intended behavior for an initial one-shot populate, but it means if a summary was somehow written with wrong data, re-running the backfill won't fix it. Worth noting in the JSDoc so future operators know to clear latestVersionSummary first if a re-backfill is ever needed.
Prompt To Fix With AI
This is a comment left during a code review.
Path: convex/maintenance.ts
Line: 1551
Comment:
**Backfill skips skills with existing `latestVersionSummary`**
The guard `if (skill.latestVersionSummary) continue` means any skill that already has a summary (e.g. one published after this PR is deployed but before the backfill completes) will be silently skipped and never re-evaluated by the backfill. This is the intended behavior for an initial one-shot populate, but it means if a summary was somehow written with wrong data, re-running the backfill won't fix it. Worth noting in the JSDoc so future operators know to clear `latestVersionSummary` first if a re-backfill is ever needed.
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Fixed in e352123 — backfill now reconciles against the current latestVersionId instead of skipping. Updated JSDoc accordingly.
- Replace ArkType `[inferred]` type alias for ClawdisSkillMetadata with an explicit interface so TS can see envVars/dependencies/author/links - Extract listBySkillHandler from comments.ts so tests can call it directly without accessing private _handler property - Rebuild packages/schema dist output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b904504176
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
convex/maintenance.ts
Outdated
|
|
||
| let patched = 0 | ||
| for (const skill of page) { | ||
| if (skill.latestVersionSummary) continue |
There was a problem hiding this comment.
Reconcile existing summaries during latest-version backfill
backfillLatestVersionSummaryInternal skips any row that already has latestVersionSummary, so it cannot repair skills where latestVersionId was later repointed (for example by tag updates) and the denormalized summary is now stale. In that scenario list responses can keep returning _id for one version and summary fields from another indefinitely because this backfill will never patch those rows; the migration should validate/rewrite summary data against the current latestVersionId instead of unconditionally continuing.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in e352123 — same fix as above, backfill always re-evaluates against current latestVersionId.
The explicit ClawdisSkillMetadata interface (needed because ArkType's [inferred] doesn't resolve all fields) now has a keyof-based type guard that triggers a compile error if the interface keys drift from the schema. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix P1: remove !result.isDone guard from listPublicPageV2 backfill fallback so it fires when the nonsuspicious index is empty (isDone=true) - Fix updateTags to update latestVersionSummary when repointing latest tag - Parallelize leaderboard daily queries with Promise.all - Over-fetch stale-reason candidates (2x limit) before VT filtering - Reconcile existing latestVersionSummary in backfill instead of skipping - Add _creationTime approximation comment - Rebuild schema dist to include author field on ClawdisSkillMetadata Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e3521239b6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
convex/skills.ts
Outdated
| // query returns an unexpectedly empty first page, retry with the old path + JS filter. | ||
| // Note: when the index has no matching rows, Convex returns isDone=true, so we | ||
| // intentionally omit the !result.isDone guard here to ensure the fallback fires. | ||
| if (useNonsuspiciousIndex && filteredPage.length === 0 && !initialCursor) { |
There was a problem hiding this comment.
Keep pagination cursor on the same index path
When nonSuspiciousOnly hits the first-page fallback (!initialCursor), this branch switches to fallbackPaginate and returns a cursor generated from SORT_INDEXES. On the next page request, runPaginate goes back to NONSUSPICIOUS_SORT_INDEXES, so that cursor is incompatible, paginateWithStaleCursorRecovery resets to page 1 (runPaginate(null)), and users can no longer paginate past the first page during backfill windows where isSuspicious is still mostly unset.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in aefaf1c — fallback now fires on any page and reuses the client cursor via paginateWithStaleCursorRecovery, keeping cursors on the same SORT_INDEXES path across paginations.
The nonsuspicious index fallback now fires on any page (not just the first) and reuses the client's cursor via stale-cursor recovery. This prevents pagination from breaking when a SORT_INDEXES cursor is sent back to the NONSUSPICIOUS_SORT_INDEXES path on page 2+. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: aefaf1c798
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| // un-patched docs, which won't match eq('isSuspicious', false). Fall back to | ||
| // SORT_INDEXES + JS filter. This fires on ANY page (not just the first) so that | ||
| // cursors stay on the same index path across client paginations. | ||
| if (useNonsuspiciousIndex && filteredPage.length === 0) { |
There was a problem hiding this comment.
Fall back when non-suspicious index page is underfilled
In listPublicPageV2, the backfill fallback only runs when filteredPage.length === 0, but during isSuspicious backfill many non-suspicious rows still have isSuspicious unset and are therefore absent from the eq('isSuspicious', false) index. If a page contains even a few indexed rows, this branch is skipped, so the query can return a partial page and even isDone: true while additional eligible rows still exist outside that index slice; clients will stop paginating and silently miss valid results until backfill completes.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Acknowledged — this is a known transient edge case during the backfill window. The partial pages contain correct data (no false results), just fewer items than expected. The backfill self-schedules and completes in minutes, after which the index is fully populated and pages are full-sized. We considered always falling back when underfilled, but that would negate the index benefit for any page that happens to have fewer results than numItems (e.g. the last page). The brief underfill during backfill is an acceptable trade-off vs. adding complexity for a one-time migration.
Summary
Production DB reads are ~2 TB/day. This PR reduces bandwidth by ~75% (~1.5 TB/day) through indexed queries, denormalization, and query restructuring.
by_moderationcompound index and rewrite 8 cron query functions from.filter().collect()full table scans to.withIndex()(~6 GB/day saved)isSuspiciousonto skills table with 6 compound indexes solistPublicPageV2filters at the index level instead of paginating the entire table (~1 TB/day saved)latestVersionSummarydenormalization to avoid reading full ~6.4 KBskillVersionsdocs on list pages (~500 GB/day saved)Post-deploy steps
Run backfills after deploying:
No client changes required — all new schema fields are
v.optional()and query return shapes are unchanged.Test plan
bunx convex typecheck— no new type errors (26 pre-existing in unrelated files)bun run test— all 604 tests passbytesReadLimit/documentsReadLimiterrors in Convex insights🤖 Generated with Claude Code