fix(refresh): defer pr_issue_links backfill off the request path by MkDev11 · Pull Request #168 · MkDev11/gittensor-hub

MkDev11 · 2026-05-26T22:06:10Z

Summary by CodeRabbit

Refactor
- Improved responsiveness by making PR→issue link initialization non-blocking: existing link counts are returned immediately while full backfill runs in the background.
Reliability
- Ensures background backfills are scheduled only once per repository, record completion on success, and log failures so results appear once processing finishes.

coderabbitai · 2026-05-26T22:08:49Z

📝 Walkthrough

Walkthrough

backfillPrIssueLinksIfNeeded now returns immediately on request: it reads existing link count (hot path) or, if none, schedules runPrIssueLinksBackfill(repoFullName) via setImmediate guarded by a per-repo in-flight gate and returns 0 without blocking. A migration adds repo_meta.pr_issue_links_backfilled_at to record completion.

Changes

Non-blocking PR-issue links backfill

Layer / File(s)	Summary
Add repo_meta backfilled marker `src/lib/db.ts`	Adds a one-time `repo_meta.pr_issue_links_backfilled_at` column during `getDb()` migrations to record that the PR→issue backfill has completed for a repo.
Deferred backfill with per-repo gating `src/lib/refresh.ts`	`backfillPrIssueLinksIfNeeded` hot-paths the existing `pr_issue_links` count via `getReadDb()` or returns `0` if `repo_meta.pr_issue_links_backfilled_at` is set; otherwise schedules `runPrIssueLinksBackfill(repoFullName)` via `setImmediate` with an in-memory `inFlightLinksBackfill` gate and returns immediately. Deferred errors are logged and the gate is cleared in `finally`; the completion marker is written only on success.
Blocking backfill implementation `src/lib/refresh.ts`	Introduces `runPrIssueLinksBackfill(repoFullName)` which acquires the write DB, scans PRs, extracts linked issues from title/body, inserts `pr_issue_links` with `INSERT OR IGNORE` in a transaction, and returns the count of newly inserted rows.

Sequence Diagram

sequenceDiagram
  participant Handler as RequestHandler
  participant Backfill as backfillPrIssueLinksIfNeeded
  participant DB as Database
  participant Queue as setImmediateQueue
  participant Runner as runPrIssueLinksBackfill

  Handler->>Backfill: backfillPrIssueLinksIfNeeded(repoFullName)
  alt Links exist in pr_issue_links
    Backfill->>DB: SELECT COUNT(*) FROM pr_issue_links WHERE repo=repoFullName
    DB-->>Backfill: count
    Backfill-->>Handler: return count
  else No links yet and not in-flight
    Backfill->>Queue: schedule runPrIssueLinksBackfill(repoFullName)
    Backfill-->>Handler: return 0
    Queue->>Runner: invoke runPrIssueLinksBackfill(repoFullName)
    Runner->>DB: read pulls, extract issues, INSERT OR IGNORE into pr_issue_links (transaction)
    DB-->>Runner: inserted rows count
    Runner->>DB: UPDATE repo_meta.pr_issue_links_backfilled_at
    Runner-->>Runner: clear inFlightLinksBackfill in finally
  else Already in-flight
    Backfill-->>Handler: return 0
  end

Possibly Related PRs

MkDev11/gittensor-hub#138: related to LINK_REGEX and PR issue link insertion/cleanup.

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 I nudged the heavy work to an idle slot,
set gates so duplicates do not trot,
scans sneak in while requests hop free,
and the DB keeps a quiet backfilled key. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: deferring the PR issue links backfill operation off the request path to keep request handlers non-blocking.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/defer-pr-issue-links-backfill

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

src/lib/db.ts

ESLint skipped: missing config or dependency (missing-dependency). The ESLint configuration references a package that is not available in the sandbox.

src/lib/refresh.ts

ESLint skipped: the ESLint configuration for this file references a package that is not available in the sandbox.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Root cause of the Explorer page's "stuck loading skeleton" (Cloudflare 524 timeouts at the origin): `backfillPrIssueLinksIfNeeded` is called inline by 8 API routes (issues/route, pulls/route, issues-meta, related-issues, related-prs, gt/repos/[owner]/[name], etc.). On a cold path where `pr_issue_links` is empty for the repo (e.g. after PR #137's schema-purge migration), the function synchronously: 1. SELECTs every PR + title + body for the repo 2. Runs `extractLinkedIssues` regex over each PR's body 3. Writes resulting links in a single sqlite transaction For a repo with hundreds+ of PRs this can stall the Node event loop for tens of seconds. Cloudflare gives up at 100s and returns 524 to the client; the consumer in RepoExplorer.tsx treats !r.ok as fatal and leaves the Issues / Pull Requests tabs stuck on their skeleton. Fix: split the backfill into a request-path entry point that gates + defers, and a separate `runPrIssueLinksBackfill` that does the heavy work. The entry point now: - Returns immediately when the gate count > 0 (existing fast path, unchanged behaviour for steady state). - On cold start, schedules the backfill via setImmediate and returns 0 to the caller. A per-repo in-flight Set dedupes concurrent cold requests so we don't pile up serial writer-tx queues. The first request after a purge / cold start returns empty linked- issues / linked-PRs enrichment, but the route responds in milliseconds instead of timing out. Subsequent requests (after the deferred backfill commits) get full enrichment. Companion to df1f09c + 2fa2d13 + 99df6b1 on this branch which catch *throws* — those didn't help here because the underlying handler was *hanging*, not erroring. This is the actual root-cause patch.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/refresh.ts`:
- Around line 312-331: The current cold-path check only returns early when
existingCount > 0, so repos that legitimately have zero links keep being
rescanned; add a separate persistent "backfill completed" marker and check it
alongside existingCount. Modify the logic around
getReadDb()/existingCount/inFlightLinksBackfill/runPrIssueLinksBackfill so that
before scheduling a backfill you query a new table (e.g.,
pr_issue_links_backfill(repo_full_name, completed_at)) or equivalent flag to see
if backfill completed, and return early if that marker exists (even when
existingCount === 0); after runPrIssueLinksBackfill finishes successfully,
insert or upsert the completed marker for repoFullName; apply the same change to
the duplicate block referenced (lines ~338-343) and ensure markers are only
written on success and not on errors.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2f51384a-4a2f-4029-9fff-021c70bc8aab

📥 Commits

Reviewing files that changed from the base of the PR and between ed822ca and f4bfb7c.

📒 Files selected for processing (1)

src/lib/refresh.ts

coderabbitai

♻️ Duplicate comments (1)

src/lib/refresh.ts (1)

323-345: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Persist “backfill completed” separately from row count.

A successful backfill can still leave existingCount === 0 (for example, no cached PRs yet or no same-repo links found), so this repo never graduates from the cold path. Because src/app/api/repos/[owner]/[name]/issues/route.ts:75-85 calls this on every request, those repos will keep scheduling full rescans forever once the in-flight gate clears. Please store a persistent completion marker (for example in repo_meta or a dedicated table) and check that alongside existingCount before scheduling again.

Possible shape of the fix

 export function backfillPrIssueLinksIfNeeded(repoFullName: string): number {
   const existingCount = (getReadDb()
     .prepare(`SELECT COUNT(*) AS c FROM pr_issue_links WHERE repo_full_name = ?`)
     .get(repoFullName) as { c: number }).c;
-  if (existingCount > 0) return existingCount;
+  const backfillDone = hasPrIssueLinksBackfillCompleted(repoFullName);
+  if (existingCount > 0 || backfillDone) return existingCount;

   if (!inFlightLinksBackfill.has(repoFullName)) {
     inFlightLinksBackfill.add(repoFullName);
     setImmediate(() => {
       try {
         runPrIssueLinksBackfill(repoFullName);
+        markPrIssueLinksBackfillCompleted(repoFullName);
       } catch (err) {
         const msg = err instanceof Error ? err.message : String(err);
         console.error(`[refresh] backfillPrIssueLinks(${repoFullName}) failed:`, msg);
       } finally {
         inFlightLinksBackfill.delete(repoFullName);

This needs a companion schema/migration change so the marker survives restarts and multi-instance deployments.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/refresh.ts` around lines 323 - 345, The cold-path gate in
backfillPrIssueLinksIfNeeded currently only checks pr_issue_links row count so
repos with zero rows (but already backfilled) will be re-scheduled repeatedly;
persist a completion marker in durable storage (e.g. add a column/row in
repo_meta or a dedicated backfill_marker table) when runPrIssueLinksBackfill
finishes successfully, and update backfillPrIssueLinksIfNeeded to check that
marker along with the existingCount before scheduling; keep using
inFlightLinksBackfill to prevent concurrent in-memory runs but rely on the
persistent marker to avoid re-queuing across restarts/instances and ensure the
marker is cleared or versioned if you ever need to re-run the backfill.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/lib/refresh.ts`:
- Around line 323-345: The cold-path gate in backfillPrIssueLinksIfNeeded
currently only checks pr_issue_links row count so repos with zero rows (but
already backfilled) will be re-scheduled repeatedly; persist a completion marker
in durable storage (e.g. add a column/row in repo_meta or a dedicated
backfill_marker table) when runPrIssueLinksBackfill finishes successfully, and
update backfillPrIssueLinksIfNeeded to check that marker along with the
existingCount before scheduling; keep using inFlightLinksBackfill to prevent
concurrent in-memory runs but rely on the persistent marker to avoid re-queuing
across restarts/instances and ensure the marker is cleared or versioned if you
ever need to re-run the backfill.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 71de4b28-4373-42cc-b8fb-be86aaf5504f

📥 Commits

Reviewing files that changed from the base of the PR and between f4bfb7c and 725340e.

📒 Files selected for processing (1)

src/lib/refresh.ts

CodeRabbit caught a real edge case in #168: my cold-path check is `if (existingCount > 0) return existingCount;`. For repos that legitimately have zero linked issues (brand-new repo, or every PR genuinely has no `Closes #N` references), `existingCount === 0` is ALWAYS true → every request to one of the 8 calling routes schedules another backfill → every request re-loads all PRs + bodies and regex-extracts them, even though the previous backfill already ran and confirmed there's nothing to write. In-flight Set only dedupes concurrent schedules, not subsequent ones. Fix: add a persistent marker column `pr_issue_links_backfilled_at` on repo_meta (idempotent migration in db.ts alongside the existing ALTER TABLE backfill columns). The cold-path check now reads the marker too — once set, the repo is considered backfilled even with zero rows. Marker is written by the deferred worker on successful completion only; failures intentionally leave it null so the next request retries. Real cost without this fix: every request for a linkless repo runs the full PR scan in the background. Defer keeps it off the request path so users don't see latency, but it's wasted CPU per-request and amplifies under traffic. Follow-up to 725340e.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/refresh.ts`:
- Around line 334-342: The early return on existingCount prevents backfill when
pr_issue_links already has opportunistic rows; change the gate so we only skip
backfill if the backfill marker pr_issue_links_backfilled_at is set. Concretely,
in the logic around existingCount and completedAt (variables repoFullName,
existingCount, completedAt and column pr_issue_links_backfilled_at) either check
completedAt first and return 0 if present, or require both completedAt and
existingCount before returning; ensure runPrIssueLinksBackfill() is invoked when
completedAt is null even if existingCount > 0 so existing opportunistic rows
don't stop the full backfill (this ties to replacePrIssueLinks() /
refreshIssueLinkedPrsIfStale()).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fc98cd54-4a24-49f0-942b-b5f3c2eceea7

📥 Commits

Reviewing files that changed from the base of the PR and between 725340e and ca0217b.

📒 Files selected for processing (2)

src/lib/db.ts
src/lib/refresh.ts

coderabbitai · 2026-05-26T22:21:17Z

+  const existingCount = (readDb
    .prepare(`SELECT COUNT(*) AS c FROM pr_issue_links WHERE repo_full_name = ?`)
    .get(repoFullName) as { c: number }).c;
  if (existingCount > 0) return existingCount;
-  const db = getDb();
+  const completedAt = (readDb
+    .prepare('SELECT pr_issue_links_backfilled_at FROM repo_meta WHERE full_name = ?')
+    .get(repoFullName) as { pr_issue_links_backfilled_at: string | null } | undefined)
+    ?.pr_issue_links_backfilled_at;
+  if (completedAt) return 0;


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't use existingCount > 0 as the completion gate.

Line 337 short-circuits before the new marker is consulted, but pr_issue_links can already be non-empty from opportunistic inserts in replacePrIssueLinks() / refreshIssueLinkedPrsIfStale() without ever scanning all cached PRs. That means upgraded repos with a few existing rows will never run runPrIssueLinksBackfill() and older cached PRs stay partially linked forever.

Suggested fix

const existingCount = (readDb .prepare(`SELECT COUNT(*) AS c FROM pr_issue_links WHERE repo_full_name = ?`) .get(repoFullName) as { c: number }).c; - if (existingCount > 0) return existingCount; const completedAt = (readDb .prepare('SELECT pr_issue_links_backfilled_at FROM repo_meta WHERE full_name = ?') .get(repoFullName) as { pr_issue_links_backfilled_at: string | null } | undefined) ?.pr_issue_links_backfilled_at; - if (completedAt) return 0; + if (completedAt) return existingCount; // Cold path: schedule the backfill OFF the request path. if (!inFlightLinksBackfill.has(repoFullName)) { inFlightLinksBackfill.add(repoFullName); setImmediate(() => { @@ } }); } - return 0; + return existingCount; }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/refresh.ts` around lines 334 - 342, The early return on existingCount prevents backfill when pr_issue_links already has opportunistic rows; change the gate so we only skip backfill if the backfill marker pr_issue_links_backfilled_at is set. Concretely, in the logic around existingCount and completedAt (variables repoFullName, existingCount, completedAt and column pr_issue_links_backfilled_at) either check completedAt first and return 0 if present, or require both completedAt and existingCount before returning; ensure runPrIssueLinksBackfill() is invoked when completedAt is null even if existingCount > 0 so existing opportunistic rows don't stop the full backfill (this ties to replacePrIssueLinks() / refreshIssueLinkedPrsIfStale()).

* fix(refresh): defer pr_issue_links backfill off the request path Root cause of the Explorer page's "stuck loading skeleton" (Cloudflare 524 timeouts at the origin): `backfillPrIssueLinksIfNeeded` is called inline by 8 API routes (issues/route, pulls/route, issues-meta, related-issues, related-prs, gt/repos/[owner]/[name], etc.). On a cold path where `pr_issue_links` is empty for the repo (e.g. after PR #137's schema-purge migration), the function synchronously: 1. SELECTs every PR + title + body for the repo 2. Runs `extractLinkedIssues` regex over each PR's body 3. Writes resulting links in a single sqlite transaction For a repo with hundreds+ of PRs this can stall the Node event loop for tens of seconds. Cloudflare gives up at 100s and returns 524 to the client; the consumer in RepoExplorer.tsx treats !r.ok as fatal and leaves the Issues / Pull Requests tabs stuck on their skeleton. Fix: split the backfill into a request-path entry point that gates + defers, and a separate `runPrIssueLinksBackfill` that does the heavy work. The entry point now: - Returns immediately when the gate count > 0 (existing fast path, unchanged behaviour for steady state). - On cold start, schedules the backfill via setImmediate and returns 0 to the caller. A per-repo in-flight Set dedupes concurrent cold requests so we don't pile up serial writer-tx queues. The first request after a purge / cold start returns empty linked- issues / linked-PRs enrichment, but the route responds in milliseconds instead of timing out. Subsequent requests (after the deferred backfill commits) get full enrichment. Companion to df1f09c + 2fa2d13 + 99df6b1 on this branch which catch *throws* — those didn't help here because the underlying handler was *hanging*, not erroring. This is the actual root-cause patch. * fix(refresh): persistent backfill marker so linkless repos don't re-scan CodeRabbit caught a real edge case in #168: my cold-path check is `if (existingCount > 0) return existingCount;`. For repos that legitimately have zero linked issues (brand-new repo, or every PR genuinely has no `Closes #N` references), `existingCount === 0` is ALWAYS true → every request to one of the 8 calling routes schedules another backfill → every request re-loads all PRs + bodies and regex-extracts them, even though the previous backfill already ran and confirmed there's nothing to write. In-flight Set only dedupes concurrent schedules, not subsequent ones. Fix: add a persistent marker column `pr_issue_links_backfilled_at` on repo_meta (idempotent migration in db.ts alongside the existing ALTER TABLE backfill columns). The cold-path check now reads the marker too — once set, the repo is considered backfilled even with zero rows. Marker is written by the deferred worker on successful completion only; failures intentionally leave it null so the next request retries. Real cost without this fix: every request for a linkless repo runs the full PR scan in the background. Defer keeps it off the request path so users don't see latency, but it's wasted CPU per-request and amplifies under traffic. Follow-up to 725340e. * fix(pulls): scope cache to live gittensor repos

MkDev11 force-pushed the fix/defer-pr-issue-links-backfill branch from f4bfb7c to 725340e Compare May 26, 2026 22:10

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

Comment thread src/lib/refresh.ts Outdated

coderabbitai Bot reviewed May 26, 2026

View reviewed changes

MkDev11 merged commit e896ce0 into main May 26, 2026
4 checks passed

MkDev11 deleted the fix/defer-pr-issue-links-backfill branch May 26, 2026 22:21

coderabbitai Bot mentioned this pull request May 27, 2026

fix: defer pr issue links backfill #171

Merged

coderabbitai Bot mentioned this pull request May 27, 2026

Fix/defer pr issue links backfill #172

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(refresh): defer pr_issue_links backfill off the request path#168

fix(refresh): defer pr_issue_links backfill off the request path#168
MkDev11 merged 2 commits into
mainfrom
fix/defer-pr-issue-links-backfill

MkDev11 commented May 26, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 26, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Possibly Related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MkDev11 commented May 26, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Possibly Related PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

MkDev11 commented May 26, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 26, 2026 •

edited

Loading