Skip to content

fix(refresh): defer pr_issue_links backfill off the request path#168

Merged
MkDev11 merged 2 commits into
mainfrom
fix/defer-pr-issue-links-backfill
May 26, 2026
Merged

fix(refresh): defer pr_issue_links backfill off the request path#168
MkDev11 merged 2 commits into
mainfrom
fix/defer-pr-issue-links-backfill

Conversation

@MkDev11
Copy link
Copy Markdown
Owner

@MkDev11 MkDev11 commented May 26, 2026

Summary by CodeRabbit

  • Refactor
    • Improved responsiveness by making PR→issue link initialization non-blocking: existing link counts are returned immediately while full backfill runs in the background.
  • Reliability
    • Ensures background backfills are scheduled only once per repository, record completion on success, and log failures so results appear once processing finishes.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 26, 2026

📝 Walkthrough

Walkthrough

backfillPrIssueLinksIfNeeded now returns immediately on request: it reads existing link count (hot path) or, if none, schedules runPrIssueLinksBackfill(repoFullName) via setImmediate guarded by a per-repo in-flight gate and returns 0 without blocking. A migration adds repo_meta.pr_issue_links_backfilled_at to record completion.

Changes

Non-blocking PR-issue links backfill

Layer / File(s) Summary
Add repo_meta backfilled marker
src/lib/db.ts
Adds a one-time repo_meta.pr_issue_links_backfilled_at column during getDb() migrations to record that the PR→issue backfill has completed for a repo.
Deferred backfill with per-repo gating
src/lib/refresh.ts
backfillPrIssueLinksIfNeeded hot-paths the existing pr_issue_links count via getReadDb() or returns 0 if repo_meta.pr_issue_links_backfilled_at is set; otherwise schedules runPrIssueLinksBackfill(repoFullName) via setImmediate with an in-memory inFlightLinksBackfill gate and returns immediately. Deferred errors are logged and the gate is cleared in finally; the completion marker is written only on success.
Blocking backfill implementation
src/lib/refresh.ts
Introduces runPrIssueLinksBackfill(repoFullName) which acquires the write DB, scans PRs, extracts linked issues from title/body, inserts pr_issue_links with INSERT OR IGNORE in a transaction, and returns the count of newly inserted rows.

Sequence Diagram

sequenceDiagram
  participant Handler as RequestHandler
  participant Backfill as backfillPrIssueLinksIfNeeded
  participant DB as Database
  participant Queue as setImmediateQueue
  participant Runner as runPrIssueLinksBackfill

  Handler->>Backfill: backfillPrIssueLinksIfNeeded(repoFullName)
  alt Links exist in pr_issue_links
    Backfill->>DB: SELECT COUNT(*) FROM pr_issue_links WHERE repo=repoFullName
    DB-->>Backfill: count
    Backfill-->>Handler: return count
  else No links yet and not in-flight
    Backfill->>Queue: schedule runPrIssueLinksBackfill(repoFullName)
    Backfill-->>Handler: return 0
    Queue->>Runner: invoke runPrIssueLinksBackfill(repoFullName)
    Runner->>DB: read pulls, extract issues, INSERT OR IGNORE into pr_issue_links (transaction)
    DB-->>Runner: inserted rows count
    Runner->>DB: UPDATE repo_meta.pr_issue_links_backfilled_at
    Runner-->>Runner: clear inFlightLinksBackfill in finally
  else Already in-flight
    Backfill-->>Handler: return 0
  end
Loading

Possibly Related PRs

🎯 3 (Moderate) | ⏱️ ~20 minutes

🐰 I nudged the heavy work to an idle slot,
set gates so duplicates do not trot,
scans sneak in while requests hop free,
and the DB keeps a quiet backfilled key. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: deferring the PR issue links backfill operation off the request path to keep request handlers non-blocking.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/defer-pr-issue-links-backfill

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 ESLint

If the error stems from missing dependencies, add them to the package.json file. For unrecoverable errors (e.g., due to private dependencies), disable the tool in the CodeRabbit configuration.

src/lib/db.ts

ESLint skipped: missing config or dependency (missing-dependency). The ESLint configuration references a package that is not available in the sandbox.

src/lib/refresh.ts

ESLint skipped: the ESLint configuration for this file references a package that is not available in the sandbox.


Comment @coderabbitai help to get the list of available commands and usage tips.

Root cause of the Explorer page's "stuck loading skeleton" (Cloudflare
524 timeouts at the origin): `backfillPrIssueLinksIfNeeded` is called
inline by 8 API routes (issues/route, pulls/route, issues-meta,
related-issues, related-prs, gt/repos/[owner]/[name], etc.). On a
cold path where `pr_issue_links` is empty for the repo (e.g. after
PR #137's schema-purge migration), the function synchronously:

  1. SELECTs every PR + title + body for the repo
  2. Runs `extractLinkedIssues` regex over each PR's body
  3. Writes resulting links in a single sqlite transaction

For a repo with hundreds+ of PRs this can stall the Node event loop
for tens of seconds. Cloudflare gives up at 100s and returns 524 to
the client; the consumer in RepoExplorer.tsx treats !r.ok as fatal
and leaves the Issues / Pull Requests tabs stuck on their skeleton.

Fix: split the backfill into a request-path entry point that gates +
defers, and a separate `runPrIssueLinksBackfill` that does the heavy
work. The entry point now:
  - Returns immediately when the gate count > 0 (existing fast path,
    unchanged behaviour for steady state).
  - On cold start, schedules the backfill via setImmediate and
    returns 0 to the caller. A per-repo in-flight Set dedupes
    concurrent cold requests so we don't pile up serial writer-tx
    queues.

The first request after a purge / cold start returns empty linked-
issues / linked-PRs enrichment, but the route responds in
milliseconds instead of timing out. Subsequent requests (after the
deferred backfill commits) get full enrichment.

Companion to df1f09c + 2fa2d13 + 99df6b1 on this branch which catch
*throws* — those didn't help here because the underlying handler was
*hanging*, not erroring. This is the actual root-cause patch.
@MkDev11 MkDev11 force-pushed the fix/defer-pr-issue-links-backfill branch from f4bfb7c to 725340e Compare May 26, 2026 22:10
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/refresh.ts`:
- Around line 312-331: The current cold-path check only returns early when
existingCount > 0, so repos that legitimately have zero links keep being
rescanned; add a separate persistent "backfill completed" marker and check it
alongside existingCount. Modify the logic around
getReadDb()/existingCount/inFlightLinksBackfill/runPrIssueLinksBackfill so that
before scheduling a backfill you query a new table (e.g.,
pr_issue_links_backfill(repo_full_name, completed_at)) or equivalent flag to see
if backfill completed, and return early if that marker exists (even when
existingCount === 0); after runPrIssueLinksBackfill finishes successfully,
insert or upsert the completed marker for repoFullName; apply the same change to
the duplicate block referenced (lines ~338-343) and ensure markers are only
written on success and not on errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 2f51384a-4a2f-4029-9fff-021c70bc8aab

📥 Commits

Reviewing files that changed from the base of the PR and between ed822ca and f4bfb7c.

📒 Files selected for processing (1)
  • src/lib/refresh.ts

Comment thread src/lib/refresh.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
src/lib/refresh.ts (1)

323-345: ⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Persist “backfill completed” separately from row count.

A successful backfill can still leave existingCount === 0 (for example, no cached PRs yet or no same-repo links found), so this repo never graduates from the cold path. Because src/app/api/repos/[owner]/[name]/issues/route.ts:75-85 calls this on every request, those repos will keep scheduling full rescans forever once the in-flight gate clears. Please store a persistent completion marker (for example in repo_meta or a dedicated table) and check that alongside existingCount before scheduling again.

Possible shape of the fix
 export function backfillPrIssueLinksIfNeeded(repoFullName: string): number {
   const existingCount = (getReadDb()
     .prepare(`SELECT COUNT(*) AS c FROM pr_issue_links WHERE repo_full_name = ?`)
     .get(repoFullName) as { c: number }).c;
-  if (existingCount > 0) return existingCount;
+  const backfillDone = hasPrIssueLinksBackfillCompleted(repoFullName);
+  if (existingCount > 0 || backfillDone) return existingCount;

   if (!inFlightLinksBackfill.has(repoFullName)) {
     inFlightLinksBackfill.add(repoFullName);
     setImmediate(() => {
       try {
         runPrIssueLinksBackfill(repoFullName);
+        markPrIssueLinksBackfillCompleted(repoFullName);
       } catch (err) {
         const msg = err instanceof Error ? err.message : String(err);
         console.error(`[refresh] backfillPrIssueLinks(${repoFullName}) failed:`, msg);
       } finally {
         inFlightLinksBackfill.delete(repoFullName);

This needs a companion schema/migration change so the marker survives restarts and multi-instance deployments.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/refresh.ts` around lines 323 - 345, The cold-path gate in
backfillPrIssueLinksIfNeeded currently only checks pr_issue_links row count so
repos with zero rows (but already backfilled) will be re-scheduled repeatedly;
persist a completion marker in durable storage (e.g. add a column/row in
repo_meta or a dedicated backfill_marker table) when runPrIssueLinksBackfill
finishes successfully, and update backfillPrIssueLinksIfNeeded to check that
marker along with the existingCount before scheduling; keep using
inFlightLinksBackfill to prevent concurrent in-memory runs but rely on the
persistent marker to avoid re-queuing across restarts/instances and ensure the
marker is cleared or versioned if you ever need to re-run the backfill.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@src/lib/refresh.ts`:
- Around line 323-345: The cold-path gate in backfillPrIssueLinksIfNeeded
currently only checks pr_issue_links row count so repos with zero rows (but
already backfilled) will be re-scheduled repeatedly; persist a completion marker
in durable storage (e.g. add a column/row in repo_meta or a dedicated
backfill_marker table) when runPrIssueLinksBackfill finishes successfully, and
update backfillPrIssueLinksIfNeeded to check that marker along with the
existingCount before scheduling; keep using inFlightLinksBackfill to prevent
concurrent in-memory runs but rely on the persistent marker to avoid re-queuing
across restarts/instances and ensure the marker is cleared or versioned if you
ever need to re-run the backfill.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 71de4b28-4373-42cc-b8fb-be86aaf5504f

📥 Commits

Reviewing files that changed from the base of the PR and between f4bfb7c and 725340e.

📒 Files selected for processing (1)
  • src/lib/refresh.ts

CodeRabbit caught a real edge case in #168: my cold-path check is
`if (existingCount > 0) return existingCount;`. For repos that
legitimately have zero linked issues (brand-new repo, or every PR
genuinely has no `Closes #N` references), `existingCount === 0` is
ALWAYS true → every request to one of the 8 calling routes schedules
another backfill → every request re-loads all PRs + bodies and
regex-extracts them, even though the previous backfill already ran
and confirmed there's nothing to write. In-flight Set only dedupes
concurrent schedules, not subsequent ones.

Fix: add a persistent marker column `pr_issue_links_backfilled_at` on
repo_meta (idempotent migration in db.ts alongside the existing
ALTER TABLE backfill columns). The cold-path check now reads the
marker too — once set, the repo is considered backfilled even with
zero rows. Marker is written by the deferred worker on successful
completion only; failures intentionally leave it null so the next
request retries.

Real cost without this fix: every request for a linkless repo runs
the full PR scan in the background. Defer keeps it off the request
path so users don't see latency, but it's wasted CPU per-request and
amplifies under traffic.

Follow-up to 725340e.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/lib/refresh.ts`:
- Around line 334-342: The early return on existingCount prevents backfill when
pr_issue_links already has opportunistic rows; change the gate so we only skip
backfill if the backfill marker pr_issue_links_backfilled_at is set. Concretely,
in the logic around existingCount and completedAt (variables repoFullName,
existingCount, completedAt and column pr_issue_links_backfilled_at) either check
completedAt first and return 0 if present, or require both completedAt and
existingCount before returning; ensure runPrIssueLinksBackfill() is invoked when
completedAt is null even if existingCount > 0 so existing opportunistic rows
don't stop the full backfill (this ties to replacePrIssueLinks() /
refreshIssueLinkedPrsIfStale()).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: fc98cd54-4a24-49f0-942b-b5f3c2eceea7

📥 Commits

Reviewing files that changed from the base of the PR and between 725340e and ca0217b.

📒 Files selected for processing (2)
  • src/lib/db.ts
  • src/lib/refresh.ts

Comment thread src/lib/refresh.ts
Comment on lines +334 to +342
const existingCount = (readDb
.prepare(`SELECT COUNT(*) AS c FROM pr_issue_links WHERE repo_full_name = ?`)
.get(repoFullName) as { c: number }).c;
if (existingCount > 0) return existingCount;
const db = getDb();
const completedAt = (readDb
.prepare('SELECT pr_issue_links_backfilled_at FROM repo_meta WHERE full_name = ?')
.get(repoFullName) as { pr_issue_links_backfilled_at: string | null } | undefined)
?.pr_issue_links_backfilled_at;
if (completedAt) return 0;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don't use existingCount > 0 as the completion gate.

Line 337 short-circuits before the new marker is consulted, but pr_issue_links can already be non-empty from opportunistic inserts in replacePrIssueLinks() / refreshIssueLinkedPrsIfStale() without ever scanning all cached PRs. That means upgraded repos with a few existing rows will never run runPrIssueLinksBackfill() and older cached PRs stay partially linked forever.

Suggested fix
   const existingCount = (readDb
     .prepare(`SELECT COUNT(*) AS c FROM pr_issue_links WHERE repo_full_name = ?`)
     .get(repoFullName) as { c: number }).c;
-  if (existingCount > 0) return existingCount;
   const completedAt = (readDb
     .prepare('SELECT pr_issue_links_backfilled_at FROM repo_meta WHERE full_name = ?')
     .get(repoFullName) as { pr_issue_links_backfilled_at: string | null } | undefined)
     ?.pr_issue_links_backfilled_at;
-  if (completedAt) return 0;
+  if (completedAt) return existingCount;
 
   // Cold path: schedule the backfill OFF the request path.
   if (!inFlightLinksBackfill.has(repoFullName)) {
     inFlightLinksBackfill.add(repoFullName);
     setImmediate(() => {
@@
       }
     });
   }
-  return 0;
+  return existingCount;
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/lib/refresh.ts` around lines 334 - 342, The early return on existingCount
prevents backfill when pr_issue_links already has opportunistic rows; change the
gate so we only skip backfill if the backfill marker
pr_issue_links_backfilled_at is set. Concretely, in the logic around
existingCount and completedAt (variables repoFullName, existingCount,
completedAt and column pr_issue_links_backfilled_at) either check completedAt
first and return 0 if present, or require both completedAt and existingCount
before returning; ensure runPrIssueLinksBackfill() is invoked when completedAt
is null even if existingCount > 0 so existing opportunistic rows don't stop the
full backfill (this ties to replacePrIssueLinks() /
refreshIssueLinkedPrsIfStale()).

@MkDev11 MkDev11 merged commit e896ce0 into main May 26, 2026
4 checks passed
@MkDev11 MkDev11 deleted the fix/defer-pr-issue-links-backfill branch May 26, 2026 22:21
MkDev11 added a commit that referenced this pull request May 27, 2026
* fix(refresh): defer pr_issue_links backfill off the request path

Root cause of the Explorer page's "stuck loading skeleton" (Cloudflare
524 timeouts at the origin): `backfillPrIssueLinksIfNeeded` is called
inline by 8 API routes (issues/route, pulls/route, issues-meta,
related-issues, related-prs, gt/repos/[owner]/[name], etc.). On a
cold path where `pr_issue_links` is empty for the repo (e.g. after
PR #137's schema-purge migration), the function synchronously:

  1. SELECTs every PR + title + body for the repo
  2. Runs `extractLinkedIssues` regex over each PR's body
  3. Writes resulting links in a single sqlite transaction

For a repo with hundreds+ of PRs this can stall the Node event loop
for tens of seconds. Cloudflare gives up at 100s and returns 524 to
the client; the consumer in RepoExplorer.tsx treats !r.ok as fatal
and leaves the Issues / Pull Requests tabs stuck on their skeleton.

Fix: split the backfill into a request-path entry point that gates +
defers, and a separate `runPrIssueLinksBackfill` that does the heavy
work. The entry point now:
  - Returns immediately when the gate count > 0 (existing fast path,
    unchanged behaviour for steady state).
  - On cold start, schedules the backfill via setImmediate and
    returns 0 to the caller. A per-repo in-flight Set dedupes
    concurrent cold requests so we don't pile up serial writer-tx
    queues.

The first request after a purge / cold start returns empty linked-
issues / linked-PRs enrichment, but the route responds in
milliseconds instead of timing out. Subsequent requests (after the
deferred backfill commits) get full enrichment.

Companion to df1f09c + 2fa2d13 + 99df6b1 on this branch which catch
*throws* — those didn't help here because the underlying handler was
*hanging*, not erroring. This is the actual root-cause patch.

* fix(refresh): persistent backfill marker so linkless repos don't re-scan

CodeRabbit caught a real edge case in #168: my cold-path check is
`if (existingCount > 0) return existingCount;`. For repos that
legitimately have zero linked issues (brand-new repo, or every PR
genuinely has no `Closes #N` references), `existingCount === 0` is
ALWAYS true → every request to one of the 8 calling routes schedules
another backfill → every request re-loads all PRs + bodies and
regex-extracts them, even though the previous backfill already ran
and confirmed there's nothing to write. In-flight Set only dedupes
concurrent schedules, not subsequent ones.

Fix: add a persistent marker column `pr_issue_links_backfilled_at` on
repo_meta (idempotent migration in db.ts alongside the existing
ALTER TABLE backfill columns). The cold-path check now reads the
marker too — once set, the repo is considered backfilled even with
zero rows. Marker is written by the deferred worker on successful
completion only; failures intentionally leave it null so the next
request retries.

Real cost without this fix: every request for a linkless repo runs
the full PR scan in the background. Defer keeps it off the request
path so users don't see latency, but it's wasted CPU per-request and
amplifies under traffic.

Follow-up to 725340e.

* fix(pulls): scope cache to live gittensor repos
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant