Skip to content

fix(seed-bundle-resilience): shorten Resilience-Scores interval so refresh actually runs#3126

Merged
koala73 merged 1 commit intomainfrom
fix/bundle-resilience-refresh-interval
Apr 16, 2026
Merged

fix(seed-bundle-resilience): shorten Resilience-Scores interval so refresh actually runs#3126
koala73 merged 1 commit intomainfrom
fix/bundle-resilience-refresh-interval

Conversation

@koala73
Copy link
Copy Markdown
Owner

@koala73 koala73 commented Apr 16, 2026

Summary

Follow-up to the Slice B work merged in #3124. Live Railway log at 2026-04-16 09:25 UTC:

[Bundle:resilience] Starting (2 sections)
  [Resilience-Scores] Skipped, last seeded 203min ago (interval: 360min)
  [Resilience-Static]  Skipped, last seeded 11241min ago (interval: 129600min)
[Bundle:resilience] Finished in 0.5s, ran:0 skipped:2

The bundle runner skips any section whose seed-meta is younger than intervalMs * 0.8. With intervalMs: 6 * HOUR the skip window is 4.8h. Every Railway cron fire inside that window bypassed the section entirely, which means refreshRankingAggregate(), the whole point of the Slice B work, never ran on those ticks. The ranking aggregate (12h TTL) could then expire silently before the next actual run.

Fix

Drop intervalMs on Resilience-Scores from 6h to 2h. Skip threshold becomes ~96min, so hourly Railway fires run the section every 2h or so. Cheap on warm-path runs (~5-10s; only the light refresh + intervals recompute + verify, not the 222-country warm). Well within the 12h ranking TTL.

Structural test pins intervalMs <= 2 hours so this doesn't silently regress later.

Test plan

  • Full resilience suite: 378/378 pass.
  • After deploy, next hourly Railway fire should run the section (not skip). Look for [Resilience-Scores] Refreshed resilience:ranking:v9 with 222 countries (scheduled cron refresh) in the log.
  • /api/health resilienceRanking.status should stay OK across cron ticks instead of flipping to EMPTY_ON_DEMAND.

… so refresh runs

Live log 2026-04-16 09:25 showed the bundle runner SKIPPING Resilience-Scores
(last seeded 203min ago, interval 360min -> 288min skip threshold). Every
Railway cron fire within the 4.8h skip window bypassed the section entirely,
so refreshRankingAggregate() -- the whole point of the Slice B work merged in
#3124 -- never ran. Ranking could then silently expire in the gap.

Lower intervalMs to 2h. The bundle runner skip threshold becomes 96min;
hourly Railway fires run the section about every 2h. Well within the 12h
ranking TTL, and cheap per warm-path run:

  - computeAndWriteIntervals (~100ms local CPU + one pipeline write)
  - refreshRankingAggregate -> /api/resilience/v1/get-resilience-ranking?refresh=1
    (handler recompute + 2-SET pipeline, ~2-5s)
  - STRLEN + GET-meta verify in parallel (~200ms)

Total ~5-10s per warm-scores run. The expensive 222-country warm still only
runs when scores are actually missing.

Structural test pins intervalMs <= 2 hours so this doesn't silently regress.

Full resilience suite: 378/378.
@vercel
Copy link
Copy Markdown

vercel bot commented Apr 16, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
worldmonitor Ignored Ignored Apr 16, 2026 9:32am

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 16, 2026

Greptile Summary

This PR fixes a skip-logic bug in the seed-bundle-resilience runner where the Resilience-Scores section's 6 h intervalMs produced a 4.8 h skip window — wider than the hourly Railway cron interval — causing refreshRankingAggregate() to never execute on most ticks and the 12 h ranking TTL to go unrefreshed. Dropping intervalMs to 2 h narrows the skip window to 96 min, so hourly Railway fires run the section every ~2 h; a new structural test pins the value at ≤ 2 h to prevent silent regression.

Confidence Score: 5/5

Safe to merge — the interval change is minimal, well-reasoned, and guarded by a new structural test.

The only finding is a stale test name/comment (P2 style). The core fix is correct: 2 h interval → 96 min skip threshold → hourly cron runs the section every ~2 h, well inside the 12 h ranking TTL. The new regex test correctly guards against regression to a larger interval.

No files require special attention.

Important Files Changed

Filename Overview
scripts/seed-bundle-resilience.mjs Drops Resilience-Scores intervalMs from 6 h to 2 h; skip threshold becomes 96 min so hourly Railway fires run the section every ~2 h instead of bypassing it. Rationale comment is clear and accurate.
tests/resilience-scores-seed.test.mjs Adds a structural test that reads seed-bundle-resilience.mjs and asserts intervalMs ≤ 2 * HOUR for the Resilience-Scores section. Regex and assertion logic are correct; one pre-existing test name/comment became stale when the interval changed (minor).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Railway hourly cron fire]) --> B[Bundle runner reads seed-meta:resilience:intervals]
    B --> C{elapsed < intervalMs × 0.8?}

    C -- "Old: 6 h interval\n skip window = 4.8 h\n(most hourly fires hit this)" --> D[SKIP — refreshRankingAggregate never runs]
    C -- "New: 2 h interval\n skip window = 96 min\n(every other hourly fire passes)" --> E[RUN Resilience-Scores section]

    E --> F[refreshRankingAggregate via ?refresh=1]
    F --> G[Ranking key refreshed — 12 h TTL reset]

    D --> H[Ranking TTL silently expires → EMPTY_ON_DEMAND]
Loading

Comments Outside Diff (1)

  1. tests/resilience-scores-seed.test.mjs, line 21-25 (link)

    P2 Stale "(2x cron interval)" annotation

    The test name and inner comment were written when intervalMs was 6 h (so 12 h = 2× that). Now that this PR drops the interval to 2 h, the 12 h TTL is 6× the seeder interval — not 2×. The (6h) in the comment and (2x cron interval) in the it-string are both misleading to anyone reading this alongside the new section at line 220.

Reviews (1): Last reviewed commit: "fix(seed-bundle-resilience): drop Resili..." | Re-trigger Greptile

@koala73 koala73 merged commit 5093d82 into main Apr 16, 2026
11 checks passed
@koala73 koala73 deleted the fix/bundle-resilience-refresh-interval branch April 16, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant