Ram#112
Conversation
Move GitHub App webhook processing off the request path with delivery-id idempotency, background workers, and job status tracking so burst traffic is acknowledged quickly without duplicate processing. Made-with: Cursor
Expose internal memory and webhook queue metrics, and add heap-based load shedding for heavy endpoints with allowlisted core paths to reduce OOM risk under free-tier pressure. Made-with: Cursor
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
There was a problem hiding this comment.
Pull request overview
This PR introduces an in-memory async GitHub webhook processing pipeline with idempotent delivery handling, plus basic operational protections/visibility (load shedding + internal metrics).
Changes:
- Added an in-process webhook queue with delivery-id idempotency and job status tracking.
- Refactored the GitHub App webhook route to enqueue work and added a GitHub webhook worker that aggregates push payloads into a single Project snapshot update.
- Added a heap-based load shedding middleware and a new internal metrics endpoint, along with Jest tests.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| backend/services/webhookQueue.js | New in-memory queue, idempotency map, job status/metrics, and test reset hooks. |
| backend/services/githubWebhookWorker.js | New async worker to process push webhooks and update Project aggregation snapshot. |
| backend/routes/githubAppWebhook.js | Refactored webhook handler to enqueue jobs; added job status endpoint. |
| backend/routes/internalMetrics.js | New endpoint exposing memory + webhook queue metrics. |
| backend/middleware/loadShedding.js | New middleware to return 503 for “heavy” routes under heap pressure. |
| backend/index.js | Wires in load shedding and mounts /internal metrics route. |
| backend/models/Project.js | Adds fields to store last webhook aggregation snapshot + AI summary metadata. |
| backend/tests/internalMetricsAndLoadShedding.test.js | Tests internal metrics response shape and load shedding behavior. |
| backend/tests/githubAppWebhookIdempotency.test.js | Tests delivery-id idempotency for webhook enqueue/processing. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| app.use('/internal', internalMetricsRoutes); | ||
|
|
||
|
|
||
|
|
There was a problem hiding this comment.
This mounts the internal metrics router publicly at /internal with no access control. If this is intended for ops-only use, add an auth/allowlist middleware here (or mount it only behind an internal network) to prevent information disclosure in production.
| app.use('/internal', internalMetricsRoutes); | |
| const protectInternalMetrics = (req, res, next) => { | |
| const configuredToken = process.env.INTERNAL_METRICS_TOKEN; | |
| const authHeader = req.headers.authorization || ''; | |
| const expectedHeader = configuredToken ? `Bearer ${configuredToken}` : ''; | |
| if (!configuredToken) { | |
| return res.status(503).json({ error: 'Internal metrics access is not configured' }); | |
| } | |
| if (authHeader !== expectedHeader) { | |
| return res.status(403).json({ error: 'Forbidden' }); | |
| } | |
| next(); | |
| }; | |
| app.use('/internal', protectInternalMetrics, internalMetricsRoutes); |
| // GET /api/github-app/webhook/jobs/:deliveryId — lightweight internal async status endpoint | ||
| router.get('/webhook/jobs/:deliveryId', (req, res) => { | ||
| const { deliveryId } = req.params; | ||
| const job = getWebhookJobStatus(deliveryId); | ||
| if (!job) { | ||
| return res.status(404).json({ message: 'Job not found', deliveryId }); | ||
| } | ||
| return res.json({ job }); |
There was a problem hiding this comment.
The job status endpoint is described as “internal” but is exposed without any auth. It returns job.result/job.error, which may leak internal details. Consider restricting access (admin auth / secret token / IP allowlist) or removing job details from the public response.
| const scheduleDrain = () => { | ||
| setImmediate(() => { | ||
| drainQueue().catch((error) => { | ||
| console.error('[WebhookQueue] Drain failure:', error); | ||
| }); | ||
| }); | ||
| }; |
There was a problem hiding this comment.
enqueueWebhookJob schedules a drain via setImmediate every time a job is enqueued. Under high webhook volume this can create many pending immediates that mostly no-op while isDraining=true. Consider adding a separate "drainScheduled" flag (or only scheduling when queue transitions from empty->non-empty) so you schedule at most one drain at a time.
| while (queue.length > 0) { | ||
| const queuedItem = queue.shift(); | ||
| await processOneJob(queuedItem); | ||
| } |
There was a problem hiding this comment.
drainQueue uses queue.shift() in a loop, which is O(n) per item due to array reindexing. If the webhook queue can grow, this becomes quadratic. Consider using a head index (ring buffer) or a deque implementation to make pops O(1).
| const removable = jobsByDeliveryId.size - MAX_STORED_JOBS; | ||
| let removed = 0; | ||
| for (const [deliveryId, job] of jobsByDeliveryId.entries()) { | ||
| if (removed >= removable) break; | ||
| if (job.status === 'completed' || job.status === 'failed') { | ||
| jobsByDeliveryId.delete(deliveryId); | ||
| removed += 1; | ||
| } | ||
| } |
There was a problem hiding this comment.
MAX_STORED_JOBS implies a hard cap, but pruneOldJobs only removes jobs that are completed/failed. If many jobs remain queued/processing (or get stuck), jobsByDeliveryId can grow unbounded. Consider enforcing the cap more strictly (e.g., evict oldest queued jobs too, or cap by age/TTL) to avoid memory blow-ups under sustained load.
| const removable = jobsByDeliveryId.size - MAX_STORED_JOBS; | |
| let removed = 0; | |
| for (const [deliveryId, job] of jobsByDeliveryId.entries()) { | |
| if (removed >= removable) break; | |
| if (job.status === 'completed' || job.status === 'failed') { | |
| jobsByDeliveryId.delete(deliveryId); | |
| removed += 1; | |
| } | |
| } | |
| let removable = jobsByDeliveryId.size - MAX_STORED_JOBS; | |
| // Prefer evicting oldest terminal jobs first so active work is preserved | |
| // whenever possible. | |
| for (const [deliveryId, job] of jobsByDeliveryId.entries()) { | |
| if (removable <= 0) break; | |
| if (job.status === 'completed' || job.status === 'failed') { | |
| jobsByDeliveryId.delete(deliveryId); | |
| removable -= 1; | |
| } | |
| } | |
| if (removable <= 0) return; | |
| // Enforce the configured cap even if queued/processing jobs have built up. | |
| // Map iteration is in insertion order, so this evicts the oldest remaining | |
| // tracked jobs first. | |
| for (const [deliveryId] of jobsByDeliveryId.entries()) { | |
| if (removable <= 0) break; | |
| jobsByDeliveryId.delete(deliveryId); | |
| removable -= 1; | |
| } |
| router.get('/metrics', (_req, res) => { | ||
| const memoryUsage = process.memoryUsage(); | ||
| const queueMetrics = getWebhookQueueMetrics(); | ||
|
|
||
| return res.json({ | ||
| memoryMb: { | ||
| rss: bytesToMb(memoryUsage.rss), | ||
| heapUsed: bytesToMb(memoryUsage.heapUsed), | ||
| heapTotal: bytesToMb(memoryUsage.heapTotal), | ||
| }, | ||
| webhookQueue: { | ||
| depth: queueMetrics.depth, | ||
| lagMs: queueMetrics.lagMs, | ||
| processing: queueMetrics.processing, | ||
| trackedJobs: queueMetrics.trackedJobs, | ||
| }, | ||
| timestamp: new Date().toISOString(), | ||
| }); |
There was a problem hiding this comment.
The /internal/metrics endpoint exposes process memory usage and queue state without any authentication/authorization. In production this leaks operational details and can aid attackers (and it’s not rate-limited since it’s mounted outside /api). Consider protecting this route (e.g., admin auth middleware, a shared secret header, IP allowlist, or only enabling it when NODE_ENV !== 'production').
Description
Type of Change
Related Issues
Screenshots (if applicable)
Testing
Checklist