-
Notifications
You must be signed in to change notification settings - Fork 0
docs: add free-tier scalability execution requirements #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,188 @@ | ||||||
| # Free-Tier Scalability Execution Requirements (April 2026) | ||||||
|
|
||||||
| ## Purpose | ||||||
|
|
||||||
| This document converts the Free-Tier Scalability Plan into an execution contract for an implementation teammate. | ||||||
|
|
||||||
| Primary outcome: | ||||||
| - Production-safe rollout for 500+ users on free-tier constraints. | ||||||
| - Clear Definition of Done with mandatory passing tests. | ||||||
|
|
||||||
| Source plan: | ||||||
| - docs/plans/FREE_TIER_SCALABILITY_PLAN_APRIL_2026.md | ||||||
|
|
||||||
| ## Implementation Context Snapshot | ||||||
|
|
||||||
| Current state observed in codebase: | ||||||
| - Raw body parsing is already scoped to webhook routes. | ||||||
| - Webhook signature middleware does not fail closed when secret is missing. | ||||||
| - In-memory caches exist, but env values are parsed without strict clamp/validation. | ||||||
| - Webhook push handling is per-commit and emits repeated updates. | ||||||
| - Socket presence and delivery maps are process-local. | ||||||
| - Queue worker path and saturation metrics are not yet complete. | ||||||
|
|
||||||
| ## Scope | ||||||
|
|
||||||
| In scope: | ||||||
| 1. Phase 1 hardening: webhook fail-closed, config validation and clamps, pagination safety checks. | ||||||
| 2. Phase 2 throughput improvements: commit aggregation, fanout reduction, reconnect efficiency, log-level guards. | ||||||
| 3. Phase 3 async isolation: queue-backed heavy processing for webhook and AI architecture analysis. | ||||||
| 4. Phase 4 guardrails: runtime metrics, latency, queue depth, alerts, and overload controls. | ||||||
|
|
||||||
| Out of scope for this cycle: | ||||||
| 1. Full horizontal multi-instance socket state coordination. | ||||||
| 2. Major domain model rewrites. | ||||||
| 3. Vendor/platform migration. | ||||||
|
|
||||||
| ## Work Plan (Teammate Execution) | ||||||
|
|
||||||
| ## Phase A: Security and Memory Safety Gate | ||||||
|
|
||||||
| Tasks: | ||||||
| 1. Make GitHub webhook verification fail closed in production if secret is absent. | ||||||
| 2. Reject unsigned requests and invalid signatures with 401. | ||||||
| 3. Add strict env validation utilities for cache and catchup settings: | ||||||
| - Positive integer only. | ||||||
| - Hard min and max bounds. | ||||||
| - Safe fallback defaults. | ||||||
| 4. Apply validated values to: | ||||||
| - ARCHITECTURE_CACHE_MAX_ENTRIES | ||||||
| - ARCHITECTURE_CACHE_TTL_MS | ||||||
| - DELIVERY_CATCHUP_BATCH_SIZE | ||||||
| - DELIVERY_CATCHUP_MAX_BATCHES | ||||||
| 5. Confirm pagination boundary caps for any endpoint accepting page size style inputs. | ||||||
|
|
||||||
| Acceptance criteria: | ||||||
| 1. Production webhook endpoints cannot process events when secret/config is missing. | ||||||
| 2. Invalid env values cannot disable pruning/TTL behavior. | ||||||
| 3. Invalid signatures never enter heavy processing path. | ||||||
|
|
||||||
| ## Phase B: Throughput per MB | ||||||
|
|
||||||
| Tasks: | ||||||
| 1. Aggregate webhook commit effects by project and task before DB writes. | ||||||
| 2. Replace per-commit projectUpdate fanout with one consolidated project update per webhook event. | ||||||
| 3. Minimize repeated reads for project/step/task lookup during webhook bursts. | ||||||
| 4. Add log-level checks so debug logs do not flood hot paths. | ||||||
|
|
||||||
| Acceptance criteria: | ||||||
| 1. Push bursts trigger bounded DB operations instead of commit-amplified loops. | ||||||
| 2. One webhook event yields one project-level broadcast per affected project. | ||||||
| 3. p95 latency and peak memory improve under synthetic push bursts. | ||||||
|
|
||||||
| ## Phase C: Async Queue Isolation | ||||||
|
|
||||||
| Tasks: | ||||||
| 1. Move heavy webhook processing to internal queue worker path. | ||||||
| 2. Move AI architecture analysis to async jobs with status endpoint. | ||||||
| 3. Keep API handlers lightweight and immediately acknowledge accepted work. | ||||||
| 4. Implement idempotency for webhook deliveries. | ||||||
|
|
||||||
| Acceptance criteria: | ||||||
| 1. API remains responsive during webhook and AI spikes. | ||||||
| 2. Queue backlog, retry state, and failure reasons are visible. | ||||||
| 3. Duplicate webhook deliveries do not duplicate side effects. | ||||||
|
|
||||||
| ## Phase D: Operational Guardrails | ||||||
|
|
||||||
| Tasks: | ||||||
| 1. Expose metrics for rss, heapUsed, heapTotal, event-loop lag, route latency. | ||||||
| 2. Expose queue depth and queue lag metrics. | ||||||
| 3. Add pressure controls: | ||||||
| - Defer non-critical work when memory threshold is crossed. | ||||||
| - Preserve core auth/session/chat behavior. | ||||||
| 4. Add alert thresholds tied to free-tier limits. | ||||||
|
|
||||||
| Acceptance criteria: | ||||||
| 1. Saturation is detected before user-visible failure. | ||||||
| 2. Degradation mode is controlled and reversible. | ||||||
|
|
||||||
| ## Definition of Done | ||||||
|
|
||||||
| All items below are mandatory: | ||||||
| 1. Phase A complete and merged. | ||||||
| 2. Phase B complete and merged. | ||||||
| 3. Queue-based processing path merged for at least webhook heavy work. | ||||||
| 4. Metrics endpoint/logging added with memory, latency, and queue indicators. | ||||||
|
Comment on lines
+103
to
+106
|
||||||
| 5. Mandatory test gate passes. | ||||||
| 6. Verification report attached to PR. | ||||||
|
|
||||||
| ## Mandatory Test Gate (Must Pass) | ||||||
|
|
||||||
| Run from repository root: | ||||||
|
|
||||||
| 1. Unit and integration baseline | ||||||
| - npm run test:jest | ||||||
|
|
||||||
| 2. Type safety | ||||||
| - npm run typecheck | ||||||
|
|
||||||
| 3. Frontend lint baseline | ||||||
|
||||||
| 3. Frontend lint baseline | |
| 3. Repository-wide lint baseline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document mixes phase naming schemes: Scope refers to “Phase 1–4”, but the execution sections are labeled “Phase A–D”. This makes it harder to cross-reference with the source plan; consider using one consistent phase numbering/lettering throughout (and ideally match the source plan’s Phase 1–4).