feat: Scale & Ops Maturity — agent-first infrastructure#86
feat: Scale & Ops Maturity — agent-first infrastructure#86jeremylongshore merged 3 commits intomainfrom
Conversation
…eries) Five-document analysis covering agent ecosystem inventory, integration strategy (standalone/medium/deep), engineering risks with mitigations, Bob's Brain refactor options, and agent network vision. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Five audit-driven improvements making GWI infrastructure agent-operable: - Dependabot + SBOM (gwi-nx0): grouped dependency updates with gwi-triage labels, CycloneDX SBOM generation on release - VPC + Budget Alerts (gwi-keq): enable VPC connector, GCP billing budget with Pub/Sub alerts, agent-queryable GET /tenants/:tenantId/budget - Circuit Breaker for LLM (gwi-d1k): wrap LLM providers with existing CB + retry, provider health registry, selection policy skips open circuits, GET /health/providers endpoint - Run Steps Subcollection (gwi-o06): StepStore interface + Firestore/InMemory implementations, paginated step listing API endpoints - Distributed Rate Limiting (gwi-5a6): agent-queryable quota endpoint GET /tenants/:tenantId/quota with per-action rate limit status All endpoints return agent-actionable recommendation fields. 19 new tests pass (step-store + resilient-provider). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces automated dependency management via Dependabot, SBOM generation, comprehensive strategic documentation (five new planning documents), new API endpoints for health monitoring, budget and quota tracking, run steps retrieval, LLM provider circuit breaker resilience with exponential backoff retry logic, and step-level storage abstraction with both in-memory and Firestore implementations. Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client/Agent
participant Resilient as Resilient Provider<br/>(Wrapper)
participant CircuitBreaker as Circuit Breaker
participant RetryHandler as Exponential Backoff<br/>Retry Handler
participant OriginalProvider as Original LLM<br/>Provider
participant HealthRegistry as Health Registry &<br/>Failure Tracking
Client->>Resilient: completeJson(request)
Resilient->>CircuitBreaker: execute(providerKey, retryFn)
alt Circuit OPEN
CircuitBreaker-->>Resilient: reject with circuit open
Resilient->>HealthRegistry: recordProviderFailure()
Resilient-->>Client: error
else Circuit CLOSED or HALF_OPEN
CircuitBreaker->>RetryHandler: retry(fn)
loop Retry Logic (up to maxRetries)
RetryHandler->>OriginalProvider: call provider
alt Retryable Error (429, 5xx, network)
OriginalProvider-->>RetryHandler: error
RetryHandler->>HealthRegistry: recordProviderFailure()
RetryHandler->>RetryHandler: exponential backoff + jitter
else Success
OriginalProvider-->>RetryHandler: response
RetryHandler-->>CircuitBreaker: return response
else Non-Retryable Error
OriginalProvider-->>RetryHandler: error
RetryHandler->>HealthRegistry: recordProviderFailure()
RetryHandler-->>CircuitBreaker: rethrow error
end
end
CircuitBreaker-->>Resilient: final result or error
Resilient-->>Client: response or error
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Comment |
Review Summary by QodoScale & Ops Maturity — Agent-first infrastructure with circuit breakers, step subcollections, and cost-aware APIs
WalkthroughsDescription• Circuit breaker for LLM providers with health registry and selection policy integration • Run steps subcollection with Firestore/in-memory stores and paginated API endpoints • Budget and rate limit quota endpoints for agent-aware cost and reliability decisions • Dependabot configuration with SBOM generation for automated dependency management • VPC connector enabled in production and budget alerts via Pub/Sub • 19 new tests covering step store and resilient provider implementations • Strategic documentation on Automaton integration, engineering risks, and agent network vision Diagramflowchart LR
LLM["LLM Providers"]
CB["Circuit Breaker<br/>Health Registry"]
SP["Selection Policy<br/>Skip Open Circuits"]
Runs["Run Document"]
Steps["Steps Subcollection<br/>Firestore/InMemory"]
StepAPI["GET /steps<br/>GET /steps/:id"]
Tenant["Tenant Store"]
Budget["Budget API<br/>GET /budget"]
Quota["Quota API<br/>GET /quota"]
Deps["Dependabot<br/>gwi-triage label"]
SBOM["SBOM Generation<br/>CycloneDX"]
LLM --> CB
CB --> SP
SP -.->|skip unhealthy| LLM
Runs --> Steps
Steps --> StepAPI
Tenant --> Budget
Tenant --> Quota
Deps --> SBOM
Budget -.->|recommendation| SP
Quota -.->|recommendation| SP
File Changes1. apps/api/src/index.ts
|
🤖 Gemini 2.5 Flash Code ReviewFiles Reviewed: 15 SummaryThis PR introduces significant "agent-first" infrastructure for scaling and operational maturity, including Dependabot configuration, SBOM generation, new API endpoints for LLM provider health, budget, and rate limits, and a new Critical Issues 🔴
Suggestions 🟡
Good Practices ✅
Powered by Gemini 2.5 Flash |
Summary of ChangesHello @jeremylongshore, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly advances the platform's 'agent-first' infrastructure by introducing several key features focused on scalability, operational maturity, and autonomous agent decision-making. The changes enable agents to make more intelligent, cost-aware, and reliability-aware choices by providing them with real-time data on budget usage, LLM provider health, and rate limits. Additionally, it lays foundational documentation for the broader agent ecosystem and strategic roadmap. Highlights
Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a substantial pull request that introduces several significant features under the umbrella of "Scale & Ops Maturity." The changes, including the addition of Dependabot for automated dependency management, GCP budget alerts, a circuit breaker for LLM providers, and the refactoring of run steps into a subcollection, are well-implemented and significantly enhance the project's robustness, observability, and operational maturity. The new API endpoints are thoughtfully designed to be agent-actionable. My review focuses on a few minor areas where maintainability can be further improved by reducing hardcoded values and configuration duplication.
| # Docker — apps/api | ||
| - package-ecosystem: docker | ||
| directory: "/apps/api" | ||
| schedule: | ||
| interval: weekly | ||
| labels: ["dependencies", "docker", "gwi-triage"] | ||
| commit-message: | ||
| prefix: "chore(deps)" | ||
|
|
||
| # Docker — apps/gateway | ||
| - package-ecosystem: docker | ||
| directory: "/apps/gateway" | ||
| schedule: | ||
| interval: weekly | ||
| labels: ["dependencies", "docker", "gwi-triage"] | ||
| commit-message: | ||
| prefix: "chore(deps)" | ||
|
|
||
| # Docker — apps/worker | ||
| - package-ecosystem: docker | ||
| directory: "/apps/worker" | ||
| schedule: | ||
| interval: weekly | ||
| labels: ["dependencies", "docker", "gwi-triage"] | ||
| commit-message: | ||
| prefix: "chore(deps)" | ||
|
|
||
| # Docker — apps/github-webhook | ||
| - package-ecosystem: docker | ||
| directory: "/apps/github-webhook" | ||
| schedule: | ||
| interval: weekly | ||
| labels: ["dependencies", "docker", "gwi-triage"] | ||
| commit-message: | ||
| prefix: "chore(deps)" | ||
|
|
||
| # Docker — apps/webhook-receiver | ||
| - package-ecosystem: docker | ||
| directory: "/apps/webhook-receiver" | ||
| schedule: | ||
| interval: weekly | ||
| labels: ["dependencies", "docker", "gwi-triage"] | ||
| commit-message: | ||
| prefix: "chore(deps)" |
There was a problem hiding this comment.
The configuration for the Docker package ecosystem is repeated for each application (apps/api, apps/gateway, etc.). This repetition makes the file harder to maintain, as any change to the schedule or labels needs to be applied in multiple places.
To improve maintainability and reduce duplication, you can use YAML anchors. Define a common configuration block once and then reuse it for each Docker entry.
Here's an example of how you could refactor it:
# Define a reusable config block at the top of your file
x-docker-dependabot-config: &docker-config
package-ecosystem: docker
schedule:
interval: weekly
labels: ["dependencies", "docker", "gwi-triage"]
commit-message:
prefix: "chore(deps)"
updates:
# ... (npm and other configs)
# Docker — apps/api
- <<: *docker-config
directory: "/apps/api"
# Docker — apps/gateway
- <<: *docker-config
directory: "/apps/gateway"
# ... and so on for other Docker services| const status = await meteringService.getPlanUsageStatus(tenantId); | ||
| runsUsed = status.plan.run_limit - status.runs_remaining; | ||
| // Rough cost estimate: $0.10 per run average | ||
| estimatedCostCents = runsUsed * 10; |
There was a problem hiding this comment.
The value 10 used for the rough cost estimate is a magic number. It appears again on lines 101 and 104. Using a named constant would improve readability and make it easier to update this value in the future.
I suggest defining a constant at the top of the file:
const ROUGH_COST_PER_RUN_CENTS = 10;And then using it here and in the other locations:
estimatedCostCents = runsUsed * ROUGH_COST_PER_RUN_CENTS;| */ | ||
| export function getProviderHealth(): ProviderHealthReport { | ||
| const cb = getLLMCircuitBreaker(); | ||
| const providerTypes: LLMProviderType[] = ['anthropic', 'google', 'openai', 'vertex', 'openai_compat']; |
There was a problem hiding this comment.
The list of providerTypes is hardcoded. This can lead to maintenance issues, as new providers added to the system won't be automatically included in health reports. To make this more robust and extensible, you should dynamically retrieve the list of providers from the providerRegistry.
You'll need to import it first:
import { providerRegistry } from './index.js';Then you can use it to get the list of providers.
| const providerTypes: LLMProviderType[] = ['anthropic', 'google', 'openai', 'vertex', 'openai_compat']; | |
| const providerTypes: LLMProviderType[] = providerRegistry.list(); |
|
| Filename | Overview |
|---|---|
| infra/budget.tf | Adds GCP budget alerts with Pub/Sub notifications for cost management |
| packages/core/src/llm/resilient-provider.ts | Implements circuit breaker and retry logic for LLM providers; circuit breaker config looks good but test coverage could verify actual circuit trips |
| packages/core/src/llm/tests/resilient-provider.test.ts | Tests for resilient provider wrapper; good coverage of retry logic and health registry, missing actual circuit trip scenarios |
| packages/core/src/storage/firestore-step.ts | Implements Firestore-backed step store with cursor-based pagination |
| apps/api/src/routes/budget.ts | Budget API route with agent recommendations; not yet integrated into main API index.ts |
| apps/api/src/routes/quota.ts | Rate limit quota API route with agent recommendations; not yet integrated into main API index.ts |
| apps/api/src/index.ts | Adds /health/providers endpoint and step subcollection routes; missing integration of budget and quota routes |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Agent/API Request] --> B{LLM Provider Call}
B --> C[ResilientProvider Wrapper]
C --> D{Circuit Breaker Check}
D -->|Circuit Open| E[Fast Fail]
D -->|Circuit Closed/Half-Open| F[Retry Handler]
F --> G{Retryable Error?}
G -->|Yes 429/5xx| H[Exponential Backoff]
H --> F
G -->|No| I[Propagate Error]
F -->|Success| J[Update Health Registry]
F -->|Max Retries| K[Trip Circuit]
K --> L[Record Provider Failure]
M[Agent Budget Check] --> N[GET /tenants/:id/budget]
N --> O[Tenant Store]
O --> P{Budget Status}
P -->|< 80%| Q[Recommendation: normal]
P -->|80-95%| R[Recommendation: conserve]
P -->|>= 95%| S[Recommendation: stop]
T[Agent Quota Check] --> U[GET /tenants/:id/quota]
U --> V[Rate Limiter]
V --> W{Quota Status}
W -->|Available| X[Recommendation: normal]
W -->|Low| Y[Recommendation: throttle]
Z[Run Step Query] --> AA[GET /runs/:id/steps]
AA --> AB[StepStore Subcollection]
AB --> AC[Firestore/InMemory]
AC --> AD[Paginated Results]
AE[Dependabot PR] --> AF[gwi-triage Label]
AF --> AG[Auto-Score Agent]
AH[Budget Alert] --> AI[GCP Billing Budget]
AI --> AJ[Pub/Sub Topic]
AJ --> AK[Agent Subscriber]
Last reviewed commit: a4a8739
| /** | ||
| * Budget API Route — Scale & Ops Maturity (gwi-keq) | ||
| * | ||
| * Agent-first: Agents query budget status to make cost-aware decisions | ||
| * (e.g., switch to cheaper models when budget is 80% consumed). | ||
| * | ||
| * GET /tenants/:tenantId/budget — Returns current spend vs limit with | ||
| * an agent-actionable recommendation field. | ||
| * | ||
| * @module @gwi/api/routes/budget | ||
| */ | ||
|
|
||
| import { Router } from 'express'; | ||
| import { | ||
| expressRequireAuth, | ||
| expressRequirePermission, | ||
| type RBACRequest, | ||
| } from '@gwi/core'; | ||
| import { createLogger } from '@gwi/core'; | ||
|
|
||
| const logger = createLogger('budget-routes'); | ||
|
|
||
| /** | ||
| * Determine agent-actionable recommendation based on budget usage | ||
| */ | ||
| function getRecommendation(percentUsed: number): 'normal' | 'conserve' | 'stop' { | ||
| if (percentUsed >= 95) return 'stop'; | ||
| if (percentUsed >= 80) return 'conserve'; | ||
| return 'normal'; | ||
| } | ||
|
|
||
| /** | ||
| * Plan tier run limits (matches tenant plan limits) | ||
| */ | ||
| const PLAN_RUN_LIMITS: Record<string, number> = { | ||
| free: 50, | ||
| team: 500, | ||
| pro: 2000, | ||
| enterprise: 10000, | ||
| }; | ||
|
|
||
| const PLAN_BUDGET_CENTS: Record<string, number> = { | ||
| free: 0, | ||
| team: 5000, // $50 | ||
| pro: 20000, // $200 | ||
| enterprise: 100000, // $1000 | ||
| }; | ||
|
|
||
| /** | ||
| * Create budget API routes | ||
| */ | ||
| export function createBudgetRoutes(): Router { | ||
| const router = Router(); | ||
|
|
||
| /** | ||
| * GET /tenants/:tenantId/budget | ||
| * | ||
| * Returns current budget status with agent-actionable recommendation. | ||
| * Agents use this to decide whether to use cheaper models or stop. | ||
| */ | ||
| router.get( | ||
| '/tenants/:tenantId/budget', | ||
| expressRequireAuth(), | ||
| expressRequirePermission('tenant:read'), | ||
| async (req: RBACRequest, res) => { | ||
| const { tenantId } = req.params; | ||
|
|
||
| try { | ||
| const { getTenantStore, getMeteringService, isMeteringEnabled } = await import('@gwi/core'); | ||
|
|
||
| const tenantStore = getTenantStore(); | ||
| const tenant = await tenantStore.getTenant(tenantId); | ||
|
|
||
| if (!tenant) { | ||
| return res.status(404).json({ error: 'Tenant not found' }); | ||
| } | ||
|
|
||
| const plan = tenant.plan || 'free'; | ||
| const runsLimit = tenant.planLimits?.runsPerMonth ?? PLAN_RUN_LIMITS[plan] ?? 50; | ||
| const budgetLimitCents = PLAN_BUDGET_CENTS[plan] ?? 0; | ||
|
|
||
| // Get current period key (YYYY-MM) | ||
| const now = new Date(); | ||
| const period = `${now.getFullYear()}-${String(now.getMonth() + 1).padStart(2, '0')}`; | ||
|
|
||
| // Get runs used this month | ||
| let runsUsed = tenant.runsThisMonth ?? 0; | ||
| let estimatedCostCents = 0; | ||
|
|
||
| // If metering is enabled, use actual usage data | ||
| if (isMeteringEnabled()) { | ||
| try { | ||
| const meteringService = getMeteringService(); | ||
| meteringService.setTenantPlan(tenantId, plan); | ||
| const status = await meteringService.getPlanUsageStatus(tenantId); | ||
| runsUsed = status.plan.run_limit - status.runs_remaining; | ||
| // Rough cost estimate: $0.10 per run average | ||
| estimatedCostCents = runsUsed * 10; | ||
| } catch { | ||
| // Fall back to tenant.runsThisMonth | ||
| estimatedCostCents = runsUsed * 10; | ||
| } | ||
| } else { | ||
| estimatedCostCents = runsUsed * 10; | ||
| } | ||
|
|
||
| const percentUsed = budgetLimitCents > 0 | ||
| ? Math.min(100, Math.round((estimatedCostCents / budgetLimitCents) * 100)) | ||
| : (runsLimit > 0 ? Math.min(100, Math.round((runsUsed / runsLimit) * 100)) : 0); | ||
|
|
||
| const recommendation = getRecommendation(percentUsed); | ||
|
|
||
| logger.info('Budget status queried', { | ||
| eventName: 'budget.query', | ||
| tenantId, | ||
| plan, | ||
| percentUsed, | ||
| recommendation, | ||
| }); | ||
|
|
||
| res.json({ | ||
| tenantId, | ||
| plan, | ||
| period, | ||
| runsUsed, | ||
| runsLimit, | ||
| estimatedCostCents, | ||
| budgetLimitCents, | ||
| percentUsed, | ||
| recommendation, | ||
| }); | ||
| } catch (error) { | ||
| logger.error('Failed to get budget status', { | ||
| eventName: 'budget.error', | ||
| tenantId, | ||
| error: error instanceof Error ? error.message : 'Unknown error', | ||
| }); | ||
| res.status(500).json({ | ||
| error: 'Failed to get budget status', | ||
| message: error instanceof Error ? error.message : 'Unknown error', | ||
| }); | ||
| } | ||
| }, | ||
| ); | ||
|
|
||
| return router; | ||
| } |
There was a problem hiding this comment.
Budget route created but not integrated into apps/api/src/index.ts. Need to import createBudgetRoutes() and mount it with app.use() to make the /tenants/:tenantId/budget endpoint accessible.
| /** | ||
| * Rate Limit Quota API Route — Scale & Ops Maturity (gwi-5a6) | ||
| * | ||
| * Agent-first: Per-agent rate limits with metering. Agents authenticate, | ||
| * get rate-limited based on their tier. Agents can query their own rate | ||
| * limit status and remaining quota via this endpoint. | ||
| * | ||
| * GET /tenants/:tenantId/quota — Returns rate limit status per action | ||
| * with agent-actionable recommendation. | ||
| * | ||
| * Distinct from /quotas (resource quotas) — this is about rate limiting. | ||
| * | ||
| * @module @gwi/api/routes/quota | ||
| */ | ||
|
|
||
| import { Router } from 'express'; | ||
| import { | ||
| expressRequireAuth, | ||
| expressRequirePermission, | ||
| type RBACRequest, | ||
| } from '@gwi/core'; | ||
| import { | ||
| getRateLimiter, | ||
| DEFAULT_RATE_LIMITS, | ||
| type RateLimitResult, | ||
| } from '@gwi/core'; | ||
| import { createLogger } from '@gwi/core'; | ||
|
|
||
| const logger = createLogger('quota-routes'); | ||
|
|
||
| /** | ||
| * Agent-queryable rate limit status for a single action | ||
| */ | ||
| interface ActionQuotaStatus { | ||
| limit: number; | ||
| remaining: number; | ||
| resetsAt: string; | ||
| } | ||
|
|
||
| /** | ||
| * Determine agent-actionable recommendation | ||
| */ | ||
| function getRecommendation( | ||
| statuses: Record<string, ActionQuotaStatus>, | ||
| ): 'normal' | 'throttle' { | ||
| for (const status of Object.values(statuses)) { | ||
| if (status.remaining <= 0) return 'throttle'; | ||
| const percentUsed = ((status.limit - status.remaining) / status.limit) * 100; | ||
| if (percentUsed >= 80) return 'throttle'; | ||
| } | ||
| return 'normal'; | ||
| } | ||
|
|
||
| /** | ||
| * Create rate limit quota API routes | ||
| */ | ||
| export function createQuotaRateLimitRoutes(): Router { | ||
| const router = Router(); | ||
|
|
||
| /** | ||
| * GET /tenants/:tenantId/quota | ||
| * | ||
| * Returns current rate limit status per action for the tenant. | ||
| * Agent-actionable: includes recommendation field. | ||
| */ | ||
| router.get( | ||
| '/tenants/:tenantId/quota', | ||
| expressRequireAuth(), | ||
| expressRequirePermission('tenant:read'), | ||
| async (req: RBACRequest, res) => { | ||
| const { tenantId } = req.params; | ||
|
|
||
| try { | ||
| const { getTenantStore } = await import('@gwi/core'); | ||
| const tenantStore = getTenantStore(); | ||
| const tenant = await tenantStore.getTenant(tenantId); | ||
|
|
||
| if (!tenant) { | ||
| return res.status(404).json({ error: 'Tenant not found' }); | ||
| } | ||
|
|
||
| const limiter = getRateLimiter(); | ||
| const plan = tenant.plan || 'free'; | ||
|
|
||
| // Check key actions | ||
| const actions = ['run:create', 'api:read', 'signal:create', 'candidate:generate', 'api:write']; | ||
| const limits: Record<string, ActionQuotaStatus> = {}; | ||
|
|
||
| for (const action of actions) { | ||
| const config = DEFAULT_RATE_LIMITS[action]; | ||
| if (!config) continue; | ||
|
|
||
| const status: RateLimitResult = await limiter.status(tenantId, action); | ||
| const resetTime = new Date(Date.now() + status.resetInMs); | ||
|
|
||
| limits[action] = { | ||
| limit: status.limit, | ||
| remaining: status.remaining, | ||
| resetsAt: resetTime.toISOString(), | ||
| }; | ||
| } | ||
|
|
||
| const recommendation = getRecommendation(limits); | ||
|
|
||
| logger.info('Rate limit quota queried', { | ||
| eventName: 'quota.ratelimit.query', | ||
| tenantId, | ||
| plan, | ||
| recommendation, | ||
| }); | ||
|
|
||
| res.json({ | ||
| limits, | ||
| plan, | ||
| recommendation, | ||
| }); | ||
| } catch (error) { | ||
| logger.error('Failed to get rate limit quota', { | ||
| eventName: 'quota.ratelimit.error', | ||
| tenantId, | ||
| error: error instanceof Error ? error.message : 'Unknown error', | ||
| }); | ||
| res.status(500).json({ | ||
| error: 'Failed to get rate limit quota', | ||
| message: error instanceof Error ? error.message : 'Unknown error', | ||
| }); | ||
| } | ||
| }, | ||
| ); | ||
|
|
||
| return router; | ||
| } |
There was a problem hiding this comment.
Quota route created but not integrated into apps/api/src/index.ts. Need to import createQuotaRateLimitRoutes() and mount it with app.use() to make the /tenants/:tenantId/quota endpoint accessible.
| describe('Resilient Provider Wrapper', () => { | ||
| beforeEach(() => { | ||
| resetLLMCircuitBreaker(); | ||
| }); | ||
|
|
||
| it('passes through successful calls', async () => { | ||
| const mock = createMockProvider(); | ||
| const resilient = wrapWithResilience(mock, { maxRetries: 1 }); | ||
|
|
||
| const result = await resilient.completeJson(dummyRequest); | ||
| expect(result.json).toEqual({ result: 'ok' }); | ||
| expect(mock.callCount).toBe(1); | ||
| }); | ||
|
|
||
| it('preserves provider type and name', () => { | ||
| const mock = createMockProvider(); | ||
| const resilient = wrapWithResilience(mock); | ||
|
|
||
| expect(resilient.type).toBe('anthropic'); | ||
| expect(resilient.name).toBe('mock-anthropic'); | ||
| expect(resilient.getModel()).toBe('claude-sonnet-4-20250514'); | ||
| expect(resilient.isAvailable()).toBe(true); | ||
| }); | ||
|
|
||
| it('retries on retryable errors', async () => { | ||
| // Fails first 2 calls, succeeds on 3rd | ||
| let attempt = 0; | ||
| const mock = createMockProvider(); | ||
| const originalCompleteJson = mock.completeJson.bind(mock); | ||
| mock.completeJson = async (req) => { | ||
| attempt++; | ||
| if (attempt <= 2) { | ||
| throw new Error('Service unavailable (status 503)'); | ||
| } | ||
| return originalCompleteJson(req); | ||
| }; | ||
|
|
||
| const resilient = wrapWithResilience(mock, { maxRetries: 3, initialDelayMs: 10 }); | ||
| const result = await resilient.completeJson(dummyRequest); | ||
| expect(result.json).toEqual({ result: 'ok' }); | ||
| }); | ||
|
|
||
| it('does not retry on non-retryable errors', async () => { | ||
| let attempts = 0; | ||
| const mock = createMockProvider(); | ||
| mock.completeJson = async () => { | ||
| attempts++; | ||
| throw new Error('Invalid API key (status 401)'); | ||
| }; | ||
|
|
||
| const resilient = wrapWithResilience(mock, { maxRetries: 3, initialDelayMs: 10 }); | ||
| await expect(resilient.completeJson(dummyRequest)).rejects.toThrow('Invalid API key'); | ||
| expect(attempts).toBe(1); | ||
| }); | ||
|
|
||
| it('works with completeText too', async () => { | ||
| const mock = createMockProvider(); | ||
| const resilient = wrapWithResilience(mock, { maxRetries: 1 }); | ||
|
|
||
| const result = await resilient.completeText(dummyTextRequest); | ||
| expect(result.text).toBe('Hello'); | ||
| }); | ||
| }); |
There was a problem hiding this comment.
Missing test for actual circuit breaker trip behavior. The current tests verify retry logic and health registry state, but don't test that after N consecutive failures (per circuit breaker config), subsequent calls fail fast without attempting the provider call.
| description = "GCP Billing Account ID (required if enable_budget_alerts = true)" | ||
| type = string | ||
| default = "" | ||
| } |
There was a problem hiding this comment.
The billing_account_id variable should be marked as sensitive since it's account infrastructure metadata.
| } | |
| sensitive = true |
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Code Review by Qodo
1. limit/cursor not Zod-validated
|
| const { tenantId, runId } = req.params; | ||
| const limit = parseInt(req.query.limit as string) || 50; | ||
| const cursor = req.query.cursor as string | undefined; | ||
|
|
There was a problem hiding this comment.
1. limit/cursor not zod-validated 📘 Rule violation ⛨ Security
The new steps listing endpoint consumes query parameters without Zod schema validation or bounded constraints. This enables unbounded limit values and malformed cursors, risking performance and reliability issues.
Agent Prompt
## Issue description
The endpoint `GET /tenants/:tenantId/runs/:runId/steps` parses `limit` and `cursor` directly from the querystring and uses them without Zod validation or bounded constraints.
## Issue Context
Compliance requires all external inputs be validated with Zod and have reasonable bounds (e.g., `limit` max 100).
## Fix Focus Areas
- apps/api/src/index.ts[1785-1803]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| if (isMeteringEnabled()) { | ||
| try { | ||
| const meteringService = getMeteringService(); | ||
| meteringService.setTenantPlan(tenantId, plan); | ||
| const status = await meteringService.getPlanUsageStatus(tenantId); | ||
| runsUsed = status.plan.run_limit - status.runs_remaining; | ||
| // Rough cost estimate: $0.10 per run average | ||
| estimatedCostCents = runsUsed * 10; | ||
| } catch { | ||
| // Fall back to tenant.runsThisMonth | ||
| estimatedCostCents = runsUsed * 10; | ||
| } |
There was a problem hiding this comment.
2. meteringservice error swallowed 📘 Rule violation ⛯ Reliability
Budget usage retrieval swallows metering exceptions without logging or contextual error details. This makes production debugging and incident forensics difficult and can hide recurring metering failures.
Agent Prompt
## Issue description
The metering lookup in the budget endpoint catches errors with a bare `catch {}` and does not log anything. This is a swallowed exception and removes necessary context for debugging.
## Issue Context
The endpoint is production-facing and the metering layer is an external dependency/failure point. Failures should be observable (structured log) and still gracefully degrade to the fallback behavior.
## Fix Focus Areas
- apps/api/src/routes/budget.ts[91-105]
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| export function createBudgetRoutes(): Router { | ||
| const router = Router(); | ||
|
|
||
| /** | ||
| * GET /tenants/:tenantId/budget | ||
| * | ||
| * Returns current budget status with agent-actionable recommendation. | ||
| * Agents use this to decide whether to use cheaper models or stop. | ||
| */ | ||
| router.get( | ||
| '/tenants/:tenantId/budget', | ||
| expressRequireAuth(), | ||
| expressRequirePermission('tenant:read'), | ||
| async (req: RBACRequest, res) => { | ||
| const { tenantId } = req.params; |
There was a problem hiding this comment.
3. Budget/quota routes unintegrated 🐞 Bug ✓ Correctness
Budget/quota endpoints are implemented as Router factories using RBAC middleware that depends on req.context, but apps/api’s entrypoint uses a different per-route auth + tenant scoping chain; as a result these endpoints are likely unreachable or will 401/403 even if mounted.
Agent Prompt
## Issue description
`createBudgetRoutes()` / `createQuotaRateLimitRoutes()` are implemented with `expressRequireAuth()` / `expressRequirePermission()`, but the API app primarily uses `authMiddleware` + `tenantAuthMiddleware` + `requirePermission()` to populate `req.context` and scope access. Without explicit mounting and consistent middleware usage, the endpoints may be unreachable or always return auth errors.
## Issue Context
- apps/api runtime entrypoint is `dist/index.js` built from `apps/api/src/index.ts`.
- Existing tenant-scoped endpoints use `authMiddleware` and `tenantAuthMiddleware`.
- RBAC middleware in `@gwi/core` extracts identity from `req.context`.
## Fix Focus Areas
- apps/api/src/routes/budget.ts[52-66]
- apps/api/src/routes/quota.ts[57-70]
- apps/api/src/index.ts[1092-1133]
- packages/core/src/security/rbac.ts[456-466]
## Suggested approach
1. In `apps/api/src/index.ts`, add route registration for budget/quota endpoints using the same middleware chain as other `/tenants/:tenantId/*` routes.
2. Alternatively, change `createBudgetRoutes` / `createQuotaRateLimitRoutes` to accept middleware dependencies (or expose a shared auth/tenant middleware) so they behave consistently.
3. Add a minimal integration test hitting `/tenants/:tenantId/budget` and `/tenants/:tenantId/quota` with an authenticated user context to ensure they return 200/403 appropriately.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| const plan = tenant.plan || 'free'; | ||
| const runsLimit = tenant.planLimits?.runsPerMonth ?? PLAN_RUN_LIMITS[plan] ?? 50; | ||
| const budgetLimitCents = PLAN_BUDGET_CENTS[plan] ?? 0; | ||
|
|
||
| // Get current period key (YYYY-MM) | ||
| const now = new Date(); | ||
| const period = `${now.getFullYear()}-${String(now.getMonth() + 1).padStart(2, '0')}`; | ||
|
|
||
| // Get runs used this month | ||
| let runsUsed = tenant.runsThisMonth ?? 0; | ||
| let estimatedCostCents = 0; | ||
|
|
||
| // If metering is enabled, use actual usage data | ||
| if (isMeteringEnabled()) { | ||
| try { | ||
| const meteringService = getMeteringService(); | ||
| meteringService.setTenantPlan(tenantId, plan); | ||
| const status = await meteringService.getPlanUsageStatus(tenantId); | ||
| runsUsed = status.plan.run_limit - status.runs_remaining; | ||
| // Rough cost estimate: $0.10 per run average | ||
| estimatedCostCents = runsUsed * 10; | ||
| } catch { |
There was a problem hiding this comment.
4. Budget plan id mismatch 🐞 Bug ✓ Correctness
The budget endpoint sets the metering plan using tenant.plan values (free/team/pro/enterprise), but metering only recognizes free/starter/professional/enterprise; this can silently fall back to free and produce incorrect runsUsed/percentUsed and recommendations.
Agent Prompt
## Issue description
`apps/api/src/routes/budget.ts` passes `tenant.plan` directly to `meteringService.setTenantPlan()`, but metering plan IDs differ from tenant plan tiers (`team`/`pro` vs `starter`/`professional`). This can make metering-based budget calculations incorrect.
## Issue Context
- Tenant plan tiers: `free | team | pro | enterprise`
- Metering plan IDs: `free | starter | professional | enterprise`
- A mapping function already exists in `packages/core/src/billing/metering-bridge.ts`.
## Fix Focus Areas
- apps/api/src/routes/budget.ts[78-99]
- packages/core/src/metering/types.ts[221-269]
- packages/core/src/tenants/lifecycle.ts[81-99]
- packages/core/src/billing/metering-bridge.ts[231-243]
## Suggested approach
1. Add a small mapping in the budget route (or import a shared mapper) such as:
- free -> free
- team -> starter
- pro -> professional
- enterprise -> enterprise
2. Call `meteringService.setTenantPlan(tenantId, mappedPlanId)`.
3. Add a unit test for the mapping (team/pro) to avoid regressions.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| // Check circuit breaker health (gwi-d1k: skip providers with open circuits) | ||
| if (!isProviderHealthy(provider)) { | ||
| reasons.push(`Excluded: circuit breaker OPEN for ${provider}`); | ||
| return null; | ||
| } |
There was a problem hiding this comment.
5. Circuit breaker is a no-op 🐞 Bug ⛯ Reliability
Selection now excludes providers with OPEN circuits, but provider calls are never wrapped with the new circuit-breaker+retry wrapper; therefore circuits won’t open and /health/providers will always report default states in real usage.
Agent Prompt
## Issue description
Circuit breaker health is checked during provider selection, but provider calls are not wrapped with the new resilience layer, so circuits never transition to OPEN in real traffic.
## Issue Context
- `ProviderSelectionPolicy` now calls `isProviderHealthy()`.
- Circuit breaker state/metrics are only updated when `getLLMCircuitBreaker().execute(...)` is used (currently in `wrapWithResilience`).
- `getLLMProvider()` returns raw providers without wrapping.
## Fix Focus Areas
- packages/core/src/llm/index.ts[220-258]
- packages/core/src/llm/resilient-provider.ts[49-90]
- packages/core/src/llm/selection-policy.ts[336-340]
- packages/core/src/llm/provider-health.ts[116-143]
## Suggested approach
1. In `getLLMProvider()`, after factory creation + availability check, return `wrapWithResilience(provider)`.
2. Add/adjust tests to ensure a failing provider trips the circuit and `isProviderHealthy()` returns false.
3. (Optional) Update `getProviderHealth()` to return `recommendation: null` when all providers are OPEN, or to only recommend providers with state != 'open'.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
| const limit = opts?.limit ?? 100; | ||
| let query = this.stepsCollection(runId).orderBy('startedAt').limit(limit + 1); | ||
|
|
||
| if (opts?.cursor) { | ||
| const cursorDoc = await this.stepsCollection(runId).doc(opts.cursor).get(); | ||
| if (cursorDoc.exists) { | ||
| query = query.startAfter(cursorDoc); | ||
| } | ||
| } |
There was a problem hiding this comment.
6. Step pagination inconsistent 🐞 Bug ⛯ Reliability
StepStore pagination/order differs between Firestore (orderBy startedAt) and in-memory (insertion order), and Firestore pagination uses cursor doc snapshots while startedAt is optional—this can lead to inconsistent ordering across environments or brittle cursor behavior.
Agent Prompt
## Issue description
`FirestoreStepStore` orders by `startedAt` and uses doc-snapshot cursors, while `InMemoryStepStore` doesn’t sort at all. Since `startedAt` is optional, ordering and pagination can diverge or become brittle.
## Issue Context
- Firestore uses `orderBy('startedAt')`.
- In-memory uses Map insertion order.
- `RunStep.startedAt` and `StepDoc.startedAt` are optional.
## Fix Focus Areas
- packages/core/src/storage/firestore-step.ts[114-122]
- packages/core/src/storage/inmemory.ts[1091-1110]
- packages/core/src/storage/interfaces.ts[114-126]
- packages/core/src/storage/firestore-step.ts[31-43]
## Suggested approach
1. Decide on a required ordering field (e.g., `startedAt` becomes required at creation time, or add `createdAt` that is always set).
2. Update both Firestore and InMemory implementations to sort by the same keys.
3. Consider adding a tie-breaker (e.g., also orderBy document ID) for deterministic pagination.
4. Add tests that assert ordering + cursor pagination stability.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Actionable comments posted: 9
🧹 Nitpick comments (1)
000-docs/242-PP-RMAP-automaton-integration-strategy.md (1)
23-30: Consider adding language specifiers to fenced code blocks.The ASCII architecture diagrams lack language specifiers, which triggers markdownlint MD040 warnings. Adding
textorplaintextwould satisfy linters and improve rendering consistency.📝 Suggested fix
-``` +```text Conway Cloud └── Automaton (standalone)Apply similarly to the diagrams at lines 54 and 89.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@000-docs/242-PP-RMAP-automaton-integration-strategy.md` around lines 23 - 30, The fenced ASCII diagrams (e.g., the block showing "Conway Cloud └── Automaton (standalone) ├── SOUL.md ...") should include a language specifier to satisfy markdownlint MD040; update those triple-backtick fences around that diagram and the other similar diagrams (the ones later in the doc) to use ```text or ```plaintext so linters and renderers treat them as plain text.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@apps/api/src/index.ts`:
- Around line 1785-1810: Add Zod input validation and bounds to the run-steps
endpoints: define Zod schemas that validate and coerce the path params
(tenantId, runId) and query params (limit and cursor) with min/max constraints
(e.g. limit coerced to number with a max like 100 and a sensible min like 1,
cursor optional string) and apply them at the start of the handlers for the
'/tenants/:tenantId/runs/:runId/steps' list route (where getEngine() and
stepStore.listSteps are used) and the corresponding single-step route; replace
the current direct parseInt(req.query.limit) and unchecked req.params access
with validated/coerced values and return 400 on validation failure before
calling eng.getRun or stepStore methods.
- Around line 687-704: The /health/providers route is currently public; protect
it by applying the existing internalOnlyMiddleware to the route so only internal
callers can access provider circuit-breaker data. Update the app.get handler
that calls core.getProviderHealth() to include internalOnlyMiddleware (similar
to the /metrics route) before the async handler (or alternatively wrap the
handler with authMiddleware/internalOnlyMiddleware), ensuring the same
middleware symbol name internalOnlyMiddleware is imported/available in the file
and used on the app.get('/health/providers', ...) registration.
In `@apps/api/src/routes/budget.ts`:
- Around line 13-31: Add a bounded Zod schema to validate req.params.tenantId
before calling getTenant, setTenantPlan, or getPlanUsageStatus: import z from
'zod', define a schema (e.g., const tenantIdSchema = z.string().min(1).max(64)
or z.string().uuid() depending on expected format) and call
tenantIdSchema.safeParse(req.params.tenantId); if validation fails return
res.status(400).json({ error: 'Invalid tenantId' }); otherwise use the parsed
value (e.g., parsed.tenantId) when calling getTenant, setTenantPlan, and
getPlanUsageStatus so no raw external input is passed directly to those
functions.
In `@apps/api/src/routes/quota.ts`:
- Around line 43-49: getRecommendation currently computes percentUsed by
dividing by status.limit which can be 0/undefined and yield NaN/Infinity causing
false throttles; update getRecommendation to first guard that status.limit is a
positive number (e.g., if (!status.limit || status.limit <= 0) continue) before
computing percentUsed, so the remaining check still applies but percentage logic
only runs when limit > 0; reference the function getRecommendation and the
fields ActionQuotaStatus.limit and .remaining when making this change.
- Around line 105-110: The current logger.info call ('Rate limit quota queried')
in apps/api/src/routes/quota.ts is missing required A2A audit fields; update the
handler (the route function that calls logger.info) to emit an audit-style log
entry including timestamp (new Date().toISOString()), spiffeId (from request
context/agent auth), taskId (request or generated trace id), operation (e.g.,
'quota.query' or 'quota.ratelimit.query'), and duration (compute using a start
time captured at handler entry and end time before logging), either by extending
the existing logger.info payload with these keys or by emitting a separate audit
logger call with those fields alongside tenantId, plan, and recommendation so
the endpoint conforms to the A2A audit requirements.
- Around line 16-40: Validate req.params.tenantId with a bounded Zod schema
before any storage or rate-limiter calls: define a z.string().min(1).max(64) (or
project-bound) schema and parse tenantId at the top of the route handler,
returning HTTP 400 on parse failure; then use the validated value (not raw
req.params.tenantId) when calling getTenantStore().getTenant(tenantId) and
limiter.status(tenantId, action). Ensure both the tenant lookup and rate-limiter
usage reference the validated variable and add tests for invalid tenantId cases.
In `@infra/budget.tf`:
- Around line 9-31: There are duplicate Terraform declarations for variables
enable_budget_alerts, billing_account_id, budget_amount_usd,
budget_alert_thresholds and the resource google_pubsub_topic.budget_alerts;
remove the duplicates by consolidating the variable definitions into a single
shared variables file (e.g., variables.tf) and keep only one declaration of
google_pubsub_topic.budget_alerts (or move it to the intended monitoring/budget
module), then update any references to those symbols to point to the single
source of truth so OpenTofu can plan/apply without duplicate resource/variable
errors.
In `@packages/core/src/llm/__tests__/resilient-provider.test.ts`:
- Around line 11-19: The test's mock provider currently hard-codes model names;
update the mock used in resilient-provider.test.ts (the provider passed into
wrapWithResilience) to read its model identifier from a test configuration or
environment variable instead of using literal strings, e.g., import or reference
a shared TEST_MODEL constant or process.env.TEST_MODEL (fallback to a single
default for CI) and use that variable everywhere the mock returns or checks
model names so all occurrences (including the areas corresponding to the earlier
mock sections and later assertions) remain in sync with environment/model
rotation.
In `@packages/core/src/llm/provider-health.ts`:
- Around line 116-143: getProviderHealth currently iterates a hard-coded
providerTypes array which can include providers not registered (e.g., "vertex")
and may recommend an OPEN circuit; change it to obtain the provider list from
the provider registry (use the registry accessor used where providers are
registered) instead of the hard-coded LLMProviderType array, call
buildProviderStatus(cb, pt) for each registered provider, compute the stateScore
as before but exclude providers whose status.state === 'open' from being
considered for recommendation, and keep the rest of the scoring logic so
recommendation only points to a registered, non-open provider; ensure variables
bestProvider and providers still populate correctly and timestamp remains.
---
Nitpick comments:
In `@000-docs/242-PP-RMAP-automaton-integration-strategy.md`:
- Around line 23-30: The fenced ASCII diagrams (e.g., the block showing "Conway
Cloud └── Automaton (standalone) ├── SOUL.md ...") should include a language
specifier to satisfy markdownlint MD040; update those triple-backtick fences
around that diagram and the other similar diagrams (the ones later in the doc)
to use ```text or ```plaintext so linters and renderers treat them as plain
text.
| /** | ||
| * GET /health/providers — Circuit breaker state per LLM provider (gwi-d1k) | ||
| * | ||
| * Agent-queryable: returns which providers are healthy, degraded, or tripped. | ||
| * Agents check this before requesting runs to pick a healthy provider. | ||
| */ | ||
| app.get('/health/providers', async (_req, res) => { | ||
| try { | ||
| const core = await import('@gwi/core'); | ||
| const report = core.getProviderHealth(); | ||
| res.json(report); | ||
| } catch (error) { | ||
| res.status(500).json({ | ||
| error: 'Failed to get provider health', | ||
| message: error instanceof Error ? error.message : 'Unknown error', | ||
| }); | ||
| } | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Search for documentation about /health/providers endpoint
rg -n "health/providers" -g "*.md" -g "*.yaml" -g "*.yml" -g "*.json" -g "*.ts" -g "*.tsx"Repository: intent-solutions-io/iam-git-with-intent
Length of output: 262
🏁 Script executed:
# Check for middleware patterns and authentication in the same file
fd "index.ts" | xargs grep -l "app.get\|internalOnlyMiddleware\|authMiddleware" | head -5Repository: intent-solutions-io/iam-git-with-intent
Length of output: 184
🏁 Script executed:
# Look for openapi or swagger documentation
fd -e yaml -e yml -e json -e ts | xargs grep -l "openapi\|swagger\|paths.*health" | head -10Repository: intent-solutions-io/iam-git-with-intent
Length of output: 442
🏁 Script executed:
# Check how other health endpoints are implemented
rg -B5 -A5 "app.get.*health" apps/api/src/index.ts | head -50Repository: intent-solutions-io/iam-git-with-intent
Length of output: 507
🏁 Script executed:
# Check openapi.yaml for health endpoint documentation
cat apps/gateway/openapi.yaml | grep -A 10 -B 2 "health" | head -50Repository: intent-solutions-io/iam-git-with-intent
Length of output: 65
🏁 Script executed:
# Search for internalOnlyMiddleware definition and usage
rg -n "internalOnlyMiddleware\|authMiddleware" apps/api/src/index.ts | head -20Repository: intent-solutions-io/iam-git-with-intent
Length of output: 65
🏁 Script executed:
# Check if there are other health endpoints and how they're protected
rg -B 2 -A 8 "app\.get\('\/health" apps/api/src/index.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 179
🏁 Script executed:
# Look for SPIFFE ID checks or authentication patterns in the file
rg -n "SPIFFE\|auth\|authenticated\|internal" apps/api/src/index.ts | head -30Repository: intent-solutions-io/iam-git-with-intent
Length of output: 65
🏁 Script executed:
# Fix regex and search for health endpoints
rg -B 2 -A 8 "app.get.*health" apps/api/src/index.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 495
🏁 Script executed:
# Check if there's middleware setup at the top of the file
head -100 apps/api/src/index.ts | grep -n "middleware\|app.use\|Express"Repository: intent-solutions-io/iam-git-with-intent
Length of output: 118
🏁 Script executed:
# Search for middleware definitions anywhere in the project
fd -e ts -e tsx | xargs rg -l "internalOnlyMiddleware|authMiddleware" | head -10Repository: intent-solutions-io/iam-git-with-intent
Length of output: 135
🏁 Script executed:
# Check the broader context of the endpoint - is there any authentication setup?
sed -n '680,710p' apps/api/src/index.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 1027
🏁 Script executed:
# Look for any documentation or comments about who can access health endpoints
rg -B 5 "health" apps/api/src/index.ts | head -80Repository: intent-solutions-io/iam-git-with-intent
Length of output: 2105
🏁 Script executed:
# Check how internalOnlyMiddleware is defined and used elsewhere in the file
rg -A 10 "function internalOnlyMiddleware" apps/api/src/index.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 671
🏁 Script executed:
# Search for SPIFFE validation patterns in the codebase
rg -n "SPIFFE\|spiffe://" apps/api/src/index.ts | head -10Repository: intent-solutions-io/iam-git-with-intent
Length of output: 65
🏁 Script executed:
# Check if there are other agent-facing endpoints and how they're protected
rg -B 3 -A 3 "agent|Agent" apps/api/src/index.ts | grep -A 3 -B 3 "app\\.get\|middleware" | head -40Repository: intent-solutions-io/iam-git-with-intent
Length of output: 588
🏁 Script executed:
# Check complete internalOnlyMiddleware implementation
sed -n '730,760p' apps/api/src/index.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 1095
🏁 Script executed:
# Verify if authMiddleware or similar is used for agent endpoints
rg -B 2 "authMiddleware.*agent|Agent.*authMiddleware" apps/api/src/index.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 65
🏁 Script executed:
# Check what the `/tenants/:tenantId/runs/:runId/steps` endpoint looks like (found in results)
sed -n '1775,1843p' apps/api/src/index.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 2542
Gate /health/providers with internalOnlyMiddleware to prevent public exposure of operational health data.
This endpoint exposes circuit breaker states per LLM provider without authentication. Although intended for agents, it should be restricted to internal or authenticated access—similar to the /metrics endpoint in the same file, which uses internalOnlyMiddleware. Agent-facing endpoints elsewhere in the codebase (e.g., /tenants/:tenantId/runs/:runId/steps) are protected with authMiddleware.
-app.get('/health/providers', async (_req, res) => {
+app.get('/health/providers', internalOnlyMiddleware, async (_req, res) => {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/index.ts` around lines 687 - 704, The /health/providers route is
currently public; protect it by applying the existing internalOnlyMiddleware to
the route so only internal callers can access provider circuit-breaker data.
Update the app.get handler that calls core.getProviderHealth() to include
internalOnlyMiddleware (similar to the /metrics route) before the async handler
(or alternatively wrap the handler with authMiddleware/internalOnlyMiddleware),
ensuring the same middleware symbol name internalOnlyMiddleware is
imported/available in the file and used on the app.get('/health/providers', ...)
registration.
| app.get('/tenants/:tenantId/runs/:runId/steps', authMiddleware, tenantAuthMiddleware, requirePermission('run:read'), async (req, res) => { | ||
| const { tenantId, runId } = req.params; | ||
| const limit = parseInt(req.query.limit as string) || 50; | ||
| const cursor = req.query.cursor as string | undefined; | ||
|
|
||
| try { | ||
| // Verify run belongs to tenant | ||
| const eng = await getEngine(); | ||
| const run = await eng.getRun(tenantId, runId); | ||
| if (!run) { | ||
| return res.status(404).json({ error: 'Run not found', runId }); | ||
| } | ||
|
|
||
| // Use step store (Firestore subcollection in prod, in-memory in dev) | ||
| const { getStepStore } = await import('@gwi/core'); | ||
| const stepStore = getStepStore(); | ||
| const result = await stepStore.listSteps(runId, { limit, cursor }); | ||
|
|
||
| res.json({ ...result, source: 'subcollection' }); | ||
| } catch (error) { | ||
| console.error('Failed to list steps:', error); | ||
| res.status(500).json({ | ||
| error: 'Failed to list steps', | ||
| message: error instanceof Error ? error.message : 'Unknown error', | ||
| }); | ||
| } |
There was a problem hiding this comment.
Add Zod validation and bounds for the run-steps endpoints.
limit/cursor (and path params) are accepted directly; without bounds this allows unbounded reads and violates the input-validation rule. Suggest Zod schemas with min/max and coercion, applied to both list and single-step routes.
✅ Proposed validation (list + single-step)
app.get('/tenants/:tenantId/runs/:runId/steps', authMiddleware, tenantAuthMiddleware, requirePermission('run:read'), async (req, res) => {
- const { tenantId, runId } = req.params;
- const limit = parseInt(req.query.limit as string) || 50;
- const cursor = req.query.cursor as string | undefined;
+ const paramsSchema = z.object({
+ tenantId: z.string().min(1),
+ runId: z.string().min(1),
+ });
+ const querySchema = z.object({
+ limit: z.coerce.number().int().min(1).max(100).default(50),
+ cursor: z.string().min(1).optional(),
+ });
+ const paramsParse = paramsSchema.safeParse(req.params);
+ const queryParse = querySchema.safeParse(req.query);
+ if (!paramsParse.success || !queryParse.success) {
+ return res.status(400).json({
+ error: 'Invalid request parameters',
+ details: [...(paramsParse.error?.errors ?? []), ...(queryParse.error?.errors ?? [])],
+ });
+ }
+ const { tenantId, runId } = paramsParse.data;
+ const { limit, cursor } = queryParse.data;
@@
app.get('/tenants/:tenantId/runs/:runId/steps/:stepId', authMiddleware, tenantAuthMiddleware, requirePermission('run:read'), async (req, res) => {
- const { tenantId, runId, stepId } = req.params;
+ const paramsSchema = z.object({
+ tenantId: z.string().min(1),
+ runId: z.string().min(1),
+ stepId: z.string().min(1),
+ });
+ const paramsParse = paramsSchema.safeParse(req.params);
+ if (!paramsParse.success) {
+ return res.status(400).json({
+ error: 'Invalid request parameters',
+ details: paramsParse.error.errors,
+ });
+ }
+ const { tenantId, runId, stepId } = paramsParse.data;As per coding guidelines: “Use Zod schemas for validation of all external data” and “Use Zod schemas with bounded constraints for input validation”.
Also applies to: 1816-1842
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/index.ts` around lines 1785 - 1810, Add Zod input validation and
bounds to the run-steps endpoints: define Zod schemas that validate and coerce
the path params (tenantId, runId) and query params (limit and cursor) with
min/max constraints (e.g. limit coerced to number with a max like 100 and a
sensible min like 1, cursor optional string) and apply them at the start of the
handlers for the '/tenants/:tenantId/runs/:runId/steps' list route (where
getEngine() and stepStore.listSteps are used) and the corresponding single-step
route; replace the current direct parseInt(req.query.limit) and unchecked
req.params access with validated/coerced values and return 400 on validation
failure before calling eng.getRun or stepStore methods.
| import { Router } from 'express'; | ||
| import { | ||
| expressRequireAuth, | ||
| expressRequirePermission, | ||
| type RBACRequest, | ||
| } from '@gwi/core'; | ||
| import { createLogger } from '@gwi/core'; | ||
|
|
||
| const logger = createLogger('budget-routes'); | ||
|
|
||
| /** | ||
| * Determine agent-actionable recommendation based on budget usage | ||
| */ | ||
| function getRecommendation(percentUsed: number): 'normal' | 'conserve' | 'stop' { | ||
| if (percentUsed >= 95) return 'stop'; | ||
| if (percentUsed >= 80) return 'conserve'; | ||
| return 'normal'; | ||
| } | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and read the budget.ts file
find . -name "budget.ts" -type f | head -5Repository: intent-solutions-io/iam-git-with-intent
Length of output: 114
🏁 Script executed:
# Search for tenantId usage in the codebase
rg -n "tenantId" --type ts --type tsx -B2 -A2 | head -100Repository: intent-solutions-io/iam-git-with-intent
Length of output: 110
🏁 Script executed:
# Search for existing Zod schema patterns and validation
rg -n "z\.string\(\)|z\.object\(|safeParse|zod" --type ts --type tsx apps/api/src -C2 | head -150Repository: intent-solutions-io/iam-git-with-intent
Length of output: 110
🏁 Script executed:
# Check for existing tenantId schemas or validation helpers
rg -n "TenantId|tenant.*schema|tenant.*validation" --type ts --type tsx -i | head -50Repository: intent-solutions-io/iam-git-with-intent
Length of output: 110
🏁 Script executed:
# Read the budget.ts file to check actual code
wc -l ./apps/api/src/routes/budget.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 118
🏁 Script executed:
# Read the budget.ts file
cat -n ./apps/api/src/routes/budget.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 5422
🏁 Script executed:
# Search for tenantId usage without tsx type
rg -n "tenantId" ./apps/api/src -B2 -A2 | head -150Repository: intent-solutions-io/iam-git-with-intent
Length of output: 8460
🏁 Script executed:
# Search for existing Zod patterns without tsx restriction
rg -n "z\.string\(\)|z\.object\(|safeParse|from 'zod'|import.*zod" ./apps/api/src -B1 -A1 | head -100Repository: intent-solutions-io/iam-git-with-intent
Length of output: 6144
Validate tenantId with a bounded Zod schema before use.
req.params.tenantId is external input passed directly to getTenant(), setTenantPlan(), and getPlanUsageStatus() without validation. Add a bounded Zod schema and return 400 on invalid input.
🛡️ Suggested fix
import { Router } from 'express';
+import { z } from 'zod';
import {
expressRequireAuth,
expressRequirePermission,
type RBACRequest,
} from '@gwi/core';
import { createLogger } from '@gwi/core';
const logger = createLogger('budget-routes');
+
+const TenantIdSchema = z.string().min(1).max(128);
/**
* Determine agent-actionable recommendation based on budget usage
*/
@@ -63,6 +67,11 @@ export function createBudgetRoutes(): Router {
expressRequirePermission('tenant:read'),
async (req: RBACRequest, res) => {
- const { tenantId } = req.params;
+ const tenantIdResult = TenantIdSchema.safeParse(req.params.tenantId);
+ if (!tenantIdResult.success) {
+ return res.status(400).json({ error: 'Invalid tenantId' });
+ }
+ const tenantId = tenantIdResult.data;
try {Per coding guidelines: "Use Zod schemas for validation of all external data" and "Use Zod schemas with bounded constraints for input validation".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/routes/budget.ts` around lines 13 - 31, Add a bounded Zod schema
to validate req.params.tenantId before calling getTenant, setTenantPlan, or
getPlanUsageStatus: import z from 'zod', define a schema (e.g., const
tenantIdSchema = z.string().min(1).max(64) or z.string().uuid() depending on
expected format) and call tenantIdSchema.safeParse(req.params.tenantId); if
validation fails return res.status(400).json({ error: 'Invalid tenantId' });
otherwise use the parsed value (e.g., parsed.tenantId) when calling getTenant,
setTenantPlan, and getPlanUsageStatus so no raw external input is passed
directly to those functions.
| import { Router } from 'express'; | ||
| import { | ||
| expressRequireAuth, | ||
| expressRequirePermission, | ||
| type RBACRequest, | ||
| } from '@gwi/core'; | ||
| import { | ||
| getRateLimiter, | ||
| DEFAULT_RATE_LIMITS, | ||
| type RateLimitResult, | ||
| } from '@gwi/core'; | ||
| import { createLogger } from '@gwi/core'; | ||
|
|
||
| const logger = createLogger('quota-routes'); | ||
|
|
||
| /** | ||
| * Agent-queryable rate limit status for a single action | ||
| */ | ||
| interface ActionQuotaStatus { | ||
| limit: number; | ||
| remaining: number; | ||
| resetsAt: string; | ||
| } | ||
|
|
||
| /** |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check the actual file and look for tenantId validation
echo "=== Reading quota.ts file ==="
wc -l apps/api/src/routes/quota.ts
head -80 apps/api/src/routes/quota.ts
echo ""
echo "=== Search for tenantId usage patterns ==="
rg "tenantId" apps/api/src/routes/quota.ts -n -B2 -A2
echo ""
echo "=== Search for existing Zod schemas in routes directory ==="
rg "z\.(string|object|number)" apps/api/src/routes -g '*.ts' -n | head -20
echo ""
echo "=== Search for validation patterns in api ==="
rg "safeParse|parse|validate" apps/api/src/routes -g '*.ts' -n | head -20Repository: intent-solutions-io/iam-git-with-intent
Length of output: 7090
🏁 Script executed:
#!/bin/bash
echo "=== Search for existing tenantId schema or validation definitions ==="
rg -n "tenantId.*Schema|TenantId.*z\." apps/ -g '*.ts' -B2 -A2
echo ""
echo "=== Search for common tenant validation patterns ==="
rg -n "z\.string\(\).*max\(" apps/ -g '*.ts' | grep -i tenant
echo ""
echo "=== Check if `@gwi/core` exports tenantId validation ==="
rg -n "export.*tenant|TenantId" apps/ -g '*.ts' | head -15Repository: intent-solutions-io/iam-git-with-intent
Length of output: 1654
🏁 Script executed:
#!/bin/bash
echo "=== Check lines 66-72 in quota.ts for the second issue ==="
sed -n '60,80p' apps/api/src/routes/quota.ts
echo ""
echo "=== Search for how other routes validate params (not body) ==="
rg -n "req\.params" apps/api/src/routes -g '*.ts' -B3 -A3 | head -50Repository: intent-solutions-io/iam-git-with-intent
Length of output: 3849
🏁 Script executed:
#!/bin/bash
echo "=== Check if there are any middleware that validate tenantId globally ==="
rg -n "middleware|Middleware" apps/api/src/routes/quota.ts -i
echo ""
echo "=== Verify that no validation happens before tenantId use ==="
sed -n '66,90p' apps/api/src/routes/quota.ts
echo ""
echo "=== Check how quotas.ts handles tenantId for comparison ==="
rg -n "req\.params\.tenantId" apps/api/src/routes/quotas.ts -B2 -A5 | head -30Repository: intent-solutions-io/iam-git-with-intent
Length of output: 1140
🏁 Script executed:
#!/bin/bash
echo "=== Read the quotas.ts file to see full validation pattern ==="
sed -n '1,150p' apps/api/src/routes/quotas.ts | head -100
echo ""
echo "=== Check if quotas.ts validates tenantId in its routes ==="
rg -n "tenantId" apps/api/src/routes/quotas.ts -B4 -A4 | head -40Repository: intent-solutions-io/iam-git-with-intent
Length of output: 3936
🏁 Script executed:
#!/bin/bash
echo "=== Check core package for tenant validation or size constraints ==="
find . -path '*/node_modules' -prune -o -name '@gwi' -type d -print 2>/dev/null | head -5
echo ""
echo "=== Look for typical tenant ID length constraints in docs or config ==="
rg -n "tenant.*size|tenant.*length|tenant.*max|tenant.*constraint" --type-list | head -10
find . -path '*/node_modules' -prune -o -name '*.md' -type f -exec grep -l "tenant.*id\|tenantId" {} \; 2>/dev/null | head -5
echo ""
echo "=== Check if there's any validation middleware in the API setup ==="
sed -n '1,100p' apps/api/src/index.tsRepository: intent-solutions-io/iam-git-with-intent
Length of output: 4161
🏁 Script executed:
#!/bin/bash
echo "=== Final verification: confirm tenantId is used in storage/limiter without prior validation ==="
sed -n '66,100p' apps/api/src/routes/quota.ts | grep -n "tenantId\|getTenant\|limiter"Repository: intent-solutions-io/iam-git-with-intent
Length of output: 562
Add Zod validation for tenantId before storage queries.
req.params.tenantId is external input used directly in getTenantStore().getTenant(tenantId) and limiter.status(tenantId, action) without validation. Validate with a bounded Zod schema and return 400 on invalid input.
🛡️ Suggested fix
import { Router } from 'express';
+import { z } from 'zod';
import {
expressRequireAuth,
expressRequirePermission,
type RBACRequest,
} from '@gwi/core';
@@
interface ActionQuotaStatus {
limit: number;
remaining: number;
resetsAt: string;
}
+
+const TenantIdSchema = z.string().min(1).max(128);
@@
- const { tenantId } = req.params;
+ const tenantIdResult = TenantIdSchema.safeParse(req.params.tenantId);
+ if (!tenantIdResult.success) {
+ return res.status(400).json({ error: 'Invalid tenantId' });
+ }
+ const tenantId = tenantIdResult.data;Aligns with "Use Zod schemas for validation of all external data" and "Use Zod schemas with bounded constraints for input validation".
Also applies to: 66-72
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/routes/quota.ts` around lines 16 - 40, Validate
req.params.tenantId with a bounded Zod schema before any storage or rate-limiter
calls: define a z.string().min(1).max(64) (or project-bound) schema and parse
tenantId at the top of the route handler, returning HTTP 400 on parse failure;
then use the validated value (not raw req.params.tenantId) when calling
getTenantStore().getTenant(tenantId) and limiter.status(tenantId, action).
Ensure both the tenant lookup and rate-limiter usage reference the validated
variable and add tests for invalid tenantId cases.
| function getRecommendation( | ||
| statuses: Record<string, ActionQuotaStatus>, | ||
| ): 'normal' | 'throttle' { | ||
| for (const status of Object.values(statuses)) { | ||
| if (status.remaining <= 0) return 'throttle'; | ||
| const percentUsed = ((status.limit - status.remaining) / status.limit) * 100; | ||
| if (percentUsed >= 80) return 'throttle'; |
There was a problem hiding this comment.
Guard against zero/undefined limits to avoid NaN/Infinity throttle.
percentUsed divides by status.limit; if the limiter returns 0 (unlimited or misconfigured), the calculation becomes Infinity/NaN and forces throttle. Add a non‑positive limit guard before the calculation.
🛠️ Suggested fix
- for (const status of Object.values(statuses)) {
- if (status.remaining <= 0) return 'throttle';
- const percentUsed = ((status.limit - status.remaining) / status.limit) * 100;
+ for (const status of Object.values(statuses)) {
+ if (status.limit <= 0) continue;
+ if (status.remaining <= 0) return 'throttle';
+ const percentUsed = ((status.limit - status.remaining) / status.limit) * 100;
if (percentUsed >= 80) return 'throttle';
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/routes/quota.ts` around lines 43 - 49, getRecommendation
currently computes percentUsed by dividing by status.limit which can be
0/undefined and yield NaN/Infinity causing false throttles; update
getRecommendation to first guard that status.limit is a positive number (e.g.,
if (!status.limit || status.limit <= 0) continue) before computing percentUsed,
so the remaining check still applies but percentage logic only runs when limit >
0; reference the function getRecommendation and the fields
ActionQuotaStatus.limit and .remaining when making this change.
| logger.info('Rate limit quota queried', { | ||
| eventName: 'quota.ratelimit.query', | ||
| tenantId, | ||
| plan, | ||
| recommendation, | ||
| }); |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's run the provided search pattern to find audit logging patterns
rg -n -C2 "audit|A2A|spiffe|taskId|operationType|duration" apps/api/src packages -g '*.ts' 2>&1 | head -100Repository: intent-solutions-io/iam-git-with-intent
Length of output: 5875
🏁 Script executed:
#!/bin/bash
# Let's examine the quota.ts file to understand the context
cat -n apps/api/src/routes/quota.ts 2>/dev/null | head -150Repository: intent-solutions-io/iam-git-with-intent
Length of output: 4678
🏁 Script executed:
#!/bin/bash
# Check if there's a middleware directory or logging utilities
find apps/api/src -type f -name '*middleware*' -o -name '*audit*' -o -name '*log*' | head -20Repository: intent-solutions-io/iam-git-with-intent
Length of output: 65
🏁 Script executed:
#!/bin/bash
# Search for the operation type definition and see if there's a global logging pattern
rg -n "operationType|logger\.(info|debug|audit)" apps/api/src/routes -A 2 -B 2 | head -150Repository: intent-solutions-io/iam-git-with-intent
Length of output: 4186
Add required A2A audit log fields for agent-facing quota endpoint.
The endpoint is explicitly agent-first (per the module comment), making this an A2A operation. The log entry at lines 105-110 lacks required audit fields: timestamp (ISO 8601), SPIFFE ID, task ID, operation type, and duration. Either emit a dedicated audit log entry with these fields or confirm a global middleware adds them for this endpoint.
Per coding guidelines: "Implement audit logging for all A2A operations with timestamp (ISO 8601), SPIFFE ID, task ID, operation type, and duration".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@apps/api/src/routes/quota.ts` around lines 105 - 110, The current logger.info
call ('Rate limit quota queried') in apps/api/src/routes/quota.ts is missing
required A2A audit fields; update the handler (the route function that calls
logger.info) to emit an audit-style log entry including timestamp (new
Date().toISOString()), spiffeId (from request context/agent auth), taskId
(request or generated trace id), operation (e.g., 'quota.query' or
'quota.ratelimit.query'), and duration (compute using a start time captured at
handler entry and end time before logging), either by extending the existing
logger.info payload with these keys or by emitting a separate audit logger call
with those fields alongside tenantId, plan, and recommendation so the endpoint
conforms to the A2A audit requirements.
| variable "enable_budget_alerts" { | ||
| description = "Enable GCP billing budget alerts" | ||
| type = bool | ||
| default = false | ||
| } | ||
|
|
||
| variable "budget_amount_usd" { | ||
| description = "Monthly budget amount in USD" | ||
| type = number | ||
| default = 500 | ||
| } | ||
|
|
||
| variable "budget_alert_thresholds" { | ||
| description = "Alert threshold percentages (0.0-1.0)" | ||
| type = list(number) | ||
| default = [0.5, 0.8, 0.9, 1.0] | ||
| } | ||
|
|
||
| variable "billing_account_id" { | ||
| description = "GCP Billing Account ID (required if enable_budget_alerts = true)" | ||
| type = string | ||
| default = "" | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if infra/budget.tf exists and get its line count
ls -lh infra/budget.tf
wc -l infra/budget.tf
# Find duplicate variable declarations
echo "=== Searching for duplicate: enable_budget_alerts ==="
rg -n 'variable "enable_budget_alerts"' infra -g '*.tf'
echo "=== Searching for duplicate: billing_account_id ==="
rg -n 'variable "billing_account_id"' infra -g '*.tf'
echo "=== Searching for duplicate resource: google_pubsub_topic budget_alerts ==="
rg -n 'resource "google_pubsub_topic" "budget_alerts"' infra -g '*.tf'
# List all .tf files in infra to understand structure
echo "=== All Terraform files in infra/ ==="
find infra -name '*.tf' -type fRepository: intent-solutions-io/iam-git-with-intent
Length of output: 1116
🏁 Script executed:
# Read the full infra/budget.tf file to see all content
cat -n infra/budget.tfRepository: intent-solutions-io/iam-git-with-intent
Length of output: 3587
🏁 Script executed:
# Read the duplicate definitions in monitoring.tf
sed -n '831,860p' infra/monitoring.tfRepository: intent-solutions-io/iam-git-with-intent
Length of output: 879
Resolve duplicate variable and resource declarations blocking OpenTofu plan.
Duplicates for enable_budget_alerts, billing_account_id, and google_pubsub_topic.budget_alerts are declared in both infra/monitoring.tf (lines 831, 837, 849) and infra/budget.tf (lines 9, 27, 37). These will prevent OpenTofu plan and apply from succeeding. Deduplicate by consolidating definitions into a single file, or if keeping both files, move the duplicates to the shared variables.tf and remove from here.
Also applies to: 37-45, 51-78
🧰 Tools
🪛 GitHub Actions: OpenTofu Plan
[error] 9-9: OpenTofu: Duplicate variable declaration for "enable_budget_alerts". A variable named "enable_budget_alerts" is defined multiple times within the module.
[error] 27-27: OpenTofu: Duplicate variable declaration for "billing_account_id". A variable named "billing_account_id" is defined multiple times within the module.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@infra/budget.tf` around lines 9 - 31, There are duplicate Terraform
declarations for variables enable_budget_alerts, billing_account_id,
budget_amount_usd, budget_alert_thresholds and the resource
google_pubsub_topic.budget_alerts; remove the duplicates by consolidating the
variable definitions into a single shared variables file (e.g., variables.tf)
and keep only one declaration of google_pubsub_topic.budget_alerts (or move it
to the intended monitoring/budget module), then update any references to those
symbols to point to the single source of truth so OpenTofu can plan/apply
without duplicate resource/variable errors.
| import { describe, it, expect, beforeEach } from 'vitest'; | ||
| import { wrapWithResilience } from '../resilient-provider.js'; | ||
| import { | ||
| getProviderHealth, | ||
| isProviderHealthy, | ||
| tripProvider, | ||
| resetProvider, | ||
| resetLLMCircuitBreaker, | ||
| } from '../provider-health.js'; |
There was a problem hiding this comment.
Avoid hard-coded model names in the mock provider.
Use a config-driven model identifier so tests stay aligned with environment/model rotation.
🔧 Suggested update
-import { wrapWithResilience } from '../resilient-provider.js';
+import { wrapWithResilience } from '../resilient-provider.js';
+import { getDefaultModel as getDefaultModelForProvider } from '../provider-discovery.js';
+const MODEL_ID = process.env.GWI_LLM_MODEL ?? getDefaultModelForProvider('anthropic');
function createMockProvider(opts?: {
failAfter?: number;
failWith?: Error;
}): LLMProvider & { callCount: number } {
let callCount = 0;
const failAfter = opts?.failAfter ?? Infinity;
const failWith = opts?.failWith ?? new Error('Provider error (status 500)');
@@
getModel() {
- return 'claude-sonnet-4-20250514';
+ return MODEL_ID;
},
@@
return {
json: { result: 'ok' },
raw: '{"result":"ok"}',
provider: 'anthropic',
- model: 'claude-sonnet-4-20250514',
+ model: MODEL_ID,
latencyMs: 100,
};
@@
return {
text: 'Hello',
provider: 'anthropic',
- model: 'claude-sonnet-4-20250514',
+ model: MODEL_ID,
latencyMs: 100,
};
@@
- expect(resilient.getModel()).toBe('claude-sonnet-4-20250514');
+ expect(resilient.getModel()).toBe(MODEL_ID);As per coding guidelines, "Do not hard-code model names; use configuration and environment variables instead".
Also applies to: 32-79, 147-154
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/llm/__tests__/resilient-provider.test.ts` around lines 11 -
19, The test's mock provider currently hard-codes model names; update the mock
used in resilient-provider.test.ts (the provider passed into wrapWithResilience)
to read its model identifier from a test configuration or environment variable
instead of using literal strings, e.g., import or reference a shared TEST_MODEL
constant or process.env.TEST_MODEL (fallback to a single default for CI) and use
that variable everywhere the mock returns or checks model names so all
occurrences (including the areas corresponding to the earlier mock sections and
later assertions) remain in sync with environment/model rotation.
| export function getProviderHealth(): ProviderHealthReport { | ||
| const cb = getLLMCircuitBreaker(); | ||
| const providerTypes: LLMProviderType[] = ['anthropic', 'google', 'openai', 'vertex', 'openai_compat']; | ||
|
|
||
| const providers: Partial<Record<LLMProviderType, ProviderHealthStatus>> = {}; | ||
| let bestProvider: LLMProviderType | null = null; | ||
| let bestScore = -1; | ||
|
|
||
| for (const pt of providerTypes) { | ||
| const status = buildProviderStatus(cb, pt); | ||
| providers[pt] = status; | ||
|
|
||
| // Score: closed = 100, half_open = 50, open = 0, minus failure rate | ||
| const stateScore = status.state === 'closed' ? 100 : status.state === 'half_open' ? 50 : 0; | ||
| const score = stateScore - status.failureRate; | ||
|
|
||
| if (score > bestScore) { | ||
| bestScore = score; | ||
| bestProvider = pt; | ||
| } | ||
| } | ||
|
|
||
| return { | ||
| providers, | ||
| recommendation: bestProvider, | ||
| timestamp: new Date().toISOString(), | ||
| }; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the LLMProviderType definition
rg -n "type LLMProviderType|enum LLMProviderType" packages/core/src/llm/ -A 5
# Check provider-capabilities.ts for the full provider list
rg -n "PROVIDER_CAPABILITIES|export.*Provider" packages/core/src/llm/provider-capabilities.ts -A 3 | head -100
# Check what providers are registered in index.ts or similar
rg -n "register|supported.*provider|Provider.*list" packages/core/src/llm/index.ts -i -A 2 | head -50Repository: intent-solutions-io/iam-git-with-intent
Length of output: 3181
🏁 Script executed:
# Check if there's a getAllProviders function or how the registry list is accessed
rg -n "getAllProviders|registry\.list|vertex" packages/core/src/llm/ -A 2 | head -80
# Check where getProviderHealth is called
rg -n "getProviderHealth" packages/core/src/ -B 2 -A 2Repository: intent-solutions-io/iam-git-with-intent
Length of output: 6313
🏁 Script executed:
# Check if vertex should be registered or is intentionally excluded
rg -n "createVertexProvider" packages/core/src/llm/ -B 2 -A 2
# Check if the provider registry initialization should register vertex
git log --oneline -n 20 -- packages/core/src/llm/index.ts 2>/dev/null | head -10Repository: intent-solutions-io/iam-git-with-intent
Length of output: 1000
Recommendation can point to an unregistered provider.
The hard-coded provider list includes 'vertex', which exists in the LLMProviderType type definition but is not registered in the provider registry (only google, anthropic, openai, and openai_compat are registered at line 181-184 in index.ts). If getProviderHealth() recommends vertex, it will fail at runtime when agents attempt to use it. Additionally, the provider list should be derived from the registry to stay in sync.
For the OPEN state concern: while circuits in the open state score 0 after subtracting failure rate (which is negative if failures exist), it's still safer to explicitly skip them for recommendation to handle edge cases.
⚙️ Suggested fix
for (const pt of providerTypes) {
const status = buildProviderStatus(cb, pt);
providers[pt] = status;
+ if (status.state === 'open') {
+ continue;
+ }
- // Score: closed = 100, half_open = 50, open = 0, minus failure rate
- const stateScore = status.state === 'closed' ? 100 : status.state === 'half_open' ? 50 : 0;
+ // Score: closed = 100, half_open = 50, minus failure rate
+ const stateScore = status.state === 'closed' ? 100 : 50;
const score = stateScore - status.failureRate;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@packages/core/src/llm/provider-health.ts` around lines 116 - 143,
getProviderHealth currently iterates a hard-coded providerTypes array which can
include providers not registered (e.g., "vertex") and may recommend an OPEN
circuit; change it to obtain the provider list from the provider registry (use
the registry accessor used where providers are registered) instead of the
hard-coded LLMProviderType array, call buildProviderStatus(cb, pt) for each
registered provider, compute the stateScore as before but exclude providers
whose status.state === 'open' from being considered for recommendation, and keep
the rest of the scoring logic so recommendation only points to a registered,
non-open provider; ensure variables bestProvider and providers still populate
correctly and timestamp remains.
Release prepared by: jeremy Approval timestamp: 2026-02-20T00:21:16Z Approval SHA: 10989d0 Changes in this release: - Scale & Ops Maturity epic: agent-first infrastructure (#86) - Harness engineering hooks + README currency + BSL 1.1 (#85) - Code Factory gap closure: incident-to-harness + browser evidence (#84) Co-Authored-By: Claude <noreply@anthropic.com>
Summary
gwi-triagelabels for agent auto-scoring; CycloneDX SBOM generation on release viaanchore/sbom-actionGET /tenants/:tenantId/budgetwith recommendation field (normal/conserve/stop)CircuitBreaker+ExponentialBackoffRetryHandler, provider health registry singleton, selection policy skips providers with open circuits,GET /health/providersendpointStepStoreinterface + Firestore/InMemory implementations, cursor-based pagination,GET /tenants/:tenantId/runs/:runId/stepsand.../steps/:stepIdAPI endpointsGET /tenants/:tenantId/quotawith per-action rate limit status andrecommendationfieldAll new API endpoints return agent-actionable
recommendationfields so autonomous agents can make cost-aware and reliability-aware decisions.Test plan
npx tsc --noEmitpasses for@gwi/core,apps/api,apps/gatewaydependabot.yml,sbom.ymlvalidtofu fmt -checkpasses onbudget.tfnpm run arvfull gate pass/health/providers,/quota,/budgetendpoints return valid JSONgwi-triagelabel after merge🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Infrastructure
Documentation