Feat/api realtime position monitoring#326
Conversation
Both migrations referenced public.invisible_wallets, which is created in a separate migration stream that is not always present on a fresh local reset. Wrapping the ALTER TABLE statements in a table-existence check makes them idempotent so supabase start succeeds on a clean checkout.
New tables for the real-time position monitoring feature (Issue Galaxy-KJ#306, Roadmap Galaxy-KJ#53). Includes per-user RLS, a partial index over active alerts that drives the worker scan, and a retry index for failed deliveries. Triggers reuse the shared update_updated_at_column() function.
…abase client Foundation layer for the monitoring feature: domain types and typed errors, Joi schemas for alert CRUD, a reusable validation middleware, and a service-role Supabase singleton scoped to the api-rest package. The webhook URL scheme is https-only in production and relaxed to http in development so local smoke tests can target 127.0.0.1.
Persistence layer for monitoring_alerts and alert_events. Enforces ownership at the data boundary when a userId is provided; worker-only queries intentionally bypass that filter so the background process can scan every active alert. Includes a row-to-domain mapper to keep snake_case isolated to the repo. Tests cover CRUD success and error paths, ownership filtering, worker scan ordering, and event delivery state transitions.
MonitoringAlertService orchestrates HTTP-facing CRUD on top of the repository, enforces protocol and channel invariants, and translates persistence errors into typed MonitoringError instances mapped to HTTP status codes. AlertEvaluator is a pure decision function (alert + observed value + cooldown → trigger?) so the same rule is testable in isolation and reusable by future endpoints such as a 'test alert' simulator.
Strategy pattern for delivering alert events. WebhookChannel POSTs the canonical payload with an HMAC-SHA256 signature over '${timestamp}.${body}' so receivers can verify authenticity, and includes the event id as an idempotency key. Distinguishes retryable failures (network, 5xx, 408, 429) from terminal ones.
ChannelDispatcher routes to the right implementation by kind and computes the next retry timestamp with full-jitter exponential backoff capped at MAX_DELIVERY_ATTEMPTS.
PositionMonitorWorker runs as a separate process (npm scripts: worker, worker:dev) so polling never duplicates when the REST API is horizontally scaled. Two interval loops: evaluation (fetch alerts → query protocol → decide → persist event → dispatch) and a retry stub reserved for next iteration. ProtocolPool caches one IDefiProtocol client per (protocol, network) tuple for the worker's lifetime so we don't rebuild heavy RPC clients on every tick. Parses Blend's '∞' health factor as +Infinity so positions with no debt never trigger.
Wires the JWT-protected REST surface: POST, GET, GET:id, PATCH, DELETE on /monitoring/alerts and GET on /monitoring/alerts/:id/events. Maps MonitoringError instances to HTTP status codes consistently and applies the shared audit middleware so every mutation is logged. Closes the wiring for Issue Galaxy-KJ#306 — the API contract called out in the acceptance criteria is now live.
scratch/smoke.sh boots the receiver, the REST API and a fake-protocol worker, provisions a Supabase auth user, creates an alert, and asserts that the worker triggers the webhook with a valid HMAC signature and a delivered alert_event row. Knobs: FAKE_HEALTH_FACTOR, THRESHOLD, EVAL_INTERVAL_MS, KEEP_RUNNING. Cleans up all background processes on any exit path.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
📝 WalkthroughWalkthroughAdds monitoring alert persistence, REST routes, validation, delivery dispatch, worker polling, and smoke-test tooling around a new alerting flow. A Supabase migration adds the underlying tables and policies, and the API package wires routes and a worker entrypoint for runtime use. ChangesMonitoring Alerts Feature
Sequence DiagramsequenceDiagram
participant Client
participant AlertsRouter
participant MonitoringAlertService
participant MonitoringAlertRepository
participant PositionMonitorWorker
participant ProtocolPool
participant ChannelDispatcher
Client->>AlertsRouter: POST /api/v1/monitoring/alerts
AlertsRouter->>MonitoringAlertService: create(userId, input)
MonitoringAlertService->>MonitoringAlertRepository: create(userId, input)
MonitoringAlertRepository-->>MonitoringAlertService: MonitoringAlert
MonitoringAlertService-->>AlertsRouter: MonitoringAlert
AlertsRouter-->>Client: 201 Created
loop evaluationTick interval
PositionMonitorWorker->>MonitoringAlertRepository: listActiveForEvaluation(network, batchSize)
MonitoringAlertRepository-->>PositionMonitorWorker: alerts[]
PositionMonitorWorker->>ProtocolPool: get(protocol, network).getHealthFactor(account)
ProtocolPool-->>PositionMonitorWorker: health factor
PositionMonitorWorker->>MonitoringAlertRepository: markEvaluated(id, healthFactor, didTrigger)
alt shouldTrigger
PositionMonitorWorker->>MonitoringAlertRepository: createEvent(alertId, payload, healthFactor)
PositionMonitorWorker->>ChannelDispatcher: dispatch(alert, payload)
ChannelDispatcher-->>PositionMonitorWorker: ChannelDeliveryResult
PositionMonitorWorker->>MonitoringAlertRepository: updateEventDelivery(id, patch)
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 3❌ Failed checks (3 warnings)
✅ Passed checks (2 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 ESLint
packages/api/rest/src/index.tsOops! Something went wrong! :( ESLint: 9.39.2 YAMLException: Cannot read config file: /.eslintrc.js.bak 1 | module.exports = { Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
🧹 Nitpick comments (1)
packages/api/rest/src/worker/__tests__/position-monitor.worker.test.ts (1)
172-178: 📐 Maintainability & Code Quality | 🔵 Trivial | ⚡ Quick winThis test does not verify interval cleanup.
Lines 172-178 only call
start()/stop(), so it still passes if the timer registration/cleanup logic disappears. Spy onsetInterval/clearIntervalor use fake timers and assert the call counts.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/api/rest/src/worker/__tests__/position-monitor.worker.test.ts` around lines 172 - 178, The idempotent start/stop test in position-monitor.worker.test.ts does not actually verify interval cleanup because it only calls worker.start() and worker.stop(). Update the test around buildWorker and the worker.start/worker.stop methods to spy on setInterval and clearInterval, or use fake timers, and assert the expected call counts so the test fails if timer registration or cleanup is removed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@packages/api/rest/src/routes/monitoring/alerts.ts`:
- Around line 135-145: The GET /alerts/:id/events handler is parsing limit and
offset manually, so invalid query values can become NaN or pass unchecked into
service.listEventsForUser. Add a Joi query schema for event-history pagination,
then wire it into this route with validate(..., 'query') the same way GET
/alerts does, so the handler rejects non-numeric, negative, or oversized values
before calling listEventsForUser.
In
`@packages/api/rest/src/services/monitoring/channels/__tests__/channel-dispatcher.test.ts`:
- Around line 81-85: The ChannelDispatcher retry timestamp test is flaky because
it compares computeNextRetry(1) against a later Date.now() even though the
method can return an equal timestamp when jitter is zero. Update the assertion
in channel-dispatcher.test.ts to use a stable time window (capture before/after
around the call) or mock the clock, and keep the check focused on
ChannelDispatcher.computeNextRetry rather than relying on live time.
In `@packages/api/rest/src/services/monitoring/channels/channel-dispatcher.ts`:
- Around line 28-38: `ChannelDispatcher.dispatch` currently returns directly
from `channel.send(...)`, which allows channel exceptions to escape the
dispatcher boundary; wrap that call in error handling so any thrown exception is
converted into a failed `ChannelDeliveryResult` with `success: false`,
`retryable` set appropriately, and an error message. Keep the existing
missing-channel path intact, and make sure the fix is localized in
`ChannelDispatcher.dispatch` so `triggerAlert()` in the worker always receives a
result instead of an exception.
In `@packages/api/rest/src/services/monitoring/channels/webhook-channel.ts`:
- Around line 52-60: The webhook request headers in webhook-channel.ts are
currently allowing config.headers to override reserved system headers, which can
break signature verification and event handling. Update the header merge in the
webhook send path so caller-provided headers are applied first and the reserved
headers such as Content-Type, X-Galaxy-Event, X-Galaxy-Event-Id,
X-Galaxy-Timestamp, and X-Galaxy-Signature are written afterward in the
webhook-channel logic. Keep the fix localized to the header construction around
the payload dispatch so these signed values always win.
In `@packages/api/rest/src/services/monitoring/monitoring-alert.service.ts`:
- Around line 54-68: The update() method in MonitoringAlertService currently
patches directly through repo.update() without re-validating channelConfig
against the alert’s stored channel. Load the existing alert first, merge it with
the incoming UpdateMonitoringAlertInput, and validate the resulting state before
writing; if the alert is missing, keep the existing ALERT_NOT_FOUND behavior.
Make the check happen in update() using the MonitoringAlertService and repo
methods so a PATCH cannot leave an incompatible channelConfig for the current
channel.
In `@packages/api/rest/src/services/monitoring/protocol-pool.ts`:
- Around line 28-33: ProtocolPool is injecting one global BLEND_ADDRESSES set
into both cached clients, so testnet and mainnet can point at the wrong
deployment. Update build() in ProtocolPool to choose addresses based on the
target network and pass a network-specific address map into the client creation
path. Keep the existing BLEND_* env fallbacks, but resolve them separately for
testnet and mainnet instead of sharing a single constant across both.
In `@packages/api/rest/src/types/monitoring-types.ts`:
- Around line 19-55: The read model is exposing sensitive webhook secrets
through MonitoringAlert.channelConfig because WebhookChannelConfig.secret is
returned unchanged. Update the monitoring types around WebhookChannelConfig,
ChannelConfig, and MonitoringAlert so persisted config can stay private while
API responses use a redacted DTO that omits or masks secret. Make the
create/list/update flow map from the stored config to the safe response shape
before returning it.
In `@packages/api/rest/src/validators/monitoring-validators.ts`:
- Around line 16-20: The webhook validation in webhookConfigSchema currently
allows any header key, which can let users override reserved delivery headers
later used by the dispatcher. Update the schema or validation logic in
monitoring-validators.ts to reject reserved header names case-insensitively for
config.headers, especially X-Galaxy-* and Content-Type, so the generated webhook
signature, timestamp, event id, and content type cannot be overwritten. If you
prefer to keep validation permissive, then adjust the webhook dispatch path that
uses config.headers so the fixed headers are applied last and remain immutable.
In `@packages/api/rest/src/worker/index.ts`:
- Around line 43-46: The shutdown flow in shutdown currently exits immediately
after worker.stop(), which can interrupt in-flight createEvent() / dispatch /
updateEventDelivery() work and leave event state inconsistent. Update the
shutdown handling to stop accepting new intervals, then await any active work or
the worker’s graceful completion before calling process.exit, using the existing
shutdown() and worker.stop() logic as the entry point.
- Around line 33-39: The `main` setup in `PositionMonitorWorker` is casting
`MONITOR_NETWORK` to `StellarNetworkName`, which allows invalid values to slip
through and default to the wrong network. Replace the cast with explicit
validation/parsing before constructing `PositionMonitorWorker`, and only pass a
confirmed `'mainnet'` or `'testnet'` value (fall back to the existing default
when the env var is missing or invalid). Use the `main` function and the
`PositionMonitorWorker` constructor as the places to update.
In `@packages/api/rest/src/worker/position-monitor.worker.ts`:
- Around line 96-100: The retryTick() method in position-monitor.worker.ts is
currently a no-op, so alerts marked retrying by triggerAlert() with
deliveryStatus and nextRetryAt never get redelivered. Update retryTick() to
query due retry events from alert_events based on nextRetryAt and retrying
status, then invoke the existing resend path used for initial delivery so
transient failures are retried instead of stalling forever.
- Around line 72-78: The current setInterval-based scheduling in
position-monitor.worker.ts can overlap async runs, causing duplicate processing
before markEvaluated() completes. Replace the timer callbacks around
evaluationTick() and retryTick() with non-overlapping execution guards or
self-scheduling loops that only queue the next run after the previous Promise
settles. Use the existing evaluationTick, retryTick, evaluationTimer, and
retryTimer logic to keep one in-flight cycle per loop and preserve the existing
error logging.
In `@scratch/setup-test-user.mjs`:
- Around line 6-10: Remove the baked-in Supabase JWT fallback values from
setup-test-user.mjs so the script does not ship hardcoded service-role or anon
credentials. Update SUPABASE_URL, SERVICE_KEY, and ANON_KEY handling to require
the corresponding environment variables and fail fast with a clear error if they
are missing, using the existing setup-test-user.mjs entrypoint logic. Apply the
same env-only credential handling pattern in scratch/smoke.sh so both scratch
tools are consistent.
In `@scratch/smoke.sh`:
- Around line 223-227: The smoke test expectation is out of sync with
PositionMonitorWorker.parseHealthFactor() because it only treats the unicode
infinity symbol as non-triggering. Update the health-factor check in
scratch/smoke.sh so it also recognizes the string "infinity" case-insensitively,
matching parseHealthFactor() behavior, and keep EXPECT_TRIGGER=0 for both
representations when verifying the webhook dispatch path.
- Around line 263-264: The smoke test is hardcoding the Supabase DB container
name in the EVENT_ROW lookup, which will break on clones/forks with different
local project slugs. Update the `scratch/smoke.sh` assertion to use the same
dynamic container naming/source as the rest of the script (or derive the DB
container name from the configured project slug) so the `docker exec` call works
regardless of repository name.
In `@supabase/migrations/20260629000000_monitoring_alerts.sql`:
- Around line 34-47: Add a dedicated event_id column to alert_events and make it
unique so AlertEventPayload.eventId is stored as a first-class idempotency key
instead of only inside payload JSONB. Update the monitoring_alerts migration and
the createEvent() insert path to populate event_id from payload.eventId, and
ensure any lookup/dedup logic references the new column; use the alert_events
and createEvent symbols to locate the affected schema and write path.
---
Nitpick comments:
In `@packages/api/rest/src/worker/__tests__/position-monitor.worker.test.ts`:
- Around line 172-178: The idempotent start/stop test in
position-monitor.worker.test.ts does not actually verify interval cleanup
because it only calls worker.start() and worker.stop(). Update the test around
buildWorker and the worker.start/worker.stop methods to spy on setInterval and
clearInterval, or use fake timers, and assert the expected call counts so the
test fails if timer registration or cleanup is removed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 9773452a-12ff-4619-9267-1cf7c72269ad
📒 Files selected for processing (31)
packages/api/rest/package.jsonpackages/api/rest/src/index.tspackages/api/rest/src/middleware/validate.tspackages/api/rest/src/repositories/__tests__/monitoring-alert.repository.test.tspackages/api/rest/src/repositories/monitoring-alert.repository.tspackages/api/rest/src/routes/monitoring/__tests__/alerts.routes.test.tspackages/api/rest/src/routes/monitoring/alerts.tspackages/api/rest/src/services/monitoring/__tests__/alert-evaluator.test.tspackages/api/rest/src/services/monitoring/__tests__/monitoring-alert.service.test.tspackages/api/rest/src/services/monitoring/__tests__/protocol-pool.test.tspackages/api/rest/src/services/monitoring/alert-evaluator.tspackages/api/rest/src/services/monitoring/channels/__tests__/channel-dispatcher.test.tspackages/api/rest/src/services/monitoring/channels/__tests__/webhook-channel.test.tspackages/api/rest/src/services/monitoring/channels/channel-dispatcher.tspackages/api/rest/src/services/monitoring/channels/notification-channel.tspackages/api/rest/src/services/monitoring/channels/webhook-channel.tspackages/api/rest/src/services/monitoring/monitoring-alert.service.tspackages/api/rest/src/services/monitoring/protocol-pool.tspackages/api/rest/src/types/monitoring-types.tspackages/api/rest/src/utils/supabase.tspackages/api/rest/src/validators/monitoring-validators.tspackages/api/rest/src/worker/__tests__/position-monitor.worker.test.tspackages/api/rest/src/worker/index.tspackages/api/rest/src/worker/position-monitor.worker.tsscratch/run-worker-with-fake-protocol.tsscratch/setup-test-user.mjsscratch/smoke.shscratch/webhook-receiver.mjssupabase/migrations/20260222090135_smart_wallets.sqlsupabase/migrations/20260324235408_drop_encrypted_private_key.sqlsupabase/migrations/20260629000000_monitoring_alerts.sql
| router.get( | ||
| '/alerts/:id/events', | ||
| validate(alertIdParamSchema, 'params'), | ||
| async (req, res, next) => { | ||
| const userId = requireUser(req, res); | ||
| if (!userId) return; | ||
| try { | ||
| const limit = req.query.limit ? Number(req.query.limit) : undefined; | ||
| const offset = req.query.offset ? Number(req.query.offset) : undefined; | ||
| const events = await service.listEventsForUser(req.params.id, userId, { limit, offset }); | ||
| res.json({ events }); |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Validate event-history pagination too.
This endpoint parses limit/offset manually instead of running them through Joi like GET /alerts. Non-numeric values become NaN, and negative or oversized values go straight into service.listEventsForUser(...) instead of failing fast with a 400. Please add a query schema for this route and reuse validate(..., 'query') here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/api/rest/src/routes/monitoring/alerts.ts` around lines 135 - 145,
The GET /alerts/:id/events handler is parsing limit and offset manually, so
invalid query values can become NaN or pass unchecked into
service.listEventsForUser. Add a Joi query schema for event-history pagination,
then wire it into this route with validate(..., 'query') the same way GET
/alerts does, so the handler rejects non-numeric, negative, or oversized values
before calling listEventsForUser.
| it('returns a future Date for in-range attempt counts', () => { | ||
| const dispatcher = new ChannelDispatcher([new FakeChannel('webhook')]); | ||
| const next = dispatcher.computeNextRetry(1); | ||
| expect(next).not.toBeNull(); | ||
| expect(next!.getTime()).toBeGreaterThanOrEqual(Date.now()); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win
Stabilize this retry timestamp assertion.
computeNextRetry(1) can legally return new Date(Date.now()) when jitter is 0, so comparing it to a later Date.now() makes the test flaky. Capture a before/after window or mock time instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In
`@packages/api/rest/src/services/monitoring/channels/__tests__/channel-dispatcher.test.ts`
around lines 81 - 85, The ChannelDispatcher retry timestamp test is flaky
because it compares computeNextRetry(1) against a later Date.now() even though
the method can return an equal timestamp when jitter is zero. Update the
assertion in channel-dispatcher.test.ts to use a stable time window (capture
before/after around the call) or mock the clock, and keep the check focused on
ChannelDispatcher.computeNextRetry rather than relying on live time.
| async dispatch(alert: MonitoringAlert, payload: AlertEventPayload): Promise<ChannelDeliveryResult> { | ||
| const channel = this.channels.get(alert.channel); | ||
| if (!channel) { | ||
| return { | ||
| success: false, | ||
| durationMs: 0, | ||
| retryable: false, | ||
| error: `No implementation registered for channel "${alert.channel}"`, | ||
| }; | ||
| } | ||
| return channel.send(alert, payload); |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | ⚡ Quick win
Catch channel exceptions inside the dispatcher boundary.
dispatch() currently lets channel.send() throw. In packages/api/rest/src/worker/position-monitor.worker.ts:128-177, triggerAlert() awaits this call before updating the event row, so one unexpected channel exception can skip persistence and abort processing of later alerts in the same tick. Convert thrown errors into a failed ChannelDeliveryResult here.
Suggested fix
async dispatch(alert: MonitoringAlert, payload: AlertEventPayload): Promise<ChannelDeliveryResult> {
const channel = this.channels.get(alert.channel);
if (!channel) {
return {
success: false,
durationMs: 0,
retryable: false,
error: `No implementation registered for channel "${alert.channel}"`,
};
}
- return channel.send(alert, payload);
+ const start = Date.now();
+ try {
+ return await channel.send(alert, payload);
+ } catch (err) {
+ return {
+ success: false,
+ durationMs: Date.now() - start,
+ retryable: true,
+ error: err instanceof Error ? err.message : String(err),
+ };
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async dispatch(alert: MonitoringAlert, payload: AlertEventPayload): Promise<ChannelDeliveryResult> { | |
| const channel = this.channels.get(alert.channel); | |
| if (!channel) { | |
| return { | |
| success: false, | |
| durationMs: 0, | |
| retryable: false, | |
| error: `No implementation registered for channel "${alert.channel}"`, | |
| }; | |
| } | |
| return channel.send(alert, payload); | |
| async dispatch(alert: MonitoringAlert, payload: AlertEventPayload): Promise<ChannelDeliveryResult> { | |
| const channel = this.channels.get(alert.channel); | |
| if (!channel) { | |
| return { | |
| success: false, | |
| durationMs: 0, | |
| retryable: false, | |
| error: `No implementation registered for channel "${alert.channel}"`, | |
| }; | |
| } | |
| const start = Date.now(); | |
| try { | |
| return await channel.send(alert, payload); | |
| } catch (err) { | |
| return { | |
| success: false, | |
| durationMs: Date.now() - start, | |
| retryable: true, | |
| error: err instanceof Error ? err.message : String(err), | |
| }; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/api/rest/src/services/monitoring/channels/channel-dispatcher.ts`
around lines 28 - 38, `ChannelDispatcher.dispatch` currently returns directly
from `channel.send(...)`, which allows channel exceptions to escape the
dispatcher boundary; wrap that call in error handling so any thrown exception is
converted into a failed `ChannelDeliveryResult` with `success: false`,
`retryable` set appropriately, and an error message. Keep the existing
missing-channel path intact, and make sure the fix is localized in
`ChannelDispatcher.dispatch` so `triggerAlert()` in the worker always receives a
result instead of an exception.
| headers: { | ||
| 'Content-Type': 'application/json', | ||
| 'User-Agent': 'GalaxyDevKit-Monitoring/1.0', | ||
| 'X-Galaxy-Event': 'monitoring.alert.triggered', | ||
| 'X-Galaxy-Event-Id': payload.eventId, | ||
| 'X-Galaxy-Timestamp': timestamp, | ||
| 'X-Galaxy-Signature': `sha256=${signature}`, | ||
| ...(config.headers ?? {}), | ||
| }, |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Do not let custom headers override the signed system headers.
Because config.headers is merged last, a user-configured webhook can replace X-Galaxy-Signature, X-Galaxy-Timestamp, X-Galaxy-Event-Id, or Content-Type. That breaks signature verification and idempotency downstream. Merge caller headers first, then write the reserved headers on top.
Suggested fix
const response = await this.fetchImpl(config.url, {
method: 'POST',
headers: {
+ ...(config.headers ?? {}),
'Content-Type': 'application/json',
'User-Agent': 'GalaxyDevKit-Monitoring/1.0',
'X-Galaxy-Event': 'monitoring.alert.triggered',
'X-Galaxy-Event-Id': payload.eventId,
'X-Galaxy-Timestamp': timestamp,
'X-Galaxy-Signature': `sha256=${signature}`,
- ...(config.headers ?? {}),
},
body,
signal: controller.signal,
});📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| headers: { | |
| 'Content-Type': 'application/json', | |
| 'User-Agent': 'GalaxyDevKit-Monitoring/1.0', | |
| 'X-Galaxy-Event': 'monitoring.alert.triggered', | |
| 'X-Galaxy-Event-Id': payload.eventId, | |
| 'X-Galaxy-Timestamp': timestamp, | |
| 'X-Galaxy-Signature': `sha256=${signature}`, | |
| ...(config.headers ?? {}), | |
| }, | |
| headers: { | |
| ...(config.headers ?? {}), | |
| 'Content-Type': 'application/json', | |
| 'User-Agent': 'GalaxyDevKit-Monitoring/1.0', | |
| 'X-Galaxy-Event': 'monitoring.alert.triggered', | |
| 'X-Galaxy-Event-Id': payload.eventId, | |
| 'X-Galaxy-Timestamp': timestamp, | |
| 'X-Galaxy-Signature': `sha256=${signature}`, | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/api/rest/src/services/monitoring/channels/webhook-channel.ts` around
lines 52 - 60, The webhook request headers in webhook-channel.ts are currently
allowing config.headers to override reserved system headers, which can break
signature verification and event handling. Update the header merge in the
webhook send path so caller-provided headers are applied first and the reserved
headers such as Content-Type, X-Galaxy-Event, X-Galaxy-Event-Id,
X-Galaxy-Timestamp, and X-Galaxy-Signature are written afterward in the
webhook-channel logic. Keep the fix localized to the header construction around
the payload dispatch so these signed values always win.
| async update( | ||
| id: string, | ||
| userId: string, | ||
| input: UpdateMonitoringAlertInput | ||
| ): Promise<MonitoringAlert> { | ||
| const updated = await this.repo.update(id, userId, input); | ||
| if (!updated) { | ||
| throw new MonitoringError( | ||
| MonitoringErrorCode.ALERT_NOT_FOUND, | ||
| 'Monitoring alert not found', | ||
| 404 | ||
| ); | ||
| } | ||
| return updated; | ||
| } |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win
Re-validate channelConfig on PATCH.
update() writes partial changes without checking that the new channelConfig still matches the alert's stored channel. Right now a webhook alert can be patched to { to: 'a@b.com' }, and the worker later treats that row as WebhookChannelConfig when signing/sending. Please load the current alert first and validate the merged state before calling repo.update().
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/api/rest/src/services/monitoring/monitoring-alert.service.ts` around
lines 54 - 68, The update() method in MonitoringAlertService currently patches
directly through repo.update() without re-validating channelConfig against the
alert’s stored channel. Load the existing alert first, merge it with the
incoming UpdateMonitoringAlertInput, and validate the resulting state before
writing; if the alert is missing, keep the existing ALERT_NOT_FOUND behavior.
Make the check happen in update() using the MonitoringAlertService and repo
methods so a PATCH cannot leave an incompatible channelConfig for the current
channel.
| async retryTick(): Promise<void> { | ||
| // Retry queue is read directly from alert_events. Future work: pull events | ||
| // due for retry and resend; intentionally lean for the first iteration so | ||
| // we don't ship untested retry-fanout logic before we observe real traffic. | ||
| } |
There was a problem hiding this comment.
🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift
retrying events never get retried.
triggerAlert() records deliveryStatus: 'retrying' plus nextRetryAt, but Lines 96-100 are a no-op. Any transient 5xx/timeout will stay stuck forever and never be redelivered.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packages/api/rest/src/worker/position-monitor.worker.ts` around lines 96 -
100, The retryTick() method in position-monitor.worker.ts is currently a no-op,
so alerts marked retrying by triggerAlert() with deliveryStatus and nextRetryAt
never get redelivered. Update retryTick() to query due retry events from
alert_events based on nextRetryAt and retrying status, then invoke the existing
resend path used for initial delivery so transient failures are retried instead
of stalling forever.
| const SUPABASE_URL = process.env.SUPABASE_URL ?? 'http://127.0.0.1:54321'; | ||
| const SERVICE_KEY = process.env.SUPABASE_SERVICE_ROLE_KEY | ||
| ?? 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6InNlcnZpY2Vfcm9sZSIsImV4cCI6MTk4MzgxMjk5Nn0.EGIM96RAZx35lJzdJsyH-qQwv8Hdp7fsn3W0YpN81IU'; | ||
| const ANON_KEY = process.env.SUPABASE_ANON_KEY | ||
| ?? 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6ImFub24iLCJleHAiOjE5ODM4MTI5OTZ9.CRXP1A7WOeoJeXxjNni43kdQwgnWNReilDMblYTn_I0'; |
There was a problem hiding this comment.
🔒 Security & Privacy | 🟠 Major | ⚡ Quick win
Remove the baked-in Supabase JWT defaults.
Lines 7-10 commit service-role and anon tokens into the repo. Even for scratch tooling, that normalizes hardcoded credentials and lets accidental runs against any instance sharing the local JWT secret use privileged auth. Fail fast when the env vars are missing instead. The same cleanup should be applied consistently in scratch/smoke.sh.
Suggested fix
const SUPABASE_URL = process.env.SUPABASE_URL ?? 'http://127.0.0.1:54321';
-const SERVICE_KEY = process.env.SUPABASE_SERVICE_ROLE_KEY
- ?? '...';
-const ANON_KEY = process.env.SUPABASE_ANON_KEY
- ?? '...';
+const SERVICE_KEY = process.env.SUPABASE_SERVICE_ROLE_KEY;
+const ANON_KEY = process.env.SUPABASE_ANON_KEY;
+
+if (!SERVICE_KEY) throw new Error('SUPABASE_SERVICE_ROLE_KEY is required');
+if (!ANON_KEY) throw new Error('SUPABASE_ANON_KEY is required');📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const SUPABASE_URL = process.env.SUPABASE_URL ?? 'http://127.0.0.1:54321'; | |
| const SERVICE_KEY = process.env.SUPABASE_SERVICE_ROLE_KEY | |
| ?? 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6InNlcnZpY2Vfcm9sZSIsImV4cCI6MTk4MzgxMjk5Nn0.EGIM96RAZx35lJzdJsyH-qQwv8Hdp7fsn3W0YpN81IU'; | |
| const ANON_KEY = process.env.SUPABASE_ANON_KEY | |
| ?? 'eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJzdXBhYmFzZS1kZW1vIiwicm9sZSI6ImFub24iLCJleHAiOjE5ODM4MTI5OTZ9.CRXP1A7WOeoJeXxjNni43kdQwgnWNReilDMblYTn_I0'; | |
| const SUPABASE_URL = process.env.SUPABASE_URL ?? 'http://127.0.0.1:54321'; | |
| const SERVICE_KEY = process.env.SUPABASE_SERVICE_ROLE_KEY; | |
| const ANON_KEY = process.env.SUPABASE_ANON_KEY; | |
| if (!SERVICE_KEY) throw new Error('SUPABASE_SERVICE_ROLE_KEY is required'); | |
| if (!ANON_KEY) throw new Error('SUPABASE_ANON_KEY is required'); |
🧰 Tools
🪛 Betterleaks (1.6.0)
[high] 8-8: Uncovered a JSON Web Token, which may lead to unauthorized access to web applications and sensitive user data.
(jwt)
[high] 10-10: Uncovered a JSON Web Token, which may lead to unauthorized access to web applications and sensitive user data.
(jwt)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scratch/setup-test-user.mjs` around lines 6 - 10, Remove the baked-in
Supabase JWT fallback values from setup-test-user.mjs so the script does not
ship hardcoded service-role or anon credentials. Update SUPABASE_URL,
SERVICE_KEY, and ANON_KEY handling to require the corresponding environment
variables and fail fast with a clear error if they are missing, using the
existing setup-test-user.mjs entrypoint logic. Apply the same env-only
credential handling pattern in scratch/smoke.sh so both scratch tools are
consistent.
Source: Linters/SAST tools
| # A HF above threshold (or Infinity) should NOT trigger. | ||
| if awk -v hf="$FAKE_HEALTH_FACTOR" -v th="$THRESHOLD" 'BEGIN{ if (hf == "∞" || hf+0 >= th+0) exit 0; exit 1 }' 2>/dev/null; then | ||
| EXPECT_TRIGGER=0 | ||
| hint "FAKE_HEALTH_FACTOR=$FAKE_HEALTH_FACTOR ≥ threshold=$THRESHOLD → expecting NO trigger" | ||
| fi |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Keep the smoke expectation aligned with parseHealthFactor().
PositionMonitorWorker.parseHealthFactor() treats "infinity" case-insensitively as +Infinity, but Line 224 only special-cases the unicode ∞. FAKE_HEALTH_FACTOR=infinity will make the worker skip dispatch while this script still expects a webhook.
Suggested fix
-if awk -v hf="$FAKE_HEALTH_FACTOR" -v th="$THRESHOLD" 'BEGIN{ if (hf == "∞" || hf+0 >= th+0) exit 0; exit 1 }' 2>/dev/null; then
+if awk -v hf="$FAKE_HEALTH_FACTOR" -v th="$THRESHOLD" 'BEGIN{
+ if (hf == "∞" || tolower(hf) == "infinity" || hf+0 >= th+0) exit 0;
+ exit 1
+}' 2>/dev/null; then📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # A HF above threshold (or Infinity) should NOT trigger. | |
| if awk -v hf="$FAKE_HEALTH_FACTOR" -v th="$THRESHOLD" 'BEGIN{ if (hf == "∞" || hf+0 >= th+0) exit 0; exit 1 }' 2>/dev/null; then | |
| EXPECT_TRIGGER=0 | |
| hint "FAKE_HEALTH_FACTOR=$FAKE_HEALTH_FACTOR ≥ threshold=$THRESHOLD → expecting NO trigger" | |
| fi | |
| # A HF above threshold (or Infinity) should NOT trigger. | |
| if awk -v hf="$FAKE_HEALTH_FACTOR" -v th="$THRESHOLD" 'BEGIN{ | |
| if (hf == "∞" || tolower(hf) == "infinity" || hf+0 >= th+0) exit 0; | |
| exit 1 | |
| }' 2>/dev/null; then | |
| EXPECT_TRIGGER=0 | |
| hint "FAKE_HEALTH_FACTOR=$FAKE_HEALTH_FACTOR ≥ threshold=$THRESHOLD → expecting NO trigger" | |
| fi |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scratch/smoke.sh` around lines 223 - 227, The smoke test expectation is out
of sync with PositionMonitorWorker.parseHealthFactor() because it only treats
the unicode infinity symbol as non-triggering. Update the health-factor check in
scratch/smoke.sh so it also recognizes the string "infinity" case-insensitively,
matching parseHealthFactor() behavior, and keep EXPECT_TRIGGER=0 for both
representations when verifying the webhook dispatch path.
| EVENT_ROW=$(docker exec supabase_db_galaxy-devkit psql -U postgres -d postgres -tA -c \ | ||
| "SELECT delivery_status || '|' || delivery_attempts FROM alert_events WHERE alert_id = '$ALERT_ID' ORDER BY triggered_at DESC LIMIT 1;") |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Don't hardcode the Supabase DB container name.
Line 263 assumes supabase_db_galaxy-devkit, but Supabase derives container names from the local project slug. Clones/forks will fail this assertion even when the worker and API behaved correctly.
Suggested fix
+SUPABASE_DB_CONTAINER="${SUPABASE_DB_CONTAINER:-$(docker ps --format '{{.Names}}' | grep '^supabase_db_' | head -n1)}"
+[[ -n "$SUPABASE_DB_CONTAINER" ]] || { fail "could not locate Supabase DB container"; exit 1; }
+
EVENT_ROW=$(docker exec supabase_db_galaxy-devkit psql -U postgres -d postgres -tA -c \
+ EVENT_ROW=$(docker exec "$SUPABASE_DB_CONTAINER" psql -U postgres -d postgres -tA -c \
"SELECT delivery_status || '|' || delivery_attempts FROM alert_events WHERE alert_id = '$ALERT_ID' ORDER BY triggered_at DESC LIMIT 1;")🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scratch/smoke.sh` around lines 263 - 264, The smoke test is hardcoding the
Supabase DB container name in the EVENT_ROW lookup, which will break on
clones/forks with different local project slugs. Update the `scratch/smoke.sh`
assertion to use the same dynamic container naming/source as the rest of the
script (or derive the DB container name from the configured project slug) so the
`docker exec` call works regardless of repository name.
| CREATE TABLE public.alert_events ( | ||
| id UUID DEFAULT uuid_generate_v4() PRIMARY KEY, | ||
| alert_id UUID NOT NULL REFERENCES public.monitoring_alerts(id) ON DELETE CASCADE, | ||
| triggered_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), | ||
| health_factor_value NUMERIC(20, 8), | ||
| payload JSONB NOT NULL, | ||
| delivery_status alert_event_delivery_status NOT NULL DEFAULT 'pending', | ||
| delivery_attempts INTEGER NOT NULL DEFAULT 0, | ||
| last_attempt_at TIMESTAMP WITH TIME ZONE, | ||
| next_retry_at TIMESTAMP WITH TIME ZONE, | ||
| last_error TEXT, | ||
| created_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(), | ||
| updated_at TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW() | ||
| ); |
There was a problem hiding this comment.
🗄️ Data Integrity & Integration | 🟠 Major | 🏗️ Heavy lift
Persist the webhook idempotency key as a real column.
AlertEventPayload already treats eventId as the canonical idempotency key, but alert_events only buries it inside payload JSONB. That leaves no uniqueness or indexed lookup to stop a duplicate logical event from being inserted if createEvent() is retried after an ambiguous failure, even though this table is also your retry ledger/history. Add a dedicated event_id column with a unique constraint and populate it from payload.eventId.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@supabase/migrations/20260629000000_monitoring_alerts.sql` around lines 34 -
47, Add a dedicated event_id column to alert_events and make it unique so
AlertEventPayload.eventId is stored as a first-class idempotency key instead of
only inside payload JSONB. Update the monitoring_alerts migration and the
createEvent() insert path to populate event_id from payload.eventId, and ensure
any lookup/dedup logic references the new column; use the alert_events and
createEvent symbols to locate the affected schema and write path.
|
@mariocodecr fix the conflicts |
Pull Request: Feat/api realtime position monitoring
📋 Description
Adds real-time position monitoring with liquidation alerts for the REST API (Issue #306, Roadmap #53). Users can configure alerts that watch a Stellar
account's health factor on a lending protocol (Blend in this iteration) and receive a webhook when it drops below a configured threshold.
Architecture summary:
jitter, and idempotency-key delivery
🔗 Related Issues
Closes #306
🧪 Testing
✅Results:
Smoke E2E (run with ./scratch/smoke.sh):
📚 Documentation Updates (Required)
docs/AI.mdwith new patterns/examplessignature)
Documentation Checklist by Component:
If you modified
packages/core/stellar-sdk/:packages/core/stellar-sdk/README.mddocs/examples/stellar-sdk/If you modified
packages/core/invisible-wallet/:packages/core/invisible-wallet/README.mddocs/SECURITY.mddocs/ARCHITECTURE.mdIf you modified
packages/core/automation/:packages/core/automation/README.mddocs/examples/automation/If you modified
packages/core/defi-protocols/:packages/core/defi-protocols/README.mddocs/ARCHITECTURE.mdIf you modified
packages/core/oracles/:packages/core/oracles/README.mdIf you modified
packages/contracts/:If you modified
tools/cli/:🤖 AI-Friendly Documentation
New Files Created
Key Functions/Classes Added
Patterns Used
endpoint.
For future similar features, the layering is routes (thin) → service → repo + worker → service → repo, keeping HTTP and background-job concerns
separate.
📸 Screenshots (if applicable)
N/A — backend feature.
No breaking changes
Breaking changes documented in CHANGELOG.md
Two preexisting migrations (20260222090135_smart_wallets.sql, 20260324235408_drop_encrypted_private_key.sql) referenced a public.invisible_wallets
table that isn't created by any migration in this stream, so supabase start failed on a clean checkout. Both ALTER TABLE statements are now wrapped in
an existence guard, making them idempotent and no-ops when the table is absent. This is not a behavior change for environments where the table exists —
just a fix that lets a fresh local environment boot.
🔄 Deployment Notes
- npm run worker --workspace=@galaxy-kj/api-rest (production, expects compiled dist/)
- npm run worker:dev --workspace=@galaxy-kj/api-rest (dev, via ts-node)
Required env: SUPABASE_URL, SUPABASE_SERVICE_ROLE_KEY, optional MONITOR_NETWORK (testnet/mainnet), MONITOR_EVAL_INTERVAL_MS (default 60000),
MONITOR_BATCH_SIZE (default 50).
3. Horizontal scaling — current design assumes one worker per network. Scaling to N workers will need a FOR UPDATE SKIP LOCKED lease on
monitoring_alerts (documented in the worker's JSDoc, not implemented here).
4. Webhook URL scheme — production rejects plaintext (http://) URLs; only https:// is accepted. Local dev allows http:// for smoke tests against
127.0.0.1.
✅ Final Checklist
inherited from audit logger
By submitting this PR, I confirm that:
Summary by CodeRabbit