Skip to content

Tech Debt: Implement CloudWatch alarms and operational monitoring #31

@pofallon

Description

@pofallon

Context

The README documents operational TODOs that were deferred during site-shell implementation (spec 002):

## Monitoring & Operational TODOs

- [ ] Connect the Amplify app to CloudWatch alarms that page on sustained 5xx errors from the SSR Lambda.
- [ ] Schedule the Playwright navigation suite via GitHub Actions (daily) and surface failures in Slack.
- [ ] Add Route 53 health checks for \`/\` and \`/blog\` once the Amplify branch is publicly reachable.
- [ ] Record error budget dashboards (Core Web Vitals + uptime) once traffic is available.

Tasks

CloudWatch Alarms

  • Create CloudWatch alarm for SSR Lambda 5xx error rate
  • Configure SNS topic for paging on sustained errors
  • Set appropriate thresholds (e.g., >1% 5xx for 5 minutes)

Automated E2E Testing

  • Create GitHub Actions workflow for daily Playwright runs
  • Configure Slack webhook for failure notifications
  • Run tests against staging/production URLs

Health Checks

  • Create Route 53 health checks for / endpoint
  • Create Route 53 health checks for /blog endpoint
  • Configure alerting for health check failures

Observability Dashboard

  • Set up Core Web Vitals tracking (LCP, FID, CLS)
  • Create uptime monitoring dashboard
  • Define error budget SLOs

Related Specs

  • 002-nextjs-app-shell (README.md line 148-153)

Metadata

Metadata

Assignees

No one assigned

    Labels

    tech debtTechnical debt to address

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions