Skip to content

feat(db): dedicated postgres schema + goose migrations (PR1/3)#64

Open
gavinelder wants to merge 3 commits into
masterfrom
pr1-postgres-schema-migrations
Open

feat(db): dedicated postgres schema + goose migrations (PR1/3)#64
gavinelder wants to merge 3 commits into
masterfrom
pr1-postgres-schema-migrations

Conversation

@gavinelder
Copy link
Copy Markdown
Collaborator

Summary

This is PR 1 of 3 in the postgres enterprise hardening plan (see internal plan doc). It addresses the two critical findings from the postgres audit: tables landing in the public schema and schema initialization with no migration system.

  • Add STATICREG_DB_SCHEMA (default: staticreg), validated as a postgres identifier before interpolation into DDL.
  • Set search_path on every pooled connection so the existing webhook batch INSERT (and any future unqualified query) resolves to the dedicated schema with no hot-path code change.
  • Replace the embed-and-Exec bootstrap with pressly/goose v3 migrations, embedded via embed.FS. goose_db_version lives inside the dedicated schema.
  • Drop the post-DDL existence checks; goose tracks state authoritatively.
  • Surface STATICREG_DB_SCHEMA in docs/CONFIGURATION.md and in manifests/deployment.yml (optional ConfigMap key).

Why goose v3.24 (not 3.27)

The latest goose tags require Go 1.25.7. Pinned to v3.24.3 to avoid forcing a Go toolchain bump and a transitive pgx upgrade in this PR.

What's not in this PR (queued for PR 2/3)

  • TLS-by-default (sslmode=require + warn on disable).
  • Tunable connection pool (MaxConns, MinConns, lifetimes via env).
  • DB health endpoint + pool metrics.
  • Production checklist docs.

Behavior change for operators

Existing deployments will get a new staticreg schema on next startup. Tables in public.container_pull_metrics are not auto-migrated — historical data in public is left in place. Operators with existing data need to either:

  1. Set STATICREG_DB_SCHEMA=public to keep current behavior (not recommended long-term), or
  2. Migrate data with INSERT INTO staticreg.container_pull_metrics SELECT * FROM public.container_pull_metrics; after the new schema is created.

Happy to add a one-shot data migration to PR 1 if reviewers want it; left out for now to keep this PR focused.

Test plan

  • go build ./... && go vet ./... clean
  • go test ./... (unit) passes
  • go test -tags=integration ./pkg/db/... ./pkg/webhook/... passes against docker compose up postgres
  • Direct psql inspection: \dn shows the dedicated schema, \dt staticreg.* shows container_pull_metrics + goose_db_version, migration version 1 recorded
  • Webhook batch flush lands rows in the dedicated schema (verified by integration test)
  • Re-running staticreg serve is a goose no-op (idempotent)
  • Invalid schema name (Bad-Schema; DROP TABLE) is rejected before any SQL runs
  • Reviewer to verify k8s rollout against a staging deployment with STATICREG_DB_SCHEMA overridden via ConfigMap

🤖 Generated with Claude Code

StaticReg's postgres tables previously landed in the public schema, and the
schema was bootstrapped by re-executing an embedded SQL blob on every startup
with no version tracking. That blocks enterprise deployments where the database
is shared and schema evolution needs to be versioned and reversible.

- Add STATICREG_DB_SCHEMA (default: staticreg), validated as a postgres
  identifier before interpolation.
- Set search_path on every pooled connection so unqualified queries — including
  the existing webhook batch INSERT — resolve to the dedicated schema with no
  hot-path code change.
- Replace the embed-and-Exec bootstrap with pressly/goose v3 migrations
  embedded via embed.FS. goose_db_version lives inside the dedicated schema.
- Drop the post-DDL existence checks; goose tracks state authoritatively.
- Document STATICREG_DB_SCHEMA in CONFIGURATION.md and surface it as an
  optional ConfigMap key in the deployment manifest.
- Add integration tests (build tag: integration) that verify schema creation,
  search_path, migration version, idempotent re-runs, invalid-schema
  rejection, and a batch insert landing in the dedicated schema.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Introduces a dedicated PostgreSQL schema (configurable via STATICREG_DB_SCHEMA) and switches DB bootstrapping to embedded pressly/goose migrations, ensuring tables no longer land in public and schema state is tracked via migrations.

Changes:

  • Add STATICREG_DB_SCHEMA with strict identifier validation and set search_path on pooled connections.
  • Replace direct schema SQL execution with embedded goose migrations (pkg/sql/migrations/*.sql).
  • Add integration tests verifying schema creation, migration application, and webhook batch inserts landing in the dedicated schema.

Reviewed changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
pkg/db/postgres.go Adds schema resolution/validation, sets search_path, and runs embedded goose migrations at startup.
pkg/sql/schema.go Switches from embedded SQL string to embedded migrations filesystem (embed.FS).
pkg/sql/migrations/00001_create_container_pull_metrics.sql Adds goose Up/Down migration for metrics table + indexes.
pkg/db/postgres_integration_test.go Integration tests for dedicated schema creation, migrations, and invalid schema rejection.
pkg/webhook/batchserviceadapter_integration_test.go Integration test asserting unqualified INSERTs resolve into the dedicated schema via search_path.
manifests/deployment.yml Surfaces STATICREG_DB_SCHEMA as an optional ConfigMap-provided env var.
docs/CONFIGURATION.md Documents STATICREG_DB_SCHEMA defaults and validation requirements.
go.mod / go.sum Adds goose v3.24.3 and transitive dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pkg/db/postgres.go Outdated
Comment on lines 112 to 116
@@ -77,79 +115,58 @@ func InitPool() *pgxpool.Pool {
// Attempt to create the connection pool
pool, err := pgxpool.NewWithConfig(ctx, config)
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InitPool uses a single 10s timeout context for pool creation (and later passes the same ctx into initSchema/goose.UpContext). If migrations take longer than the remaining time on that ctx, startup can fail due to deadline exceeded even when the DB is healthy. Consider creating a separate (and likely longer / configurable) context for initSchema after the pool is established.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3eae76dinitSchema now runs under its own 1-minute context.WithTimeout from context.Background(). The 10s budget is scoped to pool construction + initial ping. Verified locally that schema bootstrap still completes well inside the new budget.

Comment on lines +36 to +40
t.Cleanup(func() {
ctx := context.Background()
_, _ = pool.Exec(ctx, "DROP SCHEMA IF EXISTS staticreg_test CASCADE")
pool.Close()
})
Copy link

Copilot AI Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test closes pool explicitly later (pool.Close() before the second InitPool), but the t.Cleanup closure still tries to use that same pool to Exec the DROP SCHEMA. On a closed pool the DROP likely won’t run (error is ignored), leaving the schema behind between runs. Consider moving the first pool.Close() into cleanup (and using a different variable for the second pool), or have cleanup drop the schema using whichever pool is still open.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 3eae76d — the test no longer closes the first pool before creating the second. Both pools coexist (pgxpool tolerates multiple instances against the same DB/schema), and t.Cleanup runs the DROP via the still-open first pool. Verified end-to-end: \dn staticreg_test returns 0 rows after the test exits.

gavinelder and others added 2 commits April 27, 2026 12:30
Two integration test files in this PR (pkg/db and pkg/webhook) duplicate
the same STATICREG_DB_* env-var bootstrap. PR3 will add a third caller
(pkg/server). Extract the shared setup into a dedicated package.

- pkg/db/dbtest/dbtest.go (new, build tag integration): exports
  SetEnv(t, schema) which configures the docker-compose postgres
  defaults and a per-test schema name. Compiled only when the
  integration tag is active so it does not enter normal builds.
- pkg/db/postgres_integration_test.go: drop the local setTestDBEnv
  helper, call dbtest.SetEnv instead.
- pkg/webhook/batchserviceadapter_integration_test.go: replace the
  inline env map with dbtest.SetEnv.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Address two review comments on PR #64:

- InitPool reused a single 10s context for pool construction, ping,
  and goose migrations. A migration that runs late in that budget
  could fail with "context deadline exceeded" even on a healthy
  database. Use the 10s budget only for pool create + ping; give
  initSchema a separate 1m context.
- TestInitPool_CreatesDedicatedSchemaAndAppliesMigrations closed the
  first pool before constructing the second, but t.Cleanup still
  used the (now closed) pool to DROP SCHEMA — silently failing and
  leaving staticreg_test behind between runs. Keep the first pool
  open through cleanup; pgxpool tolerates a second pool against the
  same database/schema for the idempotency check. Verified: after
  the test runs, "\dn staticreg_test" returns 0 rows.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants