feat(db): normalization + shape CHECK constraints migration, seed hardening#117
Conversation
… PRs, TDD task breakdown
…no more base truncation
…tion collided distinct sections
…e normalized, 400 on malformed
… not a job-killing section
safeFilename now allows '.' in the section part so '26 00 13.10' renders as '26-00-13.10-Panelboards.docx' rather than mangling the dot to a dash. Function exported for unit testing.
…CX title Regression pins — no production change. Verifies that renderMarkdown emits the section verbatim in the H1 header and that generateDocx writes it unchanged into document.xml, so future refactors cannot silently mangle dotted agency suffixes (e.g. '27 05 13.43').
…RCHITECTURE examples
…shape CHECK constraints
…xact-match refs, catalog join
📝 WalkthroughWalkthroughThis PR adds database schema validation for section identifiers, refactors seeding to normalize and canonicalize parsed sections while tracking inputs, and validates the complete end-to-end flow through integration tests with agency-suffixed sections. ChangesSection Normalization and Validation
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
|
@coderabbitai review |
✅ Action performedReview finished.
|
…now 400 instead of silent drop
Summary
Stacked on the API PR — the final layer of the section-number expansion:
specs.section+spec_sections.section_number(NBSP→space first, so the collapse is locale-independent; unique-constraint collisions abort loudly by design); (2) CHECK constraints —specs.sectionadmits the expanded shape or'unknown';spec_sections.section_numberis canonical-only.spec_references.target_spec_sectionis deliberately unconstrained (it records what the source document said — ADR-020). Down migration drops both constraints; normalization is acknowledged lossy.26_29_23.SEC'sSECTION …shape — the full 666-file corpus now seeds, 239 suffixed); values normalize before upsert; unnormalizable SCNs are skipped; an aggregatelogger.warn({scanned, kept, skipped})fires if any file is ever skipped (no more silent drops).01 32 01.00 10) loads → persists with section intact → exact-match ref resolution finds it → catalog join lists itinDatabasefor division 01.The CHECK constraints are the backstop for the two direct-persist paths (MCP
parse_document,pnpm load:files) that bypass the API worker gate.Test Plan
pnpm test— 586 unit tests greendocker compose up -d postgres && cp -n .env.example .env; set -a && source .env && set +a && pnpm migrate && pnpm seed && pnpm test:integration— 143 integration tests; constraint accept/reject tests; migrationup/down/upcycles cleanlypnpm load:files 'docs/references/UFGS/DIVISION_01/01_32_01.00_10.SEC'then verifySELECT section FROM specs WHERE section = '01 32 01.00 10'returns one rowSELECT count(*) FILTER (WHERE section_number ~ '\.') FROM spec_sections≈ 239Out of Scope
This PR does NOT add family/fuzzy ref matching, structured section columns, or sort-order changes (lexicographic ordering is provably correct for this fixed-width grammar — see ADR-020). Mockup-branch SPA linkifier parity is tracked separately.
Summary by CodeRabbit
Bug Fixes
Tests