Skip to content

feat(db): normalization + shape CHECK constraints migration, seed hardening#117

Merged
thewrz merged 32 commits into
mainfrom
feat/section-number-db
Jun 6, 2026
Merged

feat(db): normalization + shape CHECK constraints migration, seed hardening#117
thewrz merged 32 commits into
mainfrom
feat/section-number-db

Conversation

@thewrz
Copy link
Copy Markdown
Contributor

@thewrz thewrz commented Jun 6, 2026

Summary

Stacked on the API PR — the final layer of the section-number expansion:

  • Migration 013: (1) whitespace-normalization UPDATE on specs.section + spec_sections.section_number (NBSP→space first, so the collapse is locale-independent; unique-constraint collisions abort loudly by design); (2) CHECK constraints — specs.section admits the expanded shape or 'unknown'; spec_sections.section_number is canonical-only. spec_references.target_spec_section is deliberately unconstrained (it records what the source document said — ADR-020). Down migration drops both constraints; normalization is acknowledged lossy.
  • Seed hardening: SCN regex is prefix-optional + leading-whitespace tolerant (captures the 2 bare-SCN files and 26_29_23.SEC's SECTION … shape — the full 666-file corpus now seeds, 239 suffixed); values normalize before upsert; unnormalizable SCNs are skipped; an aggregate logger.warn({scanned, kept, skipped}) fires if any file is ever skipped (no more silent drops).
  • E2E integration: agency-suffixed corpus file (01 32 01.00 10) loads → persists with section intact → exact-match ref resolution finds it → catalog join lists it inDatabase for division 01.

The CHECK constraints are the backstop for the two direct-persist paths (MCP parse_document, pnpm load:files) that bypass the API worker gate.

Test Plan

  • pnpm test — 586 unit tests green
  • docker compose up -d postgres && cp -n .env.example .env; set -a && source .env && set +a && pnpm migrate && pnpm seed && pnpm test:integration — 143 integration tests; constraint accept/reject tests; migration up/down/up cycles cleanly
  • Corpus smoke: pnpm load:files 'docs/references/UFGS/DIVISION_01/01_32_01.00_10.SEC' then verify SELECT section FROM specs WHERE section = '01 32 01.00 10' returns one row
  • SELECT count(*) FILTER (WHERE section_number ~ '\.') FROM spec_sections ≈ 239

Out of Scope

This PR does NOT add family/fuzzy ref matching, structured section columns, or sort-order changes (lexicographic ordering is provably correct for this fixed-width grammar — see ADR-020). Mockup-branch SPA linkifier parity is tracked separately.

Summary by CodeRabbit

  • Bug Fixes

    • Section data now normalizes whitespace for consistent formatting
    • Added validation rules to enforce proper section numbering patterns across the database
  • Tests

    • Expanded test coverage for section normalization and validation constraints
    • Added integration tests for file loading and section resolution

thewrz added 25 commits June 5, 2026 19:00
safeFilename now allows '.' in the section part so '26 00 13.10'
renders as '26-00-13.10-Panelboards.docx' rather than mangling the
dot to a dash. Function exported for unit testing.
…CX title

Regression pins — no production change. Verifies that renderMarkdown
emits the section verbatim in the H1 header and that generateDocx
writes it unchanged into document.xml, so future refactors cannot
silently mangle dotted agency suffixes (e.g. '27 05 13.43').
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

Review Change Stack

📝 Walkthrough

Walkthrough

This PR adds database schema validation for section identifiers, refactors seeding to normalize and canonicalize parsed sections while tracking inputs, and validates the complete end-to-end flow through integration tests with agency-suffixed sections.

Changes

Section Normalization and Validation

Layer / File(s) Summary
Database schema normalization and validation
src/db/migrations/013_section_number_normalize_and_check.ts, src/db/queries/specs.integration.test.ts
Migration 013 normalizes whitespace in specs.section and spec_sections.section_number by converting NBSP to regular spaces and collapsing runs, then enforces a CHECK constraint validating both columns against a numeric dot-segment pattern or the literal 'unknown' sentinel. Tests verify valid shapes are accepted and malformed values are rejected by constraint name.
Section parsing and canonical collection
src/db/seed.ts, src/db/seed.test.ts
seed.ts imports normalizeSectionNumber and introduces CollectResult interface to return both canonical records and a total scanned count. SCN parsing regex accepts optional SECTION keyword and leading whitespace. extractSectionMeta normalizes parsed SCNs and returns null when normalization fails. collectFromContents filters contents into canonical records. seed logs collection counts and warns when inputs exceed kept records. Tests cover whitespace canonicalization, suffix preservation, bare SCN formats, and scanned vs. kept record behavior.
End-to-end file loading and resolution
src/lib/file-loader.integration.test.ts
Adds AGENCY_FIXTURE and three integration tests: (1) loads agency-suffixed UFGS corpus file and asserts specs.section value is preserved correctly, (2) creates spec with reference to agency-suffixed target section and asserts spec_references.target_spec_id resolves correctly, (3) verifies listSpecSections('01') includes the suffixed section with inDatabase: true. All tests clean up inserted data.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through seeds so fine, 🐰
Where whitespace trims and sections align,
With UNKNOWN checks and dots so neat,
Agency suffixes now complete!
Constraints guard what goes in deep. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately and concisely summarizes the main changes: database migration for section number normalization, shape CHECK constraints, and seed improvements.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/section-number-db

Comment @coderabbitai help to get the list of available commands and usage tips.

@thewrz
Copy link
Copy Markdown
Contributor Author

thewrz commented Jun 6, 2026

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 6, 2026

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Base automatically changed from feat/section-number-api to main June 6, 2026 21:05
@thewrz thewrz merged commit 0d0cc78 into main Jun 6, 2026
9 checks passed
@thewrz thewrz deleted the feat/section-number-db branch June 6, 2026 21:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant