feat(db): normalization + shape CHECK constraints migration, seed hardening by thewrz · Pull Request #117 · wrzonance/SpecR

thewrz · 2026-06-06T18:48:54Z

Summary

Stacked on the API PR — the final layer of the section-number expansion:

Migration 013: (1) whitespace-normalization UPDATE on specs.section + spec_sections.section_number (NBSP→space first, so the collapse is locale-independent; unique-constraint collisions abort loudly by design); (2) CHECK constraints — specs.section admits the expanded shape or 'unknown'; spec_sections.section_number is canonical-only. spec_references.target_spec_section is deliberately unconstrained (it records what the source document said — ADR-020). Down migration drops both constraints; normalization is acknowledged lossy.
Seed hardening: SCN regex is prefix-optional + leading-whitespace tolerant (captures the 2 bare-SCN files and 26_29_23.SEC's SECTION … shape — the full 666-file corpus now seeds, 239 suffixed); values normalize before upsert; unnormalizable SCNs are skipped; an aggregate logger.warn({scanned, kept, skipped}) fires if any file is ever skipped (no more silent drops).
E2E integration: agency-suffixed corpus file (01 32 01.00 10) loads → persists with section intact → exact-match ref resolution finds it → catalog join lists it inDatabase for division 01.

The CHECK constraints are the backstop for the two direct-persist paths (MCP parse_document, pnpm load:files) that bypass the API worker gate.

Test Plan

pnpm test — 586 unit tests green
docker compose up -d postgres && cp -n .env.example .env; set -a && source .env && set +a && pnpm migrate && pnpm seed && pnpm test:integration — 143 integration tests; constraint accept/reject tests; migration up/down/up cycles cleanly
Corpus smoke: pnpm load:files 'docs/references/UFGS/DIVISION_01/01_32_01.00_10.SEC' then verify SELECT section FROM specs WHERE section = '01 32 01.00 10' returns one row
SELECT count(*) FILTER (WHERE section_number ~ '\.') FROM spec_sections ≈ 239

Out of Scope

This PR does NOT add family/fuzzy ref matching, structured section columns, or sort-order changes (lexicographic ordering is provably correct for this fixed-width grammar — see ADR-020). Mockup-branch SPA linkifier parity is tracked separately.

Summary by CodeRabbit

Bug Fixes
- Section data now normalizes whitespace for consistent formatting
- Added validation rules to enforce proper section numbering patterns across the database
Tests
- Expanded test coverage for section normalization and validation constraints
- Added integration tests for file loading and section resolution

… PRs, TDD task breakdown

…rator contracts

… string

…no more base truncation

…tion collided distinct sections

…h text parser

…d their titles

…nded shape

…sentinel

…e normalized, 400 on malformed

… not a job-killing section

… parseDocx mock

safeFilename now allows '.' in the section part so '26 00 13.10' renders as '26-00-13.10-Panelboards.docx' rather than mangling the dot to a dash. Function exported for unit testing.

…CX title Regression pins — no production change. Verifies that renderMarkdown emits the section verbatim in the H1 header and that generateDocx writes it unchanged into document.xml, so future refactors cannot silently mangle dotted agency suffixes (e.g. '27 05 13.43').

…RCHITECTURE examples

…upsert

…shape CHECK constraints

…xact-match refs, catalog join

coderabbitai · 2026-06-06T18:49:00Z

📝 Walkthrough

Walkthrough

This PR adds database schema validation for section identifiers, refactors seeding to normalize and canonicalize parsed sections while tracking inputs, and validates the complete end-to-end flow through integration tests with agency-suffixed sections.

Changes

Section Normalization and Validation

Layer / File(s)	Summary
Database schema normalization and validation `src/db/migrations/013_section_number_normalize_and_check.ts`, `src/db/queries/specs.integration.test.ts`	Migration 013 normalizes whitespace in `specs.section` and `spec_sections.section_number` by converting NBSP to regular spaces and collapsing runs, then enforces a CHECK constraint validating both columns against a numeric dot-segment pattern or the literal `'unknown'` sentinel. Tests verify valid shapes are accepted and malformed values are rejected by constraint name.
Section parsing and canonical collection `src/db/seed.ts`, `src/db/seed.test.ts`	`seed.ts` imports `normalizeSectionNumber` and introduces `CollectResult` interface to return both canonical `records` and a total `scanned` count. SCN parsing regex accepts optional `SECTION` keyword and leading whitespace. `extractSectionMeta` normalizes parsed SCNs and returns `null` when normalization fails. `collectFromContents` filters contents into canonical records. `seed` logs collection counts and warns when inputs exceed kept records. Tests cover whitespace canonicalization, suffix preservation, bare SCN formats, and scanned vs. kept record behavior.
End-to-end file loading and resolution `src/lib/file-loader.integration.test.ts`	Adds `AGENCY_FIXTURE` and three integration tests: (1) loads agency-suffixed UFGS corpus file and asserts `specs.section` value is preserved correctly, (2) creates spec with reference to agency-suffixed target section and asserts `spec_references.target_spec_id` resolves correctly, (3) verifies `listSpecSections('01')` includes the suffixed section with `inDatabase: true`. All tests clean up inserted data.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

A rabbit hops through seeds so fine, 🐰
Where whitespace trims and sections align,
With UNKNOWN checks and dots so neat,
Agency suffixes now complete!
Constraints guard what goes in deep. ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main changes: database migration for section number normalization, shape CHECK constraints, and seed improvements.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/section-number-db

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

…dependent CI

thewrz · 2026-06-06T19:16:00Z

@coderabbitai review

coderabbitai · 2026-06-06T19:16:05Z

✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

…now 400 instead of silent drop

#114–#117 (#118)

thewrz added 25 commits June 5, 2026 19:00

docs(specs): design — section-number expansion across all ingest formats

61650c1

docs(plans): section-number expansion implementation plan — 4 sub-MVP…

4701aba

… PRs, TDD task breakdown

feat(lib): section-number module — expanded-shape validator + normalizer

823cc0c

test(lib): pin section-number fragment capture-group + multiline sepa…

496e2df

…rator contracts

docs(adr): ADR-020 expanded section-number shape as opaque normalized…

e3fa3c9

… string

fix(parser): prose section refs capture dotted and agency suffixes — …

86be5ec

…no more base truncation

fix(lib): section inference keeps dotted and agency suffixes — trunca…

33273bf

…tion collided distinct sections

fix(lib): strip dash separator in inferred inline titles — parity wit…

10e3544

…h text parser

fix(parser): .txt header extraction keeps suffixed section numbers an…

f89c2c4

…d their titles

feat(parser): SEC SCN/SRF section numbers normalize to canonical expa…

f156694

…nded shape

test(parser): pin internal SCN whitespace normalization

972d372

docs(parser): correct SCN comment — gates not yet landed

94a156b

feat(api): AST schemas accept expanded section shapes; PATCH rejects …

5259fd5

…sentinel

feat(api): parse worker schema gates expanded shapes; section overrid…

5c6d402

…e normalized, 400 on malformed

fix(parser): normalize dc:subject so free-text degrades to 'unknown',…

c71b6f0

… not a job-killing section

fix(api): friendly job error for section-gate failures; refresh stale…

8da2c2b

… parseDocx mock

fix(api): download filename preserves section dotted suffix

fa48fab

safeFilename now allows '.' in the section part so '26 00 13.10' renders as '26-00-13.10-Panelboards.docx' rather than mangling the dot to a dash. Function exported for unit testing.

test(api): PATCH accepts expanded section shapes over HTTP; refresh A…

9607118

…RCHITECTURE examples

feat(db): seed tolerates bare SCN, normalizes section numbers before …

c45a227

…upsert

fix(db): seed tolerates leading whitespace before SCN SECTION keyword

970c3ef

feat(db): warn on skipped section files during seed

7a0797c

feat(db): migration 013 — normalize section whitespace, add expanded-…

be6d98a

…shape CHECK constraints

test(db): leak-proof cleanup in shape-check accept test

9c9f5bc

test(integration): agency-suffixed .SEC end-to-end — load, persist, e…

1f899ba

…xact-match refs, catalog join

thewrz added 4 commits June 6, 2026 12:15

docs(plans): fix markdownlint MD038/MD040 in plan doc

b9f6978

ci: run PR checks for all base branches — stacked sub-MVP PRs need in…

fb18cf3

…dependent CI

merge: propagate lib-branch CI trigger + docs lint fixes up the stack

0d7ecd4

merge: propagate stack updates

1d2dd44

merge: propagate stack updates

24ca054

thewrz added 2 commits June 6, 2026 12:26

fix(api): Zod-validate /parse body fields — non-string section/title …

f188957

…now 400 instead of silent drop

merge: propagate stack updates

b7e2126

Base automatically changed from feat/section-number-api to main June 6, 2026 21:05

thewrz merged commit 0d0cc78 into main Jun 6, 2026
9 checks passed

thewrz deleted the feat/section-number-db branch June 6, 2026 21:23

This was referenced Jun 6, 2026

chore(docs): README reflects expanded section-number grammar (#114–#117) #118

Merged

feat(generator): SpecsIntact .SEC output renderer #108

Open

feat(parser): PDF spec ingest — extraction, OCR fallback, hierarchy inference #65

Open

thewrz added a commit that referenced this pull request Jun 6, 2026

chore(docs): README reflects expanded section-number grammar shipped in

a99298e

#114–#117 (#118)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(db): normalization + shape CHECK constraints migration, seed hardening#117

feat(db): normalization + shape CHECK constraints migration, seed hardening#117
thewrz merged 32 commits into
mainfrom
feat/section-number-db

thewrz commented Jun 6, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

thewrz commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thewrz commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test Plan

Out of Scope

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

thewrz commented Jun 6, 2026

Uh oh!

coderabbitai Bot commented Jun 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

thewrz commented Jun 6, 2026 •

edited

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading

coderabbitai Bot commented Jun 6, 2026 •

edited

Loading