Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
61650c1
docs(specs): design — section-number expansion across all ingest formats
thewrz Jun 6, 2026
4701aba
docs(plans): section-number expansion implementation plan — 4 sub-MVP…
thewrz Jun 6, 2026
823cc0c
feat(lib): section-number module — expanded-shape validator + normalizer
thewrz Jun 6, 2026
496e2df
test(lib): pin section-number fragment capture-group + multiline sepa…
thewrz Jun 6, 2026
e3fa3c9
docs(adr): ADR-020 expanded section-number shape as opaque normalized…
thewrz Jun 6, 2026
86be5ec
fix(parser): prose section refs capture dotted and agency suffixes — …
thewrz Jun 6, 2026
33273bf
fix(lib): section inference keeps dotted and agency suffixes — trunca…
thewrz Jun 6, 2026
10e3544
fix(lib): strip dash separator in inferred inline titles — parity wit…
thewrz Jun 6, 2026
f89c2c4
fix(parser): .txt header extraction keeps suffixed section numbers an…
thewrz Jun 6, 2026
f156694
feat(parser): SEC SCN/SRF section numbers normalize to canonical expa…
thewrz Jun 6, 2026
972d372
test(parser): pin internal SCN whitespace normalization
thewrz Jun 6, 2026
94a156b
docs(parser): correct SCN comment — gates not yet landed
thewrz Jun 6, 2026
5259fd5
feat(api): AST schemas accept expanded section shapes; PATCH rejects …
thewrz Jun 6, 2026
5c6d402
feat(api): parse worker schema gates expanded shapes; section overrid…
thewrz Jun 6, 2026
c71b6f0
fix(parser): normalize dc:subject so free-text degrades to 'unknown',…
thewrz Jun 6, 2026
8da2c2b
fix(api): friendly job error for section-gate failures; refresh stale…
thewrz Jun 6, 2026
fa48fab
fix(api): download filename preserves section dotted suffix
thewrz Jun 6, 2026
17f9440
test(generator): pin suffixed-section rendering in markdown H1 and DO…
thewrz Jun 6, 2026
9607118
test(api): PATCH accepts expanded section shapes over HTTP; refresh A…
thewrz Jun 6, 2026
c45a227
feat(db): seed tolerates bare SCN, normalizes section numbers before …
thewrz Jun 6, 2026
970c3ef
fix(db): seed tolerates leading whitespace before SCN SECTION keyword
thewrz Jun 6, 2026
7a0797c
feat(db): warn on skipped section files during seed
thewrz Jun 6, 2026
be6d98a
feat(db): migration 013 — normalize section whitespace, add expanded-…
thewrz Jun 6, 2026
9c9f5bc
test(db): leak-proof cleanup in shape-check accept test
thewrz Jun 6, 2026
1f899ba
test(integration): agency-suffixed .SEC end-to-end — load, persist, e…
thewrz Jun 6, 2026
b9f6978
docs(plans): fix markdownlint MD038/MD040 in plan doc
thewrz Jun 6, 2026
fb18cf3
ci: run PR checks for all base branches — stacked sub-MVP PRs need in…
thewrz Jun 6, 2026
0d7ecd4
merge: propagate lib-branch CI trigger + docs lint fixes up the stack
thewrz Jun 6, 2026
1d2dd44
merge: propagate stack updates
thewrz Jun 6, 2026
24ca054
merge: propagate stack updates
thewrz Jun 6, 2026
f188957
fix(api): Zod-validate /parse body fields — non-string section/title …
thewrz Jun 6, 2026
b7e2126
merge: propagate stack updates
thewrz Jun 6, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,9 @@ name: CI
on:
push:
branches: ["main"]
# No base-branch filter: stacked sub-MVP PRs target feature branches and
# must pass CI independently (CLAUDE.md PR discipline).
pull_request:
branches: ["main"]

permissions:
contents: read
Expand Down
6 changes: 3 additions & 3 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,7 +225,7 @@ interface CsiNode {

interface CsiTree {
id: string // spec ID
section: string // CSI section number, e.g. "27 21 00"
section: string // CSI section number, e.g. "27 21 00", "26 00 13.10", "01 32 01.00 10"
title: string
parts: CsiNode[] // root-level Part nodes
}
Expand Down Expand Up @@ -262,7 +262,7 @@ interface ApiResponse<T> {
-- Specs
CREATE TABLE specs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
section VARCHAR(20), -- "27 21 00"
section VARCHAR(20), -- "27 21 00" | "26 00 13.10" | "01 32 01.00 10" (expanded shape, ADR-020)
title TEXT,
source VARCHAR(20), -- 'ufgs' | 'arcat' | 'cpi' | 'unknown'
created_at TIMESTAMPTZ DEFAULT now(),
Expand Down Expand Up @@ -344,7 +344,7 @@ CREATE TABLE spec_references (
source_spec_id UUID REFERENCES specs(id) ON DELETE CASCADE,
source_paragraph_id UUID REFERENCES paragraphs(id) ON DELETE CASCADE,
target_type VARCHAR(20) NOT NULL, -- 'section' | 'paragraph' | 'standard'
target_spec_section VARCHAR(20), -- "09 91 00" — for section refs
target_spec_section VARCHAR(20), -- "09 91 00" / "26 00 13.10" — for section refs
target_spec_id UUID REFERENCES specs(id) ON DELETE SET NULL,
target_paragraph_id UUID REFERENCES paragraphs(id) ON DELETE SET NULL,
standard_code TEXT, -- "ASTM C150" — for standard refs
Expand Down
1 change: 1 addition & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -249,6 +249,7 @@ docs/adr/
017-project-manual-publishing.md
018-document-concurrency-state-model.md
019-scope-boundaries-content-neutral-platform.md
020-section-number-expanded-shape.md
```

**ADR format:**
Expand Down
39 changes: 39 additions & 0 deletions docs/adr/020-section-number-expanded-shape.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# ADR-020: Expanded Section-Number Shape as Opaque Normalized String

## Status: Accepted

## Context

CSI MasterFormat Level 4 (`26 00 13.10`) and UFGS Level 5 agency suffixes
(`01 32 01.00 10`; 10 = Army Corps, 20 = NAVFAC, 30/40 = NASA/AFCEC) appear in
36% of the UFGS reference corpus and arrive through every ingest format (.SEC,
DOCX, plaintext). SpecR previously validated only `NN NN NN`, silently
truncating suffixes in prose-ref extraction and content inference — collapsing
distinct sections (e.g. `01 33 23` vs `01 33 23.33`) into one identity.

Two viable designs:
1. Opaque normalized string, grammar owned by one module.
2. Structured `SectionNumber` type with decomposed DB columns
(division/l2/l3/suffix/agency).

## Decision

Opaque normalized string (`src/lib/section-number.ts` owns the grammar).
Canonical form: single ASCII spaces, `NN NN NN`, `NN NN NN.NN`, or
`NN NN NN.NN NN`. Cross-reference linking remains **exact match only** — a ref
to `26 00 13` never resolves to `26 00 13.10` or vice versa. DB CHECK
constraints enforce shape on `specs.section` (plus the `'unknown'` inference
sentinel) and `spec_sections.section_number`;
`spec_references.target_spec_section` stays unconstrained because it records
what the source document said.

## Consequences

- One module to change when the grammar grows; consumers embed its fragment.
- Exact-match keeps broken refs honest (a base ref to a missing base section
is genuinely broken) at the cost of no family fallback.
- Structured queries (e.g. "all agency variants of X") require LIKE prefixes
rather than column equality — acceptable; no current feature needs them.
- Free-prose ambiguity: `Section 26 00 13.10 20 mm` mis-reads `20` as an
agency suffix. Documented as KNOWN AMBIGUITY; tagged .SEC refs are immune.
- Lexicographic ORDER BY remains correct for the fixed-width grammar.
Loading