Skip to content

Notes-to-specifier shift CSI list numbering (1–15 renders as 5–20) + empty/junk paragraphs become numbered nodes #122

@thewrz

Description

@thewrz

Summary

A numbered list renders with shifted numbers and phantom rows. In docs/references/MANUFACTURER_EXAMPLES/paring-fixes.docx, the "Related Sections" article has 15 references that Word numbers 1–15, but SpecR renders them 5–20 with a blank 17. and a stray 11. ].

This affects both the mockup and the server-side renderMarkdown (MCP/API). The list number is computed at render time from child-array position — never stored — so the shift is render logic, not a DB value. The blank/junk rows, however, are in the parsed tree (ingestion).

Reproduction

renderMarkdown(parse(paring-fixes.docx)), "Related Sections" region:

B. Related Sections:
> **[NOTE]** ****...                          (4 note lines — correct, unnumbered)
> **[NOTE]** Include the Related Section ...
> **[NOTE]** The list below ...
> **[NOTE]** ****...
   5. Section 01 30 00 Administrative Requirements   <- should be 1.
   6. Section 01 33 00 Submittal Procedures
   ...
   11. ]                                              <- stray tailoring bracket (junk)
   ...
   17.                                                <- empty paragraph (junk)
   ...
   20. [Section 09 91 26 – Painting – Building.]      <- should be 15.

Parse tree is otherwise correct: notes are <note> nodes, references are <pr2> nodes, in source order. The 4 <note> siblings occupy child indices 0–3, so the first <pr2> lands at index 4 → label 5..

Root cause A — labeling counts non-numbered siblings (render)

src/generator/markdown.ts (renderPart/renderArticle/renderPrNode) and its mockup port public/js/tree.js pass the raw children-array index to getLabel. Notes / continuations / vanish / empty siblings consume indices, shifting numbered siblings. Happens at every level (part→article, article→pr, …).

Fix: number by a per-type ordinal among numbered, rendered siblings (skip note/continuation/vanish/empty). getLabel and public/js/labels.js stay unchanged.

Root cause B — ingestion creates junk numbered nodes

src/parser/docx/inference.ts: the empty-paragraph drop (appendContinuation) only covers continuation-typed paragraphs. An empty SPECText4 paragraph is claimed as a numbered pr2 by Signal 2 (style→numPr) and survives as an empty pr2 (17.). Its own explicit numId=0 (Word's "remove from numbering") is overridden by the style signal. A lone ] tailoring bracket likewise becomes a pr2 (11.).

Fix: drop empty paragraphs regardless of classification; respect an explicit own numId=0; optionally drop pure bracket/punctuation-only paragraphs.

Plan (two sub-MVP PRs)

  • PR1 (Fix A — numbering): correct CSI numbering in shared markdown.ts + mockup tree.js; regression test in markdown.test.ts.
  • PR2 (Fix B — sanitization): ingest cleanup in inference.ts; regression tests; commit paring-fixes.docx as a fixture.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions