Follow-up from docs/issues/2026-04-12-accuracy-from-live-audit.md (R2) and docs/plans/2026-04-18-live-audit-follow-through.md.
Problem
Confidence currently exists in fragments:
- bot profiles carry confidence metadata
- some structured-data violations carry
confidence: "high"
- SKILL guidance tells the narrator to mention confidence caveats manually
But confidence is not yet a consistent first-class concept across findings, warnings, and recommendations in the final score/report output.
Goal
Expose confidence in a machine-readable, report-wide way so narration does not have to infer certainty.
Scope
- Define a shared finding object shape that includes:
id
severity
confidence
source
evidence
message
- Emit confidence on report findings/warnings where the conclusion is:
- directly observed in raw data (
high)
- inferred from documented vendor behavior (
medium)
- heuristic or pattern-matched (
low)
- Update report/schema docs accordingly.
- Add tests that assert at least one direct-observation finding and one inferred finding carry the right confidence level.
Acceptance criteria
Why this matters
The 2026-04-12 audit failed because the narrative layer had to guess. Confidence metadata won't solve everything, but it sharply reduces fake certainty.
Follow-up from
docs/issues/2026-04-12-accuracy-from-live-audit.md(R2) anddocs/plans/2026-04-18-live-audit-follow-through.md.Problem
Confidence currently exists in fragments:
confidence: "high"But confidence is not yet a consistent first-class concept across findings, warnings, and recommendations in the final score/report output.
Goal
Expose confidence in a machine-readable, report-wide way so narration does not have to infer certainty.
Scope
idseverityconfidencesourceevidencemessagehigh)medium)low)Acceptance criteria
high,medium,low)Why this matters
The 2026-04-12 audit failed because the narrative layer had to guess. Confidence metadata won't solve everything, but it sharply reduces fake certainty.