Problem
When security-auditor and reviewer evaluate the same diff, they sometimes flag DIFFERENT findings. This is actually desired behavior — it means the validators have complementary coverage rather than duplicating each other. But the current CIA report has no metric to detect or celebrate this pattern.
Conversely, if both validators always flag the same findings or never flag unique ones, that would indicate coverage overlap (wasted tokens) or coverage gap (both missing the same blind spots).
Evidence from #972
In the #972 (WAVE 1 capstone) run:
- security-auditor: flagged PIPE_BUF concurrency as WARNING (OS-level concurrency concern for JSONL append)
- reviewer: did NOT flag PIPE_BUF (correct — reviewer focuses on code design, not OS-level write atomicity)
- reviewer: flagged several code-structure concerns that security-auditor did not flag
This is the correct division of labor. The pipeline benefits from two validators precisely because they have different focal lenses.
Proposed Measurement
Add to the CIA analysis:
Validator Diversity Score = |unique_to_security_auditor| + |unique_to_reviewer| / total_combined_findings
- Score close to 1.0 = fully complementary (high diversity — good)
- Score close to 0.0 = fully overlapping or both empty (suspicious)
CIA should flag these conditions:
[VALIDATOR-OVERLAP] — security-auditor and reviewer flag exactly the same issues (>80% overlap) — suggests one is rubber-stamping the other
[VALIDATOR-BLIND-SPOT] — both validators are empty or both pass on a change that later causes a bug (retroactive detection)
Suggested Fix
- Extend the CIA's per-pipeline analysis to extract finding topics from reviewer and security-auditor outputs
- Compute a Jaccard similarity between their finding sets
- Report similarity in the CI report
- File
[VALIDATOR-OVERLAP] issue when Jaccard > 0.8 across 3+ consecutive runs
This is advisory (info severity) — do not block pipelines, just report the trend.
Plugin Version: 3.50.0 (e858ea9)
Filed automatically by continuous-improvement-analyst — session c29a5b5d, #972 WAVE 1 capstone
Problem
When security-auditor and reviewer evaluate the same diff, they sometimes flag DIFFERENT findings. This is actually desired behavior — it means the validators have complementary coverage rather than duplicating each other. But the current CIA report has no metric to detect or celebrate this pattern.
Conversely, if both validators always flag the same findings or never flag unique ones, that would indicate coverage overlap (wasted tokens) or coverage gap (both missing the same blind spots).
Evidence from #972
In the #972 (WAVE 1 capstone) run:
This is the correct division of labor. The pipeline benefits from two validators precisely because they have different focal lenses.
Proposed Measurement
Add to the CIA analysis:
CIA should flag these conditions:
[VALIDATOR-OVERLAP]— security-auditor and reviewer flag exactly the same issues (>80% overlap) — suggests one is rubber-stamping the other[VALIDATOR-BLIND-SPOT]— both validators are empty or both pass on a change that later causes a bug (retroactive detection)Suggested Fix
[VALIDATOR-OVERLAP]issue when Jaccard > 0.8 across 3+ consecutive runsThis is advisory (info severity) — do not block pipelines, just report the trend.
Plugin Version: 3.50.0 (e858ea9)
Filed automatically by continuous-improvement-analyst — session c29a5b5d, #972 WAVE 1 capstone