Skip to content

[CI] Multi-validator finding diversity is a quality signal worth measuring — add to CIA report #991

@akaszubski

Description

@akaszubski

Problem

When security-auditor and reviewer evaluate the same diff, they sometimes flag DIFFERENT findings. This is actually desired behavior — it means the validators have complementary coverage rather than duplicating each other. But the current CIA report has no metric to detect or celebrate this pattern.

Conversely, if both validators always flag the same findings or never flag unique ones, that would indicate coverage overlap (wasted tokens) or coverage gap (both missing the same blind spots).

Evidence from #972

In the #972 (WAVE 1 capstone) run:

  • security-auditor: flagged PIPE_BUF concurrency as WARNING (OS-level concurrency concern for JSONL append)
  • reviewer: did NOT flag PIPE_BUF (correct — reviewer focuses on code design, not OS-level write atomicity)
  • reviewer: flagged several code-structure concerns that security-auditor did not flag

This is the correct division of labor. The pipeline benefits from two validators precisely because they have different focal lenses.

Proposed Measurement

Add to the CIA analysis:

Validator Diversity Score = |unique_to_security_auditor| + |unique_to_reviewer| / total_combined_findings
  • Score close to 1.0 = fully complementary (high diversity — good)
  • Score close to 0.0 = fully overlapping or both empty (suspicious)

CIA should flag these conditions:

  • [VALIDATOR-OVERLAP] — security-auditor and reviewer flag exactly the same issues (>80% overlap) — suggests one is rubber-stamping the other
  • [VALIDATOR-BLIND-SPOT] — both validators are empty or both pass on a change that later causes a bug (retroactive detection)

Suggested Fix

  1. Extend the CIA's per-pipeline analysis to extract finding topics from reviewer and security-auditor outputs
  2. Compute a Jaccard similarity between their finding sets
  3. Report similarity in the CI report
  4. File [VALIDATOR-OVERLAP] issue when Jaccard > 0.8 across 3+ consecutive runs

This is advisory (info severity) — do not block pipelines, just report the trend.

Plugin Version: 3.50.0 (e858ea9)


Filed automatically by continuous-improvement-analyst — session c29a5b5d, #972 WAVE 1 capstone

Metadata

Metadata

Assignees

No one assigned

    Labels

    auto-improvementContinuous improvement analyst findings

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions