-
Notifications
You must be signed in to change notification settings - Fork 1
fix: add finding_type to schema — notes don't penalize score #91
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
a8a538e
9597ef0
e9b9482
df24680
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,6 +22,7 @@ | |
| from grippy.schema import ( | ||
| Finding, | ||
| FindingCategory, | ||
| FindingType, | ||
| GrippyReview, | ||
| Score, | ||
| ScoreBreakdown, | ||
|
|
@@ -207,11 +208,18 @@ def _is_nogrip_suppressed(finding: Finding, nogrip_index: _NoGripIndex) -> bool: | |
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM: Backwards compatibility and field usage for FindingType.NOTE relies on careful field handlingConfidence: 95% The logic in both _recompute_score and _derive_verdict relies on the new FindingType.NOTE being correctly set for all positive observations. Existing code that builds Finding objects (including via direct dicts or from legacy serialized JSON) must ensure this field is present and set accurately. Omitted or legacy data may still be treated as 'issue', inadvertently penalizing code unless handled at all ingest points. Suggestion: Perform a review of all code paths that build Finding objects (especially old tests, integrations, or deserialization logic) and ensure that the handling of missing 'finding_type' values always defaults to 'issue' (as your intent), and that positive observations through prior entry points actually get set to 'note'. Consider adding a migration utility or warning path for older data. — If a pipeline forgets to set the finding type, good news gets counted as bad news. Might want to yell loudly if an unexpected finding_type appears. |
||
|
|
||
| def _recompute_score(findings: list[Finding]) -> Score: | ||
| """Recompute score from the scoring set using the deduction rubric.""" | ||
| """Recompute score from the scoring set using the deduction rubric. | ||
|
|
||
| Only ``issue`` findings deduct points. ``note`` findings (positive | ||
| observations) are included in the scoring set for display but do | ||
| not affect the score or severity counts. | ||
| """ | ||
| category_deductions: dict[FindingCategory, int] = {} | ||
| severity_counts = dict.fromkeys(Severity, 0) | ||
|
|
||
| for f in findings: | ||
| if f.finding_type == FindingType.NOTE and not f.rule_id: | ||
| continue | ||
| category_deductions[f.category] = category_deductions.get(f.category, 0) + _DEDUCTION.get( | ||
| f.severity, 0 | ||
| ) | ||
|
|
@@ -257,11 +265,18 @@ def _recompute_score(findings: list[Finding]) -> Score: | |
|
|
||
|
|
||
| def _derive_verdict(score: int, findings: list[Finding], mode: str) -> Verdict: | ||
| """Derive verdict from recomputed score and mode (call-site authority).""" | ||
| """Derive verdict from recomputed score and mode (call-site authority). | ||
|
|
||
| Only ``issue`` findings contribute to severity gates. ``note`` findings | ||
| (positive observations) do not trigger FAIL regardless of their severity label. | ||
| Rule-backed findings (``rule_id`` set) are always treated as issues even if | ||
| the LLM labels them as notes — deterministic rules cannot be downgraded. | ||
| """ | ||
| threshold = _THRESHOLD.get(mode, 70) | ||
|
|
||
| has_critical = any(f.severity == Severity.CRITICAL for f in findings) | ||
| high_count = sum(1 for f in findings if f.severity == Severity.HIGH) | ||
| issues = [f for f in findings if f.finding_type != FindingType.NOTE or f.rule_id] | ||
| has_critical = any(f.severity == Severity.CRITICAL for f in issues) | ||
| high_count = sum(1 for f in issues if f.severity == Severity.HIGH) | ||
|
|
||
| if has_critical: | ||
| status, blocking = VerdictStatus.FAIL, True | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -36,6 +36,7 @@ You MUST produce a single JSON object conforming to this schema. No markdown wra | |
| "findings": [ | ||
| { | ||
| "id": "F-001", | ||
| "finding_type": "issue | note", | ||
| "severity": "CRITICAL | HIGH | MEDIUM | LOW", | ||
| "confidence": 0, | ||
| "category": "security | logic | governance | reliability | observability", | ||
|
|
@@ -109,8 +110,12 @@ You MUST produce a single JSON object conforming to this schema. No markdown wra | |
| ## Schema Rules | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔵 LOW: Documentation for finding_type is clear, but old clients may misinterpret new schemaConfidence: 92% The output-schema.md guidance is updated to distinguish 'issue' and 'note', but if an older orchestrator or LLM agent is running on a cached version without the new field, it might ignore or mishandle notes. This could cause minor confusion or inconsistent scoring downstream until all agents are updated. Suggestion: Confirm that all production and dev instances of the orchestrator, pipeline, and doc consumption logic have the new schema deployed. Consider a fallback message in review output if parsing fails to make debugging easier. — Doc is explicit, but downstream tools need to catch up. Not urgent, just an integration pebble. |
||
|
|
||
| 1. **findings** array may be empty (all-clear state) | ||
| 2. **escalations** array may be empty | ||
| 3. **confidence** is per-finding, 0-100. The filter pipeline uses this. | ||
| 2. **finding_type** distinguishes actionable problems from positive observations: | ||
| - `"issue"` — a problem that needs fixing. Deducts from score. Use this for bugs, vulnerabilities, missing error handling, etc. | ||
| - `"note"` — a positive observation or acknowledgment of good practice. Does NOT deduct from score. Use this when you want to highlight something the author did well, or acknowledge a security improvement without penalizing the PR. | ||
| - When in doubt, use `"issue"`. Do NOT use `"note"` to soften a real problem. | ||
| 3. **escalations** array may be empty | ||
| 4. **confidence** is per-finding, 0-100. The filter pipeline uses this. | ||
| 4. **grippy_note** is the personality-injected comment. Keep it under 280 characters. | ||
| 5. **evidence** should quote or reference specific code — never fabricate evidence. | ||
| 6. **line_start** and **line_end** must reference lines in the DIFF, not the full file, for inline comment placement. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,13 @@ class ComplexityTier(StrEnum): | |
| CRITICAL = "CRITICAL" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🔵 LOW: No legacy migration for existing serialized FindingsConfidence: 90% The new FindingType field's introduction in the schema is backward compatible in code, but existing serialized Finding records (e.g., in databases or as artifacts) may lack this field, defaulting to 'issue'. This means historical positive notes may now become score deductions if ever replayed by a test or integration. Suggestion: If you have long-lived persisted findings, consider a simple migration script to inject 'finding_type: issue' in legacy records so that the transition is explicit and future refactors are robust. — Future-you will thank you for migrating old data. Or future-you will yell at schema validation. |
||
|
|
||
|
|
||
| class FindingType(StrEnum): | ||
| """Whether a finding reports a problem or a positive observation.""" | ||
|
|
||
| ISSUE = "issue" # actionable problem — deducts from score | ||
| NOTE = "note" # positive observation / praise — does NOT deduct | ||
|
|
||
|
|
||
| class FindingCategory(StrEnum): | ||
| SECURITY = "security" | ||
| LOGIC = "logic" | ||
|
|
@@ -95,6 +102,7 @@ class Finding(BaseModel): | |
| model_config = {"frozen": True} | ||
|
|
||
| id: str = Field(description="F-001 through F-999") | ||
| finding_type: FindingType | ||
| severity: Severity | ||
| confidence: int = Field(ge=0, le=100) | ||
| category: FindingCategory | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔵 LOW: Technical debt entries for 'finding_type' schema changes are clearly documented
Confidence: 95%
The added DEBT.md entries precisely record risks and known gaps related to backward compatibility of the new 'finding_type' field. Impact, fix, and rationale are transparent, improving maintainability and onboarding for future contributors.
Suggestion: Continue updating DEBT.md when introducing subtle schema or contract changes impacting persistence, validation, or agent orchestration. No changes necessary here.
— Not often you see technical debt filed this clearly. Doesn't fix the debt, but at least nobody will be surprised by it.