fix: add finding_type to schema — notes don't penalize score#91
Conversation
Add FindingType enum (issue/note) to Finding model. Notes (positive observations) do not deduct from score or trigger the high_count>=2 auto-FAIL gate. Defaults to "issue" for backward compatibility. Bug: grippy scored a security-improvement PR 60/100 FAIL because all 4 findings were praise labeled HIGH severity. The scoring formula and verdict derivation had no way to distinguish "found a bug" from "found good code" — both deducted points equally. Fix: _recompute_score() and _derive_verdict() now skip note findings. LLM-facing output-schema.md updated with finding_type field guidance. Coding-Agent: claude-code Model: claude-opus-4-6
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a8a538e883
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| if f.finding_type == FindingType.NOTE: | ||
| continue |
There was a problem hiding this comment.
Treat rule-backed findings as issues before skipping notes
The unconditional if f.finding_type == FindingType.NOTE: continue allows findings with real rule_id evidence to be fully ignored for score and fail gates. run_review() validates rule coverage only by count/rule_id before filter_review (src/grippy/retry.py, _validate_rule_coverage), so an LLM can satisfy mandatory rule coverage by relabeling those findings as note, and this path will still produce PASS/PROVISIONAL despite unresolved rule-detected issues. Please reject note for rule-backed findings (or coerce them to issue) so rule-engine detections cannot be neutralized by type relabeling.
Useful? React with 👍 / 👎.
❌ Grippy Review — PARSE ERRORReview failed. Check the Actions log for details. |
Add FindingType enum (issue/note) to Finding model. Notes (positive observations) do not deduct from score or trigger the high_count>=2 auto-FAIL gate. Field is required (no default) to produce clean OpenAI JSON schema without $ref + default keyword conflict. Bug: grippy scored a security-improvement PR 60/100 FAIL because all 4 findings were praise labeled HIGH severity. The scoring formula and verdict derivation had no way to distinguish "found a bug" from "found good code" — both deducted points equally. Fix: _recompute_score() and _derive_verdict() now skip note findings. LLM-facing output-schema.md updated with finding_type field guidance. All test helpers updated with explicit finding_type="issue". Coding-Agent: claude-code Model: claude-opus-4-6
| @@ -207,11 +208,18 @@ def _is_nogrip_suppressed(finding: Finding, nogrip_index: _NoGripIndex) -> bool: | |||
|
|
|||
There was a problem hiding this comment.
🟡 MEDIUM: Backwards compatibility and field usage for FindingType.NOTE relies on careful field handling
Confidence: 95%
Default logic: if f.finding_type == FindingType.NOTE: continue
The logic in both _recompute_score and _derive_verdict relies on the new FindingType.NOTE being correctly set for all positive observations. Existing code that builds Finding objects (including via direct dicts or from legacy serialized JSON) must ensure this field is present and set accurately. Omitted or legacy data may still be treated as 'issue', inadvertently penalizing code unless handled at all ingest points.
Suggestion: Perform a review of all code paths that build Finding objects (especially old tests, integrations, or deserialization logic) and ensure that the handling of missing 'finding_type' values always defaults to 'issue' (as your intent), and that positive observations through prior entry points actually get set to 'note'. Consider adding a migration utility or warning path for older data.
— If a pipeline forgets to set the finding type, good news gets counted as bad news. Might want to yell loudly if an unexpected finding_type appears.
| @@ -109,8 +110,12 @@ You MUST produce a single JSON object conforming to this schema. No markdown wra | |||
| ## Schema Rules | |||
There was a problem hiding this comment.
🔵 LOW: Documentation for finding_type is clear, but old clients may misinterpret new schema
Confidence: 92%
"finding_type": "issue | note" and new schema rules in doc
The output-schema.md guidance is updated to distinguish 'issue' and 'note', but if an older orchestrator or LLM agent is running on a cached version without the new field, it might ignore or mishandle notes. This could cause minor confusion or inconsistent scoring downstream until all agents are updated.
Suggestion: Confirm that all production and dev instances of the orchestrator, pipeline, and doc consumption logic have the new schema deployed. Consider a fallback message in review output if parsing fails to make debugging easier.
— Doc is explicit, but downstream tools need to catch up. Not urgent, just an integration pebble.
| @@ -25,6 +25,13 @@ class ComplexityTier(StrEnum): | |||
| CRITICAL = "CRITICAL" | |||
There was a problem hiding this comment.
🔵 LOW: No legacy migration for existing serialized Findings
Confidence: 90%
class FindingType(StrEnum): newly added
Finding object creation now requires 'finding_type'
The new FindingType field's introduction in the schema is backward compatible in code, but existing serialized Finding records (e.g., in databases or as artifacts) may lack this field, defaulting to 'issue'. This means historical positive notes may now become score deductions if ever replayed by a test or integration.
Suggestion: If you have long-lived persisted findings, consider a simple migration script to inject 'finding_type: issue' in legacy records so that the transition is explicit and future refactors are robust.
— Future-you will thank you for migrating old data. Or future-you will yell at schema validation.
✅ Grippy Review — PASSScore: 100/100 | Findings: 1 Delta: 1 new Commit: df24680 |
… finding) Coerce findings with rule_id to issue in _recompute_score() and _derive_verdict(), regardless of finding_type. Deterministic rule engine findings must always deduct from score — the LLM cannot neutralize them by labeling as note. Track DEBT-002 (no migration for serialized Findings) and DEBT-003 (cached agents may not emit finding_type) as low-severity debt. Coding-Agent: claude-code Model: claude-opus-4-6
Grippy already produces positive observations via finding_type: note. Formalizing this as appreciative inquiry in prompts and summary format would improve review UX — balanced feedback beats pure criticism. Coding-Agent: claude-code Model: claude-opus-4-6
| @@ -17,3 +17,43 @@ GitHub's REST API returns `406 Not Acceptable` when a PR diff exceeds their size | |||
| **Fix:** Handle 406 in `fetch_pr_diff()` by falling back to per-file diffs via `GET /repos/{owner}/{repo}/pulls/{pr}/files` and fetching individual file patches, or truncating gracefully with a warning. | |||
There was a problem hiding this comment.
🔵 LOW: Technical debt entries for 'finding_type' schema changes are clearly documented
Confidence: 95%
DEBT-002 through DEBT-004 detail migration, agent sync, and appreciative inquiry implications of the 'finding_type' update.
The added DEBT.md entries precisely record risks and known gaps related to backward compatibility of the new 'finding_type' field. Impact, fix, and rationale are transparent, improving maintainability and onboarding for future contributors.
Suggestion: Continue updating DEBT.md when introducing subtle schema or contract changes impacting persistence, validation, or agent orchestration. No changes necessary here.
— Not often you see technical debt filed this clearly. Doesn't fix the debt, but at least nobody will be surprised by it.
Summary
FindingTypeenum (issue/note) toFindingmodel with defaultissue(backward compatible)_recompute_score()skipsnotefindings — no point deductions for positive observations_derive_verdict()skipsnotefindings — no auto-FAIL trigger fromhigh_count >= 2when findings are praiseoutput-schema.mdupdated withfinding_typefield and usage guidanceBug
PR #90 scored 60/100 FAIL despite all 4 findings being positive observations ("tests look solid", "no changes needed", "properly implemented"). The scoring formula had no way to distinguish "found a bug" from "found good code" — both deducted points equally, and the
high_count >= 2gate forced FAIL regardless of score.Root cause
Three-layer gap:
Findinghad no sentiment/type field_recompute_score()deducted points for all findings regardless of intent_derive_verdict()auto-FAILed onhigh_count >= 2including praiseTest plan
TestFindingTypeNote:finding_typeisissue(backward compat)filter_reviewpipeline with notes → PASS