Skip to content

fix: add finding_type to schema — notes don't penalize score#91

Merged
Navi Bot (project-navi-bot) merged 4 commits into
mainfrom
fix/finding-type-schema
Apr 4, 2026
Merged

fix: add finding_type to schema — notes don't penalize score#91
Navi Bot (project-navi-bot) merged 4 commits into
mainfrom
fix/finding-type-schema

Conversation

@Fieldnote-Echo
Copy link
Copy Markdown
Member

Summary

  • Add FindingType enum (issue / note) to Finding model with default issue (backward compatible)
  • _recompute_score() skips note findings — no point deductions for positive observations
  • _derive_verdict() skips note findings — no auto-FAIL trigger from high_count >= 2 when findings are praise
  • LLM-facing output-schema.md updated with finding_type field and usage guidance

Bug

PR #90 scored 60/100 FAIL despite all 4 findings being positive observations ("tests look solid", "no changes needed", "properly implemented"). The scoring formula had no way to distinguish "found a bug" from "found good code" — both deducted points equally, and the high_count >= 2 gate forced FAIL regardless of score.

Root cause

Three-layer gap:

  1. Schema: Finding had no sentiment/type field
  2. Scoring: _recompute_score() deducted points for all findings regardless of intent
  3. Verdict: _derive_verdict() auto-FAILed on high_count >= 2 including praise

Test plan

  • 6 new tests in TestFindingTypeNote:
    • 4 HIGH notes → score stays 100
    • 2+ HIGH notes → no auto-FAIL
    • Mixed issues + notes → only issues deduct
    • Mixed verdict → only issues count toward high_count gate
    • Default finding_type is issue (backward compat)
    • Full filter_review pipeline with notes → PASS
  • 1402 existing tests still pass
  • ruff + mypy clean

Add FindingType enum (issue/note) to Finding model. Notes (positive
observations) do not deduct from score or trigger the high_count>=2
auto-FAIL gate. Defaults to "issue" for backward compatibility.

Bug: grippy scored a security-improvement PR 60/100 FAIL because all
4 findings were praise labeled HIGH severity. The scoring formula and
verdict derivation had no way to distinguish "found a bug" from
"found good code" — both deducted points equally.

Fix: _recompute_score() and _derive_verdict() now skip note findings.
LLM-facing output-schema.md updated with finding_type field guidance.

Coding-Agent: claude-code
Model: claude-opus-4-6
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a8a538e883

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

Comment thread src/grippy/output_policy.py Outdated
Comment on lines +221 to +222
if f.finding_type == FindingType.NOTE:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Treat rule-backed findings as issues before skipping notes

The unconditional if f.finding_type == FindingType.NOTE: continue allows findings with real rule_id evidence to be fully ignored for score and fail gates. run_review() validates rule coverage only by count/rule_id before filter_review (src/grippy/retry.py, _validate_rule_coverage), so an LLM can satisfy mandatory rule coverage by relabeling those findings as note, and this path will still produce PASS/PROVISIONAL despite unresolved rule-detected issues. Please reject note for rule-backed findings (or coerce them to issue) so rule-engine detections cannot be neutralized by type relabeling.

Useful? React with 👍 / 👎.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

❌ Grippy Review — PARSE ERROR

Review failed. Check the Actions log for details.

Add FindingType enum (issue/note) to Finding model. Notes (positive
observations) do not deduct from score or trigger the high_count>=2
auto-FAIL gate. Field is required (no default) to produce clean
OpenAI JSON schema without $ref + default keyword conflict.

Bug: grippy scored a security-improvement PR 60/100 FAIL because all
4 findings were praise labeled HIGH severity. The scoring formula and
verdict derivation had no way to distinguish "found a bug" from
"found good code" — both deducted points equally.

Fix: _recompute_score() and _derive_verdict() now skip note findings.
LLM-facing output-schema.md updated with finding_type field guidance.
All test helpers updated with explicit finding_type="issue".

Coding-Agent: claude-code
Model: claude-opus-4-6
@@ -207,11 +208,18 @@ def _is_nogrip_suppressed(finding: Finding, nogrip_index: _NoGripIndex) -> bool:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 MEDIUM: Backwards compatibility and field usage for FindingType.NOTE relies on careful field handling

Confidence: 95%

Default logic: if f.finding_type == FindingType.NOTE: continue

The logic in both _recompute_score and _derive_verdict relies on the new FindingType.NOTE being correctly set for all positive observations. Existing code that builds Finding objects (including via direct dicts or from legacy serialized JSON) must ensure this field is present and set accurately. Omitted or legacy data may still be treated as 'issue', inadvertently penalizing code unless handled at all ingest points.

Suggestion: Perform a review of all code paths that build Finding objects (especially old tests, integrations, or deserialization logic) and ensure that the handling of missing 'finding_type' values always defaults to 'issue' (as your intent), and that positive observations through prior entry points actually get set to 'note'. Consider adding a migration utility or warning path for older data.

— If a pipeline forgets to set the finding type, good news gets counted as bad news. Might want to yell loudly if an unexpected finding_type appears.

@@ -109,8 +110,12 @@ You MUST produce a single JSON object conforming to this schema. No markdown wra
## Schema Rules
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 LOW: Documentation for finding_type is clear, but old clients may misinterpret new schema

Confidence: 92%

"finding_type": "issue | note" and new schema rules in doc

The output-schema.md guidance is updated to distinguish 'issue' and 'note', but if an older orchestrator or LLM agent is running on a cached version without the new field, it might ignore or mishandle notes. This could cause minor confusion or inconsistent scoring downstream until all agents are updated.

Suggestion: Confirm that all production and dev instances of the orchestrator, pipeline, and doc consumption logic have the new schema deployed. Consider a fallback message in review output if parsing fails to make debugging easier.

— Doc is explicit, but downstream tools need to catch up. Not urgent, just an integration pebble.

Comment thread src/grippy/schema.py
@@ -25,6 +25,13 @@ class ComplexityTier(StrEnum):
CRITICAL = "CRITICAL"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 LOW: No legacy migration for existing serialized Findings

Confidence: 90%

class FindingType(StrEnum): newly added
Finding object creation now requires 'finding_type'

The new FindingType field's introduction in the schema is backward compatible in code, but existing serialized Finding records (e.g., in databases or as artifacts) may lack this field, defaulting to 'issue'. This means historical positive notes may now become score deductions if ever replayed by a test or integration.

Suggestion: If you have long-lived persisted findings, consider a simple migration script to inject 'finding_type: issue' in legacy records so that the transition is explicit and future refactors are robust.

— Future-you will thank you for migrating old data. Or future-you will yell at schema validation.

github-actions[bot]
github-actions Bot previously approved these changes Apr 4, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grippy approves — PASS (89/100)

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 4, 2026

✅ Grippy Review — PASS

Score: 100/100 | Findings: 1

Delta: 1 new


Commit: df24680

… finding)

Coerce findings with rule_id to issue in _recompute_score() and
_derive_verdict(), regardless of finding_type. Deterministic rule
engine findings must always deduct from score — the LLM cannot
neutralize them by labeling as note.

Track DEBT-002 (no migration for serialized Findings) and DEBT-003
(cached agents may not emit finding_type) as low-severity debt.

Coding-Agent: claude-code
Model: claude-opus-4-6
Grippy already produces positive observations via finding_type: note.
Formalizing this as appreciative inquiry in prompts and summary format
would improve review UX — balanced feedback beats pure criticism.

Coding-Agent: claude-code
Model: claude-opus-4-6
Comment thread DEBT.md
@@ -17,3 +17,43 @@ GitHub's REST API returns `406 Not Acceptable` when a PR diff exceeds their size
**Fix:** Handle 406 in `fetch_pr_diff()` by falling back to per-file diffs via `GET /repos/{owner}/{repo}/pulls/{pr}/files` and fetching individual file patches, or truncating gracefully with a warning.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔵 LOW: Technical debt entries for 'finding_type' schema changes are clearly documented

Confidence: 95%

DEBT-002 through DEBT-004 detail migration, agent sync, and appreciative inquiry implications of the 'finding_type' update.

The added DEBT.md entries precisely record risks and known gaps related to backward compatibility of the new 'finding_type' field. Impact, fix, and rationale are transparent, improving maintainability and onboarding for future contributors.

Suggestion: Continue updating DEBT.md when introducing subtle schema or contract changes impacting persistence, validation, or agent orchestration. No changes necessary here.

— Not often you see technical debt filed this clearly. Doesn't fix the debt, but at least nobody will be surprised by it.

Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Grippy approves — PASS (100/100)

@project-navi-bot Navi Bot (project-navi-bot) merged commit da3116b into main Apr 4, 2026
19 checks passed
@project-navi-bot Navi Bot (project-navi-bot) deleted the fix/finding-type-schema branch April 4, 2026 23:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants