Problem Statement
ocr review machine-readable output currently exposes finding text, location, and suggestion information, but does not appear to expose structured category and severity fields for each finding.
This makes CI action and script integrations harder to implement cleanly:
- Findings cannot be reliably sorted, grouped, or filtered by type or importance.
- CI scripts cannot render consistent category or severity labels without re-interpreting natural-language comment text.
- Build gating based on severity (e.g., fail on critical issues) requires fragile text parsing.
- This is especially relevant for GitHub Actions and GitLab CI integrations that parse
ocr output and publish structured PR/MR comments.
Similar structured metadata is common in code review tools and makes automated PR/MR comments significantly easier to scan and act on.
Proposed Solution
Add optional category and severity fields to each finding in JSON and agent output.
Suggested severity values:
| Value |
Meaning |
critical |
Security vulnerability, data loss risk, system crash |
high |
Significant bug, functional failure, performance regression |
medium |
Moderate issue, edge-case problem, maintainability concern |
low |
Style, readability, minor best-practice suggestion |
info |
Informational, no action required |
Suggested category values:
| Value |
Meaning |
bug |
Correctness issue, logic error |
security |
Security vulnerability, unsafe pattern |
performance |
Performance regression, resource concern |
maintainability |
Readability, complexity, refactoring opportunity |
test |
Missing or inadequate test coverage |
style |
Formatting, naming, convention |
documentation |
Missing or incorrect documentation |
other |
Does not fit the above categories |
Backward compatibility:
- New fields are optional at first; existing fields remain unchanged.
- Downstream consumers that do not use
category or severity are unaffected.
JSON output example:
{
"path": "internal/auth/handler.go",
"content": "Missing input validation on user-supplied email field.",
"existing_code": "email := r.FormValue(\"email\")",
"suggestion_code": "email := r.FormValue(\"email\")\nif !isValidEmail(email) { ... }",
"start_line": 42,
"end_line": 42,
"category": "security",
"severity": "high"
}
Comment rendering example:
A CI action or script could render these fields as badges in PR/MR comments:


**Issue:** Missing input validation on user-supplied email field.

CI integrations could also:
- Sort findings by severity (critical first, info last)
- Group findings by category in a summary table
- Fail or warn the build when
critical or high findings are present
Alternatives Considered
- Keep output free-text only. This avoids schema changes but forces every CI integration to re-parse comment text to extract classification, leading to inconsistent rendering across tools.
- Add only
severity without category. This is simpler but loses the ability to filter by issue type (e.g., surface only security findings).
- Add
confidence alongside category/severity. This could be useful but adds scope; it can be a separate follow-up.
- Let each CI integration classify findings independently. This pushes complexity to every consumer and risks inconsistent classification across the ecosystem.
Affected Area
- Output / Formatting
- Review Agent / LLM interaction
- CI / Integration
- Documentation
Additional Context
The maintainers suggested opening a separate issue for structured category and severity fields, and PR #11 shows the current CI integration approach.
Design questions for discussion:
- Should values be strict enums, or should the schema allow extensible strings?
- Should
critical and info be included from the start, or should OCR begin with high | medium | low and expand later?
- Should CI integrations be responsible for rendering, filtering, and failing builds based on these fields, or should the CLI also expose filtering flags (e.g.,
--severity critical,high)?
- Should a separate
confidence field be considered in a future iteration?
Acceptance criteria:
- JSON and agent output includes
category and severity per finding when available
- Documentation explains allowed values and their intended semantics
- Existing consumers that do not use these fields remain unaffected
- CI integrations can render labels or badges, filter findings, and optionally gate on severity
中文说明
问题陈述
ocr review 的机器可读输出目前包含 finding 的文本、位置和建议信息,但没有为每个 finding 暴露结构化的 category(类别)和 severity(严重程度)字段。
这使得 CI Action 和脚本集成难以干净地实现:
- 无法对 finding 进行可靠的排序、分组或按类型/重要性过滤。
- CI 脚本无法在不重新解析自然语言文本的情况下渲染一致的类别或严重程度标签。
- 基于严重程度的构建门禁(例如,在发现关键问题时失败)需要脆弱的文本解析。
- 这对于解析
ocr 输出并发布结构化 PR/MR 评论的 GitHub Actions 和 GitLab CI 集成尤为重要。
类似的结构化元数据在代码审查工具中很常见,能使自动化的 PR/MR 评论更易于浏览和处理。
建议方案
在 JSON 和 agent 输出中为每个 finding 添加可选的 category 和 severity 字段。
建议的 severity 值:
| 值 |
含义 |
critical |
安全漏洞、数据丢失风险、系统崩溃 |
high |
重大 bug、功能失败、性能回退 |
medium |
中等问题、边界情况、可维护性隐患 |
low |
代码风格、可读性、轻微最佳实践建议 |
info |
信息性提示,无需操作 |
建议的 category 值:
| 值 |
含义 |
bug |
正确性问题、逻辑错误 |
security |
安全漏洞、不安全的模式 |
performance |
性能回退、资源问题 |
maintainability |
可读性、复杂度、重构机会 |
test |
缺失或不充分的测试覆盖 |
style |
格式、命名、编码规范 |
documentation |
缺失或不正确的文档 |
other |
不属于以上类别 |
向后兼容:
- 新字段初期为可选,现有字段保持不变。
- 不使用
category 或 severity 的下游消费者不受影响。
JSON 输出示例:
{
"path": "internal/auth/handler.go",
"content": "Missing input validation on user-supplied email field.",
"existing_code": "email := r.FormValue(\"email\")",
"suggestion_code": "email := r.FormValue(\"email\")\nif !isValidEmail(email) { ... }",
"start_line": 42,
"end_line": 42,
"category": "security",
"severity": "high"
}
评论渲染示例:
CI Action 或脚本可以在 PR/MR 评论中将这些字段渲染为 badge:


**Issue:** Missing input validation on user-supplied email field.

CI 集成还可以:
- 按严重程度排序 finding(critical 在前,info 在后)
- 在摘要表格中按类别分组 finding
- 当存在
critical 或 high 级别的 finding 时失败或发出警告
考虑过的替代方案
- 仅保留自由文本输出。 这避免了 schema 变更,但迫使每个 CI 集成重新解析评论文本来提取分类,导致跨工具的渲染不一致。
- 仅添加
severity 不添加 category。 这更简单,但失去了按问题类型过滤的能力(例如,仅展示安全相关的 finding)。
- 同时添加
confidence 字段。 这可能有用但增加了范围;可以作为单独的后续工作。
- 让每个 CI 集成独立分类 finding。 这将复杂性推给每个消费者,并存在跨生态系统分类不一致的风险。
影响范围
- Output / Formatting(输出 / 格式化)
- Review Agent / LLM interaction(审查 Agent / LLM 交互)
- CI / Integration(CI / 集成)
- Documentation(文档)
补充上下文
维护者建议为结构化的 category 和 severity 字段单独开 issue,PR #11 展示了当前的 CI 集成方式。
待讨论的设计问题:
- 值应该是严格枚举,还是允许可扩展的字符串?
- 是否从一开始就包含
critical 和 info,还是 OCR 先从 high | medium | low 开始,后续再扩展?
- CI 集成是否应负责基于这些字段进行渲染、过滤和构建失败判定,还是 CLI 也应暴露过滤参数(例如
--severity critical,high)?
- 是否应在后续迭代中考虑单独的
confidence 字段?
验收标准:
- JSON 和 agent 输出在可用时为每个 finding 包含
category 和 severity
- 文档说明允许的值及其预期语义
- 不使用这些字段的现有消费者不受影响
- CI 集成可以渲染标签或 badge、过滤 finding,并可选地基于严重程度进行门禁控制
Problem Statement
ocr reviewmachine-readable output currently exposes finding text, location, and suggestion information, but does not appear to expose structuredcategoryandseverityfields for each finding.This makes CI action and script integrations harder to implement cleanly:
ocroutput and publish structured PR/MR comments.Similar structured metadata is common in code review tools and makes automated PR/MR comments significantly easier to scan and act on.
Proposed Solution
Add optional
categoryandseverityfields to each finding in JSON and agent output.Suggested severity values:
criticalhighmediumlowinfoSuggested category values:
bugsecurityperformancemaintainabilityteststyledocumentationotherBackward compatibility:
categoryorseverityare unaffected.JSON output example:
{ "path": "internal/auth/handler.go", "content": "Missing input validation on user-supplied email field.", "existing_code": "email := r.FormValue(\"email\")", "suggestion_code": "email := r.FormValue(\"email\")\nif !isValidEmail(email) { ... }", "start_line": 42, "end_line": 42, "category": "security", "severity": "high" }Comment rendering example:
A CI action or script could render these fields as badges in PR/MR comments:
CI integrations could also:
criticalorhighfindings are presentAlternatives Considered
severitywithoutcategory. This is simpler but loses the ability to filter by issue type (e.g., surface only security findings).confidencealongsidecategory/severity. This could be useful but adds scope; it can be a separate follow-up.Affected Area
Additional Context
The maintainers suggested opening a separate issue for structured category and severity fields, and PR #11 shows the current CI integration approach.
Design questions for discussion:
criticalandinfobe included from the start, or should OCR begin withhigh | medium | lowand expand later?--severity critical,high)?confidencefield be considered in a future iteration?Acceptance criteria:
categoryandseverityper finding when available中文说明
问题陈述
ocr review的机器可读输出目前包含 finding 的文本、位置和建议信息,但没有为每个 finding 暴露结构化的category(类别)和severity(严重程度)字段。这使得 CI Action 和脚本集成难以干净地实现:
ocr输出并发布结构化 PR/MR 评论的 GitHub Actions 和 GitLab CI 集成尤为重要。类似的结构化元数据在代码审查工具中很常见,能使自动化的 PR/MR 评论更易于浏览和处理。
建议方案
在 JSON 和 agent 输出中为每个 finding 添加可选的
category和severity字段。建议的 severity 值:
criticalhighmediumlowinfo建议的 category 值:
bugsecurityperformancemaintainabilityteststyledocumentationother向后兼容:
category或severity的下游消费者不受影响。JSON 输出示例:
{ "path": "internal/auth/handler.go", "content": "Missing input validation on user-supplied email field.", "existing_code": "email := r.FormValue(\"email\")", "suggestion_code": "email := r.FormValue(\"email\")\nif !isValidEmail(email) { ... }", "start_line": 42, "end_line": 42, "category": "security", "severity": "high" }评论渲染示例:
CI Action 或脚本可以在 PR/MR 评论中将这些字段渲染为 badge:
CI 集成还可以:
critical或high级别的 finding 时失败或发出警告考虑过的替代方案
severity不添加category。 这更简单,但失去了按问题类型过滤的能力(例如,仅展示安全相关的 finding)。confidence字段。 这可能有用但增加了范围;可以作为单独的后续工作。影响范围
补充上下文
维护者建议为结构化的 category 和 severity 字段单独开 issue,PR #11 展示了当前的 CI 集成方式。
待讨论的设计问题:
critical和info,还是 OCR 先从high | medium | low开始,后续再扩展?--severity critical,high)?confidence字段?验收标准:
category和severity