Skip to content

Feat/prompt tuning v3 data driven#44

Merged
2025chris2 merged 3 commits into
mainfrom
feat/prompt-tuning-v3-data-driven
May 30, 2026
Merged

Feat/prompt tuning v3 data driven#44
2025chris2 merged 3 commits into
mainfrom
feat/prompt-tuning-v3-data-driven

Conversation

@2025chris2
Copy link
Copy Markdown
Owner

@2025chris2 2025chris2 commented May 30, 2026

标题
feat: data-driven prompt tuning v3 — 5 targeted fixes from 10-run test results + IssueType enum alignment

功能描述
基于 10 轮实测用例(PR: TheAlgorithms/Java#7427, QR Decomposition Gram-Schmidt)暴露的模型输出问题,完成5处提示词定向优化,并同步对齐 IssueType 枚举定义。

实测发现问题:

  1. 格拉姆-施密特正交化数值稳定性问题 9/10 漏检
  2. m < n 矩形矩阵场景 10/10 漏检
  3. 摘要内容风格不统一,模型输出随意性大
  4. 同类问题风险评级不一致(LOW / MEDIUM 混用)
  5. 跨文件问题类型始终归类为 OTHER,分类失效
  6. 架构建议内容前后矛盾、超出变更范围

实现思路

  1. 优化系统角色(ChunkPromptBuilder):将通用代码审查角色调整为数值计算与代码审查专家,强制校验数值稳定性、矩阵维度边界、浮点比较三类问题。
  2. 标准化输出格式(ChunkPromptBuilder):摘要固定为模板「【变更本质】,影响【影响范围】」;补充风险评级判定规则与强制升级逻辑,统一评级标准。
  3. 补充检查项(ChunkPromptBuilder):新增矩阵维度边界校验要求,明确 m < n 未做前置防护时,标记为 MEDIUM 风险;原跨块依赖提示顺延为第6项。
  4. 扩展问题枚举(GlobalPromptBuilder + GlobalReviewReport):提示词内 IssueType 由6类扩充至12类,新增 NUMERICAL_ACCURACY、ALGORITHM_CHOICE、PERFORMANCE、SECURITY_VULNERABILITY 等类型;Java 枚举同步扩容,保证解析正常。
  5. 收敛架构建议(GlobalPromptBuilder):新增规则约束架构建议仅围绕本次 PR 变更代码展开,禁止提出引入第三方库等超出范围的建议。

测试方式
./mvnw test 全部 296 个测试用例通过

  • ChunkPromptBuilderTest:校验条目序号变更、矩阵维度检查项正常生成
  • GlobalPromptBuilderTest:校验架构建议约束、新增 IssueType 正常展示
  • GlobalReviewReportTest:验证 IssueType 枚举由6项扩容至12项,所有新增类型覆盖完备

Summary by CodeRabbit

  • New Features
    • Expanded code review analysis with additional issue detection capabilities including numeric accuracy, algorithm efficiency, interface consistency, duplicate logic detection, and performance analysis.
    • Enhanced validation checks for mathematical operations and edge cases to identify potential stability risks and boundary errors.

Review Change Stack

2025chris2 and others added 3 commits May 30, 2026 19:35


Analyzed a real-world PR (off-by-one arithmetic bug in distributed render
chunk plan) to validate and refine prompts:

- SYSTEM_ROLE: add '算术误差(off-by-one/舍入)' to correctness focus area
- OUTPUT_FORMAT: add 'Arithmetic' to risk types
- CHUNK_CROSS_CHUNK_HINT: add item 6 — test-implementation alignment check
- GlobalPromptBuilder: add '实现与测试的对应性' to L3 reasoning focus;
  add rule 5 to REASONING_RULES for test coverage risk detection
- Tests: update assertions for new content

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…t results

Based on 10-run test against TheAlgorithms/Java#7427 (QR Decomposition):

Fixes applied:
1. SYSTEM_ROLE: add 3 mandatory checks for numerical computing review
   (numerical stability, dimension bounds, floating-point comparison)
2. OUTPUT_FORMAT: add risk rating rules (must-use table) and summary
   template ('本质,影响范围')
3. Task list: add item 5 for matrix dimension boundary (m < n) check
4. Global OUTPUT_FORMAT: expand issueType enum with NUMERICAL_ACCURACY,
   ALGORITHM_CHOICE, PERFORMANCE, SECURITY_VULNERABILITY
5. REASONING_RULES: constrain architectureSuggestions to PR scope only

Tests updated for renumbered items (5->6/7) and new enum values.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ITHM_CHOICE, PERFORMANCE etc.

Java IssueType enum now matches the expanded issueType list in prompts.
New types: INTERFACE_INCONSISTENCY, REPEAT_LOGIC, SECURITY_VULNERABILITY,
NUMERICAL_ACCURACY, ALGORITHM_CHOICE, PERFORMANCE. Old types preserved
for backward compat.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@2025chris2 2025chris2 merged commit 339bcbd into main May 30, 2026
1 check was pending
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 30, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: 313e7d0a-e885-4074-bc38-eb57bcc46271

📥 Commits

Reviewing files that changed from the base of the PR and between a8fcb0d and 38fd88a.

📒 Files selected for processing (6)
  • backend/pr/src/main/java/com/prassistant/pr/aggregation/GlobalPromptBuilder.java
  • backend/pr/src/main/java/com/prassistant/pr/aggregation/model/GlobalReviewReport.java
  • backend/pr/src/main/java/com/prassistant/pr/review/analyzer/ChunkPromptBuilder.java
  • backend/pr/src/test/java/com/prassistant/pr/aggregation/GlobalPromptBuilderTest.java
  • backend/pr/src/test/java/com/prassistant/pr/aggregation/model/GlobalReviewReportTest.java
  • backend/pr/src/test/java/com/prassistant/pr/review/analyzer/ChunkPromptBuilderTest.java

📝 Walkthrough

Walkthrough

This PR enhances the AI code review system by expanding issue type categories from 6 to 12, adding test-coverage correspondence verification to global analysis, and introducing strict numeric stability and matrix dimension boundary validation rules at the chunk-review level.

Changes

Review Capability Enhancement

Layer / File(s) Summary
Global Issue Type Model Expansion
backend/pr/src/main/java/com/prassistant/pr/aggregation/model/GlobalReviewReport.java, backend/pr/src/test/java/com/prassistant/pr/aggregation/model/GlobalReviewReportTest.java
IssueType enum expanded to 12 values, adding INTERFACE_INCONSISTENCY, REPEAT_LOGIC, NUMERICAL_ACCURACY, ALGORITHM_CHOICE, PERFORMANCE, and OTHER. Test assertions updated to verify all 12 members.
Global Analysis Prompt Enhancement
backend/pr/src/main/java/com/prassistant/pr/aggregation/GlobalPromptBuilder.java, backend/pr/src/test/java/com/prassistant/pr/aggregation/GlobalPromptBuilderTest.java
System role adds code quality assessment requirement, reasoning rules add implementation-vs-test correspondence verification with missing-test flagging, output format recognizes expanded issue types. Tests verify new prompt phrases and issue type identifiers.
Numeric Stability and Matrix Dimension Validation
backend/pr/src/main/java/com/prassistant/pr/review/analyzer/ChunkPromptBuilder.java, backend/pr/src/test/java/com/prassistant/pr/review/analyzer/ChunkPromptBuilderTest.java
System role mandates numeric stability checks (QR/least-squares, epsilon tolerance for float comparisons) and matrix boundary validation (m < n cases). Output format adds risk-rating hard rules. Task item #5 requires explicit matrix dimension boundary checks with MEDIUM risk escalation. Tests verify new task items and risk thresholds.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A rabbit hops through review rules anew,
Six types became twelve, oh what shall we do?
Tests must align with the code they protect,
Matrix dimensions? Precision? Inspect!
Better reviews with each careful tweak.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/prompt-tuning-v3-data-driven

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant