Skip to content

pref(prompt-optimizer): handle escaped quotes in JSON parsing#903

Merged
keeees merged 1 commit intorelease/v0.3.0from
pref/prompt_optim
Apr 15, 2026
Merged

pref(prompt-optimizer): handle escaped quotes in JSON parsing#903
keeees merged 1 commit intorelease/v0.3.0from
pref/prompt_optim

Conversation

@myhMARS
Copy link
Copy Markdown
Collaborator

@myhMARS myhMARS commented Apr 15, 2026

Summary by Sourcery

Bug Fixes:

  • 修复在处理流式内容的 JSON 解析时,由于结尾反斜杠导致缓冲区被错误截断的问题。
Original summary in English

Summary by Sourcery

Bug Fixes:

  • Fix JSON parsing of streamed content when trailing backslashes cause incorrect buffer truncation.

@myhMARS myhMARS requested review from TimeBomb2018 and keeees April 15, 2026 05:20
@myhMARS myhMARS self-assigned this Apr 15, 2026
@myhMARS myhMARS added the enhancement New feature or request label Apr 15, 2026
@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 15, 2026

审阅者指南(在小型 PR 上默认折叠)

审阅者指南

调整提示优化器的 JSON 解析逻辑,在提取用于提示检测的缓存内容之前,正确处理结尾处经过转义的反斜杠。

文件级更改

变更 详情 文件
修复在计算用于提示解析的 JSON 缓存片段前,对缓冲区末尾转义反斜杠的处理。
  • 引入回溯循环,从用作缓存的缓冲区切片中裁剪末尾的反斜杠
  • 通过递减索引来跟踪裁剪偏移量,逐步缩短切片,直到遇到非反斜杠字符或缓存为空为止
api/app/services/prompt_optimizer_service.py

技巧与命令

与 Sourcery 交互

  • 触发新一轮审查: 在 pull request 上评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审查评论。
  • 从审查评论生成 GitHub issue: 在审查评论下回复,请 Sourcery 根据该评论创建一个 issue。你也可以在审查评论下回复 @sourcery-ai issue 来从该评论创建 issue。
  • 生成 pull request 标题: 在 pull request 标题的任意位置写上 @sourcery-ai,即可随时生成标题。你也可以在 pull request 中评论 @sourcery-ai title,随时(重新)生成标题。
  • 生成 pull request 摘要: 在 pull request 正文任意位置写上 @sourcery-ai summary,即可在你想要的位置生成 PR 摘要。你也可以在 pull request 中评论 @sourcery-ai summary,随时(重新)生成摘要。
  • 生成审阅者指南: 在 pull request 中评论 @sourcery-ai guide,即可随时(重新)生成审阅者指南。
  • 一次性解决所有 Sourcery 评论: 在 pull request 中评论 @sourcery-ai resolve,即可将所有 Sourcery 评论标记为已解决。如果你已经处理完所有评论且不想再看到它们,这会很有用。
  • 一次性忽略所有 Sourcery 审查: 在 pull request 中评论 @sourcery-ai dismiss,即可忽略所有现有的 Sourcery 审查。尤其适用于你想从头开始一轮新审查的情况 —— 别忘了评论 @sourcery-ai review 来触发新的审查!

自定义你的体验

前往你的控制面板以:

  • 启用或禁用审查功能,例如 Sourcery 生成的 pull request 摘要、审阅者指南等。
  • 更改审查语言。
  • 添加、删除或编辑自定义审查指令。
  • 调整其他审查设置。

获取帮助

Original review guide in English
Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Adjusts the prompt optimizer’s JSON parsing to correctly handle trailing escaped backslashes before extracting cached content for prompt detection.

File-Level Changes

Change Details Files
Fix handling of escaped backslashes at the end of the buffer before computing the JSON cache segment used for prompt parsing.
  • Introduce a backtracking loop that trims trailing backslashes from the buffer slice used as the cache
  • Track the trimming offset with a decrementing index to progressively shorten the slice until a non-backslash character is found or the cache is empty
api/app/services/prompt_optimizer_service.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我发现了 1 个问题,并给出了一些整体性的反馈:

  • 当存在很多结尾反斜杠时,处理反斜杠的循环可能会出现非预期行为(例如,last_idx 可能会变成 0 并继续变为负数,从而改变 buffer[:-last_idx] 的含义,并可能导致无限循环);建议重写这段逻辑,只计算一次安全的截断索引(或者在已切片的 buffer 上使用类似 rstrip('\\') 的方法),而不是用递减的负索引不断重新切片。
  • 硬编码常量 20 和初始的 last_idx = 19 让截断行为很难推理;将这些提取到命名清晰的变量中,和/或添加简短注释解释协议相关的后缀长度以及调整反斜杠的意图,会提升可维护性。
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The backslash-handling loop can behave unexpectedly when many trailing backslashes are present (e.g., `last_idx` can reach 0 and go negative, changing the meaning of `buffer[:-last_idx]` and potentially causing an infinite loop); consider rewriting this logic to compute a safe cutoff index once (or use something like `rstrip('\\')` on the sliced buffer) instead of repeatedly reslicing with a decreasing negative index.
- The hardcoded constants `20` and initial `last_idx = 19` make the truncation behavior difficult to reason about; extracting these into clearly named variables and/or adding a brief comment explaining the protocol-specific suffix length and the intent of the backslash adjustment would improve maintainability.

## Individual Comments

### Comment 1
<location path="api/app/services/prompt_optimizer_service.py" line_range="238-242" />
<code_context>
                 logger.error(f"Unsupported content type - {content}")
                 raise Exception("Unsupported content type")
             cache = buffer[:-20]
+            last_idx = 19
+            while cache and cache[-1] == '\\':
+                cache = buffer[:-last_idx]
+                last_idx -= 1

             # 尝试找到 "prompt": " 开始位置
</code_context>
<issue_to_address>
**issue (bug_risk):** Trailing backslash trimming loop likely has flawed slicing logic and can behave unexpectedly.

The loop intends to strip trailing backslashes from `cache`, but instead repeatedly re-slices from `buffer` with a *decreasing* `last_idx`, which expands the slice rather than trimming it. This can yield unexpected values and, when `last_idx` reaches 0, `buffer[:-0]` becomes `''`, which is likely unintended. Consider operating directly on `cache` (e.g. `cache = cache.rstrip('\\')` or `cache = cache[:-1]` in a loop) instead of re-deriving it from `buffer`.
</issue_to_address>

Sourcery 对开源项目免费——如果你觉得我们的评审有帮助,请考虑分享它 ✨
帮我变得更有用!请对每条评论点 👍 或 👎,我会根据你的反馈改进后续的评审。
Original comment in English

Hey - I've found 1 issue, and left some high level feedback:

  • The backslash-handling loop can behave unexpectedly when many trailing backslashes are present (e.g., last_idx can reach 0 and go negative, changing the meaning of buffer[:-last_idx] and potentially causing an infinite loop); consider rewriting this logic to compute a safe cutoff index once (or use something like rstrip('\') on the sliced buffer) instead of repeatedly reslicing with a decreasing negative index.
  • The hardcoded constants 20 and initial last_idx = 19 make the truncation behavior difficult to reason about; extracting these into clearly named variables and/or adding a brief comment explaining the protocol-specific suffix length and the intent of the backslash adjustment would improve maintainability.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The backslash-handling loop can behave unexpectedly when many trailing backslashes are present (e.g., `last_idx` can reach 0 and go negative, changing the meaning of `buffer[:-last_idx]` and potentially causing an infinite loop); consider rewriting this logic to compute a safe cutoff index once (or use something like `rstrip('\')` on the sliced buffer) instead of repeatedly reslicing with a decreasing negative index.
- The hardcoded constants `20` and initial `last_idx = 19` make the truncation behavior difficult to reason about; extracting these into clearly named variables and/or adding a brief comment explaining the protocol-specific suffix length and the intent of the backslash adjustment would improve maintainability.

## Individual Comments

### Comment 1
<location path="api/app/services/prompt_optimizer_service.py" line_range="238-242" />
<code_context>
                 logger.error(f"Unsupported content type - {content}")
                 raise Exception("Unsupported content type")
             cache = buffer[:-20]
+            last_idx = 19
+            while cache and cache[-1] == '\\':
+                cache = buffer[:-last_idx]
+                last_idx -= 1

             # 尝试找到 "prompt": " 开始位置
</code_context>
<issue_to_address>
**issue (bug_risk):** Trailing backslash trimming loop likely has flawed slicing logic and can behave unexpectedly.

The loop intends to strip trailing backslashes from `cache`, but instead repeatedly re-slices from `buffer` with a *decreasing* `last_idx`, which expands the slice rather than trimming it. This can yield unexpected values and, when `last_idx` reaches 0, `buffer[:-0]` becomes `''`, which is likely unintended. Consider operating directly on `cache` (e.g. `cache = cache.rstrip('\\')` or `cache = cache[:-1]` in a loop) instead of re-deriving it from `buffer`.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread api/app/services/prompt_optimizer_service.py
@myhMARS myhMARS force-pushed the pref/prompt_optim branch from 9746ffc to ed765b7 Compare April 15, 2026 06:00
@keeees keeees merged commit bfb723a into release/v0.3.0 Apr 15, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants