Skip to content

fix(rag)#897

Merged
TimeBomb2018 merged 2 commits intorelease/v0.3.0from
fix/Timebomb_030
Apr 14, 2026
Merged

fix(rag)#897
TimeBomb2018 merged 2 commits intorelease/v0.3.0from
fix/Timebomb_030

Conversation

@TimeBomb2018
Copy link
Copy Markdown
Collaborator

@TimeBomb2018 TimeBomb2018 commented Apr 14, 2026

replace semicolon separators with newlines in Excel parser output

Summary by Sourcery

错误修复:

  • 修正 Excel 解析器的单元格拼接方式,将字段与工作表注释以换行符分隔,以改进后续处理效果。
Original summary in English

Summary by Sourcery

Bug Fixes:

  • Correct Excel parser cell concatenation to separate fields and sheet annotations with newlines for improved downstream processing.

@sourcery-ai
Copy link
Copy Markdown
Contributor

sourcery-ai bot commented Apr 14, 2026

审阅者指南(在小型 PR 上折叠)

审阅者指南

Excel RAG 解析器现在输出多行文本,而不是用分号分隔的字段,从而提高了可读性,并在存在工作表名称时将其保留在独立的一行。

Excel 解析器行到文本格式变更的流程图

flowchart TD
  A[Read Excel sheet row] --> B[Collect values into fields list]
  B --> C{Has_header_only_or_data_rows}
  C --> D[Data rows: build fields per row]
  C --> E[Header only: use header_fields]
  D --> F[Join fields with newline separator]
  E --> F
  F --> G{Sheetname_is_generic_sheet}
  G -- Yes --> H[Append nothing]
  G -- No --> I[Append newline then em_dash then sheetname]
  H --> J[Append formatted line to result list]
  I --> J
  J --> K[Return multiline text for RAG consumption]
Loading

文件级变更

变更 详情 文件
将 Excel 解析器行/字符串格式从分号分隔输出改为换行分隔输出。
  • 在拼接普通行的单元格字段时,将 "; ".join(...) 替换为 "\n".join(...)
  • 对于非默认工作表名称,确保在追加工作表名称前先追加一个换行符,并将工作表名称置于新的一行
  • 在仅有表头的情形下,也改为用换行符而不是分号来拼接表头字段,以保持一致性,同时保持工作表后缀格式不变
api/app/core/rag/deepdoc/parser/excel_parser.py

提示与命令

与 Sourcery 交互

  • 触发新审阅: 在 pull request 上评论 @sourcery-ai review
  • 继续讨论: 直接回复 Sourcery 的审阅评论。
  • 从审阅评论生成 GitHub issue: 通过回复审阅评论的方式,请求 Sourcery 从该评论创建一个 issue。你也可以回复审阅评论 @sourcery-ai issue 来从该评论创建一个 issue。
  • 生成 pull request 标题: 在 pull request 标题的任意位置写上 @sourcery-ai,即可随时生成标题。你也可以在 pull request 上评论 @sourcery-ai title 来(重新)生成标题。
  • 生成 pull request 摘要: 在 pull request 正文的任意位置写上 @sourcery-ai summary,即可在你想要的位置随时生成 PR 摘要。你也可以在 pull request 上评论 @sourcery-ai summary 来(重新)生成摘要。
  • 生成审阅者指南: 在 pull request 上评论 @sourcery-ai guide,即可在任何时候(重新)生成审阅者指南。
  • 解决所有 Sourcery 评论: 在 pull request 上评论 @sourcery-ai resolve,即可解决所有 Sourcery 评论。如果你已经处理完所有评论且不想再看到它们,这会很有用。
  • 关闭所有 Sourcery 审阅: 在 pull request 上评论 @sourcery-ai dismiss,即可关闭所有现有的 Sourcery 审阅。若你想从一次全新的审阅开始,这尤其有用——别忘了评论 @sourcery-ai review 来触发新的审阅!

自定义你的使用体验

访问你的 控制面板 以:

  • 启用或禁用审阅功能,例如 Sourcery 生成的 pull request 摘要、审阅者指南等。
  • 更改审阅语言。
  • 添加、移除或编辑自定义审阅说明。
  • 调整其他审阅设置。

获取帮助

Original review guide in English
Reviewer's guide (collapsed on small PRs)

Reviewer's Guide

Excel RAG parser now outputs multi-line text instead of semicolon-separated fields, improving readability and keeping sheet names on their own line when present.

Flow diagram for Excel parser row-to-text formatting change

flowchart TD
  A[Read Excel sheet row] --> B[Collect values into fields list]
  B --> C{Has_header_only_or_data_rows}
  C --> D[Data rows: build fields per row]
  C --> E[Header only: use header_fields]
  D --> F[Join fields with newline separator]
  E --> F
  F --> G{Sheetname_is_generic_sheet}
  G -- Yes --> H[Append nothing]
  G -- No --> I[Append newline then em_dash then sheetname]
  H --> J[Append formatted line to result list]
  I --> J
  J --> K[Return multiline text for RAG consumption]
Loading

File-Level Changes

Change Details Files
Change Excel parser row/string formatting from semicolon-separated to newline-separated output.
  • Replace "; ".join(...) with "\n".join(...) when joining cell fields for normal rows
  • Ensure sheet names are appended on a new line with a preceding newline character for non-default sheet names
  • Keep header-only case consistent by also joining header fields with newlines instead of semicolons while leaving sheet suffix formatting unchanged
api/app/core/rag/deepdoc/parser/excel_parser.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - 我在这里给出了一些整体反馈:

  • 在仅包含表头的分支中,你仍然使用前置空格(" ——")而不是换行符("\n——")来追加工作表名称,这与主分支不一致,并且在你切换到按行分隔输出之后,这很可能也不是你真正想要的行为。
给 AI 代理的提示
Please address the comments from this code review:

## Overall Comments
- In the header-only branch you still append the sheet name with a leading space (`" ——"`) rather than a newline (`"\n——"`), which is inconsistent with the main branch and likely not what you intended after switching to newline-separated output.

Sourcery 对开源项目免费——如果你觉得我们的代码审查有帮助,请考虑分享给更多人 ✨
帮我变得更有用!请在每条评论上点 👍 或 👎,我会根据你的反馈来改进后续的代码审查。
Original comment in English

Hey - I've left some high level feedback:

  • In the header-only branch you still append the sheet name with a leading space (" ——") rather than a newline ("\n——"), which is inconsistent with the main branch and likely not what you intended after switching to newline-separated output.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In the header-only branch you still append the sheet name with a leading space (`" ——"`) rather than a newline (`"\n——"`), which is inconsistent with the main branch and likely not what you intended after switching to newline-separated output.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@TimeBomb2018 TimeBomb2018 merged commit dde7ea9 into release/v0.3.0 Apr 14, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant