Skip to content

feat: 添加内存诊断工具#1665

Closed
Ronifue wants to merge 67 commits into
Mai-with-u:devfrom
Ronifue:dev
Closed

feat: 添加内存诊断工具#1665
Ronifue wants to merge 67 commits into
Mai-with-u:devfrom
Ronifue:dev

Conversation

@Ronifue
Copy link
Copy Markdown
Contributor

@Ronifue Ronifue commented May 9, 2026

  • ✅ 接受:与main直接相关的Bug修复:提交到dev分支
  • 新增功能类pr需要经过issue提前讨论,否则不会被合并
  • 🌐 i18n 提醒:除 bootstrap 或紧急修复外,请不要把非 zh-CN 目标翻译作为常规 GitHub 编辑面;常规翻译以 Crowdin -> l10n_* PR 回流为准,详见 docs/i18n.md

请填写以下内容

(删除掉中括号内的空格,并替换为小写的x

    • main 分支 禁止修改,请确认本次提交的分支 不是 main 分支
    • 我确认我阅读了贡献指南
    • 本次更新类型为:BUG修复
    • 本次更新类型为:功能新增
    • 本次更新是否经过测试
    • 如果本次修改涉及 src/A_memorix,我确认已阅读 src/A_memorix/MODIFICATION_POLICY.md,不涉及则无需勾选
  1. 请填写破坏性更新的具体内容(如有):
  2. 请简要说明本次更新的内容和目的:

其他信息

  • 关联 Issue:Close #

  • 截图/GIF

  • 附加信息:

    新增长时间运行内存诊断工具,用于排查长期运行后的内存占用增长问题。
    诊断任务可通过 [debug] 配置开关启用,定期采集相关状态,并输出 JSONL 快照日志,方便对比趋势和定位异常来源。并用 AI 写了个零基础使用文档。

Summary by CodeRabbit

  • 新功能

    • 增加可选的长期内存诊断任务:定期采集进程/子进程内存、Python 运行时与任务、会话与消息缓存、二进制媒体与各类内部队列指标,记录摘要并根据阈值输出告警与轮转的 JSONL 诊断日志。
  • 文档

    • 新增面向非开发者的内存诊断使用指南,包含何时启用、关键配置项、输出说明、排查流程与注意事项。
  • 依赖更新

    • 添加 psutil 用于改进系统级内存指标采集。
  • 测试

    • 扩展测试覆盖采样、估算、记录、文件轮转与告警隔离行为。

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

该 PR 添加周期性内存诊断子系统:配置与依赖、MemoryDiagnosticsTask 实现、多维度采集器、二进制估算与采样、JSONL 快照持久化与轮转、启动集成、单元测试与运维指南文档。

变更说明

内存诊断服务完整实现

Layer / File(s) Summary
配置与依赖
src/config/config.py, src/config/official_configs.py, pyproject.toml, requirements.txt
升级 CONFIG_VERSION;write_config_to_file 微调 a_memorix 访问;DebugConfig 新增多项内存诊断字段;添加运行时依赖 psutil>=6.0.0
用户文档
docs/memory_diagnostics_guide.md
新增运维级内存诊断指南,包含启用场景、推荐配置、输出位置、快速判断流程、告警配置、故障报告及字段速查表。
启动集成
src/main.py
在初始化组件时按 debug 配置条件注册 MemoryDiagnosticsTask 到异步任务管理器。
服务核心实现
src/services/memory_diagnostics_service.py
新增 MemoryDiagnosticsTask 及多个采集器:进程/子进程、Python GC/tracemalloc、asyncio 任务分布、Heartflow 会话与二进制估算、Chat/WebSocket/媒体任务/memory_automation/A_Memorix 指标;采集构建 JSON 对象并写入 JSONL,支持阈值告警与 tracemalloc 差分。
估算与采样工具
src/services/memory_diagnostics_service.py
实现消息组件二进制估算、spread 采样、历史循环抽样与外推、有界深度/循环检测、待处理任务栈遍历与任务二进制估算、扫描预算规划/跳过标记等。
持久化与轮转
src/services/memory_diagnostics_service.py
解析输出路径、保证目录、按总大小轮转历史文件、追加单行 JSONL、清理超限/过期文件并修剪历史快照。
单元测试
tests/test_memory_diagnostics_service.py
新增测试覆盖配置默认值、估算与采样、扫描计划公平性、heartflow 采集模拟、快照构建与 JSONL 写入、错误隔离、轮转修剪与进程 cmdline 断言。

Sequence Diagram(s)

sequenceDiagram
  participant Scheduler as AsyncTaskManager
  participant Task as MemoryDiagnosticsTask
  participant Collector as _collect_snapshot
  participant Proc as ProcessMetrics
  participant HF as HeartflowCollector
  participant Trace as TracemallocCollector
  participant Writer as _write_snapshot
  Scheduler->>Task: 调度周期性运行
  Task->>Collector: 汇聚各子系统指标
  Collector->>Proc: 收集进程/子进程指标 (psutil)
  Collector->>HF: 收集会话/消息二进制估算
  Collector->>Trace: 可选 tracemalloc diff
  Collector-->>Task: 返回快照
  Task->>Writer: 持久化为 JSONL 并触发轮转/清理
  Writer-->>Task: 完成并记录摘要/告警
Loading

预估代码审查工作量

🎯 4 (Complex) | ⏱️ ~60 minutes

可能相关的 PRs

  • Mai-with-u/MaiBot#1624 — 与 a_memorix 配置读写和 CONFIG_VERSION 更改在配置处理层面存在直接代码联系。
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description check ✅ Passed PR 描述完成了大部分必需项,包括分支、贡献指南、更新类型、测试确认和功能说明,但缺少关联 Issue 号码(仅为占位符)。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed PR 标题清晰准确地概括了主要变更:添加内存诊断工具,与 raw_summary 中的所有文件变更(新增诊断指南文档、配置字段、诊断服务模块及其测试)相符,是该 PR 的核心功能。

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/memory_diagnostics_guide.md`:
- Around line 103-123: The documentation shows an inconsistent default path: the
canonical file is "logs/memory_diagnostics/memory_diagnostics.jsonl" but the
PowerShell examples and rotation example omit the "memory_diagnostics" folder
and base filename; update all occurrences (including the PowerShell examples at
the two places called out and the rotation example) to use the full path and
filename—e.g., change "Get-Content logs\memory_diagnostics.jsonl -Tail 20" to
"Get-Content logs\memory_diagnostics\memory_diagnostics.jsonl -Tail 20" and make
the rotated example include the directory and base file name like
"logs/memory_diagnostics/memory_diagnostics.20260509-153000.jsonl"; scan the doc
for other instances in the 423-439 range and make them consistent as well.

In `@src/services/memory_diagnostics_service.py`:
- Around line 1002-1012: The current code appends full child process cmdlines
into child_items using _safe_process_cmdline, which may leak private paths or
secrets; modify _safe_process_cmdline (or wrap its use where child_items is
built) to return a sanitized value containing only the executable basename and a
short hashed/length-limited summary (or a flag like "<redacted>") instead of the
raw cmdline, and ensure the JSONL output uses that sanitized string for the
"cmdline" field so sensitive arguments are never written out.
- Around line 595-603: The config memory_diagnostics_jsonl_max_total_size_mb is
treated as a per-file threshold in _rotate_snapshot_file_if_needed (rotating
when the active file exceeds the value) but the code still keeps
DEFAULT_JSONL_ROTATED_FILE_KEEP rotated files, so directory total can reach
~(keep+1)*threshold; fix by enforcing a true "max total size" during cleanup: in
_cleanup_rotated_snapshot_files (called from _rotate_snapshot_file_if_needed)
compute total bytes across the active file, rotated files returned by
_build_rotated_snapshot_path pattern, and then delete the oldest rotated files
until total_bytes <= max_total_size_bytes; alternatively if you intend a
single-file limit, rename the config to indicate "per-file" limit—apply the
former change to implement the documented "max total size" semantics.
- Line 21: Replace the direct object import "global_config" in this module with
a module-level import and/or a live accessor so reads always reflect
hot-reloads; specifically stop using "from src.config.config import
global_config" and instead import the config module (e.g., "from src.config
import config") and change all uses of the symbol "global_config" in
memory_diagnostics_service to access the live object via the module
(config.global_config...) or implement a small helper function
get_global_config() that returns src.config.config.global_config; update every
place that reads configuration (sampling interval, output path, thresholds,
tracemalloc switch, etc.) and ensure ConfigManager.reload_config() will affect
those reads at runtime.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c324c86f-5e03-4511-8e0b-ce005246c210

📥 Commits

Reviewing files that changed from the base of the PR and between 60b0b41 and efa238d.

⛔ Files ignored due to path filters (1)
  • uv.lock is excluded by !**/*.lock
📒 Files selected for processing (8)
  • docs/memory_diagnostics_guide.md
  • pyproject.toml
  • requirements.txt
  • src/config/config.py
  • src/config/official_configs.py
  • src/main.py
  • src/services/memory_diagnostics_service.py
  • tests/test_memory_diagnostics_service.py

Comment thread docs/memory_diagnostics_guide.md
Comment thread src/services/memory_diagnostics_service.py Outdated
Comment thread src/services/memory_diagnostics_service.py Outdated
Comment thread src/services/memory_diagnostics_service.py
@Ronifue Ronifue force-pushed the dev branch 2 times, most recently from c2cc76e to 621ab6f Compare May 9, 2026 11:08
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/memory_diagnostics_guide.md`:
- Around line 31-47: Add a short dependency note at the top of the memory
diagnostics config section in docs/memory_diagnostics_guide.md stating that the
memory diagnostics feature requires psutil>=6.0.0 (as declared in pyproject.toml
and requirements.txt), and include a one-line install instruction (e.g. pip
install "psutil>=6.0.0"); keep the existing configuration example (keys like
enable_memory_diagnostics, memory_diagnostics_interval_seconds,
memory_diagnostics_top_sessions, etc.) unchanged and clarify that psutil is
required for runtime memory metrics used by the diagnostics feature.
- Around line 125-335: The docs omit the memory_automation module exposed by
_collect_memory_automation_metrics(); add a new diagnostic step (e.g., "第十步:看
memory_automation 队列与工作器") that lists the fields memory_automation.started,
memory_automation.fact_writeback_queue,
memory_automation.fact_writeback_worker_active,
memory_automation.chat_summary_queue,
memory_automation.chat_summary_worker_active, and
memory_automation.chat_summary_states, and give concise guidance to check for
queue backlog, inactive/blocked workers, and long-running summary state entries
referencing those field names so operators know to inspect the automation queues
and workers when memory rises.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3ed66af9-5129-46c7-961a-21d790ac3536

📥 Commits

Reviewing files that changed from the base of the PR and between efa238d and c2cc76e.

📒 Files selected for processing (4)
  • docs/memory_diagnostics_guide.md
  • src/config/official_configs.py
  • src/services/memory_diagnostics_service.py
  • tests/test_memory_diagnostics_service.py
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/config/official_configs.py
  • tests/test_memory_diagnostics_service.py
  • src/services/memory_diagnostics_service.py

Comment thread docs/memory_diagnostics_guide.md
Comment thread docs/memory_diagnostics_guide.md
@Ronifue
Copy link
Copy Markdown
Contributor Author

Ronifue commented May 9, 2026

image 抓到的日志大概就像这样,应该可以供排障了

@Ronifue Ronifue changed the title feat: 添加内存诊断工具(vibe code) feat: 添加内存诊断工具 May 10, 2026
A-Dawn and others added 27 commits May 12, 2026 15:11
feat(A_memorix):优化聊天摘要窗口与历史回顾
WebUI 的封禁和更新接口仅修改了数据库 is_banned 字段,未同步移除
emoji_manager.emojis 内存列表中的对应项,导致插件等消费者在服务
重启前仍能选中已封禁的表情包。

同时修复 emoji_manager.ban_emoji() 中依赖身份比较(MaiEmoji 未
定义 __eq__)导致跨实例调用时移除静默失败的问题,改为按 file_hash
过滤。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
列表过滤后未更新 _emoji_num,后续容量检查会使用过期值。

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ved-from-memory

fix: WebUI 封禁表情包后未从内存列表移除
feat(A_memorix):更加优化的人物画像迭代机制
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants