feat: 添加内存诊断工具 by Ronifue · Pull Request #1665 · Mai-with-u/MaiBot

Ronifue · 2026-05-09T10:42:14Z

✅ 接受：与main直接相关的Bug修复：提交到dev分支
新增功能类pr需要经过issue提前讨论，否则不会被合并
🌐 i18n 提醒：除 bootstrap 或紧急修复外，请不要把非 zh-CN 目标翻译作为常规 GitHub 编辑面；常规翻译以 Crowdin -> l10n_* PR 回流为准，详见 docs/i18n.md

请填写以下内容

（删除掉中括号内的空格，并替换为小写的x）

- main 分支 禁止修改，请确认本次提交的分支 不是 main 分支
- 我确认我阅读了贡献指南
- 本次更新类型为：BUG修复
- 本次更新类型为：功能新增
- 本次更新是否经过测试
- 如果本次修改涉及 src/A_memorix，我确认已阅读 src/A_memorix/MODIFICATION_POLICY.md，不涉及则无需勾选
请填写破坏性更新的具体内容（如有）:
请简要说明本次更新的内容和目的：

其他信息

关联 Issue：Close #
截图/GIF：
附加信息:

新增长时间运行内存诊断工具，用于排查长期运行后的内存占用增长问题。
诊断任务可通过 [debug] 配置开关启用，定期采集相关状态，并输出 JSONL 快照日志，方便对比趋势和定位异常来源。并用 AI 写了个零基础使用文档。

Summary by CodeRabbit

新功能
- 增加可选的长期内存诊断任务：定期采集进程/子进程内存、Python 运行时与任务、会话与消息缓存、二进制媒体与各类内部队列指标，记录摘要并根据阈值输出告警与轮转的 JSONL 诊断日志。
文档
- 新增面向非开发者的内存诊断使用指南，包含何时启用、关键配置项、输出说明、排查流程与注意事项。
依赖更新
- 添加 psutil 用于改进系统级内存指标采集。
测试
- 扩展测试覆盖采样、估算、记录、文件轮转与告警隔离行为。

Dev

coderabbitai · 2026-05-09T10:42:25Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

该 PR 添加周期性内存诊断子系统：配置与依赖、MemoryDiagnosticsTask 实现、多维度采集器、二进制估算与采样、JSONL 快照持久化与轮转、启动集成、单元测试与运维指南文档。

变更说明

内存诊断服务完整实现

Layer / File(s)	Summary
配置与依赖 `src/config/config.py`, `src/config/official_configs.py`, `pyproject.toml`, `requirements.txt`	升级 CONFIG_VERSION；`write_config_to_file` 微调 a_memorix 访问；`DebugConfig` 新增多项内存诊断字段；添加运行时依赖 `psutil>=6.0.0`。
用户文档 `docs/memory_diagnostics_guide.md`	新增运维级内存诊断指南，包含启用场景、推荐配置、输出位置、快速判断流程、告警配置、故障报告及字段速查表。
启动集成 `src/main.py`	在初始化组件时按 debug 配置条件注册 MemoryDiagnosticsTask 到异步任务管理器。
服务核心实现 `src/services/memory_diagnostics_service.py`	新增 MemoryDiagnosticsTask 及多个采集器：进程/子进程、Python GC/tracemalloc、asyncio 任务分布、Heartflow 会话与二进制估算、Chat/WebSocket/媒体任务/memory_automation/A_Memorix 指标；采集构建 JSON 对象并写入 JSONL，支持阈值告警与 tracemalloc 差分。
估算与采样工具 `src/services/memory_diagnostics_service.py`	实现消息组件二进制估算、spread 采样、历史循环抽样与外推、有界深度/循环检测、待处理任务栈遍历与任务二进制估算、扫描预算规划/跳过标记等。
持久化与轮转 `src/services/memory_diagnostics_service.py`	解析输出路径、保证目录、按总大小轮转历史文件、追加单行 JSONL、清理超限/过期文件并修剪历史快照。
单元测试 `tests/test_memory_diagnostics_service.py`	新增测试覆盖配置默认值、估算与采样、扫描计划公平性、heartflow 采集模拟、快照构建与 JSONL 写入、错误隔离、轮转修剪与进程 cmdline 断言。

Sequence Diagram(s)

sequenceDiagram
  participant Scheduler as AsyncTaskManager
  participant Task as MemoryDiagnosticsTask
  participant Collector as _collect_snapshot
  participant Proc as ProcessMetrics
  participant HF as HeartflowCollector
  participant Trace as TracemallocCollector
  participant Writer as _write_snapshot
  Scheduler->>Task: 调度周期性运行
  Task->>Collector: 汇聚各子系统指标
  Collector->>Proc: 收集进程/子进程指标 (psutil)
  Collector->>HF: 收集会话/消息二进制估算
  Collector->>Trace: 可选 tracemalloc diff
  Collector-->>Task: 返回快照
  Task->>Writer: 持久化为 JSONL 并触发轮转/清理
  Writer-->>Task: 完成并记录摘要/告警

预估代码审查工作量

🎯 4 (Complex) | ⏱️ ~60 minutes

可能相关的 PRs

Mai-with-u/MaiBot#1624 — 与 a_memorix 配置读写和 CONFIG_VERSION 更改在配置处理层面存在直接代码联系。

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 9.09% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description check	✅ Passed	PR 描述完成了大部分必需项，包括分支、贡献指南、更新类型、测试确认和功能说明，但缺少关联 Issue 号码（仅为占位符）。
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	PR 标题清晰准确地概括了主要变更：添加内存诊断工具，与 raw_summary 中的所有文件变更（新增诊断指南文档、配置字段、诊断服务模块及其测试）相符，是该 PR 的核心功能。

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/memory_diagnostics_guide.md`:
- Around line 103-123: The documentation shows an inconsistent default path: the
canonical file is "logs/memory_diagnostics/memory_diagnostics.jsonl" but the
PowerShell examples and rotation example omit the "memory_diagnostics" folder
and base filename; update all occurrences (including the PowerShell examples at
the two places called out and the rotation example) to use the full path and
filename—e.g., change "Get-Content logs\memory_diagnostics.jsonl -Tail 20" to
"Get-Content logs\memory_diagnostics\memory_diagnostics.jsonl -Tail 20" and make
the rotated example include the directory and base file name like
"logs/memory_diagnostics/memory_diagnostics.20260509-153000.jsonl"; scan the doc
for other instances in the 423-439 range and make them consistent as well.

In `@src/services/memory_diagnostics_service.py`:
- Around line 1002-1012: The current code appends full child process cmdlines
into child_items using _safe_process_cmdline, which may leak private paths or
secrets; modify _safe_process_cmdline (or wrap its use where child_items is
built) to return a sanitized value containing only the executable basename and a
short hashed/length-limited summary (or a flag like "<redacted>") instead of the
raw cmdline, and ensure the JSONL output uses that sanitized string for the
"cmdline" field so sensitive arguments are never written out.
- Around line 595-603: The config memory_diagnostics_jsonl_max_total_size_mb is
treated as a per-file threshold in _rotate_snapshot_file_if_needed (rotating
when the active file exceeds the value) but the code still keeps
DEFAULT_JSONL_ROTATED_FILE_KEEP rotated files, so directory total can reach
~(keep+1)*threshold; fix by enforcing a true "max total size" during cleanup: in
_cleanup_rotated_snapshot_files (called from _rotate_snapshot_file_if_needed)
compute total bytes across the active file, rotated files returned by
_build_rotated_snapshot_path pattern, and then delete the oldest rotated files
until total_bytes <= max_total_size_bytes; alternatively if you intend a
single-file limit, rename the config to indicate "per-file" limit—apply the
former change to implement the documented "max total size" semantics.
- Line 21: Replace the direct object import "global_config" in this module with
a module-level import and/or a live accessor so reads always reflect
hot-reloads; specifically stop using "from src.config.config import
global_config" and instead import the config module (e.g., "from src.config
import config") and change all uses of the symbol "global_config" in
memory_diagnostics_service to access the live object via the module
(config.global_config...) or implement a small helper function
get_global_config() that returns src.config.config.global_config; update every
place that reads configuration (sampling interval, output path, thresholds,
tracemalloc switch, etc.) and ensure ConfigManager.reload_config() will affect
those reads at runtime.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c324c86f-5e03-4511-8e0b-ce005246c210

📥 Commits

Reviewing files that changed from the base of the PR and between 60b0b41 and efa238d.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

docs/memory_diagnostics_guide.md
pyproject.toml
requirements.txt
src/config/config.py
src/config/official_configs.py
src/main.py
src/services/memory_diagnostics_service.py
tests/test_memory_diagnostics_service.py

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/memory_diagnostics_guide.md`:
- Around line 31-47: Add a short dependency note at the top of the memory
diagnostics config section in docs/memory_diagnostics_guide.md stating that the
memory diagnostics feature requires psutil>=6.0.0 (as declared in pyproject.toml
and requirements.txt), and include a one-line install instruction (e.g. pip
install "psutil>=6.0.0"); keep the existing configuration example (keys like
enable_memory_diagnostics, memory_diagnostics_interval_seconds,
memory_diagnostics_top_sessions, etc.) unchanged and clarify that psutil is
required for runtime memory metrics used by the diagnostics feature.
- Around line 125-335: The docs omit the memory_automation module exposed by
_collect_memory_automation_metrics(); add a new diagnostic step (e.g., "第十步：看
memory_automation 队列与工作器") that lists the fields memory_automation.started,
memory_automation.fact_writeback_queue,
memory_automation.fact_writeback_worker_active,
memory_automation.chat_summary_queue,
memory_automation.chat_summary_worker_active, and
memory_automation.chat_summary_states, and give concise guidance to check for
queue backlog, inactive/blocked workers, and long-running summary state entries
referencing those field names so operators know to inspect the automation queues
and workers when memory rises.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 3ed66af9-5129-46c7-961a-21d790ac3536

📥 Commits

Reviewing files that changed from the base of the PR and between efa238d and c2cc76e.

📒 Files selected for processing (4)

docs/memory_diagnostics_guide.md
src/config/official_configs.py
src/services/memory_diagnostics_service.py
tests/test_memory_diagnostics_service.py

🚧 Files skipped from review as they are similar to previous changes (3)

src/config/official_configs.py
tests/test_memory_diagnostics_service.py
src/services/memory_diagnostics_service.py

Ronifue · 2026-05-09T12:19:30Z

抓到的日志大概就像这样，应该可以供排障了

Dev

ModifiedBy 枚举值为大写的 "AI"/"USER"，但前端 getModifierBadge 比较的是小写 'ai'，导致所有自动审核的表达式均被错误标记为"人工"。序列化时统一转为小写以与前端约定保持一致。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat(A_memorix)：优化聊天摘要窗口与历史回顾

WebUI 的封禁和更新接口仅修改了数据库 is_banned 字段，未同步移除 emoji_manager.emojis 内存列表中的对应项，导致插件等消费者在服务重启前仍能选中已封禁的表情包。同时修复 emoji_manager.ban_emoji() 中依赖身份比较（MaiEmoji 未定义 __eq__）导致跨实例调用时移除静默失败的问题，改为按 file_hash 过滤。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

列表过滤后未更新 _emoji_num，后续容量检查会使用过期值。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…ved-from-memory fix: WebUI 封禁表情包后未从内存列表移除

feat(A_memorix)：更加优化的人物画像迭代机制

SengokuCola and others added 2 commits May 8, 2026 21:14

Merge pull request Mai-with-u#1660 from Mai-with-u/dev

b41f5d4

Dev

feat: 添加内存诊断工具

efa238d

github-project-automation Bot added this to MaiM to the GATE_ and MaiM to the GATE May 9, 2026

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

Comment thread docs/memory_diagnostics_guide.md

Comment thread src/services/memory_diagnostics_service.py Outdated

Comment thread src/services/memory_diagnostics_service.py Outdated

Comment thread src/services/memory_diagnostics_service.py

Ronifue force-pushed the dev branch 2 times, most recently from c2cc76e to 621ab6f Compare May 9, 2026 11:08

coderabbitai Bot reviewed May 9, 2026

View reviewed changes

Comment thread docs/memory_diagnostics_guide.md

Comment thread docs/memory_diagnostics_guide.md

Ronifue force-pushed the dev branch from 621ab6f to c8a72dd Compare May 9, 2026 11:20

fix: 修复已知问题

6a3a664

Ronifue force-pushed the dev branch from c8a72dd to 6a3a664 Compare May 9, 2026 12:01

Ronifue and others added 2 commits May 9, 2026 23:08

docs: 完善内存诊断字段速查表

d2d250d

Merge pull request Mai-with-u#1668 from Mai-with-u/dev

161fc42

Dev

Ronifue changed the title ~~feat: 添加内存诊断工具(vibe code)~~ feat: 添加内存诊断工具 May 10, 2026

Ronifue and others added 14 commits May 10, 2026 12:50

Merge branch 'Mai-with-u:dev' into dev

3b849c3

fix(mcp)：兼容向 stdout 写横幅且未实现 list_resource_templates 的 stdio server

f49826d

no_reply更改为no_action

cbcb6a0

feat：优化图片返回

1a2042d

fix：兼容旧数据的插件数据

9b2bf0f

Merge pull request Mai-with-u#1672 from Mai-with-u/dev

3454291

Dev

更新依赖

0b0f748

更新依赖

1104767

Update package.json

522d975

Update package.json

1dd6950

Update dashboard_update.py

ca3d4d4

fix(A_memorix): 兼容 MaiBot 记忆迁移 schema

644d487

fix(dashboard): 校验 MaiBot 迁移导入表单

47eaf8f

A-Dawn and others added 27 commits May 12, 2026 15:11

Merge pull request Mai-with-u#1680 from A-Dawn/dev

61222a4

feat(A_memorix)：优化聊天摘要窗口与历史回顾

feat: 加快webui配置页面加载

7cf726e

Merge branch 'dev' of https://github.com/Mai-with-u/MaiBot into dev

7cf6602

Update config.py

dcf9028

fix: 封禁时同步更新 _emoji_num 缓存计数器

f8ffe67

列表过滤后未更新 _emoji_num，后续容量检查会使用过期值。 Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

feat: 优化webui表达审核逻辑，重构自动表达逻辑,添加webui允许内网，优化配置

8d9724e

Merge branch 'main' into dev

47abb7e

更新依赖版本

f6c72e3

Merge branch 'dev' of https://github.com/Mai-with-u/MaiBot into dev

614b660

Merge pull request Mai-with-u#1681 from hsd221/fix/emoji-ban-not-remo…

69ca877

…ved-from-memory fix: WebUI 封禁表情包后未从内存列表移除

feat: 允许进行assistant裁切以节省token

83015cf

Merge branch 'dev' of https://github.com/Mai-with-u/MaiBot into dev

d8110e5

feat(A_memorix)：更加优化的人物画像迭代机制

08ec1a5

Merge branch 'Mai-with-u:dev' into dev

44bb80d

Update person_profile_service.py

43093c3

Merge branch 'dev' of https://github.com/A-Dawn/MaiBot into dev

af4d769

Merge pull request Mai-with-u#1684 from A-Dawn/dev

2d6db69

feat(A_memorix)：更加优化的人物画像迭代机制

feat:实验性功能，合并timing

c5f798d

Merge branch 'dev' of https://github.com/Mai-with-u/MaiBot into dev

6063b78

feat：记录表达自动优化log，优化webui布局，移除webui自动更新，允许插件声明核心tool

4c051af

feat: 添加内容工具send_image，优化未展开tool的调用

7f898cf

remove；移除残留代码

c2b8520

feat: 添加内存诊断工具

6a7a9a8

fix: 修复已知问题

d3fe470

docs: 完善内存诊断字段速查表

4136008

Merge branch 'dev' of https://github.com/Ronifue/MaiBot into dev

6be9850

Ronifue closed this May 13, 2026

github-project-automation Bot moved this to 已完成 in MaiM to the GATE May 13, 2026

github-project-automation Bot moved this to 已完成 in MaiM to the GATE_ May 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: 添加内存诊断工具#1665

feat: 添加内存诊断工具#1665
Ronifue wants to merge 67 commits into
Mai-with-u:devfrom
Ronifue:dev

Ronifue commented May 9, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 9, 2026 •

edited

Loading

Reviews paused

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Ronifue commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

Ronifue commented May 9, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

请填写以下内容

其他信息

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

变更说明

Sequence Diagram(s)

预估代码审查工作量

可能相关的 PRs

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Ronifue commented May 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Ronifue commented May 9, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 9, 2026 •

edited

Loading