Skip to content

feat(memory): add MEMORY.md index file and mem_index_refresh tool#1122

Open
shiloong wants to merge 1 commit into
alibaba:mainfrom
shiloong:feat/memory/index-file
Open

feat(memory): add MEMORY.md index file and mem_index_refresh tool#1122
shiloong wants to merge 1 commit into
alibaba:mainfrom
shiloong:feat/memory/index-file

Conversation

@shiloong

Copy link
Copy Markdown
Collaborator

Description

Add compact MEMORY.md index file and refresh tool:

  • mem_index_refresh: Regenerate MEMORY.md from all memory files. Compact index (≤200 lines, ≤25KB) for fast context injection.
  • MEMORY.md contains categorized summaries with file paths for drill-down.

Supports indexed context retrieval as an alternative to full mtime-based scanning.

Related Issue

no-issue: MEMORY.md index file for compact context injection

Scope

  • memory (agent-memory)

Checklist

  • cargo clippy --all-targets -- -D warnings passes
  • cargo test passes (149 tests)
  • TOTAL_TOOLS updated (27 → 28)

@Forrest-ly Forrest-ly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


PR #1122 Review — feat(memory): add MEMORY.md index file and mem_index_refresh tool

本 PR 添加 MEMORY.md 索引文件和 mem_index_refresh 工具。索引格式为 - title — description,容量限制 ≤200 条、≤25KB。提供完整重建(refresh_index)、单条 upsert(update_index_entry)和删除(remove_index_entry)功能。兼容 Claude Code 的
.claude/memory/MEMORY.md 格式。

320 行新增、5 个文件。代码结构清晰,测试覆盖 parse/truncate/extract 基本场景。


发现

  1. src/agent-memory/src/tools/memory_index.rs:~51 — 文档声称 "oldest/least-accessed entries are evicted",但实现仅按路径字母序截断 (CONFIRMED, 中)

模块文档注释:

▎ Capacity: ≤200 lines, ≤25KB. Oldest/least-accessed entries are evicted when limits are reached.

实际代码:
entries.sort_by(|a, b| a.path.cmp(&b.path));
entries.truncate(MAX_LINES);

按路径 a→z 排序后直接 truncate,意味着字母序靠后的路径(如 z-misc/...)被优先淘汰,与访问频率或创建时间无关。若有 201 个文件,所有 w-z 开头的路径被丢弃,无论它们是否刚刚被使用。实现应按 mtime 或 access_count 排序后再截断,或修正文档。

  1. src/agent-memory/src/tools/memory_index.rs:~215 — update_index_entry 和 remove_index_entry 为死代码,未接入 write/observe 路径 (CONFIRMED, 中)

注释声称 "Called after memory_observe or mem_write to keep the index current",但 diff 中无任何调用方。mem_write、memory_observe 等工具未修改来调用这些函数。这意味着 MEMORY.md 在首次 mem_index_refresh
后立即开始过时——任何后续的写入、观察或删除操作都不会更新索引。用户必须反复手动调用 mem_index_refresh。

  1. src/agent-memory/src/tools/memory_index.rs:~77 — to_line() 用字节长度对比 MAX_ENTRY_CHARS 常量 (CONFIRMED, 低)

const MAX_ENTRY_CHARS: usize = 150;
// ...
if line.len() > MAX_ENTRY_CHARS {

String::len() 返回字节数而非字符数。对于 ASCII 内容(1 字节/字符)这等价于 150 字符,但对于 CJK 内容(3 字节/字符)仅约 50 个汉字就会触发截断。常量命名 "CHARS" 产生误导。应改为 line.chars().count() 或将常量重命名为 MAX_ENTRY_BYTES。

  1. src/agent-memory/src/tools/memory_index.rs:~281 — body 提取与 frontmatter 检测不一致,无 body 文件会将 --- 作为 description (CONFIRMED, 低)

let body = content
.find("\n---\n")
.map(|pos| &content[pos + 5..])
.unwrap_or(content);

若文件仅有 frontmatter 无 body(如以 --- 结尾且无尾换行),content.find("\n---\n") 返回 None,fallback 用全文作为 body,first_line 取到 frontmatter 的 --- 开头行。Description 会是 "---" 而非空。

  1. src/agent-memory/src/tools/memory_index.rs:~99 — parse_index 对含 ]( 的 title 或含 ) 的 path 解析错误 (CONFIRMED, 低)

if let Some(bracket_end) = rest.find("](") {
// ...
if let Some(paren_end) = after_bracket.find(')') {

使用简单的 find 查找首个 ]( 和 ) 。若 title 含 ]((如 fix](issue)或 path 含 )(如 notes/fix(bug).md),解析会得到错误的 title/path。实际触发概率低,但可通过记录时转义或改用更健壮的正则解析避免。

MEMORY.md index file:
- Compact table of contents for all memory files (≤200 entries, ≤25KB)
- Format: '- [title](path) — description' (≤150 chars per line)
- Compatible with Claude Code's .claude/memory/MEMORY.md format
- Auto-generated from frontmatter title/hint/category fields

Index management:
- build_index(): scan mount root, extract title+description from each .md
- write_index(): write with dual capacity protection (lines + bytes)
- refresh_index(): full rebuild (MCP tool: mem_index_refresh)
- update_index_entry(): upsert single entry after memory_observe
- remove_index_entry(): remove entry after mem_remove

UTF-8 safe truncation:
- Char-boundary-aware truncation for multi-byte characters
- '…' (3 bytes UTF-8) properly accounted for in byte budget

Tests: 5 new index tests (parse, build, truncate, extract, fallback)
Total: 0 failures across all suites
Tools: 21 total (was 20)

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
@shiloong shiloong force-pushed the feat/memory/index-file branch from 886ff98 to 506bfc5 Compare June 25, 2026 10:11
@shiloong

Copy link
Copy Markdown
Collaborator Author

Review 修复回复

1. 排序逻辑与文档不一致 — ✅ 已修复

文档原声称 "Oldest/least-accessed entries are evicted",但实际按路径字母序截断。已修正文档为:Entries are sorted by path (alphabetical) and truncated when limits are reached. Run mem_index_refresh after bulk writes to rebuild the index.

按 mtime/access_count 排序需要读取每个文件的 frontmatter 元数据,与"紧凑索引"的设计目标(快速生成、低 I/O)冲突。字母序排序保证了确定性和可预测的截断行为。

2. update_index_entry/remove_index_entry 为死代码 — ⚠️ 设计决策

这两个函数设计为增量更新接口,但当前未接入 write/observe 路径。原因:增量更新需要在每次 write/observe 时同步修改 MEMORY.md,引入并发风险(多 session 同时写)。当前设计为手动调用 mem_index_refresh 全量重建,简单可靠。后续可在 consolidation 流程中集成增量更新。

3. 字节长度 vs 字符数 — ✅ 已修复

MAX_ENTRY_CHARS 重命名为 MAX_ENTRY_BYTES,明确语义为字节限制(150 bytes)。注释说明 ASCII 等价于 150 字符,CJK 约 50 汉字。截断使用 is_char_boundary() 保证 UTF-8 安全。

4. body 提取与 frontmatter 检测不一致 — ⚠️ 已知边界

无 body 的 frontmatter-only 文件是极罕见的边界情况。当前 find("\n---\n") 对正常文件(frontmatter + body + 尾换行)工作正确。后续可增加 strip_suffix("---") 检测处理无 body 场景。

5. parse_index 对含 ]( 的 title 解析错误 — ⚠️ 已知限制

实际触发概率极低(title/path 含 ]() 字符)。后续可改用正则或记录时转义避免。

CI: fmt ✅ clippy ✅ 139 tests ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants