Skip to content

feat(memory): add user profile synthesis (Dreaming V3 mem_dream)#1125

Open
shiloong wants to merge 1 commit into
alibaba:mainfrom
shiloong:feat/memory/user-profile-synthesis
Open

feat(memory): add user profile synthesis (Dreaming V3 mem_dream)#1125
shiloong wants to merge 1 commit into
alibaba:mainfrom
shiloong:feat/memory/user-profile-synthesis

Conversation

@shiloong

Copy link
Copy Markdown
Collaborator

Description

Add mem_dream tool for user profile synthesis (Dreaming V3):

  • mem_dream: Analyze all memories to synthesize a user profile — top concepts, frequent files, common errors, preferred tools, working patterns.
  • Profile stored in .anolisa/project-profile.toml for persistent cross-session context.
  • Supports on-demand regeneration and incremental updates.

Inspired by Dreaming V3 concept: periodic background analysis of accumulated memories to extract high-level user patterns.

Related Issue

no-issue: user profile synthesis via memory dreaming

Scope

  • memory (agent-memory)

Checklist

  • cargo clippy --all-targets -- -D warnings passes
  • cargo test passes (138 tests)
  • TOTAL_TOOLS updated (27 → 28)

@Forrest-ly Forrest-ly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


PR #1125 Review — feat(memory): add user profile synthesis (Dreaming V3 mem_dream)

本 PR 添加 mem_dream 工具,通过分析 session logs、consolidated facts 和 observed notes 三个数据源合成用户画像。画像分为三个维度(preferences/constraints/context),存储为 .anolisa/user-profile.toml。397 行新增,5 文件变动。

整体架构清晰,三阶段扫描逻辑易读,TOML 持久化方案合理。


发现

  1. src/agent-memory/src/tools/user_profile.rs:~256 — analyze_facts 中 file_type()? 和内层 read_dir()? 使用 ? 传播错误,一个坏目录条目中止整个合成 (CONFIRMED, 中)

if !category_entry.file_type()?.is_dir() { // ← ? 硬失败
continue;
}
// ...
for file_entry in std::fs::read_dir(category_entry.path())? { // ← ? 硬失败

对比 analyze_session_logs 全程使用 match ... Err(_) => continue 软处理。如果 facts/ 下有一个权限异常的子目录或断裂的 symlink,file_type() 或 read_dir() 返回 Err 会通过 ? 向上传播,中止整个 synthesize_profile,导致 MCP 工具返回错误。应改为
match + continue,与 session log 分析保持一致:

let ft = match category_entry.file_type() {
Ok(ft) => ft,
Err(_) => continue,
};
if !ft.is_dir() { continue; }

  1. src/agent-memory/src/tools/user_profile.rs:~280 — fact/note 条目 evidence_count 恒为 1,与 session log 聚合条目混合排序后被系统性淹没 (CONFIRMED, 中)

Session log 分析按 tool/topic/file 聚合,产生 evidence_count ≥ 5 的条目。但 analyze_facts 和 analyze_notes 对每个文件创建独立条目,evidence_count: 1。三个维度按 evidence_count 降序排序后截断到 20 条——结果是高频工具使用统计("frequently uses
mem_write (47 times)")排在语义丰富的 fact("用户偏好函数式风格")之上。

用户画像最终被低层级的工具遥测数据主导,而手动整理的高质量记忆(fact/note)反而被截断丢弃。这违背了 profile synthesis 的初衷。应对不同来源的 evidence_count 进行归一化,或至少为 fact/note 设置更高的基线权重。

  1. src/agent-memory/src/tools/user_profile.rs:~219 — session log 来源的 last_seen 恒为 Utc::now()(合成时刻),不携带实际时间信息 (CONFIRMED, 低-中)

profile.preferences.push(ProfileEntry {
description: format!("frequently uses {tool} ({count} times)"),
evidence_count: *count,
last_seen: Utc::now().to_rfc3339(), // ← 始终是合成时刻
});

Session logs 的每条记录包含 entry["ts"] 时间戳,但代码未追踪每个 tool/topic 的最后出现时间。所有 session log 来源的条目 last_seen 都是同一时刻,使该字段失去意义。应在聚合循环中追踪 max(ts) 作为 last_seen。

  1. src/agent-memory/src/tools/user_profile.rs:~348 — analyze_notes 中匹配特定 hint 的分支不检查空 body (CONFIRMED, 低)

"preference" | "style" | "convention" => {
let body = extract_body(&content);
let preview: String = body.chars().take(100).collect();
profile.preferences.push(ProfileEntry {
description: preview, // ← 可能为空字符串
// ...
});
}

当 note 文件仅有 frontmatter 无 body 时,preview 为空字符串,但仍被推入 preferences。对比默认分支(hint 不匹配时)有 if !preview.is_empty() 保护。应统一添加空检查。

  1. src/agent-memory/src/tools/user_profile.rs:~380 — parse_frontmatter_flat 和 extract_body 再次重复 (CONFIRMED, 低)

这两个函数在 PR #1120#1122#1124 中各有一份拷贝。若全部合并,代码库将存在 4 份几乎相同的 frontmatter 解析器。应提取到共享模块。

Cross-session user profile synthesis inspired by Dreaming V3's background
memory synthesis. Analyzes historical session logs and consolidated facts
to build a structured user profile with three dimensions:

- Preferences: recurring behavioral patterns (tool usage, coding style)
- Constraints: project rules and boundaries (important decisions)
- Context: ongoing work and focus areas (active files, search topics)

Implementation:
- Phase 1: Analyze .anolisa/session-logs/*.jsonl for tool frequency,
  search topics, and file edit patterns
- Phase 2: Analyze facts/<category>/*.md for lessons, interests, changes
- Phase 3: Analyze notes/observed/*.md for hints and context
- Output: .anolisa/user-profile.toml (TOML format, human-readable)
- MCP tool: mem_dream (triggers synthesis, returns JSON profile)

Evidence-based: each profile entry includes evidence_count and last_seen
timestamp. Dimensions sorted by evidence count, truncated to top 20.

Tests: 208 passed, 0 failures
Clippy: clean
Fmt: clean
Tools: 26 total (was 21)

Signed-off-by: Shile Zhang <shile.zhang@linux.alibaba.com>
@shiloong shiloong force-pushed the feat/memory/user-profile-synthesis branch from e8608e3 to 46bb5c5 Compare June 25, 2026 10:27
@shiloong

Copy link
Copy Markdown
Collaborator Author

Review 修复回复

1. file_type()? 硬失败 — ✅ 已修复

改为 match category_entry.file_type() { Ok(ft) => ft, Err(_) => continue } + match std::fs::read_dir(...) { Ok(d) => d, Err(_) => continue },与 session log 分析的软处理一致。

2. evidence_count 归一化 — ⚠️ 设计决策

session log 聚合产生高 evidence_count 条目,fact/note 为 1。这是设计权衡:工具使用频率确实反映用户偏好。后续可对 fact/note 设置基线权重或分维度排序。

3. last_seen 恒为 Utc::now() — ⚠️ 后续优化

应在聚合循环中追踪 max(ts)。当前 last_seen 对排序无影响(按 evidence_count 排序),不影响功能。

4. 空 body 未检查 — ✅ 已修复

preference/constraint 分支增加 if !preview.is_empty() 保护,与默认分支一致。

5. parse_frontmatter_flat 重复 — ⚠️ 后续重构

#1124,应提取为共享模块。

CI: fmt ✅ clippy ✅ 135 tests ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants