StarTrail-org · yichuan-w · Mar 5, 2026 · Mar 5, 2026 · Mar 5, 2026
diff --git a/.gitignore b/.gitignore
@@ -120,3 +120,6 @@ test-code/
 localtestmcp/
 *.csv
 *.pickle
+
+# Personal dev notes (not tracked)
+docs/dev/
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -18,5 +18,10 @@
     "**/*.egg-info/**": true,
     "**/build/**": true,
     "**/dist/**": true
-  }
+  },
+  "accessibility.signals.terminalBell": {
+    "sound": "on",
+    "announcement": "auto"
+  },
+  "cmake.sourceDirectory": "/Users/yichuan/Desktop/code/LEANN/leann/packages/leann-backend-hnsw"
 }
diff --git a/docs/CHANGELOG.md b/docs/CHANGELOG.md
@@ -0,0 +1,27 @@
+# Changelog
+
+All notable changes to LEANN are documented here. Append-only, newest entries at the bottom.
+
+Format: `## YYYY-MM-DD: <short summary>` followed by bullet points.
+
+## 2026-03-05: IVF backend incremental update support
+
+- Added `leann-backend-ivf` with FAISS IndexIVFFlat + DirectMap.Hashtable.
+- IVF supports in-place `add_vectors` and `remove_ids` without full rebuild.
+- `leann build` is now idempotent: re-running on an existing index does incremental update (add new, remove deleted, re-index modified files).
+- Fixed incremental build chunking inconsistency and shared metadata dict bug.
+- Fixed IVF incremental update duplicate chunks from stale `passages.jsonl`.
+
+## 2026-03-05: MCP server v2 — build, status, and structured search
+
+- Added `leann_build` MCP tool: build or incrementally update indexes directly from Claude Code.
+- Added `leann_status` MCP tool: inspect index details (backend, embedding model, chunk/file count, size).
+- `leann_search` now uses `--json` output with file paths always included, formatted as markdown code blocks.
+- Fixed `float32` JSON serialization bug in `leann search --json`.
+- Cleaned up MCP tool descriptions (concise, no emoji).
+
+## 2026-03-05: Documentation — roadmap, vision, and dev guidelines
+
+- Rewrote `docs/roadmap.md` with current P0/P1 priorities from GitHub issue #237.
+- Added `docs/ultimate_goal.md` — long-term vision (personal data platform, best code retrieval MCP, multimodal, local-first).
+- Added self-contained documentation principle and dev doc maintenance rules to `CLAUDE.md`.
diff --git a/docs/issue-proposals/smart-embedding-default.md b/docs/issue-proposals/smart-embedding-default.md
@@ -0,0 +1,41 @@
+# Smart default embedding model based on platform and corpus size
+
+## Summary
+
+Propose platform- and corpus-aware default embedding model selection for `leann build` when `--embedding-model` is not explicitly specified. This would improve out-of-the-box experience for different deployment scenarios (macOS CPU, NVIDIA GPU, etc.) without changing behavior when users pass an explicit model.
+
+## Motivation
+
+- **Current default**: `facebook/contriever` (~420MB, 768 dim) — heavy for CPU-only builds on large corpora
+- **macOS users** often hit slow builds on 20K+ chunks; lighter models like `all-MiniLM-L6-v2` (~90MB) are much faster
+- **NVIDIA GPU users** can leverage stronger models; smaller corpora benefit from quality (e.g. Qwen3-Embedding-0.6B), larger ones from balanced models (e.g. bge-base-en-v1.5)
+
+## Proposed logic
+
+| Platform | Chunk count | Default model |
+|----------|-------------|---------------|
+| **macOS** | ≥ 20,000 | `sentence-transformers/all-MiniLM-L6-v2` |
+| **macOS** | < 20,000 | `intfloat/e5-small-v2` |
+| **NVIDIA GPU** | < 5,000 | `Qwen/Qwen3-Embedding-0.6B` |
+| **NVIDIA GPU** | ≥ 5,000 | `BAAI/bge-base-en-v1.5` |
+| **Other** | any | `facebook/contriever` (unchanged) |
+
+## Implementation notes
+
+1. **Platform detection**: `torch.cuda.is_available()` for NVIDIA; `sys.platform == "darwin"` for macOS
+2. **Chunk count**: Known only after loading/chunking; may need to either:
+   - Do a lightweight pre-scan (e.g. file count × rough chunks per file), or
+   - Defer default choice until after first chunking pass (and cache for incremental)
+3. **Explicit override**: If user passes `--embedding-model`, always use it; this logic applies only when the flag is omitted
+
+## Model references
+
+- `sentence-transformers/all-MiniLM-L6-v2`: ~90MB, 384 dim, fast on CPU
+- `intfloat/e5-small-v2`: ~90MB, 384 dim
+- `Qwen/Qwen3-Embedding-0.6B`: 0.6B params, 1024 dim, strong retrieval
+- `BAAI/bge-base-en-v1.5`: ~110M params, 768 dim, good MTEB scores
+
+## Open questions
+
+- Should we add a `--embedding-model auto` to explicitly opt into this logic?
+- Pre-scan vs post-chunk decision: trade-off between accuracy and implementation complexity
diff --git a/packages/leann-core/src/leann/cli.py b/packages/leann-core/src/leann/cli.py
@@ -2540,7 +2540,7 @@ async def search_documents(self, args):
             json_results = [
                 {
                     "id": r.id,
-                    "score": r.score,
+                    "score": float(r.score),
                     "text": r.text,
                     "metadata": r.metadata,
                 }