feat: fastembed migration, daemon mode, TUI output, and associative linking#4
Merged
feat: fastembed migration, daemon mode, TUI output, and associative linking#4
Conversation
Add fastembed crate for local ONNX-based embeddings (all-MiniLM-L6-v2), replacing dependency on external Ollama server. Update HTTP stack with hyper and tower for better socket and middleware support in daemon mode. Add chrono for better timestamp handling in server responses.
Rename section 16 from "Context Switching" to "Project Resolution" to reflect architectural change. Remove references to .active-ctx file and context switch command. Update server documentation: --root CLI flag at startup, /reload endpoint acepts optional ?root=<path> to switch project context dynamically. Update /health endpoint description.
Replace OllamaEmbedder with FastEmbedder for local ONNX-based embeddings. Fastembed downloads models to ~/.cache/fastembed/ on first use and provides thread-safe embeddings without external server overhead. Simplify EmbeddingConfig by removing host and model_path fields. Default model is all-MiniLM-L6-v2 (384 dims, 22MB). Supports model selection via config: all-MiniLM-L6-v2, all-MiniLM-L6-v2-q, all-MiniLM-L12-v2, BGE-small-en-v1.5, BGE-small-en-v1.5-q. Add refs field to Frontmatter for inter-layer edges (code chunk IDs or memory filenames). Add expand_refs and max_ref_expansions to RecallConfig to support ref-guided memory traversal.
Add evaluation module with three key metrics for assessing embedding distribution quality: - anisotropy: average pairwise cosine similarity (lower is better, < 0.3 target) - similarity_range: max - min off-diagonal similarity (higher is better, > 0.3 target) - discrimination_gap: intra-group vs inter-group similarity delta (higher is better, > 0.05 target) Includes helper for mean-centering and re-normalizing embeddings to reduce anisotropy.
Add fallback chunking strategy for file types without tree-sitter grammar support. Files now extract chunks gracefully using line-based chunking (MAX_CHUNK_LINES per chunk) instead of skipping silently.
…ment Transform llmem-server from simple HTTP wrapper into full daemon with warm indices and hot-reload support. Add CLI argument parsing (clap) for --root and --addr options. Add remember endpoint (/remember?q=<query>&level=<level>&budget=<budget>) for semantic search with Hebbian reinforcement (access count tracking). Endpoint integrates embedding stores with HNSW indices for efficient recall. Add reload endpoint (/reload?root=<path>) for hot-swapping project indices without restarting daemon. Extend AppState to hold both project and global embedding stores alongside indices for full semantic search capability.
Remove ctx switch/show commands (context switching moves to server --root and /reload endpoint). Add daemon integration: CLI now calls daemon_notify_reload() after memorize, note, consolidate operations to hot-reload indices. Remember command attempts daemon first (warm indices) before falling back to local file-based search. Add support for inter-layer refs in memorize command to track code chunk edges. Learn command now embeds chunks and builds HNSW indices with evaluation metrics. Consolidate command builds memory HNSW and preserves file_source as ref edges. Project root now resolved via --root CLI flag (default: current directory) instead of .active-ctx file. Update semantic search to use both .memory-index and .code-index HNSW files.
Update integration tests for removal of ctx command and config changes. Regenerates snapshots to reflect new default embedding model (all-MiniLM-L6-v2 instead of nomic-embed-text) and simplified EmbeddingConfig.
Update CLI README to document new command structure (memorize, note, remember, learn, consolidate, reflect, forget) replacing previous add/learn/recall/list/search API. Include new file locations for .code-index.hnsw and .memory-index.hnsw indices.
Update index module README to document eval module for embedding quality metrics (anisotropy, similarity_range, discrimination_gap, mean_center). Note plain-text fallback chunking for file types without supported tree-sitter grammars (shell scripts, markdown, TOML, etc.).
Update main README to document shift from Ollama embeddings to local fastembed (384-dim, ONNX-based, ~22MB model downloaded to ~/.cache/fastembed/). Explain three-layer HNSW architecture (code, memory, global) with inter-layer edges via refs frontmatter field. Document cross-layer recall, embedding quality metrics in learn output, and remove legacy context switching feature.
Add OutputConfig struct to support quiet mode for suppressing elapsed-time reporting. Add max_memory_tokens field to ConsolidationConfig to control memory body truncation. These options prepare the configuration layer for UI/UX improvements and memory size management.
Add comprehensive CLI reference documenting all commands, flags, and workflows. Add configuration reference listing all config options with descriptions and defaults. Update SKILL.md to document two-layer semantic search, edge expansion, associative linking, consolidation phases, and working memory mechanics. Include rules for saving memories and common gotchas.
- Add colored TUI output for remember, learn, and consolidate commands - Add elapsed-time reporting via CommandTimer - Add quiet flag (-q) to suppress timing output - Interleave memory and code hits in semantic search - Add associative linking phase during consolidation - Expand refs during remember for edge traversal - Truncate memory bodies to max_memory_tokens
Allow users to explicitly enable TUI mode by setting the LLMEM_TUI environment variable, in addition to the automatic detection via stderr being a terminal. This enables TUI mode in non-interactive environments and simplifies testing scenarios.
Add criterion to dev-dependencies for comprehensive benchmarking support. Add 'just bench' and 'just validate' targets to justfile for convenient testing and validation workflows.
…ction Add benchmark tables for embedding store, inbox, memory index, and eval functions with TBD placeholders. Update README and benchmarks.md to reference 'just bench' for reproduction instead of raw 'cargo bench'.
Expand sr.yaml with detailed inline comments explaining each section. Enable hooks configuration and document the full feature set. Update version management settings and consolidate release pipeline config. Remove redundant test hook from pre-commit steps.
Implement temporal scoring that blends cosine similarity with memory recency and access patterns. Ported from training/src/models/temporal.py. Provides temporal_score function (decay by recency, boost by frequency, weight by type durability) and blend function for hybrid ranking. Export from lib.rs public API.
Add comprehensive benchmarks for core storage components using criterion. Measure EmbeddingStore (upsert, get, remove, save, load), Inbox (push to capacity, eviction, drain, save, load), MemoryIndex (parse, upsert, search), and content_hash performance across varying dimensions and item counts.
Add benchmarks for distance function performance (cosine similarity, dot product, L2 distance, normalization) and index operations. Parameterize by vector dimension to track performance across embeddings of different sizes.
Add regression guards for storage format fidelity. Validate that EmbeddingStore and Inbox binary/JSON roundtrips preserve data exactly. Test embedding store dimension/entry count, inbox capacity and item order, and eviction invariants to prevent silent data corruption.
Add regression guards for approximate nearest neighbor quality. Validate HNSW recall@10 >= 90%, IVF recall@10 >= 85% on synthetic data. Test embedding space metrics: anisotropy near zero for orthogonal vectors, discrimination gap > 0.05 for clustered data, mean centering reduces anisotropy.
Add regression guards for vector quantization quality. Validate MSE roundtrip cosine similarity (2-bit >= 0.90, 4-bit >= 0.95). Test Prod inner product estimates bounded by error thresholds that scale with bit-width. Verify pack/unpack exact roundtrips across dimensions and bit-widths.
Integrate temporal scoring into memory search. Replace type-based sorting with temporal_blend that mixes cosine similarity with temporal score. Load temporal_weight from config quantization settings. Improves memory relevance by considering both semantic match and recency/frequency patterns.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
llmem-serverwith remember endpoint and state managementremember,learn,consolidate, andmemorizecommands with elapsed-time reportingrememberexpands refs for edge traversaloutputandconsolidationsettings (quiet mode, max_memory_tokens, merge_threshold)Test plan
llmem learnwith TUI output on a real repollmem remembershows interleaved memory + code resultsllmem consolidatecreates associative links between similar memories