Skip to content

feat: fastembed migration, daemon mode, TUI output, and associative linking#4

Merged
urmzd merged 25 commits intomainfrom
feat/fastembed-daemon-tui
Apr 1, 2026
Merged

feat: fastembed migration, daemon mode, TUI output, and associative linking#4
urmzd merged 25 commits intomainfrom
feat/fastembed-daemon-tui

Conversation

@urmzd
Copy link
Copy Markdown
Owner

@urmzd urmzd commented Mar 30, 2026

Summary

  • Embedding backend: Replace Ollama with local fastembed for zero-dependency embeddings
  • Daemon mode: Add llmem-server with remember endpoint and state management
  • TUI output: Colored stderr output for remember, learn, consolidate, and memorize commands with elapsed-time reporting
  • Associative linking: Consolidation phase connects similar memories via bidirectional refs; remember expands refs for edge traversal
  • Code indexing: Plain-text chunking fallback for unsupported file types, layered HNSW with interleaved memory/code hits
  • Embedding eval: Anisotropy and similarity range metrics for embedding quality assessment
  • Config: New output and consolidation settings (quiet mode, max_memory_tokens, merge_threshold)

Test plan

  • All 124 tests pass (core, index, quant, server, CLI integration)
  • Manual test: llmem learn with TUI output on a real repo
  • Manual test: llmem remember shows interleaved memory + code results
  • Manual test: llmem consolidate creates associative links between similar memories

urmzd added 25 commits March 30, 2026 02:38
Add fastembed crate for local ONNX-based embeddings (all-MiniLM-L6-v2),
replacing dependency on external Ollama server. Update HTTP stack with
hyper and tower for better socket and middleware support in daemon mode.
Add chrono for better timestamp handling in server responses.
Rename section 16 from "Context Switching" to "Project Resolution" to reflect
architectural change. Remove references to .active-ctx file and context
switch command.

Update server documentation: --root CLI flag at startup, /reload endpoint
acepts optional ?root=<path> to switch project context dynamically.
Update /health endpoint description.
Replace OllamaEmbedder with FastEmbedder for local ONNX-based embeddings.
Fastembed downloads models to ~/.cache/fastembed/ on first use and provides
thread-safe embeddings without external server overhead.

Simplify EmbeddingConfig by removing host and model_path fields. Default model
is all-MiniLM-L6-v2 (384 dims, 22MB). Supports model selection via config:
all-MiniLM-L6-v2, all-MiniLM-L6-v2-q, all-MiniLM-L12-v2, BGE-small-en-v1.5,
BGE-small-en-v1.5-q.

Add refs field to Frontmatter for inter-layer edges (code chunk IDs or memory
filenames). Add expand_refs and max_ref_expansions to RecallConfig to support
ref-guided memory traversal.
Add evaluation module with three key metrics for assessing embedding
distribution quality:

- anisotropy: average pairwise cosine similarity (lower is better, < 0.3 target)
- similarity_range: max - min off-diagonal similarity (higher is better, > 0.3 target)
- discrimination_gap: intra-group vs inter-group similarity delta (higher is better, > 0.05 target)

Includes helper for mean-centering and re-normalizing embeddings to reduce
anisotropy.
Add fallback chunking strategy for file types without tree-sitter grammar
support. Files now extract chunks gracefully using line-based chunking
(MAX_CHUNK_LINES per chunk) instead of skipping silently.
…ment

Transform llmem-server from simple HTTP wrapper into full daemon with warm
indices and hot-reload support. Add CLI argument parsing (clap) for --root
and --addr options.

Add remember endpoint (/remember?q=<query>&level=<level>&budget=<budget>)
for semantic search with Hebbian reinforcement (access count tracking).
Endpoint integrates embedding stores with HNSW indices for efficient recall.

Add reload endpoint (/reload?root=<path>) for hot-swapping project indices
without restarting daemon.

Extend AppState to hold both project and global embedding stores alongside
indices for full semantic search capability.
Remove ctx switch/show commands (context switching moves to server --root
and /reload endpoint).

Add daemon integration: CLI now calls daemon_notify_reload() after memorize,
note, consolidate operations to hot-reload indices. Remember command attempts
daemon first (warm indices) before falling back to local file-based search.

Add support for inter-layer refs in memorize command to track code chunk
edges. Learn command now embeds chunks and builds HNSW indices with evaluation
metrics. Consolidate command builds memory HNSW and preserves file_source as
ref edges.

Project root now resolved via --root CLI flag (default: current directory)
instead of .active-ctx file. Update semantic search to use both .memory-index
and .code-index HNSW files.
Update integration tests for removal of ctx command and config changes.
Regenerates snapshots to reflect new default embedding model
(all-MiniLM-L6-v2 instead of nomic-embed-text) and simplified EmbeddingConfig.
Update CLI README to document new command structure (memorize, note, remember, learn, consolidate, reflect, forget) replacing previous add/learn/recall/list/search API. Include new file locations for .code-index.hnsw and .memory-index.hnsw indices.
Update index module README to document eval module for embedding quality metrics (anisotropy, similarity_range, discrimination_gap, mean_center). Note plain-text fallback chunking for file types without supported tree-sitter grammars (shell scripts, markdown, TOML, etc.).
Update main README to document shift from Ollama embeddings to local fastembed (384-dim, ONNX-based, ~22MB model downloaded to ~/.cache/fastembed/). Explain three-layer HNSW architecture (code, memory, global) with inter-layer edges via refs frontmatter field. Document cross-layer recall, embedding quality metrics in learn output, and remove legacy context switching feature.
Add OutputConfig struct to support quiet mode for suppressing elapsed-time reporting. Add max_memory_tokens field to ConsolidationConfig to control memory body truncation. These options prepare the configuration layer for UI/UX improvements and memory size management.
Add comprehensive CLI reference documenting all commands, flags, and workflows. Add configuration reference listing all config options with descriptions and defaults. Update SKILL.md to document two-layer semantic search, edge expansion, associative linking, consolidation phases, and working memory mechanics. Include rules for saving memories and common gotchas.
- Add colored TUI output for remember, learn, and consolidate commands
- Add elapsed-time reporting via CommandTimer
- Add quiet flag (-q) to suppress timing output
- Interleave memory and code hits in semantic search
- Add associative linking phase during consolidation
- Expand refs during remember for edge traversal
- Truncate memory bodies to max_memory_tokens
Allow users to explicitly enable TUI mode by setting the LLMEM_TUI
environment variable, in addition to the automatic detection via
stderr being a terminal. This enables TUI mode in non-interactive
environments and simplifies testing scenarios.
Add criterion to dev-dependencies for comprehensive benchmarking
support. Add 'just bench' and 'just validate' targets to justfile
for convenient testing and validation workflows.
…ction

Add benchmark tables for embedding store, inbox, memory index, and
eval functions with TBD placeholders. Update README and benchmarks.md
to reference 'just bench' for reproduction instead of raw 'cargo
bench'.
Expand sr.yaml with detailed inline comments explaining each section.
Enable hooks configuration and document the full feature set. Update
version management settings and consolidate release pipeline config.
Remove redundant test hook from pre-commit steps.
Implement temporal scoring that blends cosine similarity with memory
recency and access patterns. Ported from training/src/models/temporal.py.
Provides temporal_score function (decay by recency, boost by frequency,
weight by type durability) and blend function for hybrid ranking.
Export from lib.rs public API.
Add comprehensive benchmarks for core storage components using criterion.
Measure EmbeddingStore (upsert, get, remove, save, load), Inbox (push
to capacity, eviction, drain, save, load), MemoryIndex (parse, upsert,
search), and content_hash performance across varying dimensions and
item counts.
Add benchmarks for distance function performance (cosine similarity,
dot product, L2 distance, normalization) and index operations.
Parameterize by vector dimension to track performance across embeddings
of different sizes.
Add regression guards for storage format fidelity. Validate that
EmbeddingStore and Inbox binary/JSON roundtrips preserve data exactly.
Test embedding store dimension/entry count, inbox capacity and item
order, and eviction invariants to prevent silent data corruption.
Add regression guards for approximate nearest neighbor quality.
Validate HNSW recall@10 >= 90%, IVF recall@10 >= 85% on synthetic
data. Test embedding space metrics: anisotropy near zero for orthogonal
vectors, discrimination gap > 0.05 for clustered data, mean centering
reduces anisotropy.
Add regression guards for vector quantization quality. Validate MSE
roundtrip cosine similarity (2-bit >= 0.90, 4-bit >= 0.95). Test Prod
inner product estimates bounded by error thresholds that scale with bit-width.
Verify pack/unpack exact roundtrips across dimensions and bit-widths.
Integrate temporal scoring into memory search. Replace type-based sorting
with temporal_blend that mixes cosine similarity with temporal score.
Load temporal_weight from config quantization settings. Improves memory
relevance by considering both semantic match and recency/frequency patterns.
@urmzd urmzd merged commit 3ca2c39 into main Apr 1, 2026
4 checks passed
@urmzd urmzd deleted the feat/fastembed-daemon-tui branch April 6, 2026 00:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant