Docs → Memsense Docs
See also: Architecture Overview · Embedding & Search
This page explains how Memsense ranks and selects memories today.
The goal is not “nearest chunk wins”. The goal is:
- relevant memory
- current-enough memory
- lower redundancy
- better final top-k
flowchart LR
A[user query] --> B[query embedding]
A --> C[FTS query]
B --> D[8-route candidate recall]
C --> D
D --> E[SQL RRF fusion]
E --> F{session chunks present?}
F -->|yes| G[session-first hybrid scoring]
F -->|no| H[fallback final score]
G --> I[MMR-style diversity selection]
H --> I
I --> J{fallback path?}
J -->|yes| K[neighbor expansion]
J -->|no| L[top-k session results]
K --> L
Memsense recalls candidates from eight parallel routes:
vec_full: full QA chunk embeddingvec_user: user-side embeddingvec_asst: assistant-side embeddingvec_next_user: next user turn embedding, backfilled onto the previous chunklexical: PostgreSQL full-text search overtask_tag + contentfacet_personal_info: personal-info facet embeddingfacet_preferences: preference facet embeddingfacet_events: event facet embedding
Candidate pool sizing:
- per-route candidates:
max(max(top_k * 4, 32) * 2, 40) - final candidate limit before MMR:
max(top_k * 4, 32)
This creates a broader candidate pool before reranking.
SQL fuses all route ranks using Reciprocal Rank Fusion:
rrf_score = sum(1 / (15 + rank_in_route))
RRF uses route rank rather than raw similarity, so vector, lexical, and facet routes can be combined without fragile score normalization.
The current final score is:
final_score = rrf_score + 0.1 * memory_score
memory_score is the stored chunk quality score. confidence and temporal decay are not in the live scoring path.
For evaluation data ingested with --mode hybrid, session chunks remain the prompt-visible memory. Turn chunks do not directly enter the final top-k; they add bounded support to the matching session:
turn_support = min(0.12, 0.6 * best_turn_rrf_score_for_same_session)
hybrid_rrf_score = session_rrf_score + turn_support
If no session chunk is available, retrieval falls back to the normal chunk-level ranking path.
Memsense does not stop at base ranking.
It then applies diversity-aware final selection using an MMR-style procedure.
Redundancy between two candidates is estimated from:
- embedding cosine similarity
- tag overlap (Jaccard similarity)
Current combined redundancy score:
redundancy = max(embedding_similarity, 0.35 * tag_jaccard)
For each remaining candidate:
mmr_score = lambda * final_score - (1 - lambda) * max_redundancy
Current defaults:
lambda = 0.78duplicate_threshold = 0.94
This helps prevent the final results from collapsing into many near-identical chunks.
A naive memory system often fails in three ways:
- too literal — misses semantically related memory
- too repetitive — returns many near-duplicate chunks
- too stale — returns memory that is similar but no longer timely
The current Memsense pipeline directly addresses those problems:
- multi-route recall for better coverage
- memory score for reusable high-value chunks
- diversity selection for lower redundancy
In today’s code, the retrieval logic is split roughly into:
- candidate recall in
src/server/service.js - rerank and diversity logic in
src/server/retrieval/rerank.js - utility scoring logic in
src/core/scoring.js
This separation makes it easier to evolve retrieval quality without rewriting the entire write path.
Current search results expose ranking detail through fields such as:
rrf_scoreroutesfinal_scoreexplain
For session-first hybrid results, explain also includes session_rrf_score, turn_support, supporting_turn_count, and best_turn_routes.
This makes retrieval behavior inspectable and easier to debug.
Memsense retrieval is built around one idea:
relevance alone is not enough.
A useful memory system should return results that are:
- relevant
- timely
- confident
- low-redundancy
- shaped by the type of memory being retrieved
- Read Architecture Overview for the full system flow.
- Read Embedding & Search for a compact implementation summary.