Overview
Add configurable embedding infrastructure to Hermes Agent, supporting both local models (fastembed) and API-based embedders (OpenAI). This is a shared capability needed by multiple features: cognitive memory recall (#509), semantic codebase search (#489), and future similarity-based operations.
Parent tracking issue: #509
Also enables: #489 (Semantic Codebase Search)
What to Build
Embedding Module
New file: agent/embeddings.py
from typing import Protocol
class Embedder(Protocol):
def embed_text(self, text: str) -> list[float]: ...
def embed_texts(self, texts: list[str]) -> list[list[float]]: ...
@property
def dimensions(self) -> int: ...
class FastEmbedEmbedder:
"""Local embeddings via fastembed (all-MiniLM-L6-v2, 384 dims).
~100MB model, downloaded on first use.
No API key needed, private, fast (~5ms per embed).
"""
class OpenAIEmbedder:
"""API embeddings via OpenAI (text-embedding-3-small, 1536 dims).
Uses existing OpenAI client from config.
Higher quality but costs $0.02/1M tokens.
"""
def get_embedder(config: dict) -> Embedder:
"""Factory: returns configured embedder based on config.yaml."""
Utility Functions
def cosine_similarity(a: list[float], b: list[float]) -> float:
"""Compute cosine similarity between two vectors."""
def cosine_similarity_matrix(vectors: list[list[float]]) -> list[list[float]]:
"""NxN pairwise similarity matrix for dedup."""
Configuration
# ~/.hermes/config.yaml
embeddings:
provider: "local" # "local" or "openai"
model: "all-MiniLM-L6-v2" # for local
# model: "text-embedding-3-small" # for openai
Key Design Decisions
- Lazy initialization — Model loaded on first embed call, not at startup
- Batch support —
embed_texts() for efficiency (single API call / single model forward pass)
- Cosine similarity helper — Utility function for comparing embeddings
- Dimension-aware — Embedder reports its dimension count so storage can auto-configure
- Optional dependency — fastembed only required when
provider: local; graceful error otherwise
Dependencies
fastembed (optional) — lightweight, Apache 2.0, ~5MB package + ~100MB model on first use
openai (already a dependency) — for API embeddings
- Add fastembed as optional:
pip install hermes-agent[embeddings]
Files to Create/Change
agent/embeddings.py (new) — Embedder protocol, FastEmbed + OpenAI implementations, factory
pyproject.toml — Add fastembed as optional dependency
tests/test_embeddings.py (new) — Unit tests with mocked embedders
Acceptance Criteria
Overview
Add configurable embedding infrastructure to Hermes Agent, supporting both local models (fastembed) and API-based embedders (OpenAI). This is a shared capability needed by multiple features: cognitive memory recall (#509), semantic codebase search (#489), and future similarity-based operations.
Parent tracking issue: #509
Also enables: #489 (Semantic Codebase Search)
What to Build
Embedding Module
New file:
agent/embeddings.pyUtility Functions
Configuration
Key Design Decisions
embed_texts()for efficiency (single API call / single model forward pass)provider: local; graceful error otherwiseDependencies
fastembed(optional) — lightweight, Apache 2.0, ~5MB package + ~100MB model on first useopenai(already a dependency) — for API embeddingspip install hermes-agent[embeddings]Files to Create/Change
agent/embeddings.py(new) — Embedder protocol, FastEmbed + OpenAI implementations, factorypyproject.toml— Add fastembed as optional dependencytests/test_embeddings.py(new) — Unit tests with mocked embeddersAcceptance Criteria
get_embedder()returns correct embedder based on configembed_texts) works for both providers