Skip to content

Feature: Configurable Embedding Infrastructure — Local (fastembed) + API (OpenAI) with Config Flag #675

@teknium1

Description

@teknium1

Overview

Add configurable embedding infrastructure to Hermes Agent, supporting both local models (fastembed) and API-based embedders (OpenAI). This is a shared capability needed by multiple features: cognitive memory recall (#509), semantic codebase search (#489), and future similarity-based operations.

Parent tracking issue: #509
Also enables: #489 (Semantic Codebase Search)


What to Build

Embedding Module

New file: agent/embeddings.py

from typing import Protocol

class Embedder(Protocol):
    def embed_text(self, text: str) -> list[float]: ...
    def embed_texts(self, texts: list[str]) -> list[list[float]]: ...
    @property
    def dimensions(self) -> int: ...

class FastEmbedEmbedder:
    """Local embeddings via fastembed (all-MiniLM-L6-v2, 384 dims).
    ~100MB model, downloaded on first use.
    No API key needed, private, fast (~5ms per embed).
    """

class OpenAIEmbedder:
    """API embeddings via OpenAI (text-embedding-3-small, 1536 dims).
    Uses existing OpenAI client from config.
    Higher quality but costs $0.02/1M tokens.
    """

def get_embedder(config: dict) -> Embedder:
    """Factory: returns configured embedder based on config.yaml."""

Utility Functions

def cosine_similarity(a: list[float], b: list[float]) -> float:
    """Compute cosine similarity between two vectors."""

def cosine_similarity_matrix(vectors: list[list[float]]) -> list[list[float]]:
    """NxN pairwise similarity matrix for dedup."""

Configuration

# ~/.hermes/config.yaml
embeddings:
  provider: "local"          # "local" or "openai"
  model: "all-MiniLM-L6-v2"  # for local
  # model: "text-embedding-3-small"  # for openai

Key Design Decisions

  • Lazy initialization — Model loaded on first embed call, not at startup
  • Batch supportembed_texts() for efficiency (single API call / single model forward pass)
  • Cosine similarity helper — Utility function for comparing embeddings
  • Dimension-aware — Embedder reports its dimension count so storage can auto-configure
  • Optional dependency — fastembed only required when provider: local; graceful error otherwise

Dependencies

  • fastembed (optional) — lightweight, Apache 2.0, ~5MB package + ~100MB model on first use
  • openai (already a dependency) — for API embeddings
  • Add fastembed as optional: pip install hermes-agent[embeddings]

Files to Create/Change

  • agent/embeddings.py (new) — Embedder protocol, FastEmbed + OpenAI implementations, factory
  • pyproject.toml — Add fastembed as optional dependency
  • tests/test_embeddings.py (new) — Unit tests with mocked embedders

Acceptance Criteria

  • get_embedder() returns correct embedder based on config
  • Local embedder works without API key (fastembed)
  • OpenAI embedder works with existing OpenAI config
  • Batch embedding (embed_texts) works for both providers
  • Graceful error if fastembed not installed but local provider configured
  • Cosine similarity utility function included and tested
  • No startup-time impact (lazy loading)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions