Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Changelog

## [Unreleased]

### Added
- Jina AI embeddings v5 text backends (`jina-v5-nano`, `jina-v5-small`) via local ONNX inference
- Matryoshka representation learning: configurable truncate_dim for jina-v5 backends
- Asymmetric retrieval: `retrieval.query:` / `retrieval.passage:` instruction prefixes
- Auto re-embed on embedder dimension change (`--no-auto-reembed` to opt out)
- `embed_query` / `embed_document` distinction on the `Embedder` trait
- `icm config show` now displays active backend name and license tag
- `icm recall` now shows active model name in output header

### License note
Jina v5 model weights are CC BY-NC 4.0 (non-commercial). Commercial use requires a license from Jina AI.
68 changes: 67 additions & 1 deletion Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,18 @@ zerocopy = { version = "0.8", features = ["derive"] }
# Embeddings (optional)
fastembed = "4"

# Jina v5 embedder. These are workspace version pins only; each
# consumer crate must mark them `optional = true` and gate them behind
# the `jina-v5` feature. Default builds must NOT pull these crates.
#
# `ort` defaults pull `download-binaries` which fetches the ONNX Runtime
# at build time. We disable defaults and load the system runtime via
# `load-dynamic`; `ndarray` is required for `try_extract_tensor`.
hf-hub = "0.4"
ort = { version = "2.0.0-rc.9", default-features = false, features = ["load-dynamic", "ndarray"] }
tokenizers = "0.21"
ndarray = "0.16"

# Serialization
serde = { version = "1", features = ["derive"] }
serde_json = { version = "1", features = ["preserve_order"] }
Expand Down
30 changes: 30 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -392,6 +392,36 @@ C:\Users\<user>\AppData\Roaming\icm\icm\config\config.toml # Window

See [config/default.toml](config/default.toml) for all options.

## Embedder backends

ICM supports three local ONNX embedding backends — no external API required.

| Backend | Dims | License | Notes |
|---------|------|---------|-------|
| `fastembed` (default) | 384 / 768 / 1024 (model-dependent) | Apache-2.0 | multilingual-e5-base and others via fastembed |
| `jina-v5-nano` | 32, 64, 128, 256, 512, 768 (default: 768) | CC BY-NC 4.0 | jinaai/jina-embeddings-v5-text-nano-retrieval |
| `jina-v5-small` | 32, 64, 128, 256, 512, 768, 1024 (default: 1024) | CC BY-NC 4.0 | jinaai/jina-embeddings-v5-text-small-retrieval (Qwen3-based) |

> **IMPORTANT — Non-commercial restriction:** Jina v5 model weights are licensed under
> [CC BY-NC 4.0](https://creativecommons.org/licenses/by-nc/4.0/). **Use in commercial products
> requires a commercial Jina AI license.** See https://jina.ai/contact-sales for details.
> The default `fastembed` backend (Apache-2.0) has no such restriction.

Weights download automatically to `~/.cache/huggingface/` on first run. No account or API key needed.

```toml
# config.toml
[embeddings]
backend = "jina-v5-nano"
truncate_dim = 512 # optional Matryoshka dim (omit to use model default)
```

Matryoshka truncation lets you trade accuracy for speed and storage. Valid dims per backend:

- `jina-v5-nano`: 32, 64, 128, 256, 512, 768 (default: 768)
- `jina-v5-small`: 32, 64, 128, 256, 512, 768, 1024 (default: 1024)
- `fastembed`: no truncation; dim is fixed by the chosen model

## Auto-extraction

ICM extracts memories automatically via three layers:
Expand Down
15 changes: 14 additions & 1 deletion config/default.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,15 @@ prune_threshold = 0.1
# Set to false to disable embeddings entirely (no model download, keyword search only)
# enabled = false

# Embedding model (fastembed model_code). Default: multilingual-e5-small (384d, 100+ languages)
# Embedder backend: "fastembed" (default, Apache-2.0) | "jina-v5-nano" | "jina-v5-small"
#
# Jina v5 backends: CC BY-NC 4.0 (non-commercial). For commercial use, acquire a
# commercial license from Jina AI before deploying. Weights download to
# ~/.cache/huggingface/ on first run.
#
# backend = "jina-v5-nano"

# Embedding model (fastembed model_code). Default: multilingual-e5-base (768d, 100+ languages)
# Other options:
# "BAAI/bge-small-en-v1.5" — 384d, English-only (fastest)
# "Xenova/bge-small-en-v1.5" — 384d, quantized English-only
Expand All @@ -34,6 +42,11 @@ prune_threshold = 0.1
# "jinaai/jina-embeddings-v2-base-code" — 768d, code-optimized
model = "intfloat/multilingual-e5-base"

# Matryoshka truncation (jina-v5 backends only). Valid dims:
# jina-v5-nano: 32, 64, 128, 256, 512, 768 (default: 768)
# jina-v5-small: 32, 64, 128, 256, 512, 768, 1024 (default: 1024)
# truncate_dim = 512

[extraction]
# Layer 0: rule-based fact extraction (zero LLM cost)
enabled = true
Expand Down
1 change: 1 addition & 0 deletions crates/icm-cli/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ path = "src/main.rs"
[features]
default = ["embeddings", "tui"]
embeddings = ["icm-core/embeddings", "icm-mcp/embeddings"]
jina-v5 = ["icm-core/jina-v5"]
tui = ["dep:ratatui", "dep:crossterm"]
web = ["dep:axum", "dep:tokio", "dep:tower-http", "dep:rust-embed", "dep:mime_guess", "dep:getrandom"]
vendored-openssl = ["openssl/vendored"]
Expand Down
23 changes: 22 additions & 1 deletion crates/icm-cli/src/config.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,21 +46,42 @@ pub struct MemoryConfig {
pub auto_consolidate_threshold: usize,
}

/// Which embedder backend to use for `EmbeddingsConfig.backend`.
#[derive(Debug, Deserialize, Default, PartialEq, Eq, Clone)]
#[serde(rename_all = "kebab-case")]
pub enum EmbedderBackend {
/// fastembed (default) — multilingual-e5-base etc., Apache-2.0 weights.
#[default]
Fastembed,
/// jina-embeddings-v5-text-nano-retrieval — local ONNX, CC-BY-NC-4.0.
JinaV5Nano,
/// jina-embeddings-v5-text-small-retrieval (Qwen3-based) — local ONNX, CC-BY-NC-4.0.
JinaV5Small,
}

/// Embedding model settings.
#[derive(Debug, Deserialize)]
#[serde(default)]
pub struct EmbeddingsConfig {
/// Enable embeddings (set to false to skip model download entirely).
pub enabled: bool,
/// Model identifier (fastembed model_code, e.g. "intfloat/multilingual-e5-small").
/// Which embedder backend to use.
pub backend: EmbedderBackend,
/// Model identifier for the fastembed backend
/// (e.g. "intfloat/multilingual-e5-base"). Ignored by other backends.
pub model: String,
/// Matryoshka truncation dimension. `None` = use the model's default
/// output dimension. Consumed by the jina-v5-nano and jina-v5-small backends.
pub truncate_dim: Option<usize>,
}

impl Default for EmbeddingsConfig {
fn default() -> Self {
Self {
enabled: true,
backend: EmbedderBackend::Fastembed,
model: "intfloat/multilingual-e5-base".into(),
truncate_dim: None,
}
}
}
Expand Down
2 changes: 1 addition & 1 deletion crates/icm-cli/src/learn_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ mod tests {
fn test_store() -> (TempDir, SqliteStore) {
let tmp = TempDir::new().expect("failed to create temp dir");
let db_path = tmp.path().join("test.db");
let store = SqliteStore::with_dims(&db_path, 384).expect("failed to create store");
let (store, _) = SqliteStore::with_dims(&db_path, 384).expect("failed to create store");
(tmp, store)
}

Expand Down
Loading