【WIP】feat: add local llama-cpp embedding support#1388
【WIP】feat: add local llama-cpp embedding support#1388Mijamind719 wants to merge 3 commits intovolcengine:mainfrom
Conversation
Co-authored-by: GPT-5.4 <noreply@openai.com>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Explore these optional code suggestions:
|
Co-authored-by: GPT-5.4 <noreply@openai.com>
9c8e7ef to
6e57688
Compare
Legacy issue: investigate true llama-cpp native multi-sequence batch support for local embedding models such as bge-small-zh-v1.5-f16 (current runtime reports n_seq_max=1, so embed_batch uses sequential mode). Co-authored-by: GPT-5.4 <noreply@openai.com>
6e57688 to
f6ff2a0
Compare
|
发现有几个
|
Description
This PR adds an initial local dense embedding path for OpenViking based on
llama-cpp-python, withbge-small-zh-v1.5-f16as the default local embedding model.The goal is to make local CPU embedding available without changing the existing remote providers, while keeping installation risk isolated behind an optional extra dependency.
Related Issue
N/A
Type of Change
Changes Made
provider: "local"embedding config support and default implicit local embedding behavior when no embedding provider is configured.LocalDenseEmbedderbased onllama-cpp-pythonwith:bge-small-zh-v1.5-f16model_pathcache_dirov doctorto diagnose the local embedding dependency, local model cache state, and invalid local model configuration.openviking[local-embed]extra instead of makingllama-cpp-pythona hard dependency of the main package.docs/design/local-embedding-llama-cpp-design.mdto capture the implementation scope and rollout decisions.Testing
Targeted validation performed:
python3 -m py_compileon the touched Python modulesPYTHONPATH=. ./.venv/bin/python -m pytest -q tests/cli/test_doctor.py --maxfail=1PYTHONPATH=. ./.venv/bin/python -m pytest -q tests/unit/test_local_embedder.py tests/storage/test_collection_schemas.py tests/misc/test_config_validation.py --maxfail=1llama-cpp-python+ downloaded GGUF model:embed_query()andembed_document()returned valid 512-d vectorsKnown Limitations
bge-small-zh-v1.5-f16, native multi-sequence batch embedding is not available because the created llama context reportsn_seq_max = 1.embed_batch()is therefore safe and functional, but currently falls back to sequential embedding in this runtime instead of delivering native batch throughput.Checklist
Screenshots (if applicable)
N/A
Additional Notes
Follow-up investigation is still needed for true native multi-sequence batch support in
llama-cpp-pythonfor this embedding model/runtime combination.