fix(security): config trust boundary + injection hardening (#594)#597
Open
thijs-hakkenberg wants to merge 1 commit into
Open
Conversation
Addresses findings from the security review in zilliztech#594. - config: project-local .memsearch.toml can no longer set endpoint/ credential/prompt-path fields (embedding/llm/compact base_url+api_key+ provider, milvus.uri/token, reranker.model, prompts.*). These travel with a cloned repo and could redirect the ambient API key, indexed content, or conversation summaries to an attacker endpoint, or read arbitrary files via a custom prompt path. They now come only from global config or CLI flags; stripped fields emit a UserWarning. (zilliztech#1/zilliztech#3/zilliztech#6) - opencode capture-daemon: replace os.system() background re-index with a shell-free subprocess.Popen(argv) using the existing split_memsearch_cmd helper, removing command injection via project-path metacharacters. (zilliztech#2) - cli: escape chunk_hash through _escape_filter_value in `expand` to close a Milvus filter-expression injection gap. (zilliztech#8) - reranker/onnx embeddings: support a `repo@revision` suffix and thread revision= through all hf_hub_download/list_repo_files/CrossEncoder calls so remote model downloads can be pinned to a commit, mitigating supply-chain tampering. (zilliztech#7, CWE-367) Tests: new trust-boundary tests in test_config.py and revision-pin tests in test_model_revision_pin.py. Full suite: 235 passed, 7 skipped. Co-Authored-By: Claude <noreply@anthropic.com>
Collaborator
|
Thanks for putting this together and for the detailed review in #594. We have already merged the highest-priority project config trust-boundary fix in #595 and released it in v0.4.11, so this PR now overlaps with main and is not directly mergeable as-is. The remaining changes here are still useful hardening, but they cover several separate surfaces. We'll keep #594 as the tracking issue and handle the remaining items as smaller focused PRs rather than merging this combined branch. Thanks again for the thorough report and patch. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Hardens the four highest-confidence findings from the security review in #594. Each fix is test-backed and the full suite passes (235 passed, 7 skipped).
The remaining findings in #594 (JS/TS
bash -c→ argv,prompts.summarizeshell-side restriction, PID validation,curl | shbootstrap, unbounded transcript reads) are intentionally left for follow-up PRs to keep this one reviewable.Changes
🔴 Config trust boundary —
.memsearch.tomlcan no longer set endpoint/credential fields (#594 findings 1/3/6)resolve_config()now strips security-sensitive fields from the project-local.memsearch.tomlbefore merging, since that file ships inside any cloned/opened repo and is therefore untrusted. Stripped:embedding.{provider,base_url,api_key},llm.{provider,base_url,api_key,providers},compact.{llm_provider,base_url,api_key},milvus.{uri,token},reranker.model, and all ofprompts.*. AUserWarninglists what was ignored.Without this, a malicious repo could set
embedding.base_url = "https://evil/v1"and the OpenAI SDK would attach the victim's ambientOPENAI_API_KEYto requests sent there — exfiltrating the key (and all indexed content / conversation summaries) on the next index, which the Claude Code plugin triggers automatically on session start. Global config and explicit CLI flags are trusted and unaffected.🔴
os.system→subprocessin OpenCode capture-daemon (#594 finding 2)capture-daemon.pybuilt a shell string frommemory_dir(derived from the project path) and ran it viaos.systemon every poll cycle — a path component containing a shell metacharacter was command execution. Replaced withsubprocess.Popen([...argv], start_new_session=True)using thesplit_memsearch_cmdhelper already in the file. No shell.🟡 Escape
chunk_hashinexpand(#594 finding 8)cli.pywas the one caller building a Milvus filter without the_escape_filter_valuehelper used everywhere else. Now escaped.🟠 Pin-able model revisions for reranker + ONNX embeddings (#594 finding 7, CWE-367)
Model ids now accept a
repo@revisionsuffix, threaded through everyhf_hub_download/list_repo_files/CrossEncodercall. This lets users pin the exact remote weights they download and execute, mitigating a compromised/MITM'd HuggingFace repo serving a malicious ONNX model. Default behavior (unpinned) is unchanged.Testing
tests/test_config.py: new trust-boundary tests (project fields stripped + warning; global/CLI still apply).tests/test_model_revision_pin.py: newrepo@revisionparsing tests.235 passed, 7 skipped.ruff check+ruff format --checkclean.Closes parts of #594.
🤖 Generated with Claude Code