fix: close TB-10 graph store injection path (CIPHER-001/002)#90
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
✅ Grippy Review — PASSScore: 100/100 | Findings: 0 Commit: 1a5e2a3 |
| @@ -277,6 +277,7 @@ def format_pr_context( | |||
| learnings: str = "", | |||
There was a problem hiding this comment.
🟡 MEDIUM: Prompt ingress path includes explicit _escape_xml() for all untrusted context (including graph_context)
Confidence: 94%
Review of function signature and usage shows graph_context goes through _escape_xml before reaching the LLM.
The new graph_context parameter to format_pr_context() (agent.py) receives downstream context and passes it through _escape_xml() before LLM prompt assembly, satisfying the TB-10 trust boundary invariant. However, this requires all indirect invocations to always use this method for prompt construction.
Suggestion: Ensure that any future prompt construction for PR review always routes through format_pr_context() and does not manually construct prompt context. Add a static analysis check if possible.
— All prompt-boundary defense in one place. Keep it that way.
| @@ -668,3 +668,150 @@ def test_fork_403_no_dangerous_trigger_advice(self) -> None: | |||
| source = inspect.getsource(review_main) | |||
There was a problem hiding this comment.
🟡 MEDIUM: End-to-end tests prove no regression for known persistence attacks (CIPHER-001/002)
Confidence: 94%
Test test_poisoned_observation_neutralized_in_prompt follows described attack path, confirms prompt output is clean.
New tests under TestGraphStoreInjection exercise all documented attack vectors (poisoned observation, poisoned finding title, XML breakout, homoglyph evasion) and confirm expected outcomes at both pre- and post-sanitization layers. The semantic paraphrase test explicitly documents remaining risk (out of scope for code-level defense, must be handled at policy/trust boundaries).
Suggestion: No further action required; risk profile is clearly documented and testable.
— End-to-end adversarial tests. For once, coverage matches the risk.
| @@ -185,6 +185,68 @@ def test_format_context_sanitizes_bidi_override(self) -> None: | |||
| assert "\u202a" not in text | |||
There was a problem hiding this comment.
🟡 MEDIUM: Comprehensive test coverage for Unicode normalization, boundary semantics, and egress contract
Confidence: 93%
Tests: test_blast_radius_path_sanitized, test_file_history_path_sanitized, test_author_risk_severity_sanitized, boundary semantics for prompt-safety.
Tests cover all fields at risk of egress attacks (bidi, homoglyph, invisible characters, XML tag patterns) and document both what is blocked and the known limitations at the boundary.
Suggestion: Maintain high test coverage; update tests when new graph-context fields or logic are added.
— Tests for both what you block and what you can't. That's the right way to document risk.
| @@ -206,6 +206,7 @@ Disabled (`add_history_to_context = False`). Prior LLM responses may contain att | |||
| | TB-7 | Config/credentials boundary | `_resolve_transport()`, `_PROVIDERS` dict (module paths + class names) | agent | | |||
There was a problem hiding this comment.
🔵 LOW: Governance docs updated for TB-10 trust boundary
Confidence: 90%
CLAUDE.md: TB-10 explicitly described as requiring security review and adversarial test verification.
Documentation clearly includes the new TB-10 trust boundary in trust anchor tables and mandates security review/adversarial tests for changes.
Suggestion: No action required. Maintain discipline as boundaries evolve.
— You updated the documentation. I noticed.
|
Claude (@claude) audit this pr |
… prompt-safe (TB-10) Apply navi_sanitize.clean() to all 7 graph-derived fields at egress: blast_radius paths, recurring finding files, file history paths, file history observations, and author risk severity keys. Update docstring to declare format_context_for_llm() is explicitly NOT prompt-safe — callers must apply _escape_xml() for prompt ingress. Negative tests prove injection patterns and XML tags survive this function (intentional boundary semantics). Coding-Agent: claude-code Model: claude-opus-4-6
…TB-10) Add graph_context parameter to both format_pr_context() implementations: - agent.py: applies _escape_xml() (canonical live path) - input_fence.py: applies escape_xml() (Phase 3 replacement, parity) Wire review.py to pass graph context as graph_context= kwarg instead of concatenating into description with raw <graph-context> tags. The description field now contains only the PR description. Coding-Agent: claude-code Model: claude-opus-4-6
…ons (TB-10) 5 end-to-end tests proving the TB-10 invariant: - Poisoned observation neutralized in final prompt - Poisoned finding title neutralized in final prompt - XML breakout in graph context escaped - Homoglyph evasion (Cyrillic->Latin) caught end-to-end - Semantic paraphrase documented as conscious limitation Coding-Agent: claude-code Model: claude-opus-4-6
Coding-Agent: claude-code Model: claude-opus-4-6
b1cf883 to
ca31791
Compare
❌ Grippy Review — DIFF ERRORReview failed. Check the Actions log for details. |
Summary
format_context_for_llm()vianavi_sanitize.clean()— 5 previously unsanitized fields (blast_radius paths, recurring finding files, file history paths/observations, author risk severity keys)graph_contextparameter to bothformat_pr_context()implementations (agent.py+input_fence.py) with full_escape_xml()/escape_xml()prompt-ingress sanitizationdescription—review.pynow passes graph context as a first-classgraph_context=kwarg instead of concatenating raw<graph-context>tags into the description fieldDesign invariant (TB-10)
All graph-derived text entering the LLM prompt passes through
_escape_xml()informat_pr_context().format_context_for_llm()is explicitly NOT prompt-safe — it normalizes Unicode but does not neutralize injection patterns or escape XML.Audit trail
docs/plans/2026-04-04-cipher-remediation-plan.md)Test plan
format_context_for_llm()is NOT prompt-safe)