feat(taint): object-literal + return-value sinks (→ 92.0%)#529
Conversation
Harvest two bounded, FP-safe taint sink shapes that flip registry rules
currently rejected as `mode: taint (unsupported shape)`:
- ObjectLiteralValue: a sink that is an object/dict literal construction
whose value position carries a tainted value, e.g.
`{role: "system", content: $SINK}` (JS) / `{"role": "system", ...}` (Py).
Flips the openai/mistral system-prompt-injection rules (+4: 2 JS, 2 Py).
- ReturnValue: a `return $METAVAR` sink that fires when a return statement
returns a tainted value (bounded to return position, not a universal
bare-metavar sink). Flips mcp-unsanitized-return-python and
directly-returned-format-string (+2 Py).
Both are sink/sanitizer-only and matched only by the JS/Python engines;
other engines carry the variants but never query them. Bridge-level
fires + safe-near-miss tests added for each.
Registry load rate 91.7% -> 92.0% (1966 -> 1972 / 2144); unsupported-shape
taint rules 131 -> 125.
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (9)
📝 WalkthroughWalkthroughTwo new taint sink matcher variants, ChangesNew ObjectLiteralValue and ReturnValue taint sink matchers
Sequence Diagram(s)sequenceDiagram
participant SemgrepBridge as semgrep_taint (compile_pattern)
participant GenericMatcher
participant LangConverter as to_python/js/go/..._matcher
participant NodeMatcher
participant LangEngine as python/javascript_taint (AST walk)
participant TaintEngine as taint_engine (match_*_sink)
SemgrepBridge->>GenericMatcher: emit ObjectLiteralValue / ReturnValue (sink role only)
GenericMatcher->>LangConverter: convert per-language
LangConverter->>NodeMatcher: NodeMatcher::ObjectLiteralValue / ReturnValue
LangEngine->>LangEngine: dispatch dictionary / return_statement / object AST node
LangEngine->>TaintEngine: match_object_literal_sink(spec, sink_to_rules)
TaintEngine-->>LangEngine: MatchedSink
LangEngine->>LangEngine: expression_taint on pair values / returned expr
LangEngine-->>LangEngine: emit TaintFinding on whole node
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning Tools execution failed with the following error: Failed to run tools: Ping-pong health check failed Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| msg = {"role": "system", "content": user} | ||
| return msg | ||
| "#; | ||
| let tree = parse_file(src, Language::Python).expect("python fixture should parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| msg = {"role": "system", "content": "you are a helpful assistant"} | ||
| return msg | ||
| "#; | ||
| let tree = parse_file(src, Language::Python).expect("python fixture should parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| return msg; | ||
| } | ||
| "#; | ||
| let tree = parse_file(src, Language::JavaScript).expect("js fixture should parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| return msg; | ||
| } | ||
| "#; | ||
| let tree = parse_file(src, Language::JavaScript).expect("js fixture should parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| data = requests.get("http://api") | ||
| return data | ||
| "#; | ||
| let tree = parse_file(src, Language::Python).expect("python fixture should parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| data = requests.get("http://api") | ||
| return "ok" | ||
| "#; | ||
| let tree = parse_file(src, Language::Python).expect("python fixture should parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
Taint bridge: object-literal + return-value sinks → load rate 91.7% → 92.0%
Two more FP-safe taint sink shapes, found by enumerating the 131 rejected rules and grouping by rules-flippable-per-shape (distinct rule ids whose other role already compiles) rather than pattern frequency:
{role: "system", content: $SINK}JS object / Python dict literal with a tainted value in a field. Unblocksopenai/mistralsystem-prompt-injection rules (js+python). Fires only in sink position with a literal → FP-safe.return $TAINTEDsink. Unblocksmcp-unsanitized-return-*,directly-returned-format-string. Rejects non-bare returns.Both are sink/sanitizer-only, queried only by the JS/Python engines (other 7 carry the variants inert). Each has bridge-level fire + safe-near-miss tests (7 new tests).
Results (independently re-measured)
mode: taint (unsupported shape)The residual re-scan confirmed every remaining ≥2 bucket is forbidden (universal bare/both-metavar → FP-unsafe; multiline statement-blocks → out of scope; typed-metavar → needs type resolution). This is the last FP-safe harvest from the current shape vocabulary.
Verification (re-run on the branch)
both dogfood exit 0 ·
cargo test807+ lib, 0 failures · clippy-D warningsclean ·fmt --checkclean · baseline untouched · throwaway analysis scaffolding deleted before commit.Summary by CodeRabbit
New Features
Documentation