Skip to content

feat(taint): object-literal + return-value sinks (→ 92.0%)#529

Merged
peaktwilight merged 1 commit into
mainfrom
feat/taint-objliteral-return
Jun 18, 2026
Merged

feat(taint): object-literal + return-value sinks (→ 92.0%)#529
peaktwilight merged 1 commit into
mainfrom
feat/taint-objliteral-return

Conversation

@peaktwilight

@peaktwilight peaktwilight commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Taint bridge: object-literal + return-value sinks → load rate 91.7% → 92.0%

Two more FP-safe taint sink shapes, found by enumerating the 131 rejected rules and grouping by rules-flippable-per-shape (distinct rule ids whose other role already compiles) rather than pattern frequency:

  • ObjectLiteralValue (+4){role: "system", content: $SINK} JS object / Python dict literal with a tainted value in a field. Unblocks openai/mistral system-prompt-injection rules (js+python). Fires only in sink position with a literal → FP-safe.
  • ReturnValue (+2)return $TAINTED sink. Unblocks mcp-unsanitized-return-*, directly-returned-format-string. Rejects non-bare returns.

Both are sink/sanitizer-only, queried only by the JS/Python engines (other 7 carry the variants inert). Each has bridge-level fire + safe-near-miss tests (7 new tests).

Results (independently re-measured)

before after
Load rate 91.7% (1966) 92.0% (1972)
mode: taint (unsupported shape) 131 125

The residual re-scan confirmed every remaining ≥2 bucket is forbidden (universal bare/both-metavar → FP-unsafe; multiline statement-blocks → out of scope; typed-metavar → needs type resolution). This is the last FP-safe harvest from the current shape vocabulary.

Verification (re-run on the branch)

both dogfood exit 0 · cargo test 807+ lib, 0 failures · clippy -D warnings clean · fmt --check clean · baseline untouched · throwaway analysis scaffolding deleted before commit.

Summary by CodeRabbit

  • New Features

    • Added detection of tainted values in object/dictionary literals and return statements across all supported languages, extending taint analysis capabilities.
  • Documentation

    • Updated registry coverage metrics showing improved detection rates, particularly for Python (94.8%) and JavaScript (85.6%).

Harvest two bounded, FP-safe taint sink shapes that flip registry rules
currently rejected as `mode: taint (unsupported shape)`:

- ObjectLiteralValue: a sink that is an object/dict literal construction
  whose value position carries a tainted value, e.g.
  `{role: "system", content: $SINK}` (JS) / `{"role": "system", ...}` (Py).
  Flips the openai/mistral system-prompt-injection rules (+4: 2 JS, 2 Py).
- ReturnValue: a `return $METAVAR` sink that fires when a return statement
  returns a tainted value (bounded to return position, not a universal
  bare-metavar sink). Flips mcp-unsanitized-return-python and
  directly-returned-format-string (+2 Py).

Both are sink/sanitizer-only and matched only by the JS/Python engines;
other engines carry the variants but never query them. Bridge-level
fires + safe-near-miss tests added for each.

Registry load rate 91.7% -> 92.0% (1966 -> 1972 / 2144); unsupported-shape
taint rules 131 -> 125.
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: a0b5dd90-be80-4065-9c15-33f88f126141

📥 Commits

Reviewing files that changed from the base of the PR and between 2ab1b22 and c49d596.

📒 Files selected for processing (9)
  • docs/parity/registry-coverage.md
  • src/rules/csharp_taint.rs
  • src/rules/go_taint.rs
  • src/rules/javascript_taint.rs
  • src/rules/php_taint.rs
  • src/rules/python_taint.rs
  • src/rules/ruby_taint.rs
  • src/rules/semgrep_taint.rs
  • src/rules/taint_engine.rs

📝 Walkthrough

Walkthrough

Two new taint sink matcher variants, ObjectLiteralValue and ReturnValue, are added to the shared NodeMatcher IR and helper utilities in taint_engine.rs. The Semgrep YAML bridge gains GenericMatcher variants and pattern parsers for the new shapes, with per-language conversion wired for all nine supported languages. Python and JavaScript taint engines add active sink handlers that emit findings when tainted values appear in dict/object literals or return statements. All language engines exclude the new variants from source matching. Registry-coverage docs are updated to reflect the improved load rate.

Changes

New ObjectLiteralValue and ReturnValue taint sink matchers

Layer / File(s) Summary
Core NodeMatcher IR: new variants, helpers, and fingerprinting
src/rules/taint_engine.rs
NodeMatcher gains ObjectLiteralValue and ReturnValue enum variants; description(), match_object_literal_sink, match_return_value_sink, and matcher_fingerprint are all extended for the new variants.
Semgrep bridge: GenericMatcher variants, pattern compilation, per-language wiring
src/rules/semgrep_taint.rs
GenericMatcher adds the two new sink variants; compile_pattern recognizes brace-literal and return $METAVAR sink shapes via new helper parsers; all nine to_*_matcher functions are extended to translate the new variants to language-specific NodeMatchers.
Python and JavaScript: new sink dispatch and handlers
src/rules/python_taint.rs, src/rules/javascript_taint.rs
Python dispatches dictionary and return_statement AST nodes to new sink handlers that scan children for tainted expressions and emit a single finding. JavaScript dispatches object AST nodes to a new handle_object_literal_sink that iterates pair entries.
Sink-only classification in all language match_source functions
src/rules/python_taint.rs, src/rules/javascript_taint.rs, src/rules/go_taint.rs, src/rules/php_taint.rs, src/rules/ruby_taint.rs, src/rules/csharp_taint.rs
Every language engine's match_source dispatch groups ObjectLiteralValue and ReturnValue with BinopFormat as sink-only no-ops; C#'s matcher_matches_call also returns false for the new variants.
Integration tests and registry-coverage docs
src/rules/semgrep_taint.rs, docs/parity/registry-coverage.md
End-to-end tests cover ObjectLiteralValue (Python dict, JS object) and ReturnValue (Python tainted return) firing and non-firing cases, plus unit assertions for parse_return_metavar. Registry-coverage doc is regenerated at 92.0% load rate with improved Python/JavaScript stats.

Sequence Diagram(s)

sequenceDiagram
    participant SemgrepBridge as semgrep_taint (compile_pattern)
    participant GenericMatcher
    participant LangConverter as to_python/js/go/..._matcher
    participant NodeMatcher
    participant LangEngine as python/javascript_taint (AST walk)
    participant TaintEngine as taint_engine (match_*_sink)

    SemgrepBridge->>GenericMatcher: emit ObjectLiteralValue / ReturnValue (sink role only)
    GenericMatcher->>LangConverter: convert per-language
    LangConverter->>NodeMatcher: NodeMatcher::ObjectLiteralValue / ReturnValue
    LangEngine->>LangEngine: dispatch dictionary / return_statement / object AST node
    LangEngine->>TaintEngine: match_object_literal_sink(spec, sink_to_rules)
    TaintEngine-->>LangEngine: MatchedSink
    LangEngine->>LangEngine: expression_taint on pair values / returned expr
    LangEngine-->>LangEngine: emit TaintFinding on whole node
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • 0sec-labs/foxguard#287: Modifies the same batched sink-matching and attribution plumbing in taint_engine.rs that this PR extends with match_object_literal_sink and match_return_value_sink.
  • 0sec-labs/foxguard#497: Extends the same semgrep_taint.rs sink-pattern compiler and per-language matcher conversion paths used by this PR to add the new ObjectLiteralValue/ReturnValue shapes.
  • 0sec-labs/foxguard#515: Touches csharp_taint.rs's matcher classification logic in the same classify_source_expr and matcher_matches_call functions updated by this PR.

Poem

🐇 Hopping through the AST with glee,
Found a dict value — tainted, you see!
return $X now triggers the chase,
Sink matchers wired across every language base.
The load rate climbs to ninety-two,
Six languages covered, findings brand new! ✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/taint-objliteral-return

Warning

Tools execution failed with the following error:

Failed to run tools: Ping-pong health check failed


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@peaktwilight peaktwilight merged commit 9f91a79 into main Jun 18, 2026
17 of 18 checks passed
@peaktwilight peaktwilight deleted the feat/taint-objliteral-return branch June 18, 2026 15:33
msg = {"role": "system", "content": user}
return msg
"#;
let tree = parse_file(src, Language::Python).expect("python fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

msg = {"role": "system", "content": "you are a helpful assistant"}
return msg
"#;
let tree = parse_file(src, Language::Python).expect("python fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

return msg;
}
"#;
let tree = parse_file(src, Language::JavaScript).expect("js fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

return msg;
}
"#;
let tree = parse_file(src, Language::JavaScript).expect("js fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

data = requests.get("http://api")
return data
"#;
let tree = parse_file(src, Language::Python).expect("python fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

data = requests.get("http://api")
return "ok"
"#;
let tree = parse_file(src, Language::Python).expect("python fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant