Skip to content

feat(taint): parameter-as-source via focus-metavariable + pattern-inside (→ 94.6%)#533

Merged
peaktwilight merged 1 commit into
mainfrom
feat/taint-param-source-focus-inside
Jun 18, 2026
Merged

feat(taint): parameter-as-source via focus-metavariable + pattern-inside (→ 94.6%)#533
peaktwilight merged 1 commit into
mainfrom
feat/taint-param-source-focus-inside

Conversation

@peaktwilight

@peaktwilight peaktwilight commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Taint: parameter-as-source (focus-metavariable + pattern-inside) → load rate 93.5% → 94.6%

Implements the largest remaining taint lever: the canonical Semgrep "function parameters are taint sources" shape, written as

pattern-sources:
  - patterns:
      - pattern-inside: |
          def $HANDLER(...): ...
      - focus-metavariable: $ARG

Previously the bridge dropped focus-metavariable/pattern-inside as unenforceable inside a taint block, emptying the source → rule rejected. Now a focus-metavariable over a function-signature pattern-inside/pattern compiles to an any-parameter wildcard source (ANY_PARAM_WILDCARD sentinel in taint_engine.rs), and each engine's seed_params seeds all parameters on that sentinel.

+24 rules → 94.6% (2028/2144), taint unsupported-shape 126 → 102. (Step-1 analysis upper-bounded 47 flippable; 24 realized — the rest are multi-blocker or sink-side. Honest.)

Precision (this is broad taint seeding — guarded carefully)

  • The sentinel is only produced for a focus-metavariable whose binding is a parameter of a function-signature pattern-inside/pattern. focus over a non-signature pattern (e.g. get_input($X)) falls through to normal extraction — covered by non_param_focus_block_is_not_treated_as_param_source.
  • Bridge-level firing + safe near-miss tests per language: JS/Python/Java handler-param→sink fires; non-parameter constant / literal does not fire.
  • Built-in foxguard rules never use the sentinel (dogfood scans clean). C# carries the source inertly (its model matches by name at use-site; ~5 C# rules now load but won't fire — disclosed, would need method-param scope tracking).

Verification (re-run on branch)

94.6% re-measured · both dogfood exit 0 · cargo test 851 lib + integration, 0 failed · clippy -D warnings clean · fmt clean · baseline + Cargo.toml untouched · COMPATIBILITY.md updated.

Summary by CodeRabbit

Release Notes

New Features

  • Added wildcard parameter source detection in taint rules, enabling rules to flag any function parameter as a taint source without explicit naming.
  • Extended support across C, Go, Java, JavaScript, Kotlin, PHP, Python, and Ruby.

Documentation

  • Updated compatibility guide with wildcard parameter-source details.
  • Regenerated registry coverage statistics (load rate: 94.6%).

…ion-signature pattern-inside)

The dominant rejected taint-source shape across the registry is "a parameter of
the enclosing handler/function is user-controlled", written as a pattern-sources
patterns: block combining a focus-metavariable (or bare pattern: $X) with a
function-signature pattern-inside/pattern that binds $X as a parameter. The
bridge previously dropped these constraints, emptying the source role and
rejecting the rule.

Recognise this shape and compile it to an any-function-parameter wildcard source
(ParamName with the ANY_PARAM_WILDCARD sentinel). Each taint engine's seed_params
honours the sentinel by seeding every function parameter as tainted; use-site
matchers compare against the literal sentinel (which no real identifier equals),
so the wildcard only broadens parameter seeding, never expression-position
matches. Wired for python/js/ts/go/java/c/kotlin/ruby/php; C# carries it inertly.

Recognition is bounded: the seed metavariable must appear inside the first
parameter list of a function-definition pattern in the same block, so an
unrelated focus metavariable is not treated as a parameter source.

registry coverage: 93.5% (2004) -> 94.6% (2028), +24 rules; taint
unsupported-shape 126 -> 102. Adds firing + clean safe-near-miss bridge tests
for JS, Python, and Java, plus an over-broad guard test.
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ab4be45a-354c-475b-b2b7-b568fd4399e2

📥 Commits

Reviewing files that changed from the base of the PR and between f882be8 and b4c765c.

📒 Files selected for processing (12)
  • COMPATIBILITY.md
  • docs/parity/registry-coverage.md
  • src/rules/c_taint.rs
  • src/rules/go_taint.rs
  • src/rules/java_taint.rs
  • src/rules/javascript_taint.rs
  • src/rules/kotlin_taint.rs
  • src/rules/php_taint.rs
  • src/rules/python_taint.rs
  • src/rules/ruby_taint.rs
  • src/rules/semgrep_taint.rs
  • src/rules/taint_engine.rs

📝 Walkthrough

Walkthrough

Adds an ANY_PARAM_WILDCARD sentinel and param_names_are_wildcard helper to taint_engine.rs. Extends the Semgrep YAML bridge (semgrep_taint.rs) to detect and compile the patterns:/focus-metavariable "parameter-as-source" shape into a wildcard ParamName matcher. Propagates wildcard seeding into all eight per-language taint engines. Updates COMPATIBILITY.md and regenerates registry coverage statistics.

Changes

Any-parameter wildcard taint source

Layer / File(s) Summary
ANY_PARAM_WILDCARD sentinel and helper
src/rules/taint_engine.rs
Adds ANY_PARAM_WILDCARD public constant and param_names_are_wildcard helper that checks whether a ParamName matcher's names list contains the sentinel; scoped to seeding-time only.
Semgrep bridge: parameter-as-source detection and compilation
src/rules/semgrep_taint.rs
In compile_entry, adds a specialized detection path for patterns: + MatcherRole::Source that collects seed metavariables and function-signature texts, applies balanced-paren and token-boundary heuristics to verify first-parameter-list membership, and emits a ParamName(ANY_PARAM_WILDCARD) matcher on success; falls back to existing subitem extraction otherwise. Includes new bridge-level tests for JS, Python, and Java (fire and near-miss cases).
Per-language wildcard seeding
src/rules/c_taint.rs, src/rules/go_taint.rs, src/rules/javascript_taint.rs, src/rules/java_taint.rs, src/rules/kotlin_taint.rs, src/rules/php_taint.rs, src/rules/python_taint.rs, src/rules/ruby_taint.rs
Extends ParamName matching in seed_param_sources / collect_param_sources in all eight language engines to also seed taint when param_names_are_wildcard returns true. Java and Kotlin additionally detect ANY_PARAM_WILDCARD inline and track it via a wildcard boolean before iterating formal parameters.
Docs and coverage update
COMPATIBILITY.md, docs/parity/registry-coverage.md
Documents the parameter-as-source shape in COMPATIBILITY.md; regenerates registry-coverage.md with updated headline load rate (93.5% → 94.6%), skip-reason histogram, per-language breakdown, and top skip-reason tables.

Sequence Diagram(s)

sequenceDiagram
  participant SemgrepYAML as Semgrep YAML Rule
  participant Bridge as semgrep_taint compile_entry
  participant Detector as detect_param_as_source helpers
  participant TaintEngine as taint_engine ANY_PARAM_WILDCARD
  participant LangEngine as Per-language seed_param_sources

  SemgrepYAML->>Bridge: patterns: block with focus-metavariable + pattern-inside signature
  Bridge->>Detector: attempt parameter-as-source recognition
  Detector->>Detector: collect $SEED metavariables and signature texts
  Detector->>Detector: verify $SEED in first parameter list (balanced-paren + token-boundary)
  alt shape recognized
    Detector-->>Bridge: ParamName { names: [ANY_PARAM_WILDCARD] }
    Bridge-->>LangEngine: emit wildcard matcher, return early
  else shape not recognized
    Detector-->>Bridge: None
    Bridge-->>LangEngine: fallback subitem extraction
  end
  LangEngine->>TaintEngine: param_names_are_wildcard(names)
  TaintEngine-->>LangEngine: true (wildcard) → seed every parameter as tainted
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • 0sec-labs/foxguard#506: Both PRs modify src/rules/semgrep_taint.rs to handle Semgrep patterns: blocks inside taint pattern-* entries; the new parameter-as-source wildcard compilation is a specialized case built on top of that patterns:-flattening behavior.

Poem

🐇 Hippity-hop, a wildcard appears,
Every parameter tainted, no need for fears!
The bridge sniffs the pattern, the paren-depth too,
Then seeds all the params — $X, $EVENT, and $YOU.
From Java to Ruby, the sentinel rings clear:
ANY_PARAM_WILDCARD hops here! 🥕

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/taint-param-source-focus-inside

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@peaktwilight peaktwilight merged commit ede13a7 into main Jun 18, 2026
17 of 18 checks passed
@peaktwilight peaktwilight deleted the feat/taint-param-source-focus-inside branch June 18, 2026 17:35
exec(cmd);
}
"#;
let tree = parse_file(src, Language::JavaScript).expect("js fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

exec(cmd);
}
"#;
let tree = parse_file(src, Language::JavaScript).expect("js fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

cmd = event
subprocess.call(cmd)
"#;
let tree = parse_file(src, Language::Python).expect("python fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

cmd = "echo hello"
subprocess.call(cmd)
"#;
let tree = parse_file(src, Language::Python).expect("python fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

/// any-parameter wildcard (guards against over-broad seeding).
#[test]
fn non_param_focus_block_is_not_treated_as_param_source() {
let v: YamlValue = serde_yaml_ng::from_str(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.unwrap() can panic at runtime — use proper error handling with ? or match

}
}
"#;
let tree = parse_file(src, Language::Java).expect("java fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

}
}
"#;
let tree = parse_file(src, Language::Java).expect("java fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant