Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions COMPATIBILITY.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ Taint rules (`mode: taint`):
- `mode: taint` for Python, JavaScript/TypeScript, Go, Java, C, and Kotlin; taint rules targeting other languages are skipped with a warning
- `pattern-sources`, `pattern-sinks`, `pattern-sanitizers` — each entry may be a single `pattern:` string, a `pattern-either:` list (nested `pattern-either:` is supported and flattens recursively), or a `patterns:` AND-block (see below)
- **`patterns:` AND-blocks inside source/sink/sanitizer entries** — foxguard extracts all `pattern:` and `pattern-either:` sub-items as expressible node-shape matchers. Sub-items that are constraint-only operators (`pattern-inside:`, `pattern-not-inside:`, `pattern-not:`, `pattern-not-regex:`, `focus-metavariable:`, `metavariable-regex:`, `metavariable-comparison:`, `metavariable-pattern:`, `metavariable-analysis:`, `metavariable-type:`) are **dropped with a per-key warning**. This makes the compiled matcher **slightly broader** than the original Semgrep rule (the AND narrowing is lost), so foxguard may report findings that Semgrep would suppress. This broadening is intentional — it is better to over-report than to silently drop the rule. A `patterns:` block that produces no expressible matcher (only constraint-only sub-items) is warn-skipped without aborting sibling entries. If all source or sink entries are warn-skipped and none survive, the whole rule is skipped.
- **Parameter-as-source shape (`focus-metavariable` + a function-signature `pattern-inside`/`pattern`)** — a `pattern-sources` `patterns:` block of the form "a metavariable `$X` that is a parameter of an enclosing function" (i.e. a `focus-metavariable: $X` or bare `pattern: $X` together with a function-definition context whose parameter list contains `$X`) is recognised and compiled to an **any-function-parameter** taint source. Every parameter of every function/method in the file is seeded as tainted (matching Semgrep's any-parameter semantics for this shape). Supported for Python, JavaScript/TypeScript, Go, Java, C, Kotlin, Ruby, and PHP; C# carries the source inertly (no parameter-scope seeding). The recognition is bounded: the seed metavariable must genuinely appear inside the first parameter list of a function-definition pattern in the same block, so an unrelated focus metavariable (e.g. `focus-metavariable: $X` over `pattern: get_input($X)`) is *not* treated as a parameter source and falls through to the normal graceful-degradation extraction.
- Supported `pattern:` shapes:
- bare identifier (`request`) — a source-only shape compiled to a parameter-name match
- dotted attribute chain (`request.data`, `request.json`) — nested chains flatten to `leftmost root + outermost field` (matches the engine's one-level attribute propagation)
Expand Down
40 changes: 20 additions & 20 deletions docs/parity/registry-coverage.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,33 +12,33 @@ Measures how well foxguard's existing Semgrep-compat YAML loader (`src/rules/sem
| Rule files scanned | 2070 |
| Files with YAML parse errors | 0 |
| Total rules | 2144 |
| Rules loaded OK | 2004 (93.5%) |
| Rules skipped | 140 (6.5%) |
| Rules loaded OK | 2028 (94.6%) |
| Rules skipped | 116 (5.4%) |

**Headline load rate: 93.5%** (2004 / 2144 rules).
**Headline load rate: 94.6%** (2028 / 2144 rules).

## Skip-reason histogram

Sorted by frequency. The reason names the operator/key that blocks the rule today.

| Skip reason | Rules | % of skipped | % of all rules |
|---|---:|---:|---:|
| `mode: taint (unsupported shape)` | 126 | 90.0% | 5.9% |
| `generic mode (languages: [generic])` | 5 | 3.6% | 0.2% |
| `taint: pattern-propagators` | 5 | 3.6% | 0.2% |
| `mode: taint (unsupported language: apex)` | 3 | 2.1% | 0.1% |
| `mode: taint (unsupported language: swift)` | 1 | 0.7% | 0.0% |
| `mode: taint (unsupported shape)` | 102 | 87.9% | 4.8% |
| `generic mode (languages: [generic])` | 5 | 4.3% | 0.2% |
| `taint: pattern-propagators` | 5 | 4.3% | 0.2% |
| `mode: taint (unsupported language: apex)` | 3 | 2.6% | 0.1% |
| `mode: taint (unsupported language: swift)` | 1 | 0.9% | 0.0% |

## Priority order — operator/feature backlog

Matcher capabilities (implementable in `semgrep_compat.rs` / `semgrep_taint.rs`) ranked by how many registry rules each would unlock. These are independent of adding new language grammars. Build top-down.

| Rank | Capability to add | Rules unlocked |
|---:|---|---:|
| 1 | `mode: taint (unsupported shape)` | 126 |
| 1 | `mode: taint (unsupported shape)` | 102 |
| 2 | `taint: pattern-propagators` | 5 |

Operator/feature gaps account for **131 rules** (6.1% of all rules). Closing the top of this list is the highest-leverage parity work that does not require a new parser.
Operator/feature gaps account for **107 rules** (5.0% of all rules). Closing the top of this list is the highest-leverage parity work that does not require a new parser.

## Priority order — missing language grammars

Expand All @@ -58,18 +58,18 @@ Language is the rule's first declared language (js/ts/jsx/tsx collapsed to `java

| Language | Total | Loaded | Skipped | Load rate |
|---|---:|---:|---:|---:|
| python | 423 | 401 | 22 | 94.8% |
| python | 423 | 402 | 21 | 95.0% |
| hcl | 359 | 359 | 0 | 100.0% |
| javascript | 243 | 208 | 35 | 85.6% |
| javascript | 243 | 221 | 22 | 90.9% |
| regex | 237 | 237 | 0 | 100.0% |
| java | 131 | 111 | 20 | 84.7% |
| java | 131 | 115 | 16 | 87.8% |
| generic | 103 | 98 | 5 | 95.1% |
| yaml | 100 | 100 | 0 | 100.0% |
| go | 97 | 84 | 13 | 86.6% |
| go | 97 | 86 | 11 | 88.7% |
| ruby | 92 | 76 | 16 | 82.6% |
| php | 63 | 49 | 14 | 77.8% |
| solidity | 50 | 49 | 1 | 98.0% |
| csharp | 48 | 38 | 10 | 79.2% |
| csharp | 48 | 42 | 6 | 87.5% |
| dockerfile | 39 | 39 | 0 | 100.0% |
| ocaml | 34 | 34 | 0 | 100.0% |
| scala | 23 | 23 | 0 | 100.0% |
Expand All @@ -90,13 +90,13 @@ Language is the rule's first declared language (js/ts/jsx/tsx collapsed to `java
## Top skip reasons per language

- **apex**: `mode: taint (unsupported language: apex)` (3)
- **csharp**: `mode: taint (unsupported shape)` (9), `taint: pattern-propagators` (1)
- **csharp**: `mode: taint (unsupported shape)` (5), `taint: pattern-propagators` (1)
- **generic**: `generic mode (languages: [generic])` (5)
- **go**: `mode: taint (unsupported shape)` (13)
- **java**: `mode: taint (unsupported shape)` (17), `taint: pattern-propagators` (3)
- **javascript**: `mode: taint (unsupported shape)` (35)
- **go**: `mode: taint (unsupported shape)` (11)
- **java**: `mode: taint (unsupported shape)` (13), `taint: pattern-propagators` (3)
- **javascript**: `mode: taint (unsupported shape)` (22)
- **php**: `mode: taint (unsupported shape)` (14)
- **python**: `mode: taint (unsupported shape)` (22)
- **python**: `mode: taint (unsupported shape)` (21)
- **ruby**: `mode: taint (unsupported shape)` (15), `taint: pattern-propagators` (1)
- **solidity**: `mode: taint (unsupported shape)` (1)
- **swift**: `mode: taint (unsupported language: swift)` (1)
Expand Down
4 changes: 3 additions & 1 deletion src/rules/c_taint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -527,7 +527,9 @@ fn collect_param_sources(
if let Some(name) = param_name {
for matcher in &spec.sources {
if let NodeMatcher::ParamName { names, description } = matcher {
if names.iter().any(|n| n == &name) {
if names.iter().any(|n| n == &name)
|| crate::rules::taint_engine::param_names_are_wildcard(names)
{
out.push(TaintSource {
var_name: Some(name.clone()),
description: description.clone(),
Expand Down
4 changes: 3 additions & 1 deletion src/rules/go_taint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -542,7 +542,9 @@ fn seed_param_sources(params: Node<'_>, source: &str, spec: &TaintSpec, state: &
let param_name = node_text(inner, source);
for matcher in &spec.sources {
if let NodeMatcher::ParamName { names, description } = matcher {
if names.iter().any(|n| n == param_name) {
if names.iter().any(|n| n == param_name)
|| crate::rules::taint_engine::param_names_are_wildcard(names)
{
let line = inner.start_position().row + 1;
state.taint(param_name.to_string(), description.clone(), line);
break;
Expand Down
8 changes: 7 additions & 1 deletion src/rules/java_taint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -253,11 +253,17 @@ fn collect_param_sources(
) {
let mut annotation_names: Vec<&str> = Vec::new();
let mut bare_names: Vec<&str> = Vec::new();
// `$`-prefixed name (`$PARAM`) is the any-parameter wildcard compiled from a
// Semgrep `pattern-inside: function(...,$ARG,...) + focus-metavariable: $ARG`
// source block: seed *every* parameter of the enclosing scope.
let mut wildcard = false;
for matcher in &spec.sources {
if let NodeMatcher::ParamName { names, .. } = matcher {
for name in names {
if let Some(rest) = name.strip_prefix('@') {
annotation_names.push(rest);
} else if name == crate::rules::taint_engine::ANY_PARAM_WILDCARD {
wildcard = true;
} else {
bare_names.push(name.as_str());
}
Expand All @@ -283,7 +289,7 @@ fn collect_param_sources(
hops: 0,
},
);
} else if bare_names.contains(&name) {
} else if bare_names.contains(&name) || wildcard {
state.taint(
name.to_string(),
TaintInfo {
Expand Down
8 changes: 6 additions & 2 deletions src/rules/javascript_taint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1125,7 +1125,9 @@ impl<'a> TaintLanguageAdapter<CrossFileInfo<'a>> for JsTaintAdapter {
let line = single.start_position().row + 1;
for matcher in &ctx.spec.sources {
if let NodeMatcher::ParamName { names, description } = matcher {
if names.iter().any(|n| n == name) {
if names.iter().any(|n| n == name)
|| crate::rules::taint_engine::param_names_are_wildcard(names)
{
state.taint(name.to_string(), description.clone(), line);
break;
}
Expand Down Expand Up @@ -1202,7 +1204,9 @@ fn seed_param_sources(params: Node<'_>, source: &str, spec: &TaintSpec, state: &

for matcher in &spec.sources {
if let NodeMatcher::ParamName { names, description } = matcher {
if names.iter().any(|n| n == param_name) {
if names.iter().any(|n| n == param_name)
|| crate::rules::taint_engine::param_names_are_wildcard(names)
{
let line = child.start_position().row + 1;
state.taint(param_name.to_string(), description.clone(), line);
break;
Expand Down
8 changes: 7 additions & 1 deletion src/rules/kotlin_taint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -447,11 +447,17 @@ fn collect_param_sources(
// Collect annotation strings and bare names from the spec once.
let mut annotation_names: Vec<&str> = Vec::new();
let mut bare_names: Vec<&str> = Vec::new();
// `$`-prefixed name (`$PARAM`) is the any-parameter wildcard: seed every
// parameter of the function (compiled from a Semgrep
// `pattern-inside: fun(...,$ARG,...) + focus-metavariable: $ARG` block).
let mut wildcard = false;
for matcher in &spec.sources {
if let NodeMatcher::ParamName { names, .. } = matcher {
for name in names {
if let Some(rest) = name.strip_prefix('@') {
annotation_names.push(rest);
} else if name == crate::rules::taint_engine::ANY_PARAM_WILDCARD {
wildcard = true;
} else {
bare_names.push(name.as_str());
}
Expand Down Expand Up @@ -494,7 +500,7 @@ fn collect_param_sources(
description: format!("@{} parameter '{}'", ann, name),
line: ch.start_position().row + 1,
});
} else if bare_names.contains(&name) {
} else if bare_names.contains(&name) || wildcard {
out.push(TaintSource {
var_name: Some(name.to_string()),
description: format!("parameter '{}'", name),
Expand Down
4 changes: 3 additions & 1 deletion src/rules/php_taint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -392,7 +392,9 @@ fn seed_param_sources(params: Node<'_>, source: &str, spec: &TaintSpec, state: &
let name = node_text(v, source);
for matcher in &spec.sources {
if let NodeMatcher::ParamName { names, description } = matcher {
if names.iter().any(|n| n == name) {
if names.iter().any(|n| n == name)
|| crate::rules::taint_engine::param_names_are_wildcard(names)
{
let line = v.start_position().row + 1;
state.taint(name.to_string(), description.clone(), line);
break;
Expand Down
4 changes: 3 additions & 1 deletion src/rules/python_taint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -465,7 +465,9 @@ fn seed_param_sources(params: Node<'_>, source: &str, spec: &TaintSpec, state: &

for matcher in &spec.sources {
if let NodeMatcher::ParamName { names, description } = matcher {
if names.iter().any(|n| n == param_name) {
if names.iter().any(|n| n == param_name)
|| crate::rules::taint_engine::param_names_are_wildcard(names)
{
let line = child.start_position().row + 1;
state.taint(param_name.to_string(), description.clone(), line);
break;
Expand Down
4 changes: 3 additions & 1 deletion src/rules/ruby_taint.rs
Original file line number Diff line number Diff line change
Expand Up @@ -367,7 +367,9 @@ fn seed_param_sources(params: Node<'_>, source: &str, spec: &TaintSpec, state: &
let param_name = node_text(child, source);
for matcher in &spec.sources {
if let NodeMatcher::ParamName { names, description } = matcher {
if names.iter().any(|n| n == param_name) {
if names.iter().any(|n| n == param_name)
|| crate::rules::taint_engine::param_names_are_wildcard(names)
{
let line = child.start_position().row + 1;
state.taint(param_name.to_string(), description.clone(), line);
break;
Expand Down
Loading