Skip to content

feat(generic): named-capture metavariable constraints (→ 96.4%)#538

Merged
peaktwilight merged 1 commit into
mainfrom
feat/generic-mode-metavar-constraints
Jun 19, 2026
Merged

feat(generic): named-capture metavariable constraints (→ 96.4%)#538
peaktwilight merged 1 commit into
mainfrom
feat/generic-mode-metavar-constraints

Conversation

@peaktwilight

@peaktwilight peaktwilight commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

generic-mode named-capture metavariable constraints → load rate 96.2% → 96.4%

Teaches mode: generic to (1) read patterns: AND-block arms inside pattern-either (the generic builder previously only read plain pattern/pattern-regex arms), and (2) enforce metavariable-regex/metavariable-comparison constraints against named regex capture groups — a generic-mode metavar ($X) maps to a (?P<X>...) named capture, and the constraint is evaluated against the captured text at match time (regex must match; comparison parses the captured number and evaluates).

+4 rules → 96.4% (2066/2144), generic-mode skip bucket 5 → 1. Flipped: bun-/npm-missing-minimum-release-age, uv-missing-dependency-cooldown (3, constraint-enforced), plus detected-private-key (its metavariable-analysis: entropy narrowing is dropped, matching the already-baseline detected-aws-account-id/detected-github-token behaviour; positive is the literal PEM header, low FP). use-absolute-workdir stays skipped (needs metavariable-pattern — not implemented; its only arm now explicitly refuses to load, so no FP broadening).

Precision — constraints ENFORCED, regression-checked

A constraint referencing an unknown capture refuses to load (no silent broadening); a metavariable-pattern arm that can't be enforced refuses to load. A strict-deserialization regression that dropped 2 AST rules was caught via a full before/after per-rule load diff and fixed (lenient decode of the new patterns field) → net is a clean +4, zero regressions.

Constraint-enforcement tests: min-release-age = 3 (3<7) FIRES, = 7/= 30 do NOT; exclude-newer = "3 days" FIRES, "7 days" does NOT. Plus constraint_referencing_unknown_capture_refuses_to_load, unenforceable_metavariable_pattern_arm_refuses_to_load.

Verification

96.4% re-measured · both dogfood exit 0 · cargo test 0 failed · clippy -D warnings clean · fmt clean · baseline + Cargo.toml untouched · COMPATIBILITY.md updated.

Summary by CodeRabbit

Release Notes

  • New Features

    • Generic mode now enforces metavariable-regex and metavariable-comparison constraints when referencing named regex capture groups, improving pattern matching precision.
    • Added support for nested AND-blocks within pattern-either clauses for more flexible rule composition.
  • Documentation

    • Updated compatibility documentation with expanded generic mode capabilities.
    • Refreshed registry coverage metrics with improved load rates.

…ures

Generic mode (`languages: [generic]`) now supports `pattern-either` arms that
are themselves `patterns:` AND-blocks, plus `metavariable-regex` /
`metavariable-comparison` constraints over named regex capture groups
(`(?P<NAME>...)` referenced as `$NAME`). The constraint is evaluated against the
captured group's text at match time and enforced (regex must match; comparison
parses the capture as a number), so a candidate fires only when every constraint
passes. `focus-metavariable: $NAME` narrows the reported span to the capture.

This flips the three package-manager minimum-release-age rules
(`bun-`, `npm-`, `uv-missing-...-age`) from rejected to loaded with their
constraints enforced (fires on too-low/invalid values, not on safe values).
`detected-private-key` also now loads via the existing pattern-either-arm path.

To preserve false-positive discipline, a `pattern-either` arm carrying an
unenforceable constraint (`metavariable-pattern` / `metavariable-analysis`)
refuses to load instead of silently dropping it; if it is the rule's only arm,
the rule stays skipped (so `use-absolute-workdir` and the entropy narrowing of
`detected-private-key` are not loaded broadened). Top-level `patterns:` blocks
keep the legacy load-broadened behaviour to avoid regressing rules that already
loaded that way. The new `PatternEntry.patterns` field is decoded leniently to
avoid breaking AST-bridge rules whose either-arms use unrelated nested shapes.

Registry load rate: 2062 -> 2066 (96.2% -> 96.4%); generic-mode skip bucket
5 -> 1. Adds enforcement + refusal tests; regenerates coverage doc; updates
COMPATIBILITY.md.
@coderabbitai

coderabbitai Bot commented Jun 19, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Extends Semgrep generic mode (languages: [generic]) to enforce metavariable-regex and metavariable-comparison constraints against named regex capture groups in patterns: blocks. Introduces a RegexConstraints matcher, refactors the loader to map nested patterns: blocks inside pattern-either arms with strictness semantics, and updates compatibility documentation and registry coverage metrics.

Changes

Generic-mode named-capture constraint enforcement

Layer / File(s) Summary
Data model: PatternEntry, GenericPatternsClause, GenericEitherEntry
src/rules/semgrep_compat.rs, src/rules/generic_mode.rs
PatternEntry gains an optional nested patterns: YAML field and PatternClause gains Clone; GenericPatternsClause gains metavariable_regex, metavariable_comparison, focus_metavariable, and unsupported_constraint fields; GenericEitherEntry gains a patterns: Vec<GenericPatternsClause> field for nested AND-blocks.
MvConstraint model, comparison parsing, and constraint builders
src/rules/generic_mode.rs
Introduces MvConstraint enum, constraint evaluation against a named-capture map, builders for metavariable-regex and metavariable-comparison, and parse_generic_comparison for expressions with operator detection, metavariable wrappers (int/str/float), and numeric literal parsing into f64.
RegexConstraints matcher variant and find_regex_constraints dispatch
src/rules/generic_mode.rs
Adds GenericMatcher::RegexConstraints carrying compiled regexes, a constraint list, and optional focus name; wires it into the generic match dispatch; implements find_regex_constraints to seed captures from the first regex match, require overlapping matches from additional regexes, evaluate all constraints, and choose the reported span from the focus capture or the first regex match.
build_patterns_block rework and semgrep_compat loader refactor
src/rules/generic_mode.rs, src/rules/semgrep_compat.rs
build_patterns_block rejects blocks with unsupported_constraint, aggregates constraints and focus across clauses, validates capture name existence, and builds RegexConstraints or falls back to combined/filtered semantics. build_generic_mode_rules introduces map_clause (strict/non-strict) and map_either_arm (decode and map nested patterns: YAML in strict mode).
Tests, existing test fixups, and documentation
src/rules/generic_mode.rs, COMPATIBILITY.md, docs/parity/registry-coverage.md
Existing GenericEitherEntry test literals updated with patterns: Vec::new(). New tests verify constraint enforcement, load refusal for unenforceable arms, and unknown capture rejection; includes a unit test for parse_generic_comparison. Documentation updated for new behavior and improved load rates.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

  • 0sec-labs/foxguard#516: Extends the same GenericPatternsClause, GenericEitherEntry, and build_generic_mode_rules structures in the same two files (generic_mode.rs, semgrep_compat.rs) that this PR also modifies.
  • 0sec-labs/foxguard#532: Modifies generic_mode.rs's regex compilation and matching paths — directly adjacent to the pattern-regex and named-capture matching code extended in this PR.
  • 0sec-labs/foxguard#505: Implements metavariable-analysis parsing and enforcement in semgrep_compat.rs, the same file where this PR adds unsupported_constraint strictness handling for metavariable-analysis in nested pattern-either arms.

Poem

🐇 Hop, hop through the capture groups I go,
Named regex patterns now put on a show!
(?P<NAME>...) binds the clue,
Comparisons checked — no broadening slips through.
The matcher enforces, the loader stays strict,
Each constraint a fence no false match can pick! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: adding named-capture metavariable constraint support to generic mode, with the quantitative result (96.4% load rate).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/generic-mode-metavar-constraints

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@peaktwilight peaktwilight merged commit 8c0860b into main Jun 19, 2026
18 of 19 checks passed
@peaktwilight peaktwilight deleted the feat/generic-mode-metavar-constraints branch June 19, 2026 13:17
Comment thread src/rules/generic_mode.rs
}

if positives.len() == 1 && negatives.is_empty() {
return Ok(positives.into_iter().next().expect("len==1"));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

Comment thread src/rules/generic_mode.rs
}
if positives.len() == 1 {
return Ok(GenericMatcher::Filtered {
positive: Box::new(positives.into_iter().next().expect("len==1")),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

Comment thread src/rules/generic_mode.rs
/// ignore the tree, so a Rust dummy tree is fine.
fn loader_fires(rule_yaml: &str, source: &str) -> bool {
use crate::rules::semgrep_compat::parse_semgrep_str;
let rules = parse_semgrep_str(rule_yaml, "<test>").expect("rule must load");

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.expect() can panic at runtime — use proper error handling with ? or match

Comment thread src/rules/generic_mode.rs

#[test]
fn parse_generic_comparison_handles_int_wrapper() {
let (g, op, lit, lhs) = parse_generic_comparison("int($AGE) < 604800").unwrap();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.unwrap() can panic at runtime — use proper error handling with ? or match

Comment thread src/rules/generic_mode.rs
assert_eq!(lit, 604800.0);
assert!(!lhs);
// Flipped operands.
let (g, op, lit, lhs) = parse_generic_comparison("7 >= $DAYS").unwrap();

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)

.unwrap() can panic at runtime — use proper error handling with ? or match

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant