feat(grammars): apex/html/xml/dart/clojure search grammars (→ 93.4%)#531
Conversation
Wire five new search-mode tree-sitter grammars into the Semgrep-compat loader so registry rules previously skipped as `unsupported language` now load: - apex (tree-sitter-sfapex 3.0.0, apex::LANGUAGE) - clojure (tree-sitter-clojure-orchard 0.2.5) - html (tree-sitter-html 0.23) - xml (tree-sitter-xml 0.7, LANGUAGE_XML) - dart (tree-sitter-dart 0.2) All six anchors wired per language (Language enum + Display, parser match arm, map_language, detect_language + comment markers, registry_coverage language_supported, generic/regex ALL_LANGUAGES fan-out, gen_rules_ts exhaustive matches, Cargo dep). Also teach the AST pattern unwrapper about clojure-orchard's `source` top-level node so `pattern:` rules match Clojure forms. Registry load rate: 92.4% (1982) -> 93.4% (2002). search-mode apex/clojure/html/xml/dart unsupported-language buckets all hit 0; remaining apex skips (3) are taint-mode only. Per-grammar tests: parser fixture parse + a pattern/pattern-regex search rule that loads and matches.
📝 WalkthroughWalkthroughAdds end-to-end support for five new languages—Apex, Clojure, HTML, XML, and Dart—by introducing the corresponding tree-sitter crate dependencies, ChangesLanguage Expansion: Apex, Clojure, HTML, XML, Dart
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/parity/registry-coverage.md`:
- Around line 49-53: The section heading in the registry-coverage.md file is
labeled as "missing language grammars" but the table entries (generic mode,
mode: taint with unsupported languages) represent mode and language support gaps
rather than strictly grammar gaps. Retitle or reword the section heading to
accurately reflect that it covers mode and language support gaps, not just
grammar gaps, so the content aligns with the heading and prevents
misrepresentation during prioritization.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 5105f49a-66e7-46d2-a3f3-a01eeb961772
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (9)
Cargo.tomldocs/parity/registry-coverage.mdsrc/bin/gen_rules_ts.rssrc/bin/registry_coverage.rssrc/engine/parser.rssrc/engine/scanner.rssrc/lib.rssrc/rules/generic_mode.rssrc/rules/semgrep_compat.rs
| | 1 | `generic mode (languages: [generic])` | 7 | | ||
| | 2 | `mode: taint (unsupported language: apex)` | 3 | | ||
| | 3 | `mode: taint (unsupported language: swift)` | 1 | | ||
|
|
||
| Missing-grammar gaps account for **31 rules** (1.4% of all rules). | ||
| Missing-grammar gaps account for **11 rules** (0.5% of all rules). |
There was a problem hiding this comment.
Section labeling is now misleading for these entries.
This table is under “missing language grammars,” but the listed reasons are generic-mode and taint-mode support gaps, not strictly grammar gaps. Please retitle/reword this section so prioritization isn’t skewed.
Suggested doc tweak
-## Priority order — missing language grammars
+## Priority order — remaining language-capability gaps
-Missing-grammar gaps account for **11 rules** (0.5% of all rules).
+These language-capability gaps account for **11 rules** (0.5% of all rules).🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/parity/registry-coverage.md` around lines 49 - 53, The section heading
in the registry-coverage.md file is labeled as "missing language grammars" but
the table entries (generic mode, mode: taint with unsupported languages)
represent mode and language support gaps rather than strictly grammar gaps.
Retitle or reword the section heading to accurately reflect that it covers mode
and language support gaps, not just grammar gaps, so the content aligns with the
heading and prevents misrepresentation during prioritization.
| fn parses_apex_without_errors() { | ||
| let source = | ||
| "public class Hello {\n public void greet() {\n String x = 'hi';\n }\n}\n"; | ||
| let tree = parse_path(source, Language::Apex, Path::new("Hello.cls")) |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| #[test] | ||
| fn parses_clojure_without_errors() { | ||
| let source = "(defn greet [name]\n (println \"hello\" name))\n"; | ||
| let tree = parse_path(source, Language::Clojure, Path::new("core.clj")) |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| #[test] | ||
| fn parses_html_without_errors() { | ||
| let source = "<html>\n <body>\n <a href=\"/x\">link</a>\n </body>\n</html>\n"; | ||
| let tree = parse_path(source, Language::Html, Path::new("index.html")) |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| #[test] | ||
| fn parses_xml_without_errors() { | ||
| let source = "<?xml version=\"1.0\"?>\n<root>\n <child id=\"1\">text</child>\n</root>\n"; | ||
| let tree = parse_path(source, Language::Xml, Path::new("data.xml")) |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| #[test] | ||
| fn parses_dart_without_errors() { | ||
| let source = "void main() {\n print('hello');\n}\n"; | ||
| let tree = parse_path(source, Language::Dart, Path::new("main.dart")) |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| severity: WARNING | ||
| languages: [apex] | ||
| "#; | ||
| let rules = parse_semgrep_str(yaml, "apex.yml") |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| assert_eq!(rules.len(), 1, "expected one rule for apex language"); | ||
|
|
||
| let source = "public class A {\n void f() {\n System.debug('x');\n }\n}\n"; | ||
| let tree = parse_path(source, Language::Apex, Path::new("A.cls")).expect("Apex must parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| severity: WARNING | ||
| languages: [clojure] | ||
| "#; | ||
| let rules = parse_semgrep_str(yaml, "clojure.yml") |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| assert_eq!(rules.len(), 1, "expected one rule for clojure language"); | ||
|
|
||
| let source = "(defn f [x]\n (eval x))\n"; | ||
| let tree = parse_path(source, Language::Clojure, Path::new("core.clj")) |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| severity: WARNING | ||
| languages: [html] | ||
| "#; | ||
| let rules = parse_semgrep_str(yaml, "html.yml").expect("html language rule must load"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| let source = | ||
| "<html>\n <body>\n <button onclick=\"go()\">x</button>\n </body>\n</html>\n"; | ||
| let tree = | ||
| parse_path(source, Language::Html, Path::new("index.html")).expect("HTML must parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| severity: WARNING | ||
| languages: [xml] | ||
| "#; | ||
| let rules = parse_semgrep_str(yaml, "xml.yml").expect("xml language rule must load"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| let source = | ||
| "<?xml version=\"1.0\"?>\n<!DOCTYPE root>\n<root>\n <child>text</child>\n</root>\n"; | ||
| let tree = | ||
| parse_path(source, Language::Xml, Path::new("data.xml")).expect("XML must parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
| severity: WARNING | ||
| languages: [dart] | ||
| "#; | ||
| let rules = parse_semgrep_str(yaml, "dart.yml").expect("dart language rule must load"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
|
|
||
| let source = "void main() {\n print('hello');\n}\n"; | ||
| let tree = | ||
| parse_path(source, Language::Dart, Path::new("main.dart")).expect("Dart must parse"); |
There was a problem hiding this comment.
foxguard · MEDIUM · rs/no-unwrap-in-lib (CWE-248)
.expect() can panic at runtime — use proper error handling with ? or match
New tree-sitter grammars: apex, html, xml, dart, clojure → load rate 92.4% → 93.4%
Adds five search-mode grammars via the established 6-anchor pattern (Language enum + Display, parser arm, semgrep_compat map_language, scanner detect_language, registry_coverage, Cargo.toml, generic_mode/gen_rules_ts exhaustive matches). All on tree-sitter 0.25 — no downgrades or shims.
tree-sitter-html@0.23tree-sitter-xml@0.7tree-sitter-dart@0.2tree-sitter-clojure-orchard@0.2.5tree-sitter-sfapex@3.0.0tree-sitter-apex@1.0.0was rejected (links tree-sitter ~0.20, ABI-incompatible); sfapex uses the modernapex::LANGUAGEconstResults (independently re-measured)
+20 rules. The search-mode
unsupported languagebuckets for all five → 0. Onlymode: taint (unsupported language: apex)×3 remains (no apex taint engine — out of scope).Verification (re-run on the branch)
registry_coverage→ 93.4% ✓cargo test842+ lib, 0 failed · clippy-D warningsclean ·fmt --checkclean · baseline untouched ✓pattern:/pattern-regex:search rule LOADS and matches viaparse_semgrep_str.sourcetop node so Clojurepattern:rules descend correctly (distinctive node name, full suite green).Summary by CodeRabbit
New Features
Improvements