feat: add P0 semgrep security rules#8
Conversation
4 rules targeting high-risk patterns that slip past review: - no-eval: eval(), new Function(), exec() (JS/TS/Python) - unsafe-yaml: yaml.load() without SafeLoader (Python) - dangerous-html: dangerouslySetInnerHTML (WARNING, JS/TS) - fallback-secrets: process.env.SECRET || "default" (JS/TS/Python) Each rule validated against annotated fixtures using semgrep's native --test framework (# ruleid: / # ok: annotations). CI job added with pinned semgrep==1.155.0. Phase 3B of rule enforcement roadmap — output governance (catches bad code regardless of producer), not behavior shaping. Coding-Agent: claude-code Model: claude-opus-4-6
|
You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard. |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the repository's security posture by introducing a set of critical Semgrep rules. These rules are designed to automatically identify and flag common, high-risk vulnerabilities in JavaScript, TypeScript, and Python code, such as dynamic code execution, insecure handling of secrets, and unsafe data deserialization. The initiative focuses on proactive "output governance" to ensure code safety regardless of its origin, and includes a robust validation framework to maintain rule effectiveness. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Ignored Files
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable set of Semgrep rules to detect high-risk code patterns related to dynamic code execution, unsafe YAML deserialization, dangerous HTML rendering, and hardcoded fallback secrets. The rules are well-structured and accompanied by comprehensive test fixtures. My review includes a few suggestions to enhance the robustness of the secret detection regex and improve test coverage for one of the YAML parsing rules. Overall, this is an excellent contribution to improving code security.
| - pattern: process.env.$VAR || "..." | ||
| - metavariable-regex: | ||
| metavariable: $VAR | ||
| regex: ".*(SECRET|PASSWORD|CREDENTIAL|PRIVATE|AUTH|API_KEY|TOKEN).*" |
There was a problem hiding this comment.
The regex for identifying secret-like variable names is case-sensitive. This means it would miss variables like process.env.api_key or process.env.Auth_Token. To make the rule more robust, consider making the regex case-insensitive using (?i). This change should also be applied to the regex on line 19.
regex: "(?i).*(SECRET|PASSWORD|CREDENTIAL|PRIVATE|AUTH|API_KEY|TOKEN).*"| - pattern: os.environ.get($KEY, "...") | ||
| - metavariable-regex: | ||
| metavariable: $KEY | ||
| regex: ".*(SECRET|PASSWORD|CREDENTIAL|PRIVATE|AUTH|API_KEY|TOKEN).*" |
There was a problem hiding this comment.
The regex for identifying secret-like key names is case-sensitive. This means it would miss keys like 'api_key' or 'Auth_Token'. To make the rule more robust, consider making the regex case-insensitive using (?i). This change should also be applied to the regex on line 38.
regex: "(?i).*(SECRET|PASSWORD|CREDENTIAL|PRIVATE|AUTH|API_KEY|TOKEN).*"| # ok: unsafe-yaml-load | ||
| data5 = yaml.load(raw, Loader=yaml.SafeLoader) | ||
| # ok: unsafe-yaml-load | ||
| data6 = yaml.load(raw, Loader=yaml.BaseLoader) |
There was a problem hiding this comment.
The corresponding rule in .semgrep/unsafe-yaml.yaml includes a pattern-not for yaml.load(..., Loader=yaml.CSafeLoader). To ensure complete test coverage for the rule's exclusions, it would be beneficial to add a test case for this scenario.
| data6 = yaml.load(raw, Loader=yaml.BaseLoader) | |
| data6 = yaml.load(raw, Loader=yaml.BaseLoader) | |
| # ok: unsafe-yaml-load | |
| if hasattr(yaml, "CSafeLoader"): | |
| data7 = yaml.load(raw, Loader=yaml.CSafeLoader) |
- Add semgrep scan step to CI (enforces rules on repo code, not just fixture validation) - Fix dangerous-html message: explicitly states all uses require review, sanitization alone does not suppress - Add bracket-notation fixtures for fallback-secrets - Add nosemgrep suppression fixture for dangerous-html Coding-Agent: claude-code Model: claude-opus-4-6
Summary
--testframework (# ruleid:/# ok:annotations)semgrep==1.155.0.semgrep/viasemgrep --configRules
no-eval-dynamic-execeval(),new Function(),exec()unsafe-yaml-loadyaml.load()without SafeLoaderdangerous-inner-htmldangerouslySetInnerHTML(all uses)fallback-secret-{js,python}process.env.SECRET || "default"Context
Phase 3B of the rule enforcement roadmap. Phase 3A (eval harness) confirmed the text corpus adds zero safety lift on Sonnet 4.6 — base training already catches these patterns. These semgrep rules catch violations in code output, regardless of producer (human, AI, or missed review). This is output governance, not behavior shaping.
Design decisions
dangerouslySetInnerHTML+ DOMPurify is a valid pattern; flags for review, doesn't block.nosemgrepfor approved uses.API_KEY,PASSWORD,TOKEN, etc. Known limitation: can't distinguishCACHE_TOKEN(config) fromAUTH_TOKEN(secret).nosemgrepescape hatch for edge cases.tests/semgrep-validate.shcopies rules+fixtures to temp dir becausesemgrep --testskips hidden directories (.semgrep/).Test plan
bash tests/semgrep-validate.sh— 6/6 pass locallysemgrepjob passessemgrep --config .semgrep/ . --exclude='tests/fixtures/**'— clean on repo itself