-
Notifications
You must be signed in to change notification settings - Fork 1
fix: mechanical security hardening — symlinks, markdown links, .dev.vars #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
8466842
6e1180d
7edac0c
0fde4bd
2eed1a8
39f1743
44c3703
469fe48
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -11,6 +11,8 @@ build/ | |
| coverage.xml | ||
| test-results.xml | ||
| .dev.vars | ||
| .env | ||
| .env.* | ||
| .claude/ | ||
| .claude-flow/ | ||
| .serena/ | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -731,6 +731,27 @@ def grep_code(pattern: str, glob: str = "*.py", context_lines: int = 2) -> str: | |
| return grep_code | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 HIGH: Defense-in-depth symlink rejection on file and directory accessConfidence: 95% A new _reject_symlinks() function checks each path component for symlinks before resolving the path in both read_file and list_files functions. This prevents symlink-based directory traversal attacks that could expose files outside the repository root, closing a critical path traversal risk. Suggestion: No further action required on the implemented symlink rejection logic. Maintain test coverage and ensure this pattern is used wherever user-supplied paths are handled. — This is how breaches start. I'm grudgingly satisfied with this explicit check. |
||
|
|
||
|
|
||
| def _reject_symlinks(path: Path, repo_root: Path) -> str | None: | ||
| """Check if any component of *path* is a symlink. | ||
|
|
||
| Defense-in-depth: rejects symlinks *before* resolve() so a symlink | ||
| inside the repo that resolves to an external target cannot bypass the | ||
| is_relative_to() guard. | ||
|
|
||
| Returns an error message string if a symlink is found, or None if safe. | ||
| """ | ||
| try: | ||
| relative = path.relative_to(repo_root) | ||
| except ValueError: | ||
| return "Error: path outside repo root." | ||
| current = repo_root | ||
| for part in relative.parts: | ||
| current = current / part | ||
| if current.is_symlink(): | ||
| return "Error: symlinks not allowed." | ||
|
Comment on lines
+750
to
+751
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Useful? React with 👍 / 👎. |
||
| return None | ||
|
|
||
|
|
||
| def _make_read_file(repo_root: Path) -> Any: | ||
| """Create a read_file tool function bound to a repo root.""" | ||
|
|
||
|
|
@@ -742,7 +763,11 @@ def read_file(path: str, start_line: int = 0, end_line: int = 0) -> str: | |
| :param end_line: last line to read (1-based, 0 = to end) | ||
| """ | ||
| target = repo_root / path | ||
| # Prevent path traversal | ||
| # Layer 1: reject symlinks before resolve() | ||
| symlink_err = _reject_symlinks(target, repo_root) | ||
| if symlink_err is not None: | ||
| return symlink_err | ||
| # Layer 2: resolve + is_relative_to (catches traversal via ..) | ||
| try: | ||
| target = target.resolve() | ||
| if not target.is_relative_to(repo_root.resolve()): | ||
|
|
@@ -795,7 +820,11 @@ def list_files(path: str = ".", glob_pattern: str = "*") -> str: | |
| :param glob_pattern: glob pattern to filter files (default "*") | ||
| """ | ||
| target = repo_root / path | ||
| # Prevent path traversal | ||
| # Layer 1: reject symlinks before resolve() | ||
| symlink_err = _reject_symlinks(target, repo_root) | ||
| if symlink_err is not None: | ||
| return symlink_err | ||
| # Layer 2: resolve + is_relative_to (catches traversal via ..) | ||
| try: | ||
| target = target.resolve() | ||
| if not target.is_relative_to(repo_root.resolve()): | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -160,6 +160,16 @@ def _sanitize_comment_text(text: str) -> str: | |
| # Strip markdown images (tracking pixels) and external links (phishing) | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM: Reference-style markdown link stripping in comment sanitizationConfidence: 90% The _sanitize_comment_text function now removes reference-style markdown links and bare autolinks, closing an attack vector where malicious links might evade detection/stripping via references or encoded forms. Suggestion: The sanitization logic is now robust for reference-style markdown links. Continue to monitor for any markdown parsing bypasses, especially around edge cases or non-standard syntax. — Input validation isn't optional. This blocks a real and subtle risk.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM: Expanded MarkDown Sanitization to Cover Reference Links and AutolinksConfidence: 95% The Suggestion: Continue to monitor for Markdown edge cases and test with community-submitted exploit attempts, since Markdown rendering has a long history of bypass tricks. Periodic reviews are advised as GitHub's parser evolves. — This should have been in from the start. At least it's here now. |
||
| text = re.sub(r"!\[[^\]]*\]\([^)]+\)", "", text) | ||
| text = re.sub(r"\[([^\]]*)\]\(https?://[^)]+\)", r"\1", text) | ||
| # Strip reference-style link definitions: [id]: url (optional title) | ||
| text = re.sub(r"^\s{0,3}\[[^\]]+\]:\s+\S[^\n]*$", "", text, flags=re.MULTILINE) | ||
| # Strip reference-style link references: [text][id] and collapsed [text][] | ||
| text = re.sub(r"\[([^\]]*)\]\[[^\]]*\]", r"\1", text) | ||
| # Defang bare URL autolinks: <https://...> → hxxps://... (not clickable on GitHub) | ||
| text = re.sub( | ||
| r"<https?://[^>]+>", | ||
| lambda m: m.group(0)[1:-1].replace("https://", "hxxps://").replace("http://", "hxxp://"), | ||
| text, | ||
| ) | ||
| # Loop unquote until stable — prevents multi-layer URL encoding bypass | ||
| # (e.g., %2561 -> %61 -> a) that could smuggle dangerous schemes past | ||
| # the regex check. | ||
|
|
@@ -174,7 +184,18 @@ def _sanitize_comment_text(text: str) -> str: | |
| # URL-encoded to bypass the earlier stripping pass. | ||
| text = re.sub(r"!\[[^\]]*\]\([^)]+\)", "", text) | ||
| text = re.sub(r"\[([^\]]*)\]\(https?://[^)]+\)", r"\1", text) | ||
| text = re.sub(r"^\s{0,3}\[[^\]]+\]:\s+\S[^\n]*$", "", text, flags=re.MULTILINE) | ||
| text = re.sub(r"\[([^\]]*)\]\[[^\]]*\]", r"\1", text) | ||
| # Defang autolinks after decode too | ||
| text = re.sub( | ||
| r"<https?://[^>]+>", | ||
| lambda m: m.group(0)[1:-1].replace("https://", "hxxps://").replace("http://", "hxxp://"), | ||
| text, | ||
| ) | ||
| text = _DANGEROUS_SCHEME_RE.sub("", text) | ||
| # Defang bare URLs that survived all prior stripping — GitHub auto-linkifies these | ||
| text = re.sub(r"https://", "hxxps://", text) | ||
| text = re.sub(r"http://", "hxxp://", text) | ||
| return text | ||
|
|
||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -22,7 +22,9 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import itertools | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import json | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import logging | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import os | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import subprocess | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import sys | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from collections import Counter | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from collections.abc import Callable | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -42,10 +44,35 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from grippy.rules import RuleResult, RuleSeverity, check_gate, load_profile, run_rules | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟠 HIGH: .dev.vars is now blocked from loading when git-trackedConfidence: 95% The main() logic now checks if .dev.vars is tracked by git using _is_git_tracked(), and refuses to load secrets from a tracked .dev.vars file. This prevents attackers from exfiltrating env values via malicious PRs. .env patterns are also added to .gitignore. Suggestion: No further action needed; guard is correct and has comprehensive test coverage. Continue to verify that new ways of loading secrets are not introduced. — Secrets and version control are a dangerous combo. This guard is overdue, but effective.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🟡 MEDIUM: Git subprocess now set with explicit cwd for robustnessConfidence: 90% The git ls-files call in _is_git_tracked now sets cwd to the file's containing directory, preventing overrides due to the parent process's unexpected working directory. This closes a possible bypass where git would ignore the relevant repo. Suggestion: Maintain the use of explicit cwd on subprocesses interacting with VCS, especially in environments with ambiguous process launch locations. — Explicit is better than implicit; subprocess cwd was a subtle gap. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from grippy.rules.enrichment import enrich_results, persist_rule_findings | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| log = logging.getLogger(__name__) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Max diff size sent to the LLM — configurable for local models with smaller context | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| MAX_DIFF_CHARS = int(os.environ.get("GRIPPY_MAX_DIFF_CHARS", "500000")) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def _is_git_tracked(path: str) -> bool: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Check if a file is tracked by git. Returns True if tracked. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Fail-open: returns False if git is unavailable or times out. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| This means a tracked .dev.vars loads in that case — acceptable | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| trade-off for local dev usability. CI environments are excluded | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| by a separate check before this function is called. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| try: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| file_path = Path(path) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| repo_dir = file_path.parent | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| # Use relative path — git ls-files expects paths relative to cwd/repo root | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| result = subprocess.run( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ["git", "ls-files", "--error-unmatch", file_path.name], | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| capture_output=True, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| timeout=5, | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| cwd=str(repo_dir), | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return result.returncode == 0 | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| except (subprocess.TimeoutExpired, FileNotFoundError): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+65
to
+72
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| result = subprocess.run( | |
| ["git", "ls-files", "--error-unmatch", path], | |
| capture_output=True, | |
| timeout=5, | |
| cwd=Path(path).parent, # ensure git operates in the file's repo | |
| ) | |
| return result.returncode == 0 | |
| except (subprocess.TimeoutExpired, FileNotFoundError): | |
| file_path = Path(path).resolve() | |
| parent_dir = file_path.parent | |
| repo_root_result = subprocess.run( | |
| ["git", "rev-parse", "--show-toplevel"], | |
| capture_output=True, | |
| text=True, | |
| timeout=5, | |
| cwd=parent_dir, | |
| ) | |
| if repo_root_result.returncode != 0: | |
| return False | |
| repo_root = Path(repo_root_result.stdout.strip()).resolve() | |
| relative_path = os.path.relpath(file_path, repo_root) | |
| result = subprocess.run( | |
| ["git", "ls-files", "--error-unmatch", relative_path], | |
| capture_output=True, | |
| timeout=5, | |
| cwd=repo_root, | |
| ) | |
| return result.returncode == 0 | |
| except (subprocess.TimeoutExpired, FileNotFoundError, ValueError): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🟡 MEDIUM: .dev.vars Environment File Loading Now Blocked if Git-Tracked
Confidence: 90%
Added `.is_git_tracked()` check before loading `.dev.vars` and `.env` patterns to `.gitignore`.
.dev.vars is only loaded if it is not tracked in git, preventing attackers from injecting sensitive configuration files via PRs and exfiltrating via CI. This is enforced with an explicit tracked file refusal and gitignore hardening.
Suggestion: Review other environment or config loading code for similar checks. Confirm that any such files are never referenced directly in CI unless verified safe.
— Catching git-tracked leaks preemptively will save headaches later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔵 LOW: Added .env patterns to .gitignore
Confidence: 80%
.env and .env.* are now ignored, preventing accidental commit of sensitive runtime or dev configuration. This is good baseline hygiene.
Suggestion: Verify that no existing .env files are tracked by git. No other action needed.
— Observable code is debuggable code. This is a minor, but important, governance win.