feat: expand archive exclusions and filter git archive output#40
feat: expand archive exclusions and filter git archive output#40jonathanpopham wants to merge 4 commits intosupermodeltools:mainfrom
Conversation
Consolidate skipDirs across all 8 zip.go slices to 24 entries (was 11-14, already drifted). New exclusions: .claude, .idea, .vscode, .cache, .turbo, .nx, .pnpm-store, __snapshots__, bower_components, out. Add filterSkipDirs() post-filter on git archive output so exclusions apply to git repos too (previously skipDirs only affected the walkZip fallback). Fixes supermodeltools#31 (.claude/worktrees paths leaking into analysis). Partial supermodeltools#34 (archive exclusion rules).
WalkthroughPer-slice ZIP creation logic was removed and replaced by a new shared Changes
Sequence Diagram(s)sequenceDiagram
participant Slice as Slice package
participant Archive as internal/archive
participant Git as git
participant FS as filesystem
Slice->>Archive: CreateZip(dir)
Archive->>Git: IsGitRepo(dir)?
alt git repo and GitArchive succeeds
Archive->>Git: GitArchive(dir, dest.zip)
Archive->>Archive: FilterSkipDirs(dest.zip)
else git not used or fails
Archive->>FS: WalkZip(dir, dest.zip) -- skip rules, hidden files, size cap
end
Archive-->>Slice: dest.zip / error
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
…o files Move SkipDirs, CreateZip, WalkZip, FilterSkipDirs, GitArchive into internal/archive/ (shared kernel). Each slice's zip.go is now a thin wrapper calling archive.CreateZip(). Factory keeps its isWorktreeClean logic locally. Single source of truth for exclusions — no more drift across slices. Net -1120 lines.
Scans all Go source files in internal/ for skipDirs variable declarations. Fails if any are found outside internal/archive/archive.go. Prevents the 8-way drift that already happened once. Also adds internal/archive to the sharedKernel allowlist.
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
internal/factory/zip.go (1)
13-38: Centralize this archive filter instead of copying it eight times.This exact
skipDirsblock and the new filter helpers now live in everyinternal/*/zip.goagain. I’d still pull them into one shared internal helper so the next exclusion tweak, or bug fix in this filter, only has one place to land.Also applies to: 91-139
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/factory/zip.go` around lines 13 - 38, The repeated skipDirs map and accompanying filter helpers should be factored into a single shared helper: create a package-level function (e.g., IsArchiveExcluded(path string) bool or ShouldSkipPath(path string)) that embeds the skipDirs set and the helper logic, replace the duplicate skipDirs and helper implementations in internal/*/zip.go (including the duplicate block at lines 91-139) with calls to this new function, and update imports/usages in each zip.go to call the centralized helper so future changes live in one place.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/factory/zip.go`:
- Around line 109-128: The rewrite loop that creates a new zip via
zip.NewWriter(out) lacks the 10MB cap used elsewhere (e.g., walkZip), so very
large archived files get recompressed; before opening/copying each file in the
r.File loop (the block using zw.Create, f.Open, io.Copy), check the file size
(use f.UncompressedSize or f.UncompressedSize64) and skip any entry >
10*1024*1024 bytes in the same way shouldSkip() works — i.e., continue the loop
for oversized entries so they are not opened, copied, or recompressed.
In `@internal/focus/zip.go`:
- Around line 10-13: Reorder the import block in internal/focus/zip.go so that
standard library imports (path/filepath, strings) are grouped with the other
stdlib imports above third-party/internal imports and then run goimports to
automatically fix formatting; update the import grouping that currently lists
"github.com/supermodeltools/cli/internal/api" and
"github.com/supermodeltools/cli/internal/config" so those internal packages
follow the stdlib group, then save and run goimports (or go fmt) to ensure the
imports for the file are sorted and lint passes.
---
Nitpick comments:
In `@internal/factory/zip.go`:
- Around line 13-38: The repeated skipDirs map and accompanying filter helpers
should be factored into a single shared helper: create a package-level function
(e.g., IsArchiveExcluded(path string) bool or ShouldSkipPath(path string)) that
embeds the skipDirs set and the helper logic, replace the duplicate skipDirs and
helper implementations in internal/*/zip.go (including the duplicate block at
lines 91-139) with calls to this new function, and update imports/usages in each
zip.go to call the centralized helper so future changes live in one place.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2d02f92e-b0dc-40b2-ae6c-6639d2a0b4d1
📒 Files selected for processing (8)
internal/analyze/zip.gointernal/archdocs/zip.gointernal/blastradius/zip.gointernal/deadcode/zip.gointernal/factory/zip.gointernal/find/zip.gointernal/focus/zip.gointernal/mcp/zip.go
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/factory/zip.go (1)
53-60:⚠️ Potential issue | 🟡 MinorRemove the unused
copyFilehelper andioimport.The refactoring switched to using
archive.WalkZip, socopyFileis no longer called anywhere in this package. Dead code just adds noise and makes future changes harder, so let's clean it up by deleting the function entirely (lines 53–60) along with theioimport at the top of the file if nothing else in the file uses it.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/factory/zip.go` around lines 53 - 60, The file contains an unused helper function copyFile and an unused io import; remove the entire copyFile function definition and delete the io import at the top of the file (confirm nothing else in the file uses io, e.g., archive.WalkZip replaces its behavior) so the package has no dead code or unused imports.
🧹 Nitpick comments (1)
scripts/check-architecture/main.go (1)
160-186: Simple but effective lint for duplicate skipDirs.The string-matching approach (
strings.Contains(content, "var skipDirs")) is a bit naive — it could false-positive on comments or string literals like// removed var skipDirs. But honestly, that's probably fine for this use case. If it ever becomes a problem, you could usego/parserfor AST-level detection.One small thing: if
filepath.Walkhits an error on a directory (permissions, etc.), you're silently ignoring it. That's probably okay for a lint script, but worth knowing.💡 Optional: more precise detection with regex
If you want to avoid matching inside comments/strings without going full AST:
import "regexp" var skipDirsPattern = regexp.MustCompile(`(?m)^var\s+(s|S)kipDirs\s*=`) // Then in checkDuplicateSkipDirs: if skipDirsPattern.Match(data) { dupes = append(dupes, path) }This anchors to line start, which filters out most false positives. But again, probably overkill for your needs.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/check-architecture/main.go` around lines 160 - 186, checkDuplicateSkipDirs currently uses a naive strings.Contains scan and silently swallows directory walk errors; update the filepath.Walk callback in checkDuplicateSkipDirs to return any non-nil err instead of ignoring it and check the error returned by filepath.Walk so failures are surfaced, and replace the simple strings.Contains(content, "var skipDirs")/("var SkipDirs") check with a more precise detection (either use a regexp like skipDirsPattern := regexp.MustCompile(`(?m)^var\s+(s|S)kipDirs\s*=`) and call skipDirsPattern.Match(data) or use go/parser for AST-level detection) so you avoid false positives in comments/strings.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/archive/archive.go`:
- Around line 48-71: CreateZip currently calls GitArchive whenever
IsGitRepo(dir) is true, which skips uncommitted changes and can surprise users;
add a worktree cleanliness check (e.g., isWorktreeClean(dir)) before using
GitArchive so that CreateZip only uses GitArchive when the git worktree is
clean, otherwise fall back to WalkZip (preserving existing FilterSkipDirs
handling and error paths). Implement the helper (similar to factory's
isWorktreeClean) and update the branch in CreateZip that checks IsGitRepo to
also require isWorktreeClean(dir) before calling GitArchive.
- Around line 88-126: FilterSkipDirs currently reads the entire ZIP into memory
and copies each entry with io.Copy without limits; replace
os.ReadFile/zip.NewReader with zip.OpenReader(zipPath) to stream the archive and
avoid loading it fully, and for each file returned by zip.OpenReader (in the
same loop using ShouldSkip(f.Name)), wrap the entry reader (f.Open()) with an
io.LimitReader or use io.Copy with a limit (e.g., maxPerFile = 100*1024*1024) so
you detect and error when an entry exceeds the per-file limit; also optionally
track cumulative bytes across entries and error if a total threshold is exceeded
before writing to zw to fully mitigate decompression-bomb attacks.
---
Outside diff comments:
In `@internal/factory/zip.go`:
- Around line 53-60: The file contains an unused helper function copyFile and an
unused io import; remove the entire copyFile function definition and delete the
io import at the top of the file (confirm nothing else in the file uses io,
e.g., archive.WalkZip replaces its behavior) so the package has no dead code or
unused imports.
---
Nitpick comments:
In `@scripts/check-architecture/main.go`:
- Around line 160-186: checkDuplicateSkipDirs currently uses a naive
strings.Contains scan and silently swallows directory walk errors; update the
filepath.Walk callback in checkDuplicateSkipDirs to return any non-nil err
instead of ignoring it and check the error returned by filepath.Walk so failures
are surfaced, and replace the simple strings.Contains(content, "var
skipDirs")/("var SkipDirs") check with a more precise detection (either use a
regexp like skipDirsPattern := regexp.MustCompile(`(?m)^var\s+(s|S)kipDirs\s*=`)
and call skipDirsPattern.Match(data) or use go/parser for AST-level detection)
so you avoid false positives in comments/strings.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7e8fedc7-8644-46ff-a4dc-6eb711d24308
📒 Files selected for processing (14)
internal/analyze/zip.gointernal/analyze/zip_test.gointernal/archdocs/zip.gointernal/archive/archive.gointernal/archive/doc.gointernal/blastradius/zip.gointernal/deadcode/zip.gointernal/factory/zip.gointernal/factory/zip_test.gointernal/find/zip.gointernal/find/zip_test.gointernal/focus/zip.gointernal/mcp/zip.goscripts/check-architecture/main.go
✅ Files skipped from review due to trivial changes (2)
- internal/archive/doc.go
- internal/archdocs/zip.go
🚧 Files skipped from review as they are similar to previous changes (2)
- internal/focus/zip.go
- internal/analyze/zip.go
| func CreateZip(dir string) (string, error) { | ||
| f, err := os.CreateTemp("", "supermodel-*.zip") | ||
| if err != nil { | ||
| return "", fmt.Errorf("create temp file: %w", err) | ||
| } | ||
| dest := f.Name() | ||
| f.Close() | ||
|
|
||
| if IsGitRepo(dir) { | ||
| if err := GitArchive(dir, dest); err == nil { | ||
| if err := FilterSkipDirs(dest); err != nil { | ||
| os.Remove(dest) | ||
| return "", fmt.Errorf("filter archive: %w", err) | ||
| } | ||
| return dest, nil | ||
| } | ||
| } | ||
|
|
||
| if err := WalkZip(dir, dest); err != nil { | ||
| os.Remove(dest) | ||
| return "", err | ||
| } | ||
| return dest, nil | ||
| } |
There was a problem hiding this comment.
Missing worktree cleanliness check causes silent behavior change for dirty repos.
Hey, so here's the deal: internal/factory/zip.go has this extra guard that checks isWorktreeClean(dir) before using git archive. That matters because git archive only archives committed stuff (HEAD), so if someone has uncommitted changes, they won't see those in the archive.
Your new archive.CreateZip skips that check. This means packages like deadcode, find, and mcp (which now delegate here) will silently use git archive even on dirty worktrees — the user might be scratching their head wondering why their latest changes aren't being analyzed.
Consider adding the worktree check here, or document why it's intentionally different.
🔧 Possible fix to add worktree check
You'd need to add an isWorktreeClean helper similar to factory's:
func CreateZip(dir string) (string, error) {
f, err := os.CreateTemp("", "supermodel-*.zip")
if err != nil {
return "", fmt.Errorf("create temp file: %w", err)
}
dest := f.Name()
f.Close()
- if IsGitRepo(dir) {
+ if IsGitRepo(dir) && isWorktreeClean(dir) {
if err := GitArchive(dir, dest); err == nil {
if err := FilterSkipDirs(dest); err != nil {
os.Remove(dest)
return "", fmt.Errorf("filter archive: %w", err)
}
return dest, nil
}
}And add something like:
func isWorktreeClean(dir string) bool {
cmd := exec.Command("git", "-C", dir, "status", "--porcelain")
out, err := cmd.Output()
return err == nil && len(out) == 0
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/archive/archive.go` around lines 48 - 71, CreateZip currently calls
GitArchive whenever IsGitRepo(dir) is true, which skips uncommitted changes and
can surprise users; add a worktree cleanliness check (e.g.,
isWorktreeClean(dir)) before using GitArchive so that CreateZip only uses
GitArchive when the git worktree is clean, otherwise fall back to WalkZip
(preserving existing FilterSkipDirs handling and error paths). Implement the
helper (similar to factory's isWorktreeClean) and update the branch in CreateZip
that checks IsGitRepo to also require isWorktreeClean(dir) before calling
GitArchive.
| // FilterSkipDirs removes entries from a ZIP whose path contains a SkipDirs segment. | ||
| func FilterSkipDirs(zipPath string) error { | ||
| data, err := os.ReadFile(zipPath) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| r, err := zip.NewReader(strings.NewReader(string(data)), int64(len(data))) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| out, err := os.Create(zipPath) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| defer out.Close() | ||
|
|
||
| zw := zip.NewWriter(out) | ||
| for _, f := range r.File { | ||
| if ShouldSkip(f.Name) { | ||
| continue | ||
| } | ||
| w, err := zw.Create(f.Name) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| rc, err := f.Open() | ||
| if err != nil { | ||
| return err | ||
| } | ||
| _, err = io.Copy(w, rc) | ||
| rc.Close() | ||
| if err != nil { | ||
| return err | ||
| } | ||
| } | ||
| return zw.Close() | ||
| } |
There was a problem hiding this comment.
Fix G110: Potential decompression bomb vulnerability.
The linter is flagging line 119 because io.Copy(w, rc) copies the decompressed content without any size limit. A malicious zip entry could claim to be tiny but decompress to gigabytes, eating all your memory/disk.
Also, reading the entire zip into memory on line 90 (os.ReadFile) is a bit rough for large archives — but the decompression bomb is the more pressing concern.
🛡️ Proposed fix with size limit
+const maxUncompressedSize = 100 << 20 // 100 MB limit per entry
func FilterSkipDirs(zipPath string) error {
data, err := os.ReadFile(zipPath)
if err != nil {
return err
}
r, err := zip.NewReader(strings.NewReader(string(data)), int64(len(data)))
if err != nil {
return err
}
out, err := os.Create(zipPath)
if err != nil {
return err
}
defer out.Close()
zw := zip.NewWriter(out)
for _, f := range r.File {
if ShouldSkip(f.Name) {
continue
}
w, err := zw.Create(f.Name)
if err != nil {
return err
}
rc, err := f.Open()
if err != nil {
return err
}
- _, err = io.Copy(w, rc)
+ _, err = io.Copy(w, io.LimitReader(rc, maxUncompressedSize))
rc.Close()
if err != nil {
return err
}
}
return zw.Close()
}This caps each entry at 100MB. Adjust the limit based on what makes sense for your use case. You might also want to track total bytes and bail out if the cumulative size gets crazy.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // FilterSkipDirs removes entries from a ZIP whose path contains a SkipDirs segment. | |
| func FilterSkipDirs(zipPath string) error { | |
| data, err := os.ReadFile(zipPath) | |
| if err != nil { | |
| return err | |
| } | |
| r, err := zip.NewReader(strings.NewReader(string(data)), int64(len(data))) | |
| if err != nil { | |
| return err | |
| } | |
| out, err := os.Create(zipPath) | |
| if err != nil { | |
| return err | |
| } | |
| defer out.Close() | |
| zw := zip.NewWriter(out) | |
| for _, f := range r.File { | |
| if ShouldSkip(f.Name) { | |
| continue | |
| } | |
| w, err := zw.Create(f.Name) | |
| if err != nil { | |
| return err | |
| } | |
| rc, err := f.Open() | |
| if err != nil { | |
| return err | |
| } | |
| _, err = io.Copy(w, rc) | |
| rc.Close() | |
| if err != nil { | |
| return err | |
| } | |
| } | |
| return zw.Close() | |
| } | |
| const maxUncompressedSize = 100 << 20 // 100 MB limit per entry | |
| // FilterSkipDirs removes entries from a ZIP whose path contains a SkipDirs segment. | |
| func FilterSkipDirs(zipPath string) error { | |
| data, err := os.ReadFile(zipPath) | |
| if err != nil { | |
| return err | |
| } | |
| r, err := zip.NewReader(strings.NewReader(string(data)), int64(len(data))) | |
| if err != nil { | |
| return err | |
| } | |
| out, err := os.Create(zipPath) | |
| if err != nil { | |
| return err | |
| } | |
| defer out.Close() | |
| zw := zip.NewWriter(out) | |
| for _, f := range r.File { | |
| if ShouldSkip(f.Name) { | |
| continue | |
| } | |
| w, err := zw.Create(f.Name) | |
| if err != nil { | |
| return err | |
| } | |
| rc, err := f.Open() | |
| if err != nil { | |
| return err | |
| } | |
| _, err = io.Copy(w, io.LimitReader(rc, maxUncompressedSize)) | |
| rc.Close() | |
| if err != nil { | |
| return err | |
| } | |
| } | |
| return zw.Close() | |
| } |
🧰 Tools
🪛 GitHub Actions: Lint
[error] 119-119: golangci-lint reported G110 (gosec): Potential DoS vulnerability via decompression bomb.
🪛 GitHub Check: golangci-lint
[failure] 119-119:
G110: Potential DoS vulnerability via decompression bomb (gosec)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/archive/archive.go` around lines 88 - 126, FilterSkipDirs currently
reads the entire ZIP into memory and copies each entry with io.Copy without
limits; replace os.ReadFile/zip.NewReader with zip.OpenReader(zipPath) to stream
the archive and avoid loading it fully, and for each file returned by
zip.OpenReader (in the same loop using ShouldSkip(f.Name)), wrap the entry
reader (f.Open()) with an io.LimitReader or use io.Copy with a limit (e.g.,
maxPerFile = 100*1024*1024) so you detect and error when an entry exceeds the
per-file limit; also optionally track cumulative bytes across entries and error
if a total threshold is exceeded before writing to zw to fully mitigate
decompression-bomb attacks.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/factory/zip.go (1)
50-51:⚠️ Potential issue | 🟡 MinorFix the goimports formatting error.
The pipeline is complaining about file formatting at line 51. Usually this means there's a trailing blank line or some whitespace issue at the end of the file. Run
goimports -w internal/factory/zip.go(or justgo fmt ./...) to clean it up.🔧 Proposed fix
Make sure the file ends with exactly one newline after the closing brace on line 50 - no extra blank lines:
func isWorktreeClean(dir string) bool { out, err := exec.Command("git", "-C", dir, "status", "--porcelain").Output() //nolint:gosec // dir is user-supplied cwd return err == nil && strings.TrimSpace(string(out)) == "" } -(Remove the trailing blank line if present)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/factory/zip.go` around lines 50 - 51, The file has a formatting issue at the end (extra blank line/whitespace after the closing brace in zip.go). Open the file containing the closing brace for the package (the brace shown in the diff), remove any trailing blank lines or stray whitespace so the file ends with exactly one newline after that closing brace, and then run goimports -w internal/factory/zip.go (or go fmt ./...) to fix imports/formatting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/factory/zip.go`:
- Around line 50-51: The file has a formatting issue at the end (extra blank
line/whitespace after the closing brace in zip.go). Open the file containing the
closing brace for the package (the brace shown in the diff), remove any trailing
blank lines or stray whitespace so the file ends with exactly one newline after
that closing brace, and then run goimports -w internal/factory/zip.go (or go fmt
./...) to fix imports/formatting.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6ee6fe8f-937b-4b95-9710-8f58570366aa
📒 Files selected for processing (2)
internal/archive/archive.gointernal/factory/zip.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/archive/archive.go
Fixes #31. Partial #34.
Problem
skipDirshad already drifted across slices (11-14 entries depending on which file)skipDirsonly applied to thewalkZipfallback —git archiverepos got no filtering, so.claude/worktrees/,.idea/, etc. leaked into uploads.claude/worktrees/fix-441/prefixed pathsFix
skipDirsto 24 entries across all 8 zip.go filesfilterSkipDirs()post-filter that strips matching entries fromgit archiveoutputshouldSkip()helper that checks each path segment against the mapNew exclusions
.claude,.idea,.vscode,.cache,.turbo,.nx,.pnpm-store,__snapshots__,bower_components,outTest plan
go test ./...— all 14 packages passsupermodel audit— no more.claude/worktrees/paths in outputSummary by CodeRabbit