Skip to content

Conversation

@dmartinol
Copy link
Collaborator

@dmartinol dmartinol commented Oct 20, 2025

Fixes #2250

Root cause

  • /tmp is not writable and the clone fails
  • Also, even after mounting an emptyDir to the deployment, the clone requires large memory that may exceed the limits causing the controller restart
  • Clone was performed twice because the FetchRegistry also called CurrentHash which runs another clone

Proposed solution

- Using sparse checkout to reduce the local repo size:
~ - Note: this also copies the other files and the subfloders from the registry data folder~

  • Use in-memory FS (thanks @JAORMX and @jhrozek ) for the suggestion
  • Do not clone again the repo to compute the hash
  • Added GC options in deployment to expedite the execution of the Go Garbage Collector

Alternatives

  • If the memory usage remains too high, we can use a direct HTTP raw fetch of the file with no storage and no git protocol
    • The raw URL depends on the Git provider and requires custom implementation. Some providers or versions may not be supported.
  • Connect a PVC with some storage area
    • Caching and reuse of existing local repos would improve performance (git fetch instead of full clone)

@codecov
Copy link

codecov bot commented Oct 20, 2025

Codecov Report

❌ Patch coverage is 71.00000% with 58 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.59%. Comparing base (bba3ffc) to head (3a9f74e).

Files with missing lines Patch % Lines
cmd/thv-operator/pkg/git/fs.go 49.05% 45 Missing and 9 partials ⚠️
cmd/thv-operator/pkg/sources/git.go 92.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2263      +/-   ##
==========================================
+ Coverage   54.51%   54.59%   +0.08%     
==========================================
  Files         228      229       +1     
  Lines       30120    30237     +117     
==========================================
+ Hits        16420    16508      +88     
- Misses      12520    12547      +27     
- Partials     1180     1182       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR optimizes Git repository operations by implementing sparse checkout to reduce memory usage and improve performance. The root cause was that /tmp was not writable in Kubernetes deployments and clones required excessive memory, causing controller restarts. Additionally, the repository was being cloned twice unnecessarily.

Key changes:

  • Replaced multi-step clone operations with a single FetchFileSparse method using sparse checkout
  • Added configurable workspace directory via WORKSPACE_DIR environment variable
  • Mounted emptyDir volume at /workspace in Kubernetes deployments

Reviewed Changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
deploy/charts/operator/templates/deployment.yaml Added WORKSPACE_DIR environment variable and mounted workspace volume
cmd/thv-operator/pkg/sources/git.go Refactored to use sparse checkout and eliminated duplicate cloning
cmd/thv-operator/pkg/sources/git_test.go Updated tests to use new FetchFileSparse API
cmd/thv-operator/pkg/git/client.go Implemented sparse checkout with path traversal protection
cmd/thv-operator/pkg/git/integration_test.go Updated integration tests for new API with improved test coverage
cmd/thv-operator/pkg/git/client_test.go Added comprehensive unit tests including security tests
cmd/thv-operator/pkg/git/e2e_test.go Added E2E tests for real repository operations
cmd/thv-operator/pkg/git/doc.go Updated documentation to reflect sparse checkout implementation
cmd/thv-operator/pkg/git/commit_test.go Removed obsolete commit-specific tests

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@JAORMX
Copy link
Collaborator

JAORMX commented Oct 21, 2025

@claude can you review this with a focus on security?

@claude
Copy link
Contributor

claude bot commented Oct 21, 2025

Claude encountered an error —— View job

Failed with exit code 128

I'll analyze this and get back to you.

@dmartinol dmartinol changed the title fix: fix: Cannot deploy MCPRegistry from git in OpenShift Oct 21, 2025
}

// Force garbage collection to release memory
runtime.GC()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm this kinda triggers my spider sense..

At this point, wouldn't it be better to use github.com/go-git/go-billy/v5/memfs and github.com/go-git/go-git/v5/storage/memory ? This would allow us to avoid using the /workspace mount, would clean the clone automatically with normal GC, avoiding the manual GC invocation etc.

Claude says this migth do the trick (unverified):

  const (
      MaxMemFSSize    = 50 * 1024 * 1024  // 50MB
      MaxCloneTimeout = 2 * time.Minute
  )

  // FetchFileSparse with full DoS protection
  func (c *DefaultGitClient) FetchFileSparse(ctx context.Context, config *CloneConfig, filePath string) ([]byte, error) {
      logger := log.FromContext(ctx)

      // 1. Validate URL (prevent malicious URLs)
      if err := validateGitURL(config.URL); err != nil {
          return nil, fmt.Errorf("invalid git URL: %w", err)
      }

      // 2. Enforce timeout
      _, hasDeadline := ctx.Deadline()
      if !hasDeadline {
          var cancel context.CancelFunc
          ctx, cancel = context.WithTimeout(ctx, MaxCloneTimeout)
          defer cancel()
      }

      // 3. Size-limited memfs
      fs := newSizedMemFS(MaxMemFSSize)
      storer := memory.NewStorage()

      // 4. Minimal clone options
      cloneOptions := &git.CloneOptions{
          URL:          config.URL,
          Depth:        1,              // Shallow
          SingleBranch: true,           // One branch only
          Tags:         git.NoTags,     // No tags
          NoCheckout:   true,           // Don't checkout yet
      }

      if config.Branch != "" {
          cloneOptions.ReferenceName = plumbing.NewBranchReferenceName(config.Branch)
      } else if config.Tag != "" {
          cloneOptions.ReferenceName = plumbing.NewTagReferenceName(config.Tag)
      } else if config.Commit != "" {
          // REJECT commit-based clones (too dangerous)
          return nil, fmt.Errorf("commit-based cloning disabled for security reasons; use branch or tag")
      }

      // 5. Clone with all protections
      repo, err := git.CloneContext(ctx, storer, fs, cloneOptions)
      if err != nil {
          if strings.Contains(err.Error(), "size limit exceeded") {
              return nil, fmt.Errorf("repository exceeds 50MB limit")
          }
          return nil, fmt.Errorf("clone failed: %w", err)
      }

      // 6. Sparse checkout and read file
      if err := c.performSparseCheckout(repo, filePath); err != nil {
          return nil, err
      }

      content, err := c.readFileFromMemFS(fs, filePath)
      if err != nil {
          return nil, err
      }

      logger.Info("Fetch completed",
          "url", config.URL,
          "file", filePath,
          "size_kb", len(content)/1024,
          "memfs_size_kb", fs.CurrentSize()/1024)

      return content, nil
  }

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot to update the thread: I'm working on this review, as proposed by @JAORMX on Discord.
And not using the sparse clone which seems to make things more complex and may not be fully supported by all git providers. Moving the PR to draft until it is validated.
Thanks!

@dmartinol dmartinol marked this pull request as draft October 22, 2025 06:26
…client

Using configurable workspacedir to checkout the repo

Signed-off-by: Daniele Martinoli <[email protected]>
Signed-off-by: Daniele Martinoli <[email protected]>
- more logging
- increased memory size

Signed-off-by: Daniele Martinoli <[email protected]>
…y FS

- added env variables to expedite the GC and avoid out of memory errors

Signed-off-by: Daniele Martinoli <[email protected]>
@dmartinol dmartinol marked this pull request as ready for review October 22, 2025 09:20
Copy link
Contributor

@jhrozek jhrozek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Much better! A couple of findings inline

- name: TOOLHIVE_REGISTRY_API_IMAGE
value: "{{ .Values.registryAPI.image }}"
- name: WORKSPACE_DIR
value: "/workspace"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not needed anymore and a dead YAML code now, right?

storerCache := cache.NewObjectLRUDefault()
storer := filesystem.NewStorage(storerFs, storerCache)

repo, err := git.Clone(storer, memFS, cloneOptions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We went from git.PlainCloneContext to git.Clone (no context), could we use git.CloneContext ?

repoInfo.Repository = nil

// 5. Force GC to reclaim memory
runtime.GC()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still don't think nudging the GC is necessary.

{{- toYaml .Values.operator.ports | nindent 12 }}
env:
- name: GOMEMLIMIT
value: 300MiB # Explicit limit (update as needed: keep it close to memory limits)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is bigger than the default limits (but within openshift limits)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Cannot deploy MCPRegistry from git in OpenShift

3 participants