-
Notifications
You must be signed in to change notification settings - Fork 742
Add directory traversal API to RepositoryProvider abstraction #6430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
9 commits
Select commit
Hold shift + click to select a range
4d8ada1
wip1
pditommaso cf8fbb1
wip2
pditommaso 25fe01b
Add directory traversal API to RepositoryProvider abstraction
pditommaso f602f03
Add ADR for repository directory traversal API [ci skip]
pditommaso fcd215e
Address review comments [ci fast]
pditommaso a650229
Refactor repository providers to use consistent path normalization
pditommaso 7b11158
Fix path consistency and depth logic in repository providers [ci fast]
pditommaso e7954a7
Fix depth filtering and resource management in repository providers […
pditommaso 8a5769c
Add more tests [ci fast]
pditommaso File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| # ADR: Repository Directory Traversal API | ||
|
|
||
| **Date**: 2025-09-29 | ||
| **Status**: Accepted | ||
| **Context**: Need for standardized directory listing across Git hosting providers | ||
|
|
||
| ## Decision | ||
|
|
||
| Introduce a `listDirectory(String path, int depth)` method to the `RepositoryProvider` abstraction to enable unified directory traversal across different Git hosting platforms. | ||
|
|
||
| ## Context | ||
|
|
||
| Nextflow requires the ability to explore repository directory structures across multiple Git hosting providers (GitHub, GitLab, Bitbucket, Azure DevOps, Gitea) without full repository clones. Each provider has different API capabilities and constraints for directory listing operations. | ||
|
|
||
| ## Technical Implementation | ||
|
|
||
| ### Core Algorithm | ||
|
|
||
| All providers follow a consistent pattern: | ||
| 1. **Path Resolution**: Normalize path to provider API format | ||
| 2. **Strategy Selection**: Choose recursive vs iterative approach based on API capabilities | ||
| 3. **HTTP Request**: Execute provider-specific API calls | ||
| 4. **Response Processing**: Parse to standardized `RepositoryEntry` objects | ||
| 5. **Depth Filtering**: Apply client-side limits when APIs lack precise depth control | ||
|
|
||
| ### API Strategy Classification | ||
|
|
||
| **Strategy A: Native Recursive (GitHub, GitLab, Azure)** | ||
| - Single HTTP request with recursive parameters | ||
| - Server-side tree traversal | ||
| - Performance: O(1) API calls | ||
|
|
||
| **Strategy B: Iterative Traversal (Bitbucket Server, Gitea)** | ||
| - Multiple HTTP requests per directory level | ||
| - Client-side recursion management | ||
| - Performance: O(n) API calls where n = number of directories | ||
|
|
||
| **Strategy C: Limited Support (Bitbucket Cloud)** | ||
| - Single-level listing only | ||
| - Throws exceptions for depth > 1 | ||
|
|
||
| ### Provider Implementation Details | ||
|
|
||
| | Provider | Endpoint | Recursive Support | Performance | | ||
| |----------|----------|-------------------|-------------| | ||
| | GitHub | `/git/trees/{sha}?recursive=1` | Native | Optimal | | ||
| | GitLab | `/repository/tree?recursive=true` | Native | Optimal | | ||
| | Azure | `/items?recursionLevel=Full` | Native | Optimal | | ||
| | Bitbucket Server | `/browse/{path}` | Manual iteration | Multiple calls | | ||
| | Gitea | `/contents/{path}` | Manual iteration | Multiple calls | | ||
| | Bitbucket Cloud | `/src/{commit}/{path}` | None | Unsupported | | ||
|
|
||
| ### HTTP API Constraints | ||
|
|
||
| - **Rate Limiting**: 60-5000 requests/hour depending on provider and authentication | ||
| - **Response Size**: Controlled by `NXF_GIT_RESPONSE_MAX_LENGTH` environment variable | ||
| - **Timeouts**: 60-second connect timeout across all providers | ||
| - **Authentication**: Required for private repositories and higher rate limits | ||
|
|
||
| ## Consequences | ||
|
|
||
| ### Positive | ||
| - **Unified Interface**: Consistent API across all Git hosting providers | ||
| - **Performance Optimization**: Uses native recursive APIs where available | ||
| - **Graceful Degradation**: Falls back to iterative traversal when needed | ||
| - **Error Resilience**: Handles partial failures and API limitations | ||
|
|
||
| ### Negative | ||
| - **Provider Inconsistency**: Performance varies significantly between providers | ||
| - **API Rate Limits**: Multiple calls required for some providers may hit limits faster | ||
| - **Memory Usage**: Large directory structures loaded entirely into memory | ||
|
|
||
| ### Neutral | ||
| - **Complexity**: Abstraction layer adds code complexity but improves maintainability | ||
| - **Testing**: Comprehensive test coverage required for each provider implementation | ||
|
|
||
| ## Implementation Notes | ||
|
|
||
| - Local Git repositories use JGit TreeWalk for optimal performance | ||
| - Client-side depth filtering ensures consistent behavior across providers | ||
| - Error handling varies by provider: some return empty lists, others throw exceptions | ||
| - Future enhancements could include caching based on commit SHA and pagination support | ||
|
|
||
| This decision enables Nextflow to efficiently explore repository structures regardless of the underlying Git hosting platform, with automatic optimization based on each provider's API capabilities. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -19,11 +19,13 @@ package nextflow.scm | |
|
|
||
| import groovy.transform.CompileDynamic | ||
| import groovy.transform.CompileStatic | ||
| import groovy.util.logging.Slf4j | ||
| /** | ||
| * Implements a repository provider for Gitea service | ||
| * | ||
| * @author Akira Sekiguchi <[email protected]> | ||
| */ | ||
| @Slf4j | ||
| @CompileStatic | ||
| final class GiteaRepositoryProvider extends RepositoryProvider { | ||
|
|
||
|
|
@@ -113,4 +115,118 @@ final class GiteaRepositoryProvider extends RepositoryProvider { | |
| return invokeBytes(url) | ||
| } | ||
|
|
||
| /** {@inheritDoc} */ | ||
| @Override | ||
| List<RepositoryEntry> listDirectory(String path, int depth) { | ||
| final branch = revision ?: "master" | ||
| // Normalize path using base class helper | ||
| final dirPath = normalizePath(path) | ||
|
|
||
| // Build the contents API URL - Gitea follows GitHub-like API pattern | ||
| String url = "${config.endpoint}/repos/$project/contents" | ||
| if (dirPath) { | ||
| url += "/$dirPath" | ||
| } | ||
| url += "?ref=$branch" | ||
|
|
||
| try { | ||
| // Make the API call | ||
| def response = invoke(url) | ||
| List<Map> contents = new groovy.json.JsonSlurper().parseText(response) as List<Map> | ||
|
|
||
| if (!contents) { | ||
| return [] | ||
| } | ||
|
|
||
| List<RepositoryEntry> entries = [] | ||
|
|
||
| for (Map entry : contents) { | ||
| String entryPath = entry.get('path') as String | ||
| // Filter entries based on depth using base class helper | ||
| if (shouldIncludeAtDepth(entryPath, path, depth)) { | ||
| entries.add(createRepositoryEntry(entry)) | ||
| } | ||
| } | ||
|
|
||
| // If depth > 1, we need to recursively get subdirectory contents | ||
| if (depth > 1) { | ||
| for (Map entry : contents) { | ||
| if (entry.get('type') == 'dir') { | ||
| String entryName = entry.get('name') as String | ||
| String subPath = dirPath ? "$dirPath/$entryName" : entryName | ||
| entries.addAll(getRecursiveEntries(subPath, depth, branch, 2)) | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return entries.sort { it.name } | ||
|
|
||
| } catch (Exception e) { | ||
| throw new UnsupportedOperationException("Directory listing failed for Gitea path: $path", e) | ||
| } | ||
| } | ||
|
|
||
| private List<RepositoryEntry> getRecursiveEntries(String basePath, int maxDepth, String branch, int currentDepth) { | ||
| if (currentDepth > maxDepth) { | ||
| return [] | ||
| } | ||
|
|
||
| List<RepositoryEntry> allEntries = [] | ||
|
|
||
| // Get current level entries first | ||
| final normalizedBasePath = normalizePath(basePath) | ||
| String url = "${config.endpoint}/repos/$project/contents" | ||
| if (normalizedBasePath) { | ||
| url += "/$normalizedBasePath" | ||
| } | ||
| url += "?ref=$branch" | ||
|
|
||
| try { | ||
| def response = invoke(url) | ||
| List<Map> contents = new groovy.json.JsonSlurper().parseText(response) as List<Map> | ||
|
|
||
| for (Map entry : contents) { | ||
| String entryPath = entry.get('path') as String | ||
|
|
||
| // Add entries from the current level that match the depth criteria | ||
| if (shouldIncludeAtDepth(entryPath, basePath, maxDepth)) { | ||
| allEntries.add(createRepositoryEntry(entry)) | ||
| } | ||
|
|
||
| // Recurse into subdirectories if we haven't reached max depth | ||
| if (entry.get('type') == 'dir' && currentDepth < maxDepth) { | ||
| String entryName = entry.get('name') as String | ||
| String subPath = normalizedBasePath ? "$normalizedBasePath/$entryName" : entryName | ||
| allEntries.addAll(getRecursiveEntries(subPath, maxDepth, branch, currentDepth + 1)) | ||
| } | ||
| } | ||
| } catch (Exception e) { | ||
| log.debug("Failed to process directory during recursive listing: ${e.message}") | ||
| // Continue processing other directories if one fails | ||
| } | ||
|
|
||
| return allEntries | ||
| } | ||
pditommaso marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| private RepositoryEntry createRepositoryEntry(Map entry) { | ||
| String name = entry.get('name') as String | ||
| String path = entry.get('path') as String | ||
| String type = entry.get('type') as String | ||
|
|
||
| EntryType entryType = (type == 'dir') ? EntryType.DIRECTORY : EntryType.FILE | ||
| String sha = entry.get('sha') as String | ||
| Long size = entry.get('size') as Long | ||
|
|
||
| // Ensure absolute path using base class helper | ||
| String fullPath = ensureAbsolutePath(path) | ||
|
|
||
| return new RepositoryEntry( | ||
| name: name, | ||
| path: fullPath, | ||
| type: entryType, | ||
| sha: sha, | ||
| size: size | ||
| ) | ||
| } | ||
|
|
||
| } | ||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.