Skip to content
84 changes: 84 additions & 0 deletions adr/20250929-repository-directory-traversal.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# ADR: Repository Directory Traversal API

**Date**: 2025-09-29
**Status**: Accepted
**Context**: Need for standardized directory listing across Git hosting providers

## Decision

Introduce a `listDirectory(String path, int depth)` method to the `RepositoryProvider` abstraction to enable unified directory traversal across different Git hosting platforms.

## Context

Nextflow requires the ability to explore repository directory structures across multiple Git hosting providers (GitHub, GitLab, Bitbucket, Azure DevOps, Gitea) without full repository clones. Each provider has different API capabilities and constraints for directory listing operations.

## Technical Implementation

### Core Algorithm

All providers follow a consistent pattern:
1. **Path Resolution**: Normalize path to provider API format
2. **Strategy Selection**: Choose recursive vs iterative approach based on API capabilities
3. **HTTP Request**: Execute provider-specific API calls
4. **Response Processing**: Parse to standardized `RepositoryEntry` objects
5. **Depth Filtering**: Apply client-side limits when APIs lack precise depth control

### API Strategy Classification

**Strategy A: Native Recursive (GitHub, GitLab, Azure)**
- Single HTTP request with recursive parameters
- Server-side tree traversal
- Performance: O(1) API calls

**Strategy B: Iterative Traversal (Bitbucket Server, Gitea)**
- Multiple HTTP requests per directory level
- Client-side recursion management
- Performance: O(n) API calls where n = number of directories

**Strategy C: Limited Support (Bitbucket Cloud)**
- Single-level listing only
- Throws exceptions for depth > 1

### Provider Implementation Details

| Provider | Endpoint | Recursive Support | Performance |
|----------|----------|-------------------|-------------|
| GitHub | `/git/trees/{sha}?recursive=1` | Native | Optimal |
| GitLab | `/repository/tree?recursive=true` | Native | Optimal |
| Azure | `/items?recursionLevel=Full` | Native | Optimal |
| Bitbucket Server | `/browse/{path}` | Manual iteration | Multiple calls |
| Gitea | `/contents/{path}` | Manual iteration | Multiple calls |
| Bitbucket Cloud | `/src/{commit}/{path}` | None | Unsupported |

### HTTP API Constraints

- **Rate Limiting**: 60-5000 requests/hour depending on provider and authentication
- **Response Size**: Controlled by `NXF_GIT_RESPONSE_MAX_LENGTH` environment variable
- **Timeouts**: 60-second connect timeout across all providers
- **Authentication**: Required for private repositories and higher rate limits

## Consequences

### Positive
- **Unified Interface**: Consistent API across all Git hosting providers
- **Performance Optimization**: Uses native recursive APIs where available
- **Graceful Degradation**: Falls back to iterative traversal when needed
- **Error Resilience**: Handles partial failures and API limitations

### Negative
- **Provider Inconsistency**: Performance varies significantly between providers
- **API Rate Limits**: Multiple calls required for some providers may hit limits faster
- **Memory Usage**: Large directory structures loaded entirely into memory

### Neutral
- **Complexity**: Abstraction layer adds code complexity but improves maintainability
- **Testing**: Comprehensive test coverage required for each provider implementation

## Implementation Notes

- Local Git repositories use JGit TreeWalk for optimal performance
- Client-side depth filtering ensures consistent behavior across providers
- Error handling varies by provider: some return empty lists, others throw exceptions
- Future enhancements could include caching based on commit SHA and pagination support

This decision enables Nextflow to efficiently explore repository structures regardless of the underlying Git hosting platform, with automatic optimization based on each provider's API capabilities.
Original file line number Diff line number Diff line change
Expand Up @@ -214,4 +214,87 @@ final class AzureRepositoryProvider extends RepositoryProvider {
return invokeBytes(url)
}

/** {@inheritDoc} */
@Override
List<RepositoryEntry> listDirectory(String path, int depth) {
// Build the Items API URL
def normalizedPath = normalizePath(path)
// For Azure API, root directory should be represented as "/" not empty string
if (!normalizedPath) {
normalizedPath = "/"
}

def queryParams = [
'recursionLevel': depth > 1 ? 'Full' : 'OneLevel', // Use Full for depth > 1 to get nested content
"api-version": 6.0,
'$format': 'json'
] as Map<String,Object>

// Only add scopePath if it's not the root directory
if (normalizedPath != "/") {
queryParams['scopePath'] = normalizedPath
}

if (revision) {
queryParams['versionDescriptor.version'] = revision
if (COMMIT_REGEX.matcher(revision).matches()) {
queryParams['versionDescriptor.versionType'] = 'commit'
}
}

def queryString = queryParams.collect({ "$it.key=$it.value"}).join('&')
def url = "$endpointUrl/items?$queryString"

try {
Map response = invokeAndParseResponse(url)
List<Map> items = response?.value as List<Map>

if (!items) {
return []
}

List<RepositoryEntry> entries = []

for (Map item : items) {
// Skip the root directory itself
String itemPath = item.get('path') as String
if (itemPath == path || (!path && itemPath == "/")) {
continue
}

// Filter entries based on depth using base class helper
if (shouldIncludeAtDepth(itemPath, path, depth)) {
entries.add(createRepositoryEntry(item, path))
}
}

return entries.sort { it.name }

} catch (Exception e) {
// Azure Items API may have different permissions or availability than other APIs
// Return empty list to allow graceful degradation
return []
}
}

private RepositoryEntry createRepositoryEntry(Map item, String basePath) {
String itemPath = item.get('path') as String
String name = itemPath?.split('/')?.last() ?: "unknown"

// Determine type based on Azure's gitObjectType
String gitObjectType = item.get('gitObjectType') as String
EntryType type = (gitObjectType == 'tree') ? EntryType.DIRECTORY : EntryType.FILE

String sha = item.get('objectId') as String
Long size = item.get('size') as Long

return new RepositoryEntry(
name: name,
path: itemPath,
type: type,
sha: sha,
size: size
)
}

}
Original file line number Diff line number Diff line change
Expand Up @@ -193,4 +193,65 @@ final class BitbucketRepositoryProvider extends RepositoryProvider {
final url = getContentUrl(path)
return invokeBytes(url)
}

/** {@inheritDoc} */
@Override
List<RepositoryEntry> listDirectory(String path, int depth) {
final ref = revision ? getRefForRevision(revision) : getMainBranch()
// Normalize path using base class helper
final dirPath = normalizePath(path)

// Build the src API URL - BitBucket's src endpoint returns directory listings when path is a directory
String url = "${config.endpoint}/2.0/repositories/$project/src/$ref/$dirPath"

try {
// Make the API call
Map response = invokeAndParseResponse(url)
List<Map> values = response?.values as List<Map>

if (!values) {
return []
}

List<RepositoryEntry> entries = []

for (Map entry : values) {
String entryPath = entry.get('path') as String
// Filter entries based on depth using base class helper
if (shouldIncludeAtDepth(entryPath, path, depth)) {
entries.add(createRepositoryEntry(entry, path))
}
}

return entries.sort { it.name }

} catch (Exception e) {
// If API call fails, it might be because the path is not a directory
// or the API doesn't support directory listing
throw new UnsupportedOperationException("Directory listing not supported by BitBucket API for path: $path", e)
}
}

private RepositoryEntry createRepositoryEntry(Map entry, String basePath) {
String entryPath = entry.get('path') as String
String name = entryPath?.split('/')?.last() ?: entry.get('name') as String

// Determine type based on BitBucket's response
String type = entry.get('type') as String
EntryType entryType = (type == 'commit_directory') ? EntryType.DIRECTORY : EntryType.FILE

String sha = entry.get('commit')?.get('hash') as String
Long size = entry.get('size') as Long

// Ensure absolute path using base class helper
String fullPath = ensureAbsolutePath(entryPath)

return new RepositoryEntry(
name: name,
path: fullPath,
type: entryType,
sha: sha,
size: size
)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,12 @@ final class BitbucketServerRepositoryProvider extends RepositoryProvider {
return invokeBytes(url)
}

/** {@inheritDoc} */
@Override
List<RepositoryEntry> listDirectory(String path, int depth) {
throw new UnsupportedOperationException("BitbucketServerRepositoryProvider does not support 'listDirectory' operation")
}

@Override
List<TagInfo> getTags() {
final result = new ArrayList<TagInfo>()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,13 @@ package nextflow.scm

import groovy.transform.CompileDynamic
import groovy.transform.CompileStatic
import groovy.util.logging.Slf4j
/**
* Implements a repository provider for Gitea service
*
* @author Akira Sekiguchi <[email protected]>
*/
@Slf4j
@CompileStatic
final class GiteaRepositoryProvider extends RepositoryProvider {

Expand Down Expand Up @@ -113,4 +115,118 @@ final class GiteaRepositoryProvider extends RepositoryProvider {
return invokeBytes(url)
}

/** {@inheritDoc} */
@Override
List<RepositoryEntry> listDirectory(String path, int depth) {
final branch = revision ?: "master"
// Normalize path using base class helper
final dirPath = normalizePath(path)

// Build the contents API URL - Gitea follows GitHub-like API pattern
String url = "${config.endpoint}/repos/$project/contents"
if (dirPath) {
url += "/$dirPath"
}
url += "?ref=$branch"

try {
// Make the API call
def response = invoke(url)
List<Map> contents = new groovy.json.JsonSlurper().parseText(response) as List<Map>

if (!contents) {
return []
}

List<RepositoryEntry> entries = []

for (Map entry : contents) {
String entryPath = entry.get('path') as String
// Filter entries based on depth using base class helper
if (shouldIncludeAtDepth(entryPath, path, depth)) {
entries.add(createRepositoryEntry(entry))
}
}

// If depth > 1, we need to recursively get subdirectory contents
if (depth > 1) {
for (Map entry : contents) {
if (entry.get('type') == 'dir') {
String entryName = entry.get('name') as String
String subPath = dirPath ? "$dirPath/$entryName" : entryName
entries.addAll(getRecursiveEntries(subPath, depth, branch, 2))
}
}
}

return entries.sort { it.name }

} catch (Exception e) {
throw new UnsupportedOperationException("Directory listing failed for Gitea path: $path", e)
}
}

private List<RepositoryEntry> getRecursiveEntries(String basePath, int maxDepth, String branch, int currentDepth) {
if (currentDepth > maxDepth) {
return []
}

List<RepositoryEntry> allEntries = []

// Get current level entries first
final normalizedBasePath = normalizePath(basePath)
String url = "${config.endpoint}/repos/$project/contents"
if (normalizedBasePath) {
url += "/$normalizedBasePath"
}
url += "?ref=$branch"

try {
def response = invoke(url)
List<Map> contents = new groovy.json.JsonSlurper().parseText(response) as List<Map>

for (Map entry : contents) {
String entryPath = entry.get('path') as String

// Add entries from the current level that match the depth criteria
if (shouldIncludeAtDepth(entryPath, basePath, maxDepth)) {
allEntries.add(createRepositoryEntry(entry))
}

// Recurse into subdirectories if we haven't reached max depth
if (entry.get('type') == 'dir' && currentDepth < maxDepth) {
String entryName = entry.get('name') as String
String subPath = normalizedBasePath ? "$normalizedBasePath/$entryName" : entryName
allEntries.addAll(getRecursiveEntries(subPath, maxDepth, branch, currentDepth + 1))
}
}
} catch (Exception e) {
log.debug("Failed to process directory during recursive listing: ${e.message}")
// Continue processing other directories if one fails
}

return allEntries
}

private RepositoryEntry createRepositoryEntry(Map entry) {
String name = entry.get('name') as String
String path = entry.get('path') as String
String type = entry.get('type') as String

EntryType entryType = (type == 'dir') ? EntryType.DIRECTORY : EntryType.FILE
String sha = entry.get('sha') as String
Long size = entry.get('size') as Long

// Ensure absolute path using base class helper
String fullPath = ensureAbsolutePath(path)

return new RepositoryEntry(
name: name,
path: fullPath,
type: entryType,
sha: sha,
size: size
)
}

}
Loading