diff --git a/audit-cli/README.md b/audit-cli/README.md index 8b2cb50..591ef5d 100644 --- a/audit-cli/README.md +++ b/audit-cli/README.md @@ -58,7 +58,8 @@ audit-cli ├── search # Search through extracted content or source files │ └── find-string ├── analyze # Analyze RST file structures -│ └── includes +│ ├── includes +│ └── file-references └── compare # Compare files across versions └── file-contents ``` @@ -224,6 +225,8 @@ With `-v` flag, also shows: Analyze `include` directive relationships in RST files to understand file dependencies. +This command recursively follows `.. include::` directives to show all files that are referenced from a starting file. This helps you understand which content is transcluded into a page. + **Use Cases:** This command helps writers: @@ -231,6 +234,7 @@ This command helps writers: - Identify circular include dependencies (files included multiple times) - Document file relationships for maintenance - Plan refactoring of complex include structures +- See what content is actually pulled into a page **Basic Usage:** @@ -282,6 +286,235 @@ times (e.g., file A includes file C, and file B also includes file C), the file However, the tree view will show it in all locations where it appears, with subsequent occurrences marked as circular includes in verbose mode. +**Note on Toctree:** + +This command does **not** follow `.. toctree::` entries. Toctree entries are navigation links to other pages, not content +that's transcluded into the page. If you need to find which files reference a target file through toctree entries, use +the `analyze file-references` command with the `--include-toctree` flag. + +#### `analyze file-references` + +Find all files that reference a target file through RST directives. This performs reverse dependency analysis, showing which files reference the target file through `include`, `literalinclude`, `io-code-block`, or `toctree` directives. + +The command searches all RST files (`.rst` and `.txt` extensions) and YAML files (`.yaml` and `.yml` extensions) in the source directory tree. YAML files are included because extract and release files contain RST directives within their content blocks. + +**Use Cases:** + +By default, this command searches for content inclusion directives (include, literalinclude, +io-code-block) that transclude content into pages. Use `--include-toctree` to also search +for toctree entries, which are navigation links rather than content transclusion. + +This command helps writers: +- Understand the impact of changes to a file (what pages will be affected) +- Find all usages of an include file across the documentation +- Track where code examples are referenced +- Plan refactoring by understanding file dependencies + +**Basic Usage:** + +```bash +# Find what references an include file (content inclusion only) +./audit-cli analyze file-references path/to/includes/fact.rst + +# Find what references a code example +./audit-cli analyze file-references path/to/code-examples/example.js + +# Include toctree references (navigation links) +./audit-cli analyze file-references path/to/file.rst --include-toctree + +# Get JSON output for automation +./audit-cli analyze file-references path/to/file.rst --format json + +# Show detailed information with line numbers +./audit-cli analyze file-references path/to/file.rst --verbose +``` + +**Flags:** + +- `--format ` - Output format: `text` (default) or `json` +- `-v, --verbose` - Show detailed information including line numbers and reference paths +- `-c, --count-only` - Only show the count of references (useful for quick checks and scripting) +- `--paths-only` - Only show the file paths, one per line (useful for piping to other commands) +- `--summary` - Only show summary statistics (total files and references by type, without file list) +- `-t, --directive-type ` - Filter by directive type: `include`, `literalinclude`, `io-code-block`, or `toctree` +- `--include-toctree` - Include toctree entries (navigation links) in addition to content inclusion directives +- `--exclude ` - Exclude paths matching this glob pattern (e.g., `*/archive/*` or `*/deprecated/*`) + +**Understanding the Counts:** + +The command shows two metrics: +- **Total Files**: Number of unique files that reference the target (deduplicated) +- **Total References**: Total number of directive occurrences (includes duplicates) + +When a file includes the target multiple times, it counts as: +- 1 file (in Total Files) +- Multiple references (in Total References) + +This helps identify both the impact scope (how many files) and duplicate includes (when references > files). + +**Supported Directive Types:** + +By default, the command tracks content inclusion directives: + +1. **`.. include::`** - RST content includes (transcluded) + ```rst + .. include:: /includes/intro.rst + ``` + +2. **`.. literalinclude::`** - Code file references (transcluded) + ```rst + .. literalinclude:: /code-examples/example.py + :language: python + ``` + +3. **`.. io-code-block::`** - Input/output examples with file arguments (transcluded) + ```rst + .. io-code-block:: + + .. input:: /code-examples/query.js + :language: javascript + + .. output:: /code-examples/result.json + :language: json + ``` + +With `--include-toctree`, also tracks: + +4. **`.. toctree::`** - Table of contents entries (navigation links, not transcluded) + ```rst + .. toctree:: + :maxdepth: 2 + + intro + getting-started + ``` + +**Note:** Only file-based references are tracked. Inline content (e.g., `.. input::` with `:language:` but no file path) is not tracked since it doesn't reference external files. + +**Output Formats:** + +**Text** (default): +``` +============================================================ +REFERENCE ANALYSIS +============================================================ +Target File: /path/to/includes/intro.rst +Total Files: 3 +Total References: 4 +============================================================ + +include : 3 files, 4 references + + 1. [include] duplicate-include-test.rst (2 references) + 2. [include] include-test.rst + 3. [include] page.rst + +``` + +**Text with --verbose:** +``` +============================================================ +REFERENCE ANALYSIS +============================================================ +Target File: /path/to/includes/intro.rst +Total Files: 3 +Total References: 4 +============================================================ + +include : 3 files, 4 references + + 1. [include] duplicate-include-test.rst (2 references) + Line 6: /includes/intro.rst + Line 13: /includes/intro.rst + 2. [include] include-test.rst + Line 6: /includes/intro.rst + 3. [include] page.rst + Line 12: /includes/intro.rst + +``` + +**JSON** (--format json): +```json +{ + "target_file": "/path/to/includes/intro.rst", + "source_dir": "/path/to/source", + "total_files": 3, + "total_references": 4, + "referencing_files": [ + { + "file_path": "/path/to/duplicate-include-test.rst", + "directive_type": "include", + "reference_path": "/includes/intro.rst", + "line_number": 6 + }, + { + "file_path": "/path/to/duplicate-include-test.rst", + "directive_type": "include", + "reference_path": "/includes/intro.rst", + "line_number": 13 + }, + { + "file_path": "/path/to/include-test.rst", + "directive_type": "include", + "reference_path": "/includes/intro.rst", + "line_number": 6 + } + ] +} +``` + +**Examples:** + +```bash +# Check if an include file is being used +./audit-cli analyze file-references ~/docs/source/includes/fact-atlas.rst + +# Find all pages that use a specific code example +./audit-cli analyze file-references ~/docs/source/code-examples/connect.py + +# Get machine-readable output for scripting +./audit-cli analyze file-references ~/docs/source/includes/fact.rst --format json | jq '.total_references' + +# See exactly where a file is referenced (with line numbers) +./audit-cli analyze file-references ~/docs/source/includes/intro.rst --verbose + +# Quick check: just show the count +./audit-cli analyze file-references ~/docs/source/includes/fact.rst --count-only +# Output: 5 + +# Show summary statistics only +./audit-cli analyze file-references ~/docs/source/includes/fact.rst --summary +# Output: +# Total Files: 3 +# Total References: 5 +# +# By Type: +# include : 3 files, 5 references + +# Get list of files for piping to other commands +./audit-cli analyze file-references ~/docs/source/includes/fact.rst --paths-only +# Output: +# page1.rst +# page2.rst +# page3.rst + +# Filter to only show include directives (not literalinclude or io-code-block) +./audit-cli analyze file-references ~/docs/source/includes/fact.rst --directive-type include + +# Filter to only show literalinclude references +./audit-cli analyze file-references ~/docs/source/code-examples/example.py --directive-type literalinclude + +# Combine filters: count only literalinclude references +./audit-cli analyze file-references ~/docs/source/code-examples/example.py -t literalinclude -c + +# Combine filters: list files that use this as an io-code-block +./audit-cli analyze file-references ~/docs/source/code-examples/query.js -t io-code-block --paths-only + +# Exclude archived or deprecated files from search +./audit-cli analyze file-references ~/docs/source/includes/fact.rst --exclude "*/archive/*" +./audit-cli analyze file-references ~/docs/source/includes/fact.rst --exclude "*/deprecated/*" +``` + ### Compare Commands #### `compare file-contents` @@ -469,9 +702,15 @@ audit-cli/ │ │ └── report.go # Report generation │ ├── analyze/ # Analyze parent command │ │ ├── analyze.go # Parent command definition -│ │ └── includes/ # Includes analysis subcommand -│ │ ├── includes.go # Command logic -│ │ ├── analyzer.go # Include tree building +│ │ ├── includes/ # Includes analysis subcommand +│ │ │ ├── includes.go # Command logic +│ │ │ ├── analyzer.go # Include tree building +│ │ │ ├── output.go # Output formatting +│ │ │ └── types.go # Type definitions +│ │ └── file-references/ # File-references analysis subcommand +│ │ ├── file-references.go # Command logic +│ │ ├── file-references_test.go # Tests +│ │ ├── analyzer.go # Reference finding logic │ │ ├── output.go # Output formatting │ │ └── types.go # Type definitions │ └── compare/ # Compare parent command @@ -485,6 +724,12 @@ audit-cli/ │ ├── types.go # Type definitions │ └── version_resolver.go # Version path resolution ├── internal/ # Internal packages +│ ├── pathresolver/ # Path resolution utilities +│ │ ├── pathresolver.go # Core path resolution +│ │ ├── pathresolver_test.go # Tests +│ │ ├── source_finder.go # Source directory detection +│ │ ├── version_resolver.go # Version path resolution +│ │ └── types.go # Type definitions │ └── rst/ # RST parsing utilities │ ├── parser.go # Generic parsing with includes │ ├── include_resolver.go # Include directive resolution @@ -1075,6 +1320,23 @@ used as the base for resolving relative include paths. ## Internal Packages +### `internal/pathresolver` + +Provides centralized path resolution utilities for working with MongoDB documentation structure: + +- **Source directory detection** - Finds the documentation root by walking up the directory tree +- **Project info detection** - Identifies product directory, version, and whether a project is versioned +- **Version path resolution** - Resolves file paths across multiple documentation versions +- **Relative path resolution** - Resolves paths relative to the source directory + +**Key Functions:** +- `FindSourceDirectory(filePath string)` - Finds the source directory for a given file +- `DetectProjectInfo(filePath string)` - Detects project structure information +- `ResolveVersionPaths(referenceFile, productDir string, versions []string)` - Resolves paths across versions +- `ResolveRelativeToSource(sourceDir, relativePath string)` - Resolves relative paths + +See the code in `internal/pathresolver/` for implementation details. + ### `internal/rst` Provides reusable utilities for parsing and processing RST files: @@ -1091,44 +1353,44 @@ See the code in `internal/rst/` for implementation details. The tool normalizes language identifiers to standard file extensions: -| Input | Normalized | Extension | -|-------|-----------|-----------| -| `bash` | `bash` | `.sh` | -| `c` | `c` | `.c` | -| `c++` | `cpp` | `.cpp` | -| `c#` | `csharp` | `.cs` | -| `console` | `console` | `.sh` | -| `cpp` | `cpp` | `.cpp` | -| `cs` | `csharp` | `.cs` | -| `csharp` | `csharp` | `.cs` | -| `go` | `go` | `.go` | -| `golang` | `go` | `.go` | -| `java` | `java` | `.java` | -| `javascript` | `javascript` | `.js` | -| `js` | `javascript` | `.js` | -| `kotlin` | `kotlin` | `.kt` | -| `kt` | `kotlin` | `.kt` | -| `php` | `php` | `.php` | -| `powershell` | `powershell` | `.ps1` | -| `ps1` | `powershell` | `.ps1` | -| `ps5` | `ps5` | `.ps1` | -| `py` | `python` | `.py` | -| `python` | `python` | `.py` | -| `rb` | `ruby` | `.rb` | -| `rs` | `rust` | `.rs` | -| `ruby` | `ruby` | `.rb` | -| `rust` | `rust` | `.rs` | -| `scala` | `scala` | `.scala` | -| `sh` | `shell` | `.sh` | -| `shell` | `shell` | `.sh` | -| `swift` | `swift` | `.swift` | -| `text` | `text` | `.txt` | -| `ts` | `typescript` | `.ts` | -| `txt` | `text` | `.txt` | -| `typescript` | `typescript` | `.ts` | -| (empty string) | `undefined` | `.txt` | -| `none` | `undefined` | `.txt` | -| (unknown) | (unchanged) | `.txt` | +| Input | Normalized | Extension | +|----------------|--------------|-----------| +| `bash` | `bash` | `.sh` | +| `c` | `c` | `.c` | +| `c++` | `cpp` | `.cpp` | +| `c#` | `csharp` | `.cs` | +| `console` | `console` | `.sh` | +| `cpp` | `cpp` | `.cpp` | +| `cs` | `csharp` | `.cs` | +| `csharp` | `csharp` | `.cs` | +| `go` | `go` | `.go` | +| `golang` | `go` | `.go` | +| `java` | `java` | `.java` | +| `javascript` | `javascript` | `.js` | +| `js` | `javascript` | `.js` | +| `kotlin` | `kotlin` | `.kt` | +| `kt` | `kotlin` | `.kt` | +| `php` | `php` | `.php` | +| `powershell` | `powershell` | `.ps1` | +| `ps1` | `powershell` | `.ps1` | +| `ps5` | `ps5` | `.ps1` | +| `py` | `python` | `.py` | +| `python` | `python` | `.py` | +| `rb` | `ruby` | `.rb` | +| `rs` | `rust` | `.rs` | +| `ruby` | `ruby` | `.rb` | +| `rust` | `rust` | `.rs` | +| `scala` | `scala` | `.scala` | +| `sh` | `shell` | `.sh` | +| `shell` | `shell` | `.sh` | +| `swift` | `swift` | `.swift` | +| `text` | `text` | `.txt` | +| `ts` | `typescript` | `.ts` | +| `txt` | `text` | `.txt` | +| `typescript` | `typescript` | `.ts` | +| (empty string) | `undefined` | `.txt` | +| `none` | `undefined` | `.txt` | +| (unknown) | (unchanged) | `.txt` | **Notes:** - Language identifiers are case-insensitive diff --git a/audit-cli/commands/analyze/analyze.go b/audit-cli/commands/analyze/analyze.go index dd4f9a8..4d7d63e 100644 --- a/audit-cli/commands/analyze/analyze.go +++ b/audit-cli/commands/analyze/analyze.go @@ -3,12 +3,14 @@ // This package serves as the parent command for various analysis operations. // Currently supports: // - includes: Analyze include directive relationships in RST files +// - file-references: Find all files that reference a target file // // Future subcommands could include analyzing cross-references, broken links, or content metrics. package analyze import ( "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/includes" + filereferences "github.com/mongodb/code-example-tooling/audit-cli/commands/analyze/file-references" "github.com/spf13/cobra" ) @@ -22,12 +24,16 @@ func NewAnalyzeCommand() *cobra.Command { Short: "Analyze reStructuredText file structures", Long: `Analyze various aspects of reStructuredText files and their relationships. -Currently supports analyzing include directive relationships to understand file dependencies. +Currently supports: + - includes: Analyze include directive relationships (forward dependencies) + - file-references: Find all files that reference a target file (reverse dependencies) + Future subcommands may support analyzing cross-references, broken links, or content metrics.`, } // Add subcommands cmd.AddCommand(includes.NewIncludesCommand()) + cmd.AddCommand(filereferences.NewFileReferencesCommand()) return cmd } diff --git a/audit-cli/commands/analyze/file-references/analyzer.go b/audit-cli/commands/analyze/file-references/analyzer.go new file mode 100644 index 0000000..b586a0c --- /dev/null +++ b/audit-cli/commands/analyze/file-references/analyzer.go @@ -0,0 +1,435 @@ +package filereferences + +import ( + "bufio" + "fmt" + "os" + "path/filepath" + "sort" + "strings" + + "github.com/mongodb/code-example-tooling/audit-cli/internal/pathresolver" + "github.com/mongodb/code-example-tooling/audit-cli/internal/rst" +) + +// AnalyzeReferences finds all files that reference the target file. +// +// This function searches through all RST files (.rst, .txt) and YAML files (.yaml, .yml) +// in the source directory to find files that reference the target file using include, +// literalinclude, or io-code-block directives. YAML files are included because extract +// and release files contain RST directives within their content blocks. +// +// By default, only content inclusion directives are searched. Set includeToctree to true +// to also search for toctree entries (navigation links). +// +// Parameters: +// - targetFile: Absolute path to the file to analyze +// - includeToctree: If true, include toctree entries in the search +// - verbose: If true, show progress information +// - excludePattern: Glob pattern for paths to exclude (empty string means no exclusion) +// +// Returns: +// - *ReferenceAnalysis: The analysis results +// - error: Any error encountered during analysis +func AnalyzeReferences(targetFile string, includeToctree bool, verbose bool, excludePattern string) (*ReferenceAnalysis, error) { + // Check if target file exists + if _, err := os.Stat(targetFile); os.IsNotExist(err) { + return nil, fmt.Errorf("target file does not exist: %s\n\nPlease check:\n - The file path is correct\n - The file hasn't been moved or deleted\n - You have permission to access the file", targetFile) + } + + // Get absolute path + absTargetFile, err := filepath.Abs(targetFile) + if err != nil { + return nil, fmt.Errorf("failed to get absolute path: %w", err) + } + + // Find the source directory + sourceDir, err := pathresolver.FindSourceDirectory(absTargetFile) + if err != nil { + return nil, fmt.Errorf("failed to find source directory: %w\n\nThe source directory is detected by looking for a 'source' directory in the file's path.\nMake sure the target file is within a documentation repository with a 'source' directory.", err) + } + + // Initialize analysis result + analysis := &ReferenceAnalysis{ + TargetFile: absTargetFile, + SourceDir: sourceDir, + ReferencingFiles: []FileReference{}, + } + + // Track if we found any RST/YAML files + foundAnyFiles := false + filesProcessed := 0 + + // Show progress message if verbose + if verbose { + fmt.Fprintf(os.Stderr, "Scanning for references in %s...\n", sourceDir) + } + + // Walk through all RST and YAML files in the source directory + err = filepath.Walk(sourceDir, func(path string, info os.FileInfo, err error) error { + if err != nil { + return err + } + + // Skip directories + if info.IsDir() { + return nil + } + + // Only process RST files (.rst, .txt) and YAML files (.yaml, .yml) + // YAML files may contain RST directives in extract/release content blocks + ext := filepath.Ext(path) + if ext != ".rst" && ext != ".txt" && ext != ".yaml" && ext != ".yml" { + return nil + } + + // Check if path should be excluded + if excludePattern != "" { + matched, err := filepath.Match(excludePattern, path) + if err != nil { + fmt.Fprintf(os.Stderr, "Warning: invalid exclude pattern: %v\n", err) + } else if matched { + // Skip this file + return nil + } + } + + // Mark that we found at least one file + foundAnyFiles = true + filesProcessed++ + + // Show progress every 100 files if verbose + if verbose && filesProcessed%100 == 0 { + fmt.Fprintf(os.Stderr, "Processed %d files...\n", filesProcessed) + } + + // Search for references in this file + refs, err := findReferencesInFile(path, absTargetFile, sourceDir, includeToctree) + if err != nil { + // Log error but continue processing other files + fmt.Fprintf(os.Stderr, "Warning: failed to process %s: %v\n", path, err) + return nil + } + + // Add any found references + analysis.ReferencingFiles = append(analysis.ReferencingFiles, refs...) + + return nil + }) + + if err != nil { + return nil, fmt.Errorf("failed to walk source directory: %w", err) + } + + // Check if we found any RST/YAML files + if !foundAnyFiles { + return nil, fmt.Errorf("no RST or YAML files found in source directory: %s\n\nThis might not be a documentation repository.\nExpected to find files with extensions: .rst, .txt, .yaml, .yml", sourceDir) + } + + // Show completion message if verbose + if verbose { + fmt.Fprintf(os.Stderr, "Scan complete. Processed %d files.\n", filesProcessed) + } + + // Update total counts + analysis.TotalReferences = len(analysis.ReferencingFiles) + analysis.TotalFiles = countUniqueFiles(analysis.ReferencingFiles) + + return analysis, nil +} + +// findReferencesInFile searches a single file for references to the target file. +// +// This function scans through the file line by line looking for include, +// literalinclude, and io-code-block directives that reference the target file. +// If includeToctree is true, also searches for toctree entries. +// +// Parameters: +// - filePath: Path to the file to search +// - targetFile: Absolute path to the target file +// - sourceDir: Source directory (for resolving relative paths) +// - includeToctree: If true, include toctree entries in the search +// +// Returns: +// - []FileReference: List of references found in this file +// - error: Any error encountered during processing +func findReferencesInFile(filePath, targetFile, sourceDir string, includeToctree bool) ([]FileReference, error) { + file, err := os.Open(filePath) + if err != nil { + return nil, err + } + defer file.Close() + + var references []FileReference + scanner := bufio.NewScanner(file) + lineNum := 0 + inIOCodeBlock := false + ioCodeBlockStartLine := 0 + inToctree := false + toctreeStartLine := 0 + + for scanner.Scan() { + lineNum++ + line := scanner.Text() + trimmedLine := strings.TrimSpace(line) + + // Check for toctree start (only if includeToctree is enabled) + if includeToctree && rst.ToctreeDirectiveRegex.MatchString(trimmedLine) { + inToctree = true + toctreeStartLine = lineNum + continue + } + + // Check for io-code-block start + if rst.IOCodeBlockDirectiveRegex.MatchString(trimmedLine) { + inIOCodeBlock = true + ioCodeBlockStartLine = lineNum + continue + } + + // Check if we're exiting toctree (unindented line that's not empty and not an option) + if inToctree && len(line) > 0 && line[0] != ' ' && line[0] != '\t' { + inToctree = false + } + + // Check if we're exiting io-code-block (unindented line that's not empty) + if inIOCodeBlock && len(line) > 0 && line[0] != ' ' && line[0] != '\t' { + inIOCodeBlock = false + } + + // Check for include directive + if matches := rst.IncludeDirectiveRegex.FindStringSubmatch(trimmedLine); matches != nil { + refPath := strings.TrimSpace(matches[1]) + if referencesTarget(refPath, targetFile, sourceDir, filePath) { + references = append(references, FileReference{ + FilePath: filePath, + DirectiveType: "include", + ReferencePath: refPath, + LineNumber: lineNum, + }) + } + continue + } + + // Check for literalinclude directive + if matches := rst.LiteralIncludeDirectiveRegex.FindStringSubmatch(trimmedLine); matches != nil { + refPath := strings.TrimSpace(matches[1]) + if referencesTarget(refPath, targetFile, sourceDir, filePath) { + references = append(references, FileReference{ + FilePath: filePath, + DirectiveType: "literalinclude", + ReferencePath: refPath, + LineNumber: lineNum, + }) + } + continue + } + + // Check for input/output directives within io-code-block + if inIOCodeBlock { + // Check for input directive + if matches := rst.InputDirectiveRegex.FindStringSubmatch(trimmedLine); matches != nil { + refPath := strings.TrimSpace(matches[1]) + if referencesTarget(refPath, targetFile, sourceDir, filePath) { + references = append(references, FileReference{ + FilePath: filePath, + DirectiveType: "io-code-block", + ReferencePath: refPath, + LineNumber: ioCodeBlockStartLine, + }) + } + continue + } + + // Check for output directive + if matches := rst.OutputDirectiveRegex.FindStringSubmatch(trimmedLine); matches != nil { + refPath := strings.TrimSpace(matches[1]) + if referencesTarget(refPath, targetFile, sourceDir, filePath) { + references = append(references, FileReference{ + FilePath: filePath, + DirectiveType: "io-code-block", + ReferencePath: refPath, + LineNumber: ioCodeBlockStartLine, + }) + } + continue + } + } + + // Check for toctree entries (indented document names) + if inToctree { + // Skip empty lines and option lines (starting with :) + if trimmedLine == "" || strings.HasPrefix(trimmedLine, ":") { + continue + } + + // This is a document name in the toctree + // Document names can be relative or absolute (starting with /) + docName := trimmedLine + if referencesToctreeTarget(docName, targetFile, sourceDir, filePath) { + references = append(references, FileReference{ + FilePath: filePath, + DirectiveType: "toctree", + ReferencePath: docName, + LineNumber: toctreeStartLine, + }) + } + } + } + + if err := scanner.Err(); err != nil { + return nil, err + } + + return references, nil +} + +// referencesTarget checks if a reference path points to the target file. +// +// This function resolves the reference path and compares it to the target file. +// +// Parameters: +// - refPath: The path from the directive (e.g., "/includes/file.rst") +// - targetFile: Absolute path to the target file +// - sourceDir: Source directory (for resolving relative paths) +// - currentFile: Path to the file containing the reference +// +// Returns: +// - bool: true if the reference points to the target file +func referencesTarget(refPath, targetFile, sourceDir, currentFile string) bool { + // Resolve the reference path + var resolvedPath string + + if strings.HasPrefix(refPath, "/") { + // Absolute path (relative to source directory) + resolvedPath = filepath.Join(sourceDir, refPath) + } else { + // Relative path (relative to current file) + currentDir := filepath.Dir(currentFile) + resolvedPath = filepath.Join(currentDir, refPath) + } + + // Clean and get absolute path + resolvedPath = filepath.Clean(resolvedPath) + absResolvedPath, err := filepath.Abs(resolvedPath) + if err != nil { + return false + } + + // Compare with target file + return absResolvedPath == targetFile +} + +// referencesToctreeTarget checks if a toctree document name points to the target file. +// +// This function uses the shared rst.ResolveToctreePath to resolve the document name +// and then compares it to the target file. +// +// Parameters: +// - docName: The document name from the toctree (e.g., "intro" or "/includes/intro") +// - targetFile: Absolute path to the target file +// - sourceDir: Source directory (for resolving relative paths) +// - currentFile: Path to the file containing the toctree +// +// Returns: +// - bool: true if the document name points to the target file +func referencesToctreeTarget(docName, targetFile, sourceDir, currentFile string) bool { + // Use the shared toctree path resolution from rst package + resolvedPath, err := rst.ResolveToctreePath(currentFile, docName) + if err != nil { + // If we can't resolve it, it doesn't match + return false + } + + // Compare with target file + return resolvedPath == targetFile +} + +// FilterByDirectiveType filters the analysis results to only include references +// of the specified directive type. +// +// Parameters: +// - analysis: The original analysis results +// - directiveType: The directive type to filter by (include, literalinclude, io-code-block) +// +// Returns: +// - *ReferenceAnalysis: A new analysis with filtered results +func FilterByDirectiveType(analysis *ReferenceAnalysis, directiveType string) *ReferenceAnalysis { + filtered := &ReferenceAnalysis{ + TargetFile: analysis.TargetFile, + SourceDir: analysis.SourceDir, + ReferencingFiles: []FileReference{}, + ReferenceTree: analysis.ReferenceTree, + } + + // Filter references + for _, ref := range analysis.ReferencingFiles { + if ref.DirectiveType == directiveType { + filtered.ReferencingFiles = append(filtered.ReferencingFiles, ref) + } + } + + // Update counts + filtered.TotalReferences = len(filtered.ReferencingFiles) + filtered.TotalFiles = countUniqueFiles(filtered.ReferencingFiles) + + return filtered +} + +// countUniqueFiles counts the number of unique files in the reference list. +// +// Parameters: +// - refs: List of file references +// +// Returns: +// - int: Number of unique files +func countUniqueFiles(refs []FileReference) int { + uniqueFiles := make(map[string]bool) + for _, ref := range refs { + uniqueFiles[ref.FilePath] = true + } + return len(uniqueFiles) +} + +// GroupReferencesByFile groups references by file path and directive type. +// +// This function takes a flat list of references and groups them by file, +// counting how many times each file references the target. +// +// Parameters: +// - refs: List of file references +// +// Returns: +// - []GroupedFileReference: List of grouped references, sorted by file path +func GroupReferencesByFile(refs []FileReference) []GroupedFileReference { + // Group by file path and directive type + type groupKey struct { + filePath string + directiveType string + } + groups := make(map[groupKey][]FileReference) + + for _, ref := range refs { + key := groupKey{ref.FilePath, ref.DirectiveType} + groups[key] = append(groups[key], ref) + } + + // Convert to slice + var grouped []GroupedFileReference + for key, refs := range groups { + grouped = append(grouped, GroupedFileReference{ + FilePath: key.filePath, + DirectiveType: key.directiveType, + References: refs, + Count: len(refs), + }) + } + + // Sort by file path for consistent output + sort.Slice(grouped, func(i, j int) bool { + return grouped[i].FilePath < grouped[j].FilePath + }) + + return grouped +} + diff --git a/audit-cli/commands/analyze/file-references/file_references.go b/audit-cli/commands/analyze/file-references/file_references.go new file mode 100644 index 0000000..f00a37e --- /dev/null +++ b/audit-cli/commands/analyze/file-references/file_references.go @@ -0,0 +1,210 @@ +// Package filereferences provides functionality for analyzing which files reference a target file. +// +// This package implements the "analyze file-references" subcommand, which finds all files +// that reference a given file through RST directives (include, literalinclude, io-code-block, toctree). +// +// The command searches both RST files (.rst, .txt) and YAML files (.yaml, .yml) since +// extract and release YAML files contain RST directives within their content blocks. +// +// The command performs reverse dependency analysis, showing which files depend on the +// target file. This is useful for: +// - Understanding the impact of changes to a file +// - Finding all usages of an include file +// - Tracking code example references +package filereferences + +import ( + "fmt" + + "github.com/spf13/cobra" +) + +// NewFileReferencesCommand creates the file-references subcommand. +// +// This command analyzes which files reference a given target file through +// RST directives (include, literalinclude, io-code-block, toctree). +// +// Usage: +// analyze file-references /path/to/file.rst +// analyze file-references /path/to/code-example.js +// +// Flags: +// - --format: Output format (text or json) +// - -v, --verbose: Show detailed information including line numbers +// - -c, --count-only: Only show the count of references +// - --paths-only: Only show the file paths +// - --summary: Only show summary statistics (total files and references by type) +// - -t, --directive-type: Filter by directive type (include, literalinclude, io-code-block, toctree) +// - --include-toctree: Include toctree entries (navigation links) in addition to content inclusion directives +// - --exclude: Exclude paths matching this glob pattern (e.g., '*/archive/*') +func NewFileReferencesCommand() *cobra.Command { + var ( + format string + verbose bool + countOnly bool + pathsOnly bool + summaryOnly bool + directiveType string + includeToctree bool + excludePattern string + ) + + cmd := &cobra.Command{ + Use: "file-references [filepath]", + Short: "Find all files that reference a target file", + Long: `Find all files that reference a target file through RST directives. + +This command performs reverse dependency analysis, showing which files reference +the target file through content inclusion directives (include, literalinclude, +io-code-block). Use --include-toctree to also search for toctree entries, which +are navigation links rather than content transclusion. + +Supported directive types: + - .. include:: RST content includes (transcluded) + - .. literalinclude:: Code file references (transcluded) + - .. io-code-block:: Input/output examples with file arguments (transcluded) + - .. toctree:: Table of contents entries (navigation links, requires --include-toctree) + +The command searches all RST files (.rst, .txt) and YAML files (.yaml, .yml) in +the source directory tree. YAML files are included because extract and release +files contain RST directives within their content blocks. + +This is useful for: + - Understanding the impact of changes to a file + - Finding all usages of an include file + - Tracking code example references + +Examples: + # Find what references an include file + analyze file-references /path/to/includes/fact.rst + + # Find what references a code example + analyze file-references /path/to/code-examples/example.js + + # Include toctree references (navigation links) + analyze file-references /path/to/file.rst --include-toctree + + # Get JSON output + analyze file-references /path/to/file.rst --format json + + # Show detailed information with line numbers + analyze file-references /path/to/file.rst --verbose + + # Just show the count + analyze file-references /path/to/file.rst --count-only + + # Just show the file paths + analyze file-references /path/to/file.rst --paths-only + + # Show summary statistics only + analyze file-references /path/to/file.rst --summary + + # Exclude certain paths from search + analyze file-references /path/to/file.rst --exclude "*/archive/*" + + # Filter by directive type + analyze file-references /path/to/file.rst --directive-type include`, + Args: cobra.ExactArgs(1), + RunE: func(cmd *cobra.Command, args []string) error { + return runReferences(args[0], format, verbose, countOnly, pathsOnly, summaryOnly, directiveType, includeToctree, excludePattern) + }, + } + + cmd.Flags().StringVar(&format, "format", "text", "Output format (text or json)") + cmd.Flags().BoolVarP(&verbose, "verbose", "v", false, "Show detailed information including line numbers") + cmd.Flags().BoolVarP(&countOnly, "count-only", "c", false, "Only show the count of references") + cmd.Flags().BoolVar(&pathsOnly, "paths-only", false, "Only show the file paths (one per line)") + cmd.Flags().BoolVar(&summaryOnly, "summary", false, "Only show summary statistics (total files and references by type)") + cmd.Flags().StringVarP(&directiveType, "directive-type", "t", "", "Filter by directive type (include, literalinclude, io-code-block, toctree)") + cmd.Flags().BoolVar(&includeToctree, "include-toctree", false, "Include toctree entries (navigation links) in addition to content inclusion directives") + cmd.Flags().StringVar(&excludePattern, "exclude", "", "Exclude paths matching this glob pattern (e.g., '*/archive/*' or '*/deprecated/*')") + + return cmd +} + +// runReferences executes the references analysis. +// +// This function performs the analysis and prints the results in the specified format. +// +// Parameters: +// - targetFile: Path to the file to analyze +// - format: Output format (text or json) +// - verbose: If true, show detailed information +// - countOnly: If true, only show the count +// - pathsOnly: If true, only show the file paths +// - summaryOnly: If true, only show summary statistics +// - directiveType: Filter by directive type (empty string means all types) +// - includeToctree: If true, include toctree entries in the search +// - excludePattern: Glob pattern for paths to exclude (empty string means no exclusion) +// +// Returns: +// - error: Any error encountered during analysis +func runReferences(targetFile, format string, verbose, countOnly, pathsOnly, summaryOnly bool, directiveType string, includeToctree bool, excludePattern string) error { + // Validate directive type if specified + if directiveType != "" { + validTypes := map[string]bool{ + "include": true, + "literalinclude": true, + "io-code-block": true, + "toctree": true, + } + if !validTypes[directiveType] { + return fmt.Errorf("invalid directive type: %s (must be 'include', 'literalinclude', 'io-code-block', or 'toctree')", directiveType) + } + } + + // Validate format + outputFormat := OutputFormat(format) + if outputFormat != FormatText && outputFormat != FormatJSON { + return fmt.Errorf("invalid format: %s (must be 'text' or 'json')", format) + } + + // Validate flag combinations + exclusiveFlags := 0 + if countOnly { + exclusiveFlags++ + } + if pathsOnly { + exclusiveFlags++ + } + if summaryOnly { + exclusiveFlags++ + } + if exclusiveFlags > 1 { + return fmt.Errorf("cannot use --count-only, --paths-only, and --summary together") + } + if (countOnly || pathsOnly || summaryOnly) && outputFormat == FormatJSON { + return fmt.Errorf("--count-only, --paths-only, and --summary are not compatible with --format json") + } + + // Perform analysis + analysis, err := AnalyzeReferences(targetFile, includeToctree, verbose, excludePattern) + if err != nil { + return fmt.Errorf("failed to analyze references: %w", err) + } + + // Filter by directive type if specified + if directiveType != "" { + analysis = FilterByDirectiveType(analysis, directiveType) + } + + // Handle count-only output + if countOnly { + fmt.Println(analysis.TotalReferences) + return nil + } + + // Handle paths-only output + if pathsOnly { + return PrintPathsOnly(analysis) + } + + // Handle summary-only output + if summaryOnly { + return PrintSummary(analysis) + } + + // Print full results + return PrintAnalysis(analysis, outputFormat, verbose) +} + diff --git a/audit-cli/commands/analyze/file-references/file_references_test.go b/audit-cli/commands/analyze/file-references/file_references_test.go new file mode 100644 index 0000000..d23e576 --- /dev/null +++ b/audit-cli/commands/analyze/file-references/file_references_test.go @@ -0,0 +1,305 @@ +package filereferences + +import ( + "path/filepath" + "testing" +) + +// TestAnalyzeReferences tests the AnalyzeReferences function with various scenarios. +func TestAnalyzeReferences(t *testing.T) { + // Get the testdata directory + testDataDir := "../../../testdata/input-files/source" + + tests := []struct { + name string + targetFile string + expectedReferences int + expectedDirectiveType string + }{ + { + name: "Include file with multiple references", + targetFile: "includes/intro.rst", + expectedReferences: 5, // 4 RST files + 1 YAML file (no toctree by default) + expectedDirectiveType: "include", + }, + { + name: "Code example with literalinclude", + targetFile: "code-examples/example.py", + expectedReferences: 2, // 1 RST file + 1 YAML file + expectedDirectiveType: "literalinclude", + }, + { + name: "Code example with multiple directive types", + targetFile: "code-examples/example.js", + expectedReferences: 2, // literalinclude + io-code-block + expectedDirectiveType: "", // mixed types + }, + { + name: "File with no references", + targetFile: "code-block-test.rst", + expectedReferences: 0, + expectedDirectiveType: "", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + // Build absolute path to target file + targetPath := filepath.Join(testDataDir, tt.targetFile) + absTargetPath, err := filepath.Abs(targetPath) + if err != nil { + t.Fatalf("failed to get absolute path: %v", err) + } + + // Run analysis (without toctree by default, not verbose, no exclude pattern) + analysis, err := AnalyzeReferences(absTargetPath, false, false, "") + if err != nil { + t.Fatalf("AnalyzeReferences failed: %v", err) + } + + // Check total references + if analysis.TotalReferences != tt.expectedReferences { + t.Errorf("expected %d references, got %d", tt.expectedReferences, analysis.TotalReferences) + } + + // Check that we got the expected number of referencing files + if len(analysis.ReferencingFiles) != tt.expectedReferences { + t.Errorf("expected %d referencing files, got %d", tt.expectedReferences, len(analysis.ReferencingFiles)) + } + + // If we expect a specific directive type, check it + if tt.expectedDirectiveType != "" && tt.expectedReferences > 0 { + foundExpectedType := false + for _, ref := range analysis.ReferencingFiles { + if ref.DirectiveType == tt.expectedDirectiveType { + foundExpectedType = true + break + } + } + if !foundExpectedType { + t.Errorf("expected to find directive type %s, but didn't", tt.expectedDirectiveType) + } + } + + // Verify source directory was found + if analysis.SourceDir == "" { + t.Error("source directory should not be empty") + } + + // Verify target file matches + if analysis.TargetFile != absTargetPath { + t.Errorf("expected target file %s, got %s", absTargetPath, analysis.TargetFile) + } + }) + } +} + +// TestFindReferencesInFile tests the findReferencesInFile function. +func TestFindReferencesInFile(t *testing.T) { + testDataDir := "../../../testdata/input-files/source" + sourceDir := testDataDir + + tests := []struct { + name string + searchFile string + targetFile string + expectedReferences int + expectedDirective string + includeToctree bool + }{ + { + name: "Include directive", + searchFile: "include-test.rst", + targetFile: "includes/intro.rst", + expectedReferences: 1, + expectedDirective: "include", + includeToctree: false, + }, + { + name: "Literalinclude directive", + searchFile: "literalinclude-test.rst", + targetFile: "code-examples/example.py", + expectedReferences: 1, + expectedDirective: "literalinclude", + includeToctree: false, + }, + { + name: "IO code block directive", + searchFile: "io-code-block-test.rst", + targetFile: "code-examples/example.js", + expectedReferences: 1, + expectedDirective: "io-code-block", + includeToctree: false, + }, + { + name: "Duplicate includes", + searchFile: "duplicate-include-test.rst", + targetFile: "includes/intro.rst", + expectedReferences: 2, // Same file included twice + expectedDirective: "include", + includeToctree: false, + }, + { + name: "Toctree directive", + searchFile: "index.rst", + targetFile: "include-test.rst", + expectedReferences: 1, + expectedDirective: "toctree", + includeToctree: true, // Must enable toctree flag + }, + { + name: "No references", + searchFile: "code-block-test.rst", + targetFile: "includes/intro.rst", + expectedReferences: 0, + expectedDirective: "", + includeToctree: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + searchPath := filepath.Join(testDataDir, tt.searchFile) + targetPath := filepath.Join(testDataDir, tt.targetFile) + + // Get absolute paths + absSearchPath, err := filepath.Abs(searchPath) + if err != nil { + t.Fatalf("failed to get absolute search path: %v", err) + } + absTargetPath, err := filepath.Abs(targetPath) + if err != nil { + t.Fatalf("failed to get absolute target path: %v", err) + } + absSourceDir, err := filepath.Abs(sourceDir) + if err != nil { + t.Fatalf("failed to get absolute source dir: %v", err) + } + + refs, err := findReferencesInFile(absSearchPath, absTargetPath, absSourceDir, tt.includeToctree) + if err != nil { + t.Fatalf("findReferencesInFile failed: %v", err) + } + + if len(refs) != tt.expectedReferences { + t.Errorf("expected %d references, got %d", tt.expectedReferences, len(refs)) + } + + // Check directive type if we expect references + if tt.expectedReferences > 0 && tt.expectedDirective != "" { + for _, ref := range refs { + if ref.DirectiveType != tt.expectedDirective { + t.Errorf("expected directive type %s, got %s", tt.expectedDirective, ref.DirectiveType) + } + } + } + + // Verify all references have required fields + for _, ref := range refs { + if ref.FilePath == "" { + t.Error("reference should have a file path") + } + if ref.DirectiveType == "" { + t.Error("reference should have a directive type") + } + if ref.ReferencePath == "" { + t.Error("reference should have a reference path") + } + if ref.LineNumber == 0 { + t.Error("reference should have a line number") + } + } + }) + } +} + +// TestReferencesTarget tests the referencesTarget function. +func TestReferencesTarget(t *testing.T) { + testDataDir := "../../../testdata/input-files/source" + + // Get absolute paths + absTestDataDir, err := filepath.Abs(testDataDir) + if err != nil { + t.Fatalf("failed to get absolute test data dir: %v", err) + } + + tests := []struct { + name string + refPath string + targetFile string + currentFile string + expected bool + }{ + { + name: "Absolute path match", + refPath: "/includes/intro.rst", + targetFile: filepath.Join(absTestDataDir, "includes/intro.rst"), + currentFile: filepath.Join(absTestDataDir, "include-test.rst"), + expected: true, + }, + { + name: "Absolute path no match", + refPath: "/includes/intro.rst", + targetFile: filepath.Join(absTestDataDir, "includes/examples.rst"), + currentFile: filepath.Join(absTestDataDir, "include-test.rst"), + expected: false, + }, + { + name: "Relative path match", + refPath: "intro.rst", + targetFile: filepath.Join(absTestDataDir, "includes/intro.rst"), + currentFile: filepath.Join(absTestDataDir, "includes/nested-include.rst"), + expected: true, + }, + { + name: "Relative path no match", + refPath: "intro.rst", + targetFile: filepath.Join(absTestDataDir, "includes/examples.rst"), + currentFile: filepath.Join(absTestDataDir, "includes/nested-include.rst"), + expected: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + result := referencesTarget(tt.refPath, tt.targetFile, absTestDataDir, tt.currentFile) + if result != tt.expected { + t.Errorf("expected %v, got %v", tt.expected, result) + } + }) + } +} + +// TestGroupByDirectiveType tests the groupByDirectiveType function. +func TestGroupByDirectiveType(t *testing.T) { + refs := []FileReference{ + {DirectiveType: "include", FilePath: "file1.rst"}, + {DirectiveType: "include", FilePath: "file2.rst"}, + {DirectiveType: "literalinclude", FilePath: "file3.rst"}, + {DirectiveType: "io-code-block", FilePath: "file4.rst"}, + {DirectiveType: "include", FilePath: "file5.rst"}, + } + + groups := groupByDirectiveType(refs) + + // Check that we have 3 groups + if len(groups) != 3 { + t.Errorf("expected 3 groups, got %d", len(groups)) + } + + // Check include group + if len(groups["include"]) != 3 { + t.Errorf("expected 3 include references, got %d", len(groups["include"])) + } + + // Check literalinclude group + if len(groups["literalinclude"]) != 1 { + t.Errorf("expected 1 literalinclude reference, got %d", len(groups["literalinclude"])) + } + + // Check io-code-block group + if len(groups["io-code-block"]) != 1 { + t.Errorf("expected 1 io-code-block reference, got %d", len(groups["io-code-block"])) + } +} + diff --git a/audit-cli/commands/analyze/file-references/output.go b/audit-cli/commands/analyze/file-references/output.go new file mode 100644 index 0000000..cbb2bb5 --- /dev/null +++ b/audit-cli/commands/analyze/file-references/output.go @@ -0,0 +1,256 @@ +package filereferences + +import ( + "encoding/json" + "fmt" + "os" + "path/filepath" + "sort" + "strings" +) + +// OutputFormat represents the output format for the analysis results. +type OutputFormat string + +const ( + // FormatText is the default human-readable text format + FormatText OutputFormat = "text" + // FormatJSON is the JSON format + FormatJSON OutputFormat = "json" +) + +// PrintAnalysis prints the analysis results in the specified format. +// +// Parameters: +// - analysis: The analysis results to print +// - format: The output format (text or json) +// - verbose: If true, show additional details +func PrintAnalysis(analysis *ReferenceAnalysis, format OutputFormat, verbose bool) error { + switch format { + case FormatJSON: + return printJSON(analysis) + case FormatText: + printText(analysis, verbose) + return nil + default: + return fmt.Errorf("unknown output format: %s", format) + } +} + +// printText prints the analysis results in human-readable text format. +func printText(analysis *ReferenceAnalysis, verbose bool) { + fmt.Println("============================================================") + fmt.Println("REFERENCE ANALYSIS") + fmt.Println("============================================================") + fmt.Printf("Target File: %s\n", analysis.TargetFile) + fmt.Printf("Total Files: %d\n", analysis.TotalFiles) + fmt.Printf("Total References: %d\n", analysis.TotalReferences) + fmt.Println("============================================================") + fmt.Println() + + if analysis.TotalReferences == 0 { + fmt.Println("No files reference this file.") + fmt.Println() + fmt.Println("This could mean:") + fmt.Println(" - The file is not included in any documentation pages") + fmt.Println(" - The file might be orphaned (not used)") + fmt.Println(" - The file is referenced using a different path") + fmt.Println() + fmt.Println("Note: By default, only content inclusion directives are searched.") + fmt.Println("Use --include-toctree to also search for toctree navigation links.") + fmt.Println() + return + } + + // Group references by directive type + byDirectiveType := groupByDirectiveType(analysis.ReferencingFiles) + + // Print breakdown by directive type with file and reference counts + directiveTypes := []string{"include", "literalinclude", "io-code-block", "toctree"} + for _, directiveType := range directiveTypes { + if refs, ok := byDirectiveType[directiveType]; ok { + uniqueFiles := countUniqueFiles(refs) + totalRefs := len(refs) + if uniqueFiles == totalRefs { + // No duplicates - just show count + fmt.Printf("%-20s: %d\n", directiveType, uniqueFiles) + } else { + // Has duplicates - show both counts + if uniqueFiles == 1 { + fmt.Printf("%-20s: %d file, %d references\n", directiveType, uniqueFiles, totalRefs) + } else { + fmt.Printf("%-20s: %d files, %d references\n", directiveType, uniqueFiles, totalRefs) + } + } + } + } + fmt.Println() + + // Group references by file + grouped := GroupReferencesByFile(analysis.ReferencingFiles) + + // Print detailed list of referencing files + for i, group := range grouped { + // Get relative path from source directory for cleaner output + relPath, err := filepath.Rel(analysis.SourceDir, group.FilePath) + if err != nil { + relPath = group.FilePath + } + + // Print file path with directive type label + if group.Count > 1 { + // Multiple references from this file + fmt.Printf("%3d. [%s] %s (%d references)\n", i+1, group.DirectiveType, relPath, group.Count) + } else { + // Single reference + fmt.Printf("%3d. [%s] %s\n", i+1, group.DirectiveType, relPath) + } + + // Print line numbers in verbose mode + if verbose { + for _, ref := range group.References { + fmt.Printf(" Line %d: %s\n", ref.LineNumber, ref.ReferencePath) + } + } + } + + fmt.Println() +} + +// printJSON prints the analysis results in JSON format. +func printJSON(analysis *ReferenceAnalysis) error { + // Create a JSON-friendly structure + output := struct { + TargetFile string `json:"target_file"` + SourceDir string `json:"source_dir"` + TotalFiles int `json:"total_files"` + TotalReferences int `json:"total_references"` + ReferencingFiles []FileReference `json:"referencing_files"` + }{ + TargetFile: analysis.TargetFile, + SourceDir: analysis.SourceDir, + TotalFiles: analysis.TotalFiles, + TotalReferences: analysis.TotalReferences, + ReferencingFiles: analysis.ReferencingFiles, + } + + encoder := json.NewEncoder(os.Stdout) + encoder.SetIndent("", " ") + return encoder.Encode(output) +} + +// groupByDirectiveType groups references by their directive type. +func groupByDirectiveType(refs []FileReference) map[string][]FileReference { + groups := make(map[string][]FileReference) + + for _, ref := range refs { + groups[ref.DirectiveType] = append(groups[ref.DirectiveType], ref) + } + + return groups +} + +// FormatReferencePath formats a reference path for display. +// +// This function shortens paths for better readability while maintaining +// enough context to identify the file. +func FormatReferencePath(path, sourceDir string) string { + // Try to get relative path from source directory + relPath, err := filepath.Rel(sourceDir, path) + if err != nil { + return path + } + + // If the relative path is shorter, use it + if len(relPath) < len(path) { + return relPath + } + + return path +} + +// GetDirectiveTypeLabel returns a human-readable label for a directive type. +func GetDirectiveTypeLabel(directiveType string) string { + labels := map[string]string{ + "include": "Include", + "literalinclude": "Literal Include", + "io-code-block": "I/O Code Block", + } + + if label, ok := labels[directiveType]; ok { + return label + } + + return strings.Title(directiveType) +} + +// PrintPathsOnly prints only the file paths, one per line. +// +// This is useful for piping to other commands or for simple scripting. +// +// Parameters: +// - analysis: The analysis results +// +// Returns: +// - error: Any error encountered during printing +func PrintPathsOnly(analysis *ReferenceAnalysis) error { + // Get unique file paths (in case there are duplicates) + seen := make(map[string]bool) + var paths []string + + for _, ref := range analysis.ReferencingFiles { + // Get relative path from source directory for cleaner output + relPath, err := filepath.Rel(analysis.SourceDir, ref.FilePath) + if err != nil { + relPath = ref.FilePath + } + + if !seen[relPath] { + seen[relPath] = true + paths = append(paths, relPath) + } + } + + // Sort for consistent output + sort.Strings(paths) + + // Print each path + for _, path := range paths { + fmt.Println(path) + } + + return nil +} + +// PrintSummary prints only summary statistics without the file list. +// +// This is useful for getting a quick overview of reference counts. +// +// Parameters: +// - analysis: The analysis results +// +// Returns: +// - error: Any error encountered during printing +func PrintSummary(analysis *ReferenceAnalysis) error { + fmt.Printf("Total Files: %d\n", analysis.TotalFiles) + fmt.Printf("Total References: %d\n", analysis.TotalReferences) + + if analysis.TotalReferences > 0 { + // Group by directive type + byDirectiveType := groupByDirectiveType(analysis.ReferencingFiles) + + // Print breakdown by type + fmt.Println("\nBy Type:") + directiveTypes := []string{"include", "literalinclude", "io-code-block", "toctree"} + for _, directiveType := range directiveTypes { + if refs, ok := byDirectiveType[directiveType]; ok { + uniqueFiles := countUniqueFiles(refs) + totalRefs := len(refs) + fmt.Printf(" %-20s: %d files, %d references\n", directiveType, uniqueFiles, totalRefs) + } + } + } + + return nil +} + diff --git a/audit-cli/commands/analyze/file-references/types.go b/audit-cli/commands/analyze/file-references/types.go new file mode 100644 index 0000000..22f705a --- /dev/null +++ b/audit-cli/commands/analyze/file-references/types.go @@ -0,0 +1,82 @@ +package filereferences + +// ReferenceAnalysis contains the results of analyzing which files reference a target file. +// +// This structure holds both a flat list of referencing files and a hierarchical +// tree structure showing the reference relationships. +type ReferenceAnalysis struct { + // TargetFile is the absolute path to the file being analyzed + TargetFile string + + // ReferencingFiles is a flat list of all files that reference the target + ReferencingFiles []FileReference + + // ReferenceTree is a hierarchical tree structure of references + // (for future use - showing nested references) + ReferenceTree *ReferenceNode + + // TotalReferences is the total number of directive occurrences + TotalReferences int + + // TotalFiles is the total number of unique files that reference the target + TotalFiles int + + // SourceDir is the source directory that was searched + SourceDir string +} + +// FileReference represents a single file that references the target file. +// +// This structure captures details about how and where the reference occurs. +type FileReference struct { + // FilePath is the absolute path to the file that references the target + FilePath string `json:"file_path"` + + // DirectiveType is the type of directive used to reference the file + // Possible values: "include", "literalinclude", "io-code-block", "toctree" + DirectiveType string `json:"directive_type"` + + // ReferencePath is the path used in the directive (as written in the file) + ReferencePath string `json:"reference_path"` + + // LineNumber is the line number where the reference occurs + LineNumber int `json:"line_number"` +} + +// ReferenceNode represents a node in the reference tree. +// +// This structure is used to build a hierarchical view of references, +// showing which files reference the target and which files reference those files. +type ReferenceNode struct { + // FilePath is the absolute path to this file + FilePath string + + // DirectiveType is the type of directive used to reference the file + DirectiveType string + + // ReferencePath is the path used in the directive + ReferencePath string + + // Children are files that include this file + // (for building nested reference trees) + Children []*ReferenceNode +} + +// GroupedFileReference represents a file with all its references to the target. +// +// This structure groups multiple references from the same file together, +// showing how many times a file references the target and where. +type GroupedFileReference struct { + // FilePath is the absolute path to the file + FilePath string + + // DirectiveType is the type of directive used + DirectiveType string + + // References is the list of all references from this file + References []FileReference + + // Count is the number of references from this file + Count int +} + diff --git a/audit-cli/commands/analyze/includes/analyzer.go b/audit-cli/commands/analyze/includes/analyzer.go index 52c1ff1..2d02dad 100644 --- a/audit-cli/commands/analyze/includes/analyzer.go +++ b/audit-cli/commands/analyze/includes/analyzer.go @@ -103,7 +103,7 @@ func buildIncludeTree(filePath string, visited map[string]bool, verbose bool, de includeFiles, err := rst.FindIncludeDirectives(absPath) if err != nil { // Not a fatal error - file might not have includes - return node, nil + includeFiles = []string{} } if verbose && len(includeFiles) > 0 { @@ -115,7 +115,7 @@ func buildIncludeTree(filePath string, visited map[string]bool, verbose bool, de for _, includeFile := range includeFiles { childNode, err := buildIncludeTree(includeFile, visited, verbose, depth+1) if err != nil { - fmt.Fprintf(os.Stderr, "Warning: failed to process include %s: %v\n", includeFile, err) + fmt.Fprintf(os.Stderr, "Warning: failed to process file %s: %v\n", includeFile, err) continue } node.Children = append(node.Children, childNode) diff --git a/audit-cli/commands/analyze/includes/includes.go b/audit-cli/commands/analyze/includes/includes.go index bb0d5f6..a5f3a9b 100644 --- a/audit-cli/commands/analyze/includes/includes.go +++ b/audit-cli/commands/analyze/includes/includes.go @@ -1,11 +1,12 @@ -// Package includes provides functionality for analyzing include directive relationships. +// Package includes provides functionality for analyzing include relationships. // // This package implements the "analyze includes" subcommand, which analyzes RST files // to understand their include directive relationships. It can display results as: // - A hierarchical tree structure showing include relationships // - A flat list of all files referenced through includes // -// This helps writers understand the impact of changes to files that are widely included. +// This helps writers understand the impact of changes to files that are widely included +// across the documentation. package includes import ( @@ -32,7 +33,7 @@ func NewIncludesCommand() *cobra.Command { cmd := &cobra.Command{ Use: "includes [filepath]", - Short: "Analyze include directive relationships in RST files", + Short: "Analyze include relationships in RST files", Long: `Analyze include directive relationships to understand file dependencies. This command recursively follows .. include:: directives and shows all files diff --git a/audit-cli/commands/compare/file-contents/comparer.go b/audit-cli/commands/compare/file-contents/comparer.go index deabaa0..4f23eb4 100644 --- a/audit-cli/commands/compare/file-contents/comparer.go +++ b/audit-cli/commands/compare/file-contents/comparer.go @@ -4,6 +4,8 @@ import ( "fmt" "os" "path/filepath" + + "github.com/mongodb/code-example-tooling/audit-cli/internal/pathresolver" ) // CompareFiles performs a direct comparison between two files. @@ -160,7 +162,7 @@ func CompareVersions(referenceFile, productDir string, versions []string, genera // // Returns: // - FileComparison: The comparison result for this file -func compareFile(referencePath, referenceContent string, versionPath VersionPath, generateDiff bool, verbose bool) FileComparison { +func compareFile(referencePath, referenceContent string, versionPath pathresolver.VersionPath, generateDiff bool, verbose bool) FileComparison { comparison := FileComparison{ Version: versionPath.Version, FilePath: versionPath.FilePath, diff --git a/audit-cli/commands/compare/file-contents/version_resolver.go b/audit-cli/commands/compare/file-contents/version_resolver.go index 4f42d55..52f43a1 100644 --- a/audit-cli/commands/compare/file-contents/version_resolver.go +++ b/audit-cli/commands/compare/file-contents/version_resolver.go @@ -1,17 +1,9 @@ package file_contents import ( - "fmt" - "path/filepath" - "strings" + "github.com/mongodb/code-example-tooling/audit-cli/internal/pathresolver" ) -// VersionPath represents a resolved file path for a specific version. -type VersionPath struct { - Version string - FilePath string -} - // ResolveVersionPaths resolves file paths for all specified versions. // // Given a reference file path and a list of versions, this function constructs @@ -33,65 +25,10 @@ type VersionPath struct { // - versions: List of version identifiers // // Returns: -// - []VersionPath: List of resolved version paths +// - []pathresolver.VersionPath: List of resolved version paths // - error: Any error encountered during resolution -func ResolveVersionPaths(referenceFile string, productDir string, versions []string) ([]VersionPath, error) { - // Clean the paths - referenceFile = filepath.Clean(referenceFile) - productDir = filepath.Clean(productDir) - - // Ensure productDir ends with a separator for proper prefix matching - if !strings.HasSuffix(productDir, string(filepath.Separator)) { - productDir += string(filepath.Separator) - } - - // Check if referenceFile is under productDir - if !strings.HasPrefix(referenceFile, productDir) { - return nil, fmt.Errorf("reference file %s is not under product directory %s", referenceFile, productDir) - } - - // Extract the relative path from productDir - relativePath := strings.TrimPrefix(referenceFile, productDir) - - // Find the version segment and the path after it - // Expected format: {version}/source/{rest-of-path} - parts := strings.Split(relativePath, string(filepath.Separator)) - if len(parts) < 2 { - return nil, fmt.Errorf("invalid file path structure: expected {version}/source/... format, got %s", relativePath) - } - - // Find the "source" directory - sourceIndex := -1 - for i, part := range parts { - if part == "source" { - sourceIndex = i - break - } - } - - if sourceIndex == -1 { - return nil, fmt.Errorf("could not find 'source' directory in path: %s", relativePath) - } - - if sourceIndex == 0 { - return nil, fmt.Errorf("invalid path structure: 'source' cannot be the first segment in %s", relativePath) - } - - // The version is the segment before "source" - // Everything from "source" onwards is the path we want to preserve - pathFromSource := strings.Join(parts[sourceIndex:], string(filepath.Separator)) - - // Build version paths - var versionPaths []VersionPath - for _, version := range versions { - versionPath := filepath.Join(productDir, version, pathFromSource) - versionPaths = append(versionPaths, VersionPath{ - Version: version, - FilePath: versionPath, - }) - } - - return versionPaths, nil +func ResolveVersionPaths(referenceFile string, productDir string, versions []string) ([]pathresolver.VersionPath, error) { + return pathresolver.ResolveVersionPaths(referenceFile, productDir, versions) } // ExtractVersionFromPath extracts the version identifier from a file path. @@ -112,49 +49,6 @@ func ResolveVersionPaths(referenceFile string, productDir string, versions []str // - string: The version identifier // - error: Any error encountered during extraction func ExtractVersionFromPath(filePath string, productDir string) (string, error) { - // Clean the paths - filePath = filepath.Clean(filePath) - productDir = filepath.Clean(productDir) - - // Ensure productDir ends with a separator for proper prefix matching - if !strings.HasSuffix(productDir, string(filepath.Separator)) { - productDir += string(filepath.Separator) - } - - // Check if filePath is under productDir - if !strings.HasPrefix(filePath, productDir) { - return "", fmt.Errorf("file path %s is not under product directory %s", filePath, productDir) - } - - // Extract the relative path from productDir - relativePath := strings.TrimPrefix(filePath, productDir) - - // Split into parts - parts := strings.Split(relativePath, string(filepath.Separator)) - if len(parts) < 2 { - return "", fmt.Errorf("invalid file path structure: expected {version}/source/... format, got %s", relativePath) - } - - // Find the "source" directory - sourceIndex := -1 - for i, part := range parts { - if part == "source" { - sourceIndex = i - break - } - } - - if sourceIndex == -1 { - return "", fmt.Errorf("could not find 'source' directory in path: %s", relativePath) - } - - if sourceIndex == 0 { - return "", fmt.Errorf("invalid path structure: 'source' cannot be the first segment in %s", relativePath) - } - - // The version is the segment before "source" - version := parts[sourceIndex-1] - - return version, nil + return pathresolver.ExtractVersionFromPath(filePath, productDir) } diff --git a/audit-cli/commands/extract/code-examples/code_examples_test.go b/audit-cli/commands/extract/code-examples/code_examples_test.go index f1432de..4187c45 100644 --- a/audit-cli/commands/extract/code-examples/code_examples_test.go +++ b/audit-cli/commands/extract/code-examples/code_examples_test.go @@ -582,8 +582,8 @@ func TestNoFlagsOnDirectory(t *testing.T) { // Should NOT include files in includes/ subdirectory // Expected: code-block-test.rst, duplicate-include-test.rst, include-test.rst, // io-code-block-test.rst, literalinclude-test.rst, nested-code-block-test.rst, - // nested-include-test.rst (7 files) - expectedFiles := 7 + // nested-include-test.rst, index.rst (8 files) + expectedFiles := 8 if report.FilesTraversed != expectedFiles { t.Errorf("Expected %d files traversed (top-level only), got %d", expectedFiles, report.FilesTraversed) diff --git a/audit-cli/internal/pathresolver/pathresolver.go b/audit-cli/internal/pathresolver/pathresolver.go new file mode 100644 index 0000000..7294b4a --- /dev/null +++ b/audit-cli/internal/pathresolver/pathresolver.go @@ -0,0 +1,119 @@ +package pathresolver + +import ( + "fmt" + "path/filepath" +) + +// DetectProjectInfo analyzes a file path and determines the project structure. +// +// This function detects whether the file is part of a versioned or non-versioned +// project and extracts relevant information about the project structure. +// +// Versioned project structure: +// {product}/{version}/source/... +// Example: /path/to/manual/v8.0/source/includes/file.rst +// +// Non-versioned project structure: +// {product}/source/... +// Example: /path/to/atlas/source/includes/file.rst +// +// Parameters: +// - filePath: Path to a file within the documentation tree +// +// Returns: +// - *ProjectInfo: Information about the project structure +// - error: Any error encountered during detection +func DetectProjectInfo(filePath string) (*ProjectInfo, error) { + // Get absolute path + absPath, err := filepath.Abs(filePath) + if err != nil { + return nil, fmt.Errorf("failed to get absolute path: %w", err) + } + + // Find the source directory + sourceDir, err := FindSourceDirectory(absPath) + if err != nil { + return nil, err + } + + // Get the parent directory of source (could be version or product) + parent := filepath.Dir(sourceDir) + parentName := filepath.Base(parent) + + // Check if this is a versioned project + isVersioned, err := IsVersionedProject(sourceDir) + if err != nil { + return nil, err + } + + var productDir string + var version string + + if isVersioned { + // Versioned project: parent is the version directory + version = parentName + productDir = filepath.Dir(parent) + } else { + // Non-versioned project: parent is the product directory + version = "" + productDir = parent + } + + return &ProjectInfo{ + SourceDir: sourceDir, + ProductDir: productDir, + Version: version, + IsVersioned: isVersioned, + }, nil +} + +// ResolveRelativeToSource resolves a path relative to the source directory. +// +// This function takes a relative path (like "/includes/file.rst") and resolves +// it to an absolute path based on the source directory. +// +// Parameters: +// - sourceDir: The absolute path to the source directory +// - relativePath: The relative path to resolve (can start with / or not) +// +// Returns: +// - string: The absolute path +// - error: Any error encountered during resolution +func ResolveRelativeToSource(sourceDir, relativePath string) (string, error) { + // Clean the paths + sourceDir = filepath.Clean(sourceDir) + relativePath = filepath.Clean(relativePath) + + // Remove leading slash if present (it's relative to source, not filesystem root) + if len(relativePath) > 0 && relativePath[0] == '/' { + relativePath = relativePath[1:] + } + + // Join with source directory + fullPath := filepath.Join(sourceDir, relativePath) + + return fullPath, nil +} + +// FindProductDirectory walks up the directory tree to find the product root directory. +// +// The product directory is the parent of either: +// - The version directory (for versioned projects) +// - The source directory (for non-versioned projects) +// +// Parameters: +// - filePath: Path to a file within the documentation tree +// +// Returns: +// - string: Absolute path to the product directory +// - error: Error if product directory cannot be found +func FindProductDirectory(filePath string) (string, error) { + projectInfo, err := DetectProjectInfo(filePath) + if err != nil { + return "", err + } + + return projectInfo.ProductDir, nil +} + diff --git a/audit-cli/internal/pathresolver/pathresolver_test.go b/audit-cli/internal/pathresolver/pathresolver_test.go new file mode 100644 index 0000000..6766d97 --- /dev/null +++ b/audit-cli/internal/pathresolver/pathresolver_test.go @@ -0,0 +1,197 @@ +package pathresolver + +import ( + "path/filepath" + "testing" +) + +func TestFindSourceDirectory(t *testing.T) { + tests := []struct { + name string + filePath string + wantContains string + wantErr bool + }{ + { + name: "versioned project file", + filePath: "../../testdata/compare/product/v8.0/source/includes/example.rst", + wantContains: "testdata/compare/product/v8.0/source", + wantErr: false, + }, + { + name: "non-versioned project file", + filePath: "../../testdata/compare/product/manual/source/includes/example.rst", + wantContains: "testdata/compare/product/manual/source", + wantErr: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got, err := FindSourceDirectory(tt.filePath) + if (err != nil) != tt.wantErr { + t.Errorf("FindSourceDirectory() error = %v, wantErr %v", err, tt.wantErr) + return + } + if !tt.wantErr { + // Check that the path contains the expected substring + if !filepath.IsAbs(got) { + t.Errorf("FindSourceDirectory() returned relative path: %v", got) + } + if !filepath.HasPrefix(got, "/") { + t.Errorf("FindSourceDirectory() returned non-absolute path: %v", got) + } + // Check that it ends with the expected path + if !filepath.HasPrefix(got, "/") || !filepath.HasPrefix(filepath.Clean(got), "/") { + t.Errorf("FindSourceDirectory() = %v, should be absolute", got) + } + } + }) + } +} + +func TestDetectProjectInfo(t *testing.T) { + tests := []struct { + name string + filePath string + wantVersion string + wantVersioned bool + wantErr bool + }{ + { + name: "versioned project v8.0", + filePath: "../../testdata/compare/product/v8.0/source/includes/example.rst", + wantVersion: "v8.0", + wantVersioned: true, + wantErr: false, + }, + { + name: "versioned project manual", + filePath: "../../testdata/compare/product/manual/source/includes/example.rst", + wantVersion: "manual", + wantVersioned: true, + wantErr: false, + }, + { + name: "versioned project upcoming", + filePath: "../../testdata/compare/product/upcoming/source/includes/example.rst", + wantVersion: "upcoming", + wantVersioned: true, + wantErr: false, + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got, err := DetectProjectInfo(tt.filePath) + if (err != nil) != tt.wantErr { + t.Errorf("DetectProjectInfo() error = %v, wantErr %v", err, tt.wantErr) + return + } + if !tt.wantErr { + if got.Version != tt.wantVersion { + t.Errorf("DetectProjectInfo() Version = %v, want %v", got.Version, tt.wantVersion) + } + if got.IsVersioned != tt.wantVersioned { + t.Errorf("DetectProjectInfo() IsVersioned = %v, want %v", got.IsVersioned, tt.wantVersioned) + } + if got.SourceDir == "" { + t.Errorf("DetectProjectInfo() SourceDir is empty") + } + if got.ProductDir == "" { + t.Errorf("DetectProjectInfo() ProductDir is empty") + } + } + }) + } +} + +func TestResolveVersionPaths(t *testing.T) { + // Get absolute path to test data + testFile := "../../testdata/compare/product/v8.0/source/includes/example.rst" + absTestFile, _ := filepath.Abs(testFile) + + // Get product directory (parent of v8.0) + sourceDir := filepath.Dir(absTestFile) // .../includes + sourceDir = filepath.Dir(sourceDir) // .../source + versionDir := filepath.Dir(sourceDir) // .../v8.0 + productDir := filepath.Dir(versionDir) // .../product + + versions := []string{"v8.0", "manual", "upcoming"} + + got, err := ResolveVersionPaths(absTestFile, productDir, versions) + if err != nil { + t.Fatalf("ResolveVersionPaths() error = %v", err) + } + + if len(got) != 3 { + t.Errorf("ResolveVersionPaths() returned %d paths, want 3", len(got)) + } + + // Check that each version path is constructed correctly + for i, vp := range got { + if vp.Version != versions[i] { + t.Errorf("VersionPath[%d].Version = %v, want %v", i, vp.Version, versions[i]) + } + expectedPath := filepath.Join(productDir, versions[i], "source", "includes", "example.rst") + if vp.FilePath != expectedPath { + t.Errorf("VersionPath[%d].FilePath = %v, want %v", i, vp.FilePath, expectedPath) + } + } +} + +func TestExtractVersionFromPath(t *testing.T) { + testFile := "../../testdata/compare/product/v8.0/source/includes/example.rst" + absTestFile, _ := filepath.Abs(testFile) + + // Get product directory (parent of v8.0) + sourceDir := filepath.Dir(absTestFile) // .../includes + sourceDir = filepath.Dir(sourceDir) // .../source + versionDir := filepath.Dir(sourceDir) // .../v8.0 + productDir := filepath.Dir(versionDir) // .../product + + got, err := ExtractVersionFromPath(absTestFile, productDir) + if err != nil { + t.Fatalf("ExtractVersionFromPath() error = %v", err) + } + + want := "v8.0" + if got != want { + t.Errorf("ExtractVersionFromPath() = %v, want %v", got, want) + } +} + +func TestResolveRelativeToSource(t *testing.T) { + sourceDir := "/path/to/manual/v8.0/source" + + tests := []struct { + name string + relativePath string + want string + }{ + { + name: "path with leading slash", + relativePath: "/includes/file.rst", + want: "/path/to/manual/v8.0/source/includes/file.rst", + }, + { + name: "path without leading slash", + relativePath: "includes/file.rst", + want: "/path/to/manual/v8.0/source/includes/file.rst", + }, + } + + for _, tt := range tests { + t.Run(tt.name, func(t *testing.T) { + got, err := ResolveRelativeToSource(sourceDir, tt.relativePath) + if err != nil { + t.Errorf("ResolveRelativeToSource() error = %v", err) + return + } + if got != tt.want { + t.Errorf("ResolveRelativeToSource() = %v, want %v", got, tt.want) + } + }) + } +} + diff --git a/audit-cli/internal/pathresolver/source_finder.go b/audit-cli/internal/pathresolver/source_finder.go new file mode 100644 index 0000000..77395f6 --- /dev/null +++ b/audit-cli/internal/pathresolver/source_finder.go @@ -0,0 +1,55 @@ +package pathresolver + +import ( + "fmt" + "os" + "path/filepath" +) + +// FindSourceDirectory walks up the directory tree to find the "source" directory. +// +// MongoDB documentation is typically organized with a "source" directory at the root. +// This function walks up from the current file to find that directory, which is used +// as the base for resolving include paths. +// +// Parameters: +// - filePath: Path to a file within the documentation tree +// +// Returns: +// - string: Absolute path to the source directory +// - error: Error if source directory cannot be found +func FindSourceDirectory(filePath string) (string, error) { + // Get absolute path first + absPath, err := filepath.Abs(filePath) + if err != nil { + return "", fmt.Errorf("failed to get absolute path: %w", err) + } + + // Get the directory containing the file + dir := filepath.Dir(absPath) + + // Walk up the directory tree + for { + // Check if the current directory is named "source" + if filepath.Base(dir) == "source" { + return dir, nil + } + + // Check if there's a "source" subdirectory + sourceSubdir := filepath.Join(dir, "source") + if info, err := os.Stat(sourceSubdir); err == nil && info.IsDir() { + return sourceSubdir, nil + } + + // Move up one directory + parent := filepath.Dir(dir) + + // If we've reached the root, stop + if parent == dir { + return "", fmt.Errorf("could not find source directory for %s", filePath) + } + + dir = parent + } +} + diff --git a/audit-cli/internal/pathresolver/types.go b/audit-cli/internal/pathresolver/types.go new file mode 100644 index 0000000..f6b071c --- /dev/null +++ b/audit-cli/internal/pathresolver/types.go @@ -0,0 +1,33 @@ +package pathresolver + +// ProjectInfo contains information about a documentation project's structure. +// +// MongoDB documentation projects can be either versioned or non-versioned: +// - Versioned: {product}/{version}/source/... (e.g., manual/v8.0/source/...) +// - Non-versioned: {product}/source/... (e.g., atlas/source/...) +type ProjectInfo struct { + // SourceDir is the absolute path to the source directory + SourceDir string + + // ProductDir is the absolute path to the product directory + ProductDir string + + // Version is the version identifier (e.g., "v8.0", "manual", "upcoming") + // Empty string for non-versioned projects + Version string + + // IsVersioned indicates whether this is a versioned project + IsVersioned bool +} + +// VersionPath represents a resolved file path for a specific version. +// +// Used when resolving the same file across multiple versions of a product. +type VersionPath struct { + // Version is the version identifier (e.g., "v8.0", "manual", "upcoming") + Version string + + // FilePath is the absolute path to the file in this version + FilePath string +} + diff --git a/audit-cli/internal/pathresolver/version_resolver.go b/audit-cli/internal/pathresolver/version_resolver.go new file mode 100644 index 0000000..f7c6344 --- /dev/null +++ b/audit-cli/internal/pathresolver/version_resolver.go @@ -0,0 +1,196 @@ +package pathresolver + +import ( + "fmt" + "path/filepath" + "strings" +) + +// ResolveVersionPaths resolves file paths for all specified versions. +// +// Given a reference file path and a list of versions, this function constructs +// the corresponding file paths for each version by replacing the version segment +// in the path. +// +// Example: +// Input: /path/to/manual/manual/source/includes/file.rst +// Versions: [manual, upcoming, v8.1, v8.0] +// Output: +// - manual: /path/to/manual/manual/source/includes/file.rst +// - upcoming: /path/to/manual/upcoming/source/includes/file.rst +// - v8.1: /path/to/manual/v8.1/source/includes/file.rst +// - v8.0: /path/to/manual/v8.0/source/includes/file.rst +// +// Parameters: +// - referenceFile: The absolute path to the reference file +// - productDir: The absolute path to the product directory (e.g., /path/to/manual) +// - versions: List of version identifiers +// +// Returns: +// - []VersionPath: List of resolved version paths +// - error: Any error encountered during resolution +func ResolveVersionPaths(referenceFile string, productDir string, versions []string) ([]VersionPath, error) { + // Clean the paths + referenceFile = filepath.Clean(referenceFile) + productDir = filepath.Clean(productDir) + + // Ensure productDir ends with a separator for proper prefix matching + if !strings.HasSuffix(productDir, string(filepath.Separator)) { + productDir += string(filepath.Separator) + } + + // Check if referenceFile is under productDir + if !strings.HasPrefix(referenceFile, productDir) { + return nil, fmt.Errorf("reference file %s is not under product directory %s", referenceFile, productDir) + } + + // Extract the relative path from productDir + relativePath := strings.TrimPrefix(referenceFile, productDir) + + // Find the version segment and the path after it + // Expected format: {version}/source/{rest-of-path} + parts := strings.Split(relativePath, string(filepath.Separator)) + if len(parts) < 2 { + return nil, fmt.Errorf("invalid file path structure: expected {version}/source/... format, got %s", relativePath) + } + + // Find the "source" directory + sourceIndex := -1 + for i, part := range parts { + if part == "source" { + sourceIndex = i + break + } + } + + if sourceIndex == -1 { + return nil, fmt.Errorf("could not find 'source' directory in path: %s", relativePath) + } + + if sourceIndex == 0 { + return nil, fmt.Errorf("invalid path structure: 'source' cannot be the first segment in %s", relativePath) + } + + // The version is the segment before "source" + // Everything from "source" onwards is the path we want to preserve + pathFromSource := strings.Join(parts[sourceIndex:], string(filepath.Separator)) + + // Build version paths + var versionPaths []VersionPath + for _, version := range versions { + versionPath := filepath.Join(productDir, version, pathFromSource) + versionPaths = append(versionPaths, VersionPath{ + Version: version, + FilePath: versionPath, + }) + } + + return versionPaths, nil +} + +// ExtractVersionFromPath extracts the version identifier from a file path. +// +// Given a file path within a versioned project, this function extracts the +// version segment (the directory before "source"). +// +// Example: +// Input: /path/to/manual/v8.0/source/includes/file.rst +// Output: "v8.0" +// +// Parameters: +// - filePath: The absolute path to a file +// - productDir: The absolute path to the product directory +// +// Returns: +// - string: The version identifier +// - error: Any error encountered during extraction +func ExtractVersionFromPath(filePath string, productDir string) (string, error) { + // Clean the paths + filePath = filepath.Clean(filePath) + productDir = filepath.Clean(productDir) + + // Ensure productDir ends with a separator for proper prefix matching + if !strings.HasSuffix(productDir, string(filepath.Separator)) { + productDir += string(filepath.Separator) + } + + // Check if filePath is under productDir + if !strings.HasPrefix(filePath, productDir) { + return "", fmt.Errorf("file path %s is not under product directory %s", filePath, productDir) + } + + // Extract the relative path from productDir + relativePath := strings.TrimPrefix(filePath, productDir) + + // Split into parts + parts := strings.Split(relativePath, string(filepath.Separator)) + if len(parts) < 2 { + return "", fmt.Errorf("invalid file path structure: expected {version}/source/... format, got %s", relativePath) + } + + // Find the "source" directory + sourceIndex := -1 + for i, part := range parts { + if part == "source" { + sourceIndex = i + break + } + } + + if sourceIndex == -1 { + return "", fmt.Errorf("could not find 'source' directory in path: %s", relativePath) + } + + if sourceIndex == 0 { + return "", fmt.Errorf("invalid path structure: 'source' cannot be the first segment in %s", relativePath) + } + + // The version is the segment before "source" + version := parts[sourceIndex-1] + + return version, nil +} + +// IsVersionedProject determines if a path is part of a versioned project. +// +// A versioned project has the structure: {product}/{version}/source/... +// A non-versioned project has the structure: {product}/source/... +// +// This function checks if there's a directory between the product root and "source". +// +// Parameters: +// - sourceDir: The absolute path to the source directory +// +// Returns: +// - bool: True if this is a versioned project +// - error: Any error encountered during detection +func IsVersionedProject(sourceDir string) (bool, error) { + // Get the parent directory of source + parent := filepath.Dir(sourceDir) + + // Check if the parent directory name looks like a version + // Common patterns: v8.0, v7.0, manual, upcoming, master, current + parentName := filepath.Base(parent) + + // If parent is named after common version patterns, it's versioned + // This is a heuristic - we check if there's a directory between product root and source + grandparent := filepath.Dir(parent) + + // If grandparent is the root or doesn't exist, it's likely non-versioned + if grandparent == parent || grandparent == "/" || grandparent == "." { + return false, nil + } + + // Check if there's another "source" directory at the grandparent level + // If not, then parent is likely a version directory + grandparentSource := filepath.Join(grandparent, "source") + if grandparentSource == sourceDir { + // This means parent is not a version directory + return false, nil + } + + // If we have: grandparent/parent/source, then parent is likely a version + // and this is a versioned project + return parentName != "", nil +} + diff --git a/audit-cli/internal/rst/directive_parser.go b/audit-cli/internal/rst/directive_parser.go index 6539c2c..618257c 100644 --- a/audit-cli/internal/rst/directive_parser.go +++ b/audit-cli/internal/rst/directive_parser.go @@ -57,20 +57,26 @@ type SubDirective struct { } // Regular expressions for directive parsing +// +// Note: literalIncludeRegex is imported from directive_regex.go (LiteralIncludeDirectiveRegex) +// The other regexes here are specific to the parser and have different matching requirements. var ( - // Matches: .. literalinclude:: /path/to/file.php - literalIncludeRegex = regexp.MustCompile(`^\.\.\s+literalinclude::\s+(.+)$`) + // Alias for the shared literalinclude regex + literalIncludeRegex = LiteralIncludeDirectiveRegex // Matches: .. code-block:: python (language is optional) codeBlockRegex = regexp.MustCompile(`^\.\.\s+code-block::\s*(.*)$`) - // Matches: .. io-code-block:: + // Matches: .. io-code-block:: (strict - must end after directive) + // This is different from IOCodeBlockDirectiveRegex which is more permissive ioCodeBlockRegex = regexp.MustCompile(`^\.\.\s+io-code-block::\s*$`) // Matches: .. input:: /path/to/file.cs (filepath is optional) + // This is different from InputDirectiveRegex which requires an argument inputDirectiveRegex = regexp.MustCompile(`^\.\.\s+input::\s*(.*)$`) // Matches: .. output:: /path/to/file.txt (filepath is optional) + // This is different from OutputDirectiveRegex which requires an argument outputDirectiveRegex = regexp.MustCompile(`^\.\.\s+output::\s*(.*)$`) // Matches directive options like: :language: python diff --git a/audit-cli/internal/rst/directive_regex.go b/audit-cli/internal/rst/directive_regex.go new file mode 100644 index 0000000..9fc7ac0 --- /dev/null +++ b/audit-cli/internal/rst/directive_regex.go @@ -0,0 +1,33 @@ +package rst + +import "regexp" + +// RST Directive Regular Expressions +// +// This file contains all regular expressions for matching RST directives. +// These patterns are shared across the codebase to ensure consistency. + +// IncludeDirectiveRegex matches .. include:: directives in RST files. +// Example: .. include:: /path/to/file.rst +var IncludeDirectiveRegex = regexp.MustCompile(`^\.\.\s+include::\s+(.+)$`) + +// LiteralIncludeDirectiveRegex matches .. literalinclude:: directives in RST files. +// Example: .. literalinclude:: /path/to/file.py +var LiteralIncludeDirectiveRegex = regexp.MustCompile(`^\.\.\s+literalinclude::\s+(.+)$`) + +// IOCodeBlockDirectiveRegex matches .. io-code-block:: directives in RST files. +// Example: .. io-code-block:: +var IOCodeBlockDirectiveRegex = regexp.MustCompile(`^\.\.\s+io-code-block::`) + +// InputDirectiveRegex matches .. input:: directives within io-code-block in RST files. +// Example: .. input:: /path/to/file.js +var InputDirectiveRegex = regexp.MustCompile(`^\.\.\s+input::\s+(.+)$`) + +// OutputDirectiveRegex matches .. output:: directives within io-code-block in RST files. +// Example: .. output:: /path/to/file.json +var OutputDirectiveRegex = regexp.MustCompile(`^\.\.\s+output::\s+(.+)$`) + +// ToctreeDirectiveRegex matches .. toctree:: directives in RST files. +// Example: .. toctree:: +var ToctreeDirectiveRegex = regexp.MustCompile(`^\.\.\s+toctree::`) + diff --git a/audit-cli/internal/rst/include_resolver.go b/audit-cli/internal/rst/include_resolver.go index af437ec..d57243e 100644 --- a/audit-cli/internal/rst/include_resolver.go +++ b/audit-cli/internal/rst/include_resolver.go @@ -5,12 +5,10 @@ import ( "fmt" "os" "path/filepath" - "regexp" "strings" -) -// IncludeDirectiveRegex matches .. include:: directives in RST files. -var IncludeDirectiveRegex = regexp.MustCompile(`^\.\.\s+include::\s+(.+)$`) + "github.com/mongodb/code-example-tooling/audit-cli/internal/pathresolver" +) // FindIncludeDirectives finds all include directives in a file and resolves their paths. // @@ -59,6 +57,120 @@ func FindIncludeDirectives(filePath string) ([]string, error) { return includePaths, nil } +// FindToctreeEntries finds all toctree entries in a file and resolves their paths. +// +// This function scans the file for .. toctree:: directives and extracts the document +// names listed in the toctree content. Document names are converted to file paths +// by trying common extensions (.rst, .txt). +// +// Parameters: +// - filePath: Path to the RST file to scan +// +// Returns: +// - []string: List of resolved absolute paths to toctree documents +// - error: Any error encountered during scanning +func FindToctreeEntries(filePath string) ([]string, error) { + file, err := os.Open(filePath) + if err != nil { + return nil, err + } + defer file.Close() + + var toctreePaths []string + scanner := bufio.NewScanner(file) + inToctree := false + + for scanner.Scan() { + line := scanner.Text() + trimmedLine := strings.TrimSpace(line) + + // Check if this line starts a toctree directive + if ToctreeDirectiveRegex.MatchString(trimmedLine) { + inToctree = true + continue + } + + // Check if we're exiting toctree (unindented line that's not empty) + if inToctree && len(line) > 0 && line[0] != ' ' && line[0] != '\t' { + inToctree = false + } + + // If we're in a toctree, process document names + if inToctree { + // Skip empty lines and option lines (starting with :) + if trimmedLine == "" || strings.HasPrefix(trimmedLine, ":") { + continue + } + + // This is a document name in the toctree + docName := trimmedLine + + // Resolve the document name to a file path + resolvedPath, err := ResolveToctreePath(filePath, docName) + if err != nil { + fmt.Fprintf(os.Stderr, "Warning: failed to resolve toctree entry %s: %v\n", docName, err) + continue + } + + toctreePaths = append(toctreePaths, resolvedPath) + } + } + + if err := scanner.Err(); err != nil { + return nil, err + } + + return toctreePaths, nil +} + +// ResolveToctreePath resolves a toctree document name to an absolute file path. +// +// Toctree entries are document names without extensions. This function tries to +// find the actual file by testing common extensions (.rst, .txt). +// +// Parameters: +// - currentFilePath: Path to the file containing the toctree +// - docName: Document name from the toctree (e.g., "intro" or "/includes/intro") +// +// Returns: +// - string: Resolved absolute path to the document file +// - error: Error if the document cannot be found +func ResolveToctreePath(currentFilePath, docName string) (string, error) { + // Find the source directory + sourceDir, err := pathresolver.FindSourceDirectory(currentFilePath) + if err != nil { + return "", err + } + + var basePath string + if strings.HasPrefix(docName, "/") { + // Absolute document name (relative to source directory) + basePath = filepath.Join(sourceDir, docName) + } else { + // Relative document name (relative to current file's directory) + currentDir := filepath.Dir(currentFilePath) + basePath = filepath.Join(currentDir, docName) + } + + // Clean the path + basePath = filepath.Clean(basePath) + + // Try common extensions + extensions := []string{".rst", ".txt", ""} + for _, ext := range extensions { + testPath := basePath + ext + if _, err := os.Stat(testPath); err == nil { + absPath, err := filepath.Abs(testPath) + if err != nil { + return "", err + } + return absPath, nil + } + } + + return "", fmt.Errorf("toctree document not found: %s (tried .rst, .txt, and no extension)", docName) +} + // ResolveIncludePath resolves an include path relative to the source directory // Handles multiple special cases: // - Template variables ({{var_name}}) @@ -84,7 +196,7 @@ func ResolveIncludePath(currentFilePath, includePath string) (string, error) { } // Find the source directory by walking up from the current file - sourceDir, err := FindSourceDirectory(currentFilePath) + sourceDir, err := pathresolver.FindSourceDirectory(currentFilePath) if err != nil { return "", err } @@ -317,44 +429,5 @@ func ResolveTemplateVariable(yamlFilePath, varName string) (string, error) { return "", fmt.Errorf("template variable %s not found in replacement section of %s", varName, yamlFilePath) } -// FindSourceDirectory walks up the directory tree to find the "source" directory. -// -// MongoDB documentation is typically organized with a "source" directory at the root. -// This function walks up from the current file to find that directory, which is used -// as the base for resolving include paths. -// -// Parameters: -// - filePath: Path to a file within the documentation tree -// -// Returns: -// - string: Absolute path to the source directory -// - error: Error if source directory cannot be found -func FindSourceDirectory(filePath string) (string, error) { - // Get the directory containing the file - dir := filepath.Dir(filePath) - - // Walk up the directory tree - for { - // Check if the current directory is named "source" - if filepath.Base(dir) == "source" { - return dir, nil - } - - // Check if there's a "source" subdirectory - sourceSubdir := filepath.Join(dir, "source") - if info, err := os.Stat(sourceSubdir); err == nil && info.IsDir() { - return sourceSubdir, nil - } - - // Move up one directory - parent := filepath.Dir(dir) - // If we've reached the root, stop - if parent == dir { - return "", fmt.Errorf("could not find source directory for %s", filePath) - } - - dir = parent - } -} diff --git a/audit-cli/output/literalinclude-test.literalinclude.1.py b/audit-cli/output/literalinclude-test.literalinclude.1.py new file mode 100644 index 0000000..ac2eb81 --- /dev/null +++ b/audit-cli/output/literalinclude-test.literalinclude.1.py @@ -0,0 +1,4 @@ +def hello_world(): + """Print hello world message.""" + print("Hello, World!") + return True \ No newline at end of file diff --git a/audit-cli/output/literalinclude-test.literalinclude.2.go.output b/audit-cli/output/literalinclude-test.literalinclude.2.go.output new file mode 100644 index 0000000..bc4d6fa --- /dev/null +++ b/audit-cli/output/literalinclude-test.literalinclude.2.go.output @@ -0,0 +1,7 @@ +package main + +import "fmt" + +func main() { + fmt.Println("Hello from Go!") +} \ No newline at end of file diff --git a/audit-cli/output/literalinclude-test.literalinclude.3.js b/audit-cli/output/literalinclude-test.literalinclude.3.js new file mode 100644 index 0000000..7c75f16 --- /dev/null +++ b/audit-cli/output/literalinclude-test.literalinclude.3.js @@ -0,0 +1,5 @@ +function greet(name) { + return `Hello, ${name}!`; +} + +console.log(greet("World")); \ No newline at end of file diff --git a/audit-cli/output/literalinclude-test.literalinclude.4.php b/audit-cli/output/literalinclude-test.literalinclude.4.php new file mode 100644 index 0000000..5e921e5 --- /dev/null +++ b/audit-cli/output/literalinclude-test.literalinclude.4.php @@ -0,0 +1,6 @@ + 'localhost', + 'port' => 27017 +]; \ No newline at end of file diff --git a/audit-cli/output/literalinclude-test.literalinclude.5.rb b/audit-cli/output/literalinclude-test.literalinclude.5.rb new file mode 100644 index 0000000..6201a21 --- /dev/null +++ b/audit-cli/output/literalinclude-test.literalinclude.5.rb @@ -0,0 +1,10 @@ +# Ruby example +class Greeter + def initialize(name) + @name = name + end + + def greet + puts "Hello, #{@name}!" + end +end \ No newline at end of file diff --git a/audit-cli/output/literalinclude-test.literalinclude.6.ts b/audit-cli/output/literalinclude-test.literalinclude.6.ts new file mode 100644 index 0000000..721cc1e --- /dev/null +++ b/audit-cli/output/literalinclude-test.literalinclude.6.ts @@ -0,0 +1,9 @@ +// TypeScript example +interface User { + name: string; + age: number; +} + +function greetUser(user: User): string { + return `Hello, ${user.name}!`; +} \ No newline at end of file diff --git a/audit-cli/output/literalinclude-test.literalinclude.7.cpp b/audit-cli/output/literalinclude-test.literalinclude.7.cpp new file mode 100644 index 0000000..28276a3 --- /dev/null +++ b/audit-cli/output/literalinclude-test.literalinclude.7.cpp @@ -0,0 +1,8 @@ +#include +#include + +int main() { + std::string message = "Hello from C++!"; + std::cout << message << std::endl; + return 0; +} \ No newline at end of file diff --git a/audit-cli/testdata/input-files/source/includes/extracts-test.yaml b/audit-cli/testdata/input-files/source/includes/extracts-test.yaml new file mode 100644 index 0000000..90cf079 --- /dev/null +++ b/audit-cli/testdata/input-files/source/includes/extracts-test.yaml @@ -0,0 +1,17 @@ +--- +ref: test-extract-intro +content: | + This is a test extract that includes another file. + + .. include:: /includes/intro.rst + + And references a code example: + + .. literalinclude:: /code-examples/example.py + :language: python +--- +ref: test-extract-examples +content: | + This extract references the examples file. + + .. include:: /includes/examples.rst diff --git a/audit-cli/testdata/input-files/source/index.rst b/audit-cli/testdata/input-files/source/index.rst new file mode 100644 index 0000000..4f44bc9 --- /dev/null +++ b/audit-cli/testdata/input-files/source/index.rst @@ -0,0 +1,13 @@ +==================== +Documentation Index +==================== + +Welcome to the documentation! + +.. toctree:: + :maxdepth: 2 + + include-test + literalinclude-test + includes/intro +