This guide provides step-by-step instructions for refactoring large validation files into smaller, focused validators. It uses the bundler_validation.go split as a reference implementation.
- Target size: 100-200 lines per validator
- Hard limit: 300 lines (refactor if exceeded)
- Single responsibility: Each file should validate one domain
Refactor a validation file when:
- File exceeds 300 lines
- File contains 2+ unrelated validation domains
- Complex cross-dependencies require separate testing
- Error messages span multiple concern areas
- Adding new validation would push file over 300 lines
┌─────────────────────────────────────┐
│ Validation File > 300 lines? │
└──────────────┬──────────────────────┘
│
▼
┌──────────────┐
│ Does it mix │
│ 2+ distinct │ YES
│ domains? ├──────────► Should split
└──────┬───────┘
│ NO
▼
┌──────────────┐
│ Is it hard │
│ to maintain │ YES
│ or test? ├──────────► Should split
└──────┬───────┘
│ NO
▼
Keep as-is
- Identify function groups: List all functions and group by domain
- Count lines per group: Calculate approximate lines for each domain
- Map dependencies: Identify which functions call each other
- Review tests: Understand test coverage and organization
Example from bundler_validation.go:
# List functions with line numbers
awk '/^func / {print NR": "$0}' bundler_validation.go
# Output:
# 65: func validateNoLocalRequires(bundledContent string) error {
# 99: func validateNoModuleReferences(bundledContent string) error {
# 145: func ValidateEmbeddedResourceRequires(sources map[string]string) error {
# 221: func validateNoExecSync(scriptName string, content string, mode RuntimeMode) error {
# 263: func validateNoGitHubScriptGlobals(scriptName string, content string, mode RuntimeMode) error {
# 320: func validateNoRuntimeMixing(mainScript string, sources map[string]string, targetMode RuntimeMode) error {
# 331: func validateRuntimeModeRecursive(content string, currentPath string, sources map[string]string, targetMode RuntimeMode, checked map[string]bool) error {
# 399: func detectRuntimeMode(content string) RuntimeMode {
# 439: func normalizePath(path string) string {Organize functions into logical domains based on:
- What they validate (safety, content, runtime)
- When they run (compile-time, registration, bundling)
- Their error semantics (hard errors vs warnings)
Example grouping:
| Domain | Functions | Line Range | Purpose |
|---|---|---|---|
| Safety | validateNoLocalRequiresvalidateNoModuleReferencesValidateEmbeddedResourceRequiresnormalizePath |
65-216 | Bundle safety checks |
| Script Content | validateNoExecSyncvalidateNoGitHubScriptGlobals |
221-315 | Script API usage |
| Runtime Mode | validateNoRuntimeMixingvalidateRuntimeModeRecursivedetectRuntimeMode |
320-436 | Runtime compatibility |
For each domain, create a new file following naming convention:
Naming convention: {domain}_{subdomain}_validation.go
File structure:
// Package workflow provides {domain} validation for agentic workflows.
//
// # {Domain} Validation
//
// This file validates {what it validates} to ensure {goal}.
//
// # Validation Functions
//
// - Function1() - Description
// - Function2() - Description
//
// # When to Add Validation Here
//
// Add validation to this file when:
// - Criteria 1
// - Criteria 2
//
// For related validation, see {related_files}.
// For general validation, see validation.go.
// For detailed documentation, see scratchpad/validation-architecture.md
package workflow
import (
"fmt"
"github.com/github/gh-aw/pkg/logger"
)
var {domain}Log = logger.New("workflow:{domain}_validation")
// Validation functions here- Copy functions to new file
- Preserve signatures: Don't change function names or parameters
- Update logger: Create domain-specific logger instance
- Move helpers: Include helper functions used only by this domain
- Shared helpers: Keep shared utilities in original file or create common file
Example moves:
// bundler_safety_validation.go
var bundlerSafetyLog = logger.New("workflow:bundler_safety_validation")
func validateNoLocalRequires(bundledContent string) error {
bundlerSafetyLog.Printf("Validating bundled JavaScript: %d bytes", len(bundledContent))
// ... rest of function unchanged
}
// bundler_script_validation.go
var bundlerScriptLog = logger.New("workflow:bundler_script_validation")
func validateNoExecSync(scriptName string, content string, mode RuntimeMode) error {
bundlerScriptLog.Printf("Validating no execSync in GitHub Script: %s", scriptName)
// ... rest of function unchanged
}Split test files to match new structure:
- Create test files: One per new validation file
- Move test functions: Group by domain
- Preserve test logic: Don't change test behavior
- Update imports: Ensure tests can access validation functions
Example test split:
// bundler_script_validation_test.go
package workflow
import (
"testing"
"github.com/stretchr/testify/assert"
)
func TestValidateNoExecSync_GitHubScriptMode(t *testing.T) {
// Tests for validateNoExecSync
}
func TestValidateNoGitHubScriptGlobals_NodeJSMode(t *testing.T) {
// Tests for validateNoGitHubScriptGlobals
}Update these files to reference new validators:
- validation.go: Add new files to package documentation
- validation-architecture.md: Add new validators to architecture section
- AGENTS.md: Update complexity guidelines (if adding new patterns)
Example documentation update:
// pkg/workflow/validation.go
// - bundler_safety_validation.go: JavaScript bundle safety (require/module checks)
// - bundler_script_validation.go: JavaScript script content (execSync, GitHub globals)
// - bundler_runtime_validation.go: JavaScript runtime mode compatibility- Build:
make build - Run tests:
go test ./pkg/workflow - Check imports: Verify all callers still work
- Test manually: Run CLI commands that use validation
# Build and test
make build
go test -v ./pkg/workflow
# Verify specific tests
go test -v ./pkg/workflow -run "TestValidateNo"Problem: Multiple domains need the same helper function
Solutions:
- Keep in one domain: If primarily used by one domain, keep it there
- Create common file: If used equally, create
{domain}_helpers.go - Move to parent: If used across packages, move to
pkg/utils/
Example: normalizePath() stayed in bundler_safety_validation.go because it's only used by ValidateEmbeddedResourceRequires() and validateRuntimeModeRecursive(), and the latter needs path normalization for safety checks.
Problem: Regexes compiled at package init need to move
Solution: Keep regex compilation in new file where it's used
// Before (bundler_validation.go)
var (
moduleExportsRegex = regexp.MustCompile(`\bmodule\.exports\b`)
exportsRegex = regexp.MustCompile(`\bexports\.\w+`)
)
// After (bundler_safety_validation.go)
var (
moduleExportsRegex = regexp.MustCompile(`\bmodule\.exports\b`)
exportsRegex = regexp.MustCompile(`\bexports\.\w+`)
)Problem: Validation spans multiple domains
Solution: Keep orchestrator in one domain, call helpers from others
// bundler.go (orchestrator)
func BundleJavaScriptWithMode(...) (string, error) {
// Runtime validation (bundler_runtime_validation.go)
if err := validateNoRuntimeMixing(mainContent, sources, mode); err != nil {
return "", err
}
// Bundle code...
// Safety validation (bundler_safety_validation.go)
if err := validateNoLocalRequires(bundled); err != nil {
return "", err
}
if err := validateNoModuleReferences(bundled); err != nil {
return "", err
}
return bundled, nil
}- 100-200 lines: Ideal size for focused validator
- 200-300 lines: Acceptable but consider splitting
- 300+ lines: Should be refactored
- Minimum 30% comment coverage: At least 30% of lines should be comments/docs
- File header: Describe domain, functions, and when to add validation
- Function comments: Explain what, when, and error conditions
- Files:
{domain}_{subdomain}_validation.go - Loggers:
{domain}Log = logger.New("workflow:{domain}_validation") - Functions:
validate{WhatIsValidated}() - Tests:
Test{FunctionName}_{Scenario}
Single file with 3 domains:
- Bundle safety (152 lines)
- Script content (95 lines)
- Runtime mode (141 lines)
-
bundler_safety_validation.go (230 lines)
validateNoLocalRequires()validateNoModuleReferences()ValidateEmbeddedResourceRequires()normalizePath()
-
bundler_script_validation.go (160 lines)
validateNoExecSync()validateNoGitHubScriptGlobals()
-
bundler_runtime_validation.go (190 lines)
validateNoRuntimeMixing()validateRuntimeModeRecursive()detectRuntimeMode()
- Each file under 250 lines (target: 100-200)
- Clear single responsibility per file
- Easier to test each domain independently
- Simpler to add new validation (clear where it belongs)
- Better documentation organization
When refactoring a large validation file:
- Analyze current structure (functions, line counts, dependencies)
- Group functions by domain
- Create new files with proper headers
- Move functions preserving signatures
- Update logger instances
- Split test files to match new structure
- Update validation.go documentation
- Update validation-architecture.md
- Build project successfully
- Run all tests successfully
- Verify no functional changes
- Implementation:
pkg/workflow/bundler_*_validation.go - Tests:
pkg/workflow/bundler_*_validation_test.go - Architecture:
scratchpad/validation-architecture.md - Guidelines:
AGENTS.md(Validation Architecture section)
Last Updated: 2025-01-02 Reference Issue: #8635