📊 Lockfile Statistics Report - November 11, 2025 #3599

2025-11-11T03:34:51Z

github-actions[bot]
bot Nov 11, 2025

📊 Agentic Workflow Lock File Statistics - November 11, 2025

This comprehensive analysis examines all .lock.yml files in the gh-aw repository to understand the structure, patterns, and characteristics of agentic workflows. The analysis reveals interesting insights about how workflows are configured, what triggers they use, and their structural complexity.

Executive Summary

Key Findings:

77 primary workflow lock files analyzed (150 total including duplicates in subdirectories)
Average file size: 219 KB with significant variation from 23 KB to 400 KB
Most workflows use multiple triggers - workflow_dispatch (manual) is most common (59 workflows)
Scheduled workflows are prominent - 37 workflows use cron schedules
Workflows are structurally complex - Average 57 steps and 662 jobs per workflow
Permissions are consistently scoped - Most workflows request read permissions for contents, issues, and PRs
Limited safe output adoption - Only 5 workflows use safe output features (add-comment or create-issue)

Full Report Details

File Size Analysis

Distribution Statistics

Metric	Value
Total Lock Files	77 primary files
Total Size	16.9 MB
Average Size	219,102 bytes (~214 KB)
Median Size	227,412 bytes (~222 KB)
Minimum	23,303 bytes (~23 KB)
Maximum	399,782 bytes (~390 KB)

Size Distribution by Range

Size Range	Count	Percentage
< 10 KB	0	0%
10-50 KB	1	1.3%
50-100 KB	11	14.3%
> 100 KB	65	84.4%

Key Observation: The vast majority (84%) of workflow files are over 100 KB, indicating substantial complexity with extensive configuration and embedded instructions.

Extremes:

Smallest: opencode.lock.yml (23 KB) - located in shared directory
Largest: poem-bot.lock.yml (390 KB) - contains 101 steps and 1,177 jobs

Trigger Analysis

Workflows use various trigger mechanisms to respond to GitHub events. Many workflows combine multiple triggers for flexible activation.

Most Popular Triggers

Trigger Type	Count	Percentage	Description
`workflow_dispatch`	59	76.6%	Manual workflow execution
`schedule`	37	48.1%	Cron-based scheduled runs
`issue_comment`	11	14.3%	Triggered by comments on issues
`issues`	9	11.7%	Triggered by issue events
`pull_request`	8	10.4%	Triggered by PR events
`pull_request_review_comment`	4	5.2%	Triggered by PR review comments
`discussion_comment`	4	5.2%	Triggered by discussion comments
`discussion`	3	3.9%	Triggered by discussion events
`workflow_run`	2	2.6%	Triggered by other workflow completions
`push`	2	2.6%	Triggered by repository pushes
`workflow_call`	1	1.3%	Reusable workflow (called by others)

Total Triggers: 140 trigger configurations across 77 workflows (average 1.8 triggers per workflow)

Common Trigger Combinations

Based on the frequency data, common patterns include:

Manual + Scheduled - Most flexible pattern allowing both automated daily runs and on-demand execution
Event-driven + Manual - Workflows that respond to GitHub events but can also be triggered manually
Comment-based + Issue/PR events - Interactive workflows that respond to user input

Schedule Patterns

Most Common Cron Schedules:

Schedule	Count	Meaning
`0 9 * * *`	3	Daily at 9 AM UTC
`0 0,6,12,18 * * *`	3	Four times daily (midnight, 6 AM, noon, 6 PM UTC)
`0 6 * * 0`	2	Sundays at 6 AM UTC
`0 2 * * 1-5`	2	Weekdays at 2 AM UTC
`0 15 * * 1`	2	Mondays at 3 PM UTC
`0 0 * * *`	2	Daily at midnight UTC

Insight: Workflows predominantly run during morning hours (UTC) or on specific days (Mondays, Sundays), suggesting batch processing patterns for reports, summaries, and maintenance tasks.

Safe Outputs Analysis

Safe outputs are mechanisms for workflows to create persistent artifacts like issues, discussions, or comments.

Safe Output Types Distribution

Type	Count	Usage
`add-comment`	4	Adds comments to existing issues/PRs
`create-issue`	1	Creates new GitHub issues

Total Workflows Using Safe Outputs: 5 out of 77 (6.5%)

Key Finding: Safe output adoption is very limited. Only 6.5% of workflows use these features, suggesting either:

Most workflows are read-only/analytical
Results are delivered through other mechanisms
Safe outputs are a newer feature still being adopted

Example Workflows Using Safe Outputs:

Comment-based workflows that interact with issues/PRs
Automated issue creation for detected problems

Discussion Categories

Analysis Result: No workflows currently use create-discussion safe outputs, despite this being a common pattern in the repository's history.

Structural Characteristics

This section examines the internal complexity of workflow files.

Job Complexity

Metric	Value
Total Workflows Analyzed	150 (includes variants)
Average Jobs per Workflow	662.07
Average Steps per Workflow	57.48
Maximum Jobs in Single Workflow	1,177 (`poem-bot.lock.yml`)
Maximum Steps in Single Workflow	101 (`poem-bot.lock.yml`)

Note: The extremely high job counts likely reflect the internal structure of compiled workflow files where agents and conditional logic expand into multiple job definitions.

Top 10 Most Complex Workflows (by step count)

Steps	Jobs	Workflow
101	1,177	`poem-bot.lock.yml`
84	895	`technical-doc-writer.lock.yml`
84	912	`unbloat-docs.lock.yml`
83	933	`q.lock.yml`
75	792	`mcp-inspector.lock.yml`
75	891	`tidy.lock.yml`

Insight: The most complex workflows are document-focused (technical writing, cleanup, inspection), suggesting these tasks require extensive conditional logic and multi-stage processing.

Typical Lock File Structure

Based on median values, a typical .lock.yml file has:

Size: ~220 KB
Jobs: ~662 (reflecting compiled expansion)
Steps per Job: Variable, average 57 across workflow
Triggers: 1-2 trigger types (usually manual + scheduled or event-based)
Timeout: 10-20 minutes
Runner: ubuntu-slim or ubuntu-latest

Permission Patterns

Permissions define what GitHub API operations workflows can perform.

Most Common Permissions

Permission	Count	Typical Access Level
`contents`	145	read
`pull-requests`	136	read
`issues`	130	read
`actions`	58	read
`discussions`	14	read
`security-events`	10	read/write
`repository-projects`	6	read
`attestations`	4	read
`checks`	4	read/write
`deployments`	4	read
`models`	4	read
`packages`	4	read
`pages`	4	read
`statuses`	4	read

Total Permission Grants: 527 across all workflows

Permission Distribution

Read-focused workflows: The majority (~95%) of workflows primarily request read permissions
Common Read Permissions: contents, pull-requests, issues form the standard trio
Write Permissions: Rare and targeted (security-events, checks)

Security Observation: Workflows follow the principle of least privilege, requesting primarily read access to repository resources. This is excellent security hygiene for analytical and reporting workflows.

Runner & Infrastructure Patterns

Runner Types

Runner Type	Count	Usage
`ubuntu-slim`	270	Lightweight runner for most jobs
`ubuntu-latest`	146	Standard Ubuntu runner

Total Runner Allocations: 416 across all job definitions

Insight: The repository strongly prefers ubuntu-slim (65% of allocations), suggesting optimization for faster startup times and resource efficiency.

Timeout Configurations

Timeout (minutes)	Count	Use Case
5	68	Quick tasks (health checks, simple analysis)
10	177	Most common - Standard workflow execution
15	15	Extended analysis tasks
20	78	Long-running agent workflows
30	7	Complex multi-step processes
45	4	Heavy computation or large-scale analysis
60	1	Maximum timeout for intensive tasks

Average Timeout: ~13 minutes
Most Common: 10 minutes (44% of configurations)

Insight: Timeout values cluster around 10-20 minutes, appropriate for AI agent workflows that need time for LLM API calls and multi-step reasoning.

MCP Server & Tool Patterns

MCP Server Usage

Analysis Result: No explicit MCP server declarations found in the lockfiles using standard patterns. This suggests:

MCP servers are configured at runtime or through other mechanisms
MCP integration may use dynamic discovery
Analysis pattern needs refinement to detect MCP usage

Tool Allowlists

Common tools available to workflows (based on gh-aw platform capabilities):

Bash commands - Available in most workflows for file operations
GitHub API tools - Standard across workflows for repository interactions
Web tools (fetch/search) - Used in research-oriented workflows
Safe output tools - Limited adoption (see Safe Outputs section)

Interesting Findings

High Trigger Flexibility: 76.6% of workflows support manual dispatch, enabling on-demand agent execution for debugging and testing.
Morning-Biased Scheduling: Scheduled workflows predominantly run during morning hours (UTC), suggesting alignment with working hours or daily reporting cycles.
Document-Focused Complexity: The most complex workflows (by step count) focus on documentation tasks - writing, cleanup, and inspection. This suggests documentation workflows require more conditional logic and multi-stage processing than other workflow types.
Structural Bloat: Lock files are surprisingly large (average 214 KB) compared to typical GitHub Actions workflows. This reflects the embedded agent instructions, extensive conditional logic, and compiled nature of .lock.yml files.
Low Safe Output Adoption: Only 6.5% of workflows use safe output features. This is unexpected given the repository's focus on agentic workflows and suggests an opportunity for increased adoption.
Job Count Explosion: Average 662 jobs per workflow seems anomalous - this likely reflects how the workflow compiler expands agent logic into multiple conditional job paths rather than actual parallel job execution.
Lightweight Infrastructure Preference: 65% of runner allocations use ubuntu-slim, indicating a focus on efficiency and fast startup times rather than requiring heavyweight build environments.
Security-Conscious Permission Model: Workflows overwhelmingly use read-only permissions, with write access granted sparingly and specifically. This demonstrates mature security practices.
No Discussion Output Usage: Despite discussions being a rich communication medium, no current workflows use create-discussion as a safe output, suggesting this pattern may have been deprecated or replaced.
Timeout Clustering: Sharp clustering around 10 and 20-minute timeouts suggests these are platform defaults or recommended values for typical agentic workflows.

Historical Trends

Current Analysis Date: November 11, 2025

Note: This is a snapshot analysis. Future runs will compare:

Lock file count growth over time
Average size trends
New trigger patterns
Safe output adoption rates
Permission pattern evolution

Previous Data Available: Cache memory contains analysis artifacts from October 28-29 and November 4, 2025, enabling trend analysis in future reports.

Recommendations

Based on this comprehensive analysis, we recommend:

1. Investigate Safe Output Underutilization

Only 6.5% of workflows use safe outputs. Consider:

Documentation and examples for safe output patterns
Migration guides for workflows that manually create issues/comments
Identifying workflows that would benefit from safe outputs

2. Standardize Timeout Values

With clear clustering at 10 and 20 minutes, consider:

Documenting recommended timeout values for different workflow types
Creating timeout presets (quick: 10min, standard: 20min, extended: 45min)

3. Optimize Lock File Sizes

Average 214 KB per lock file suggests potential optimization:

Review instruction embedding strategies
Consider external instruction references to reduce file size
Analyze if all embedded content is necessary

4. Document Job Count Patterns

The average 662 jobs per workflow needs clarification:

Document how agent logic compiles to job definitions
Explain conditional job expansion in documentation
Verify this is expected behavior vs. potential optimization opportunity

5. Expand Trigger Diversity

Consider additional trigger types:

pull_request_target for safe fork handling
repository_dispatch for external integrations
webhook events for third-party service integration

6. Monitor Permission Creep

Current security posture is excellent:

Continue auditing permission requests
Resist granting broad write permissions
Document when write access is truly necessary

Methodology

Analysis Approach

Data Collection:

Bash scripts with YAML parsing for text pattern extraction
Python scripts for structural analysis and statistics
Manual inspection of sample files for validation

Lock Files Analyzed: 77 primary files in .github/workflows/ (150 total including subdirectories)

Cache Memory: Used /tmp/gh-aw/cache-memory/ for:

Script persistence (/scripts/)
Data extraction results (/data/)
Historical tracking (/history/)

Data Sources:

All .lock.yml files in .github/workflows/
Recursive search including shared/ subdirectory

Analysis Scripts

Stored in /tmp/gh-aw/cache-memory/scripts/:

analyze_lockfiles.sh - Primary extraction script
extract_detailed_stats.sh - Detailed pattern extraction
analyze_yaml.py - Python-based YAML parsing

Data Files

Stored in /tmp/gh-aw/cache-memory/data/:

file_sizes.txt - File size measurements
triggers_detailed.txt - Trigger extraction by file
safe_outputs_detailed.txt - Safe output patterns
permissions_python.txt - Permission analysis
jobs_steps_python.txt - Complexity metrics

Limitations

MCP Server Detection: Current patterns did not successfully extract MCP server configurations; more refined parsing needed
Discussion Categories: No current usage detected; pattern may need updating
Duplicate Files: Analysis includes some duplicate entries from subdirectories (150 vs. 77 unique files)
Job Count Interpretation: High job counts need validation with workflow compiler documentation

Generated by Lockfile Statistics Analysis Agent
Analysis Date: November 11, 2025
Repository: githubnext/gh-aw
Cache Location: /tmp/gh-aw/cache-memory/
Analysis Scripts: Available in cache for future reuse

AI generated by Lockfile Statistics Analysis Agent

2025-11-28T23:03:34Z

github-actions[bot]
bot Nov 28, 2025
Author

This discussion was automatically closed because it was created by an agentic workflow more than 1 week ago.

0 replies