[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1934

2026-04-12T12:54:13Z

github-actions[bot]
bot Apr 12, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and layered CI/CD pipeline combining traditional GitHub Actions workflows with agentic AI-driven checks. All agentic workflow files are compiled (.lock.yml) and the pipeline covers linting, building, type-checking, unit tests, integration tests, security scanning, and multi-ecosystem validation.

Total workflow files: 44 (17 .lock.yml agentic + 27 standard .yml)
Agentic workflows tracked by gh-aw: 27
Workflows with PR triggers: ~17 distinct checks run on pull requests

✅ Existing Quality Gates

Static Analysis & Build

Check	Workflow	Trigger
ESLint (TypeScript)	`lint.yml`	push + PR
Markdownlint	`lint.yml`	push + PR
TypeScript Type Check (`tsc --noEmit`)	`test-integration.yml`	push + PR
Build Verification (Node 20 + 22 matrix)	`build.yml`	push + PR
Semantic PR Title enforcement	`pr-title.yml`	PR only
Documentation link checking	`link-check.yml`	PR (md changes) + weekly
Documentation preview build	`docs-preview.yml`	PR (doc changes)

Testing

Check	Workflow	Trigger
Unit tests + coverage (with PR diff comment)	`test-coverage.yml`	push + PR
Integration: domain, network, protocol, container, API proxy	`test-integration-suite.yml`	push + PR
Chroot: languages, package managers, /proc, edge cases	`test-chroot.yml`	push + PR
Example scripts validation	`test-examples.yml`	push + PR
Multi-ecosystem build tests (8 ecosystems, agentic)	`build-test.lock.yml`	PR

Security

Check	Workflow	Trigger
CodeQL (JS/TS + Actions)	`codeql.yml`	push + PR + weekly
npm audit (fails on high/critical, SARIF to Security tab)	`dependency-audit.yml`	push + PR + weekly
AI security review of PR diff	`security-guard.lock.yml` (Claude)	PR
Dependency security monitor	`dependency-security-monitor.lock.yml`	daily
Daily security review + threat modeling	`security-review.lock.yml`	daily

Smoke / End-to-End Testing

Check	Workflow	Trigger
Real Claude agent smoke test	`smoke-claude.lock.yml`	PR + every 12h
Real Codex agent smoke test	`smoke-codex.lock.yml`	PR + every 12h
Real Copilot agent smoke test	`smoke-copilot.lock.yml`	PR + every 12h
Chroot smoke test	`smoke-chroot.lock.yml`	PR (path-filtered) + reaction
Services smoke test	`smoke-services.lock.yml`	PR + every 12h

Automated Maintenance

Dependabot for npm (root + docs), Docker (agent + squid), GitHub Actions — weekly grouped PRs
Token usage analyzers and optimizers (Claude, Copilot) — daily
CLI flag consistency checker — weekly
Performance benchmarks with regression issue creation — daily

🔍 Identified Gaps

🔴 High Priority

1. Dangerously Low Unit Test Coverage Thresholds

Current coverage: statements 38.39%, branches 31.78%, functions 37.03%
Two critical files are virtually untested: cli.ts at 0% and docker-manager.ts at 18%
Coverage thresholds in jest.config.js are set at the current baseline (38%/30%/35%/38%), creating no meaningful quality gate — any PR can maintain coverage simply by not removing tests
The test-coverage-improver workflow targets weekly improvements, but there's no enforcement that PRs must maintain a meaningful threshold

2. No Container Image Vulnerability Scanning on PRs

Docker images (containers/squid/, containers/agent/, containers/api-proxy/, containers/cli-proxy/) are built during integration tests but never scanned for OS-level CVEs
dependency-audit.yml only audits npm packages, not the Ubuntu base image or installed apt packages
No Trivy, Grype, or equivalent container scanner in any workflow
High impact: these containers are distributed via GHCR and run user workloads

3. Performance Benchmarks Are Not PR-Gated

performance-monitor.yml runs on schedule (daily at 06:00 UTC) and manual dispatch only — not on PRs
Performance regressions in container startup time, proxy latency, etc. can be merged without detection
Regressions are only caught the day after merging, potentially after multiple PRs have landed

4. No Required Status Checks Configuration in Repository

No REQUIRED_STATUS_CHECKS configuration is committed to the repository
It's unclear which of the many workflows are required to pass before merge
With 17+ PR checks, some may be informational-only, leading to inconsistent merge standards

🟡 Medium Priority

5. Integration Tests Have Significant Duplication Risk

test-integration-suite.yml has 5 parallel jobs, each manually duplicating the same ~30-step pattern (checkout → setup node → npm ci → build → docker build × 2 containers → pre-cleanup → test → post-cleanup → collect logs)
Divergence between jobs is already visible: test-domain uses JEST_TIMEOUT: 180000 while test-chroot-package-managers uses 300000, with no shared variable
Risk: maintenance updates applied to one job may miss others

6. No Formatting Check (Prettier)

ESLint runs but no code formatter is enforced
TypeScript/JS formatting consistency relies entirely on developer discipline
Mixed formatting can accumulate as a long-term maintenance burden

7. link-check.yml Skips Non-Markdown Changes

Link checking only triggers on **/*.md file changes
URL references in TypeScript source code (e.g., comments with documentation links) are never validated
Broken links in code comments or JSDoc can persist indefinitely

8. Smoke Tests for Chroot Require Manual Reaction

smoke-chroot.lock.yml triggers on reaction: rocket in addition to pull_request
Since the path filter (src/**, containers/**) is combined with the reaction trigger, the auto-PR trigger only fires for container/source changes — pure config changes to non-path-filtered files won't smoke-test chroot
Documentation changes or workflow-only changes that affect chroot behavior are not validated

9. No License Compliance Scanning

Numerous dependencies and container base images are used (Ubuntu, Squid, Node.js) without automated license validation
No FOSSA, LicenseCheck, or equivalent tool verifies that dependency licenses are compatible
Relevant for a project distributed on GHCR and via npm as @github/awf

10. No SBOM Generation in Release or PR Pipeline

No Software Bill of Materials (SBOM) is generated for releases
GitHub's native SBOM support (via Dependency Graph) may exist, but no workflow explicitly generates or attaches an SBOM artifact

🟢 Low Priority

11. No Artifact/Bundle Size Monitoring on PRs

dist/ directory size is verified to exist (build.yml) but no size budget is enforced
Growing bundle size could affect installation time and user experience
A simple file size check or bundlesize integration would provide a lightweight gate

12. No Changelog Enforcement

PR titles are semantically validated, but there's no check that CHANGELOG.md or release notes documentation is updated for user-facing changes
The update-release-notes workflow runs on release publication, providing post-hoc automation

13. Documentation Build Does Not Deploy Live Preview

docs-preview.yml builds the docs and uploads an artifact but does not deploy to a live preview URL (e.g., via GitHub Pages PR environments or Cloudflare Pages)
Reviewers must download and serve the artifact locally to verify documentation changes

14. No Mutation Testing

The coverage metrics (38%) don't validate test quality, only that lines were executed
Mutation testing (e.g., Stryker) would validate that tests actually catch bugs, not just execute code paths

15. Performance Benchmark History in Git Branch

Benchmark history is stored in an orphaned benchmark-data branch in the same repository
This approach mixes data storage with code history and the branch will grow indefinitely

📋 Actionable Recommendations

1. Raise Coverage Thresholds Incrementally

Issue: Thresholds set at current minimum (38%) provide no quality gate
Solution: Establish a coverage improvement roadmap — raise thresholds by 5% per sprint milestone:

Short term: 50% statements, 45% branches (achievable by covering docker-manager.ts partial paths)
Medium term: 70% overall by focusing on cli.ts and docker-manager.ts
Complexity: Low (change 4 numbers in jest.config.js, add tests incrementally)
Impact: High — prevents coverage regression on every PR

2. Add Container Image Scanning

Issue: No CVE scanning for Docker images
Solution: Add Trivy or GitHub's built-in container scanning to build.yml after the docker build steps:

- name: Scan container images
  uses: aquasecurity/trivy-action@<sha>
  with:
    image-ref: ghcr.io/github/gh-aw-firewall/agent:latest
    format: sarif
    output: trivy-agent.sarif
    severity: HIGH,CRITICAL
- uses: github/codeql-action/upload-sarif@<sha>
  with:
    sarif_file: trivy-agent.sarif

Complexity: Low (well-supported GitHub Action)
Impact: High — surfaces container CVEs before they're pushed to GHCR

3. Add Performance Check to PR Pipeline

Issue: Performance regressions not detected until day after merge
Solution: Add a lightweight performance smoke test to build.yml that measures container startup time (1 iteration, not 30) with a soft warning comment rather than a hard failure:

Only block on >3x regression (critical threshold), warn on >1.5x (target threshold)
Complexity: Medium (requires sudo/Docker, similar to integration tests)
Impact: High — catches startup time regressions immediately

4. Define Required Status Checks

Issue: No clear merge requirements documented
Solution: Configure branch protection rules in GitHub repository settings (or via .github/ configuration) requiring:

Build Verification (build.yml)
Test Coverage Report (test-coverage.yml)
TypeScript Type Check (test-integration.yml)
ESLint + Markdown Lint (lint.yml)
Dependency Vulnerability Audit (dependency-audit.yml)
PR Title Check (pr-title.yml)
Complexity: Low (GitHub UI or API configuration)
Impact: High — ensures all quality gates are enforced

5. Extract Integration Test Job Template

Issue: 5 parallel integration test jobs duplicate ~30-step setup pattern
Solution: Extract to a reusable workflow (.github/workflows/integration-test-runner.yml) using workflow_call with parameters for testPathPattern and JEST_TIMEOUT. Each job calls it with their specific parameters:

test-domain:
  uses: ./.github/workflows/integration-test-runner.yml
  with:
    pattern: "blocked-domains|dns-servers|empty-domains|wildcard-patterns"
    timeout: 180000

Complexity: Medium (workflow refactoring)
Impact: Medium — reduces maintenance burden, prevents config drift

6. Add Prettier Formatting Check

Issue: No code formatting enforcement
Solution: Add prettier --check to lint.yml, and add .prettierrc configuration aligned with existing code style. Configure prettier as a dev dependency:

- name: Check formatting
  run: npx prettier --check "src/**/*.ts"

Complexity: Low (but may require one-time formatting PR)
Impact: Medium — prevents formatting debates in code review

7. Add License Compliance Check

Issue: No license validation for dependencies
Solution: Add license-checker or licensee to dependency-audit.yml:

- name: Check dependency licenses
  run: npx license-checker --onlyAllow 'MIT;Apache-2.0;BSD-2-Clause;BSD-3-Clause;ISC;CC0-1.0'

Complexity: Low
Impact: Medium — ensures license compatibility as dependencies change

8. Generate SBOM on Release

Issue: No SBOM attached to releases
Solution: Add SBOM generation to release.yml using GitHub's built-in SBOM action or anchore/sbom-action:

- name: Generate SBOM
  uses: anchore/sbom-action@<sha>
  with:
    format: spdx-json
    artifact-name: sbom.spdx.json

Complexity: Low
Impact: Medium — improves supply chain transparency

9. Add Bundle Size Check

Issue: No dist size monitoring
Solution: Add a simple check in build.yml after npm run build:

DIST_SIZE=$(du -sb dist/ | awk '{print $1}')
echo "Bundle size: \$\{DIST_SIZE} bytes"
if [ "$DIST_SIZE" -gt "5242880" ]; then  # 5MB threshold
  echo "::warning::Bundle size exceeds 5MB threshold: \$\{DIST_SIZE} bytes"
fi

Complexity: Low
Impact: Low — provides visibility into bundle growth

10. Consider Smoke Chroot Path Filter

Issue: smoke-chroot.lock.yml path filter may miss non-source changes that affect chroot
Solution: Evaluate whether the path filter src/**, containers/** is sufficient or if it should be broadened to include tests/** or removed in favor of always running on all PRs
Complexity: Low (config change)
Impact: Low-Medium — reduces risk of chroot regression on non-source PRs

📈 Metrics Summary

Metric	Value
Total workflow files	44
PR-triggered workflows (standard)	~12
PR-triggered workflows (agentic)	~5
Scheduled workflows	~8
Unit test coverage — statements	38.39%
Unit test coverage — branches	31.78%
Unit test coverage — functions	37.03%
Coverage threshold — statements	38%
Critical files with <20% coverage	2 (`cli.ts` 0%, `docker-manager.ts` 18%)
Integration test jobs (parallelized)	9
Chroot test jobs (parallelized)	4
Build ecosystems tested (agentic)	8 (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust)
Dependabot targets	5 ecosystems
Performance benchmark metrics tracked	tracked daily, not on PRs

Key Strengths

✅ Strong agentic security review layer (Claude-powered security-guard on every PR)
✅ Broad multi-ecosystem smoke testing with real AI agents
✅ Comprehensive integration tests covering domain filtering, network security, chroot, and protocol handling
✅ Coverage comparison comments on PRs showing regressions
✅ Daily performance benchmarking with automatic regression issue creation
✅ Dependabot covering all dependency ecosystems including Docker and GitHub Actions

Key Weaknesses

❌ Unit test coverage too low with insufficient quality gate thresholds
❌ No container image CVE scanning
❌ Performance benchmarks not run on PRs — regressions detected too late
❌ Required status checks not explicitly configured/documented

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 807.9K · ◷

expires on Apr 19, 2026, 12:54 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1934

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1934

Uh oh!

github-actions[bot] bot Apr 12, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Static Analysis & Build

Testing

Security

Smoke / End-to-End Testing

Automated Maintenance

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

1. Raise Coverage Thresholds Incrementally

2. Add Container Image Scanning

3. Add Performance Check to PR Pipeline

4. Define Required Status Checks

5. Extract Integration Test Job Template

6. Add Prettier Formatting Check

7. Add License Compliance Check

8. Generate SBOM on Release

9. Add Bundle Size Check

10. Consider Smoke Chroot Path Filter

📈 Metrics Summary

Key Strengths

Key Weaknesses

Replies: 0 comments

github-actions[bot]
bot Apr 12, 2026