[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1819

2026-04-08T23:11:56Z

github-actions[bot]
bot Apr 8, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature, multi-layered CI/CD pipeline with strong coverage across build verification, testing, security scanning, and AI-powered agentic workflows. All compiled workflows are in healthy status.

Pipeline Summary

Category	Workflows	Trigger
Build & Lint	`build.yml`, `lint.yml`	PR + push to main
Type Safety	`test-integration.yml` (TypeScript type check)	PR + push to main
Unit Tests + Coverage	`test-coverage.yml`	PR + push to main
Integration Tests	`test-integration-suite.yml` (5 jobs), `test-chroot.yml` (4 jobs)	PR + push to main
Examples	`test-examples.yml`	PR + push to main
Security	`codeql.yml`, `dependency-audit.yml`, `security-guard.md` (AI)	PR + push + scheduled
Documentation	`link-check.yml`, `deploy-docs.yml`, `docs-preview.yml`	PR (MD files) + push
Performance	`performance-monitor.yml`	Weekly + dispatch
PR Hygiene	`pr-title.yml`	PR opened/edited
Agentic Smoke Tests	`smoke-claude.md`, `smoke-copilot.md`, `smoke-codex.md`, `smoke-chroot.md`, `smoke-services.md`	PR + scheduled 12h
Build-Test Workflows	`build-test.md`	PR + dispatch

Recent run health: Build Verification, Integration Tests, and Test Coverage all show success conclusions across the last 10 observed runs (April 8, 2026).

✅ Existing Quality Gates

On Every PR

Build Verification (Node 20 + 22 matrix): ESLint, TypeScript compilation, dist output validation, api-proxy and cli-proxy unit tests
Lint (separate job): ESLint + Markdownlint
TypeScript Type Check: tsc --noEmit strict type checking
Test Coverage: Jest unit tests with coverage comparison against base branch; PR comment with delta; fails on regression
Integration Tests (5 parallel jobs): domain filtering, network security, protocol security, credential hiding, container/ops, API proxy — end-to-end Docker-based
Chroot Integration Tests (4 parallel jobs): language runtimes (Python/Node/Go/Java/.NET), package managers (pip/npm/cargo/mvn/gem), /proc filesystem, edge cases
Examples Test: 4 example shell scripts exercised against locally-built containers
CodeQL: javascript-typescript + actions language analysis with security-extended,security-and-quality queries
Dependency Vulnerability Audit: npm audit --audit-level=high for main + docs-site packages with SARIF upload to Security tab
Security Guard (AI agent - Claude): reviews PR diff for security boundary weakening (iptables, Squid config, container hardening, domain patterns)
PR Title Check: Enforces conventional commit format with type allowlist and lowercase subject pattern
Link Check (MD changes only): Lychee link validator

Scheduled / Async Quality

Performance Monitor (weekly): startup latency benchmarks with regression issue creation
Daily Security Review + Dependency Security Monitor: ongoing vulnerability monitoring
Secret Diggers (Claude/Copilot/Codex, hourly): AI-powered secret scanning
Build-Test Workflow: Builds real projects (Go, Rust, Java, Node, etc.) through the AWF proxy
Smoke Tests (every 12h + PR): End-to-end agentic runs of Claude, Copilot, Codex through the firewall

🔍 Identified Gaps

🔴 High Priority

1. Low Unit Test Coverage with Low Thresholds

The critical business logic files have very low unit test coverage:

docker-manager.ts: 18% statements, 4% functions (250 statements, 25 functions mostly uncovered)
cli.ts: 0% coverage (entry point entirely untested at unit level)
Coverage thresholds are set very low: statements 38%, branches 30%, functions 35%

These are the most complex files orchestrating the entire container lifecycle, yet they're almost entirely untested at the unit level. Integration tests provide some coverage but don't count toward these metrics.

Risk: A regression in docker-manager.ts (e.g., wrong Docker Compose config generation, incorrect network topology) can pass all unit tests undetected.

2. No Container Security Scanning (Image Vulnerabilities)

There is no workflow that scans the built Docker images (squid, agent, api-proxy) for OS-level CVEs or package vulnerabilities (e.g., via Trivy, Grype, or Docker Scout). The dependency-audit.yml only covers npm packages.

Risk: Base images (ubuntu/squid:latest, ubuntu:22.04, Node.js) may contain critical OS CVEs that are not detected before being pushed to GHCR.

3. Performance Benchmarks Not Run on PRs

The performance-monitor.yml is weekly-only and never runs on pull requests. Container startup latency regressions introduced by a PR won't be caught until the weekly run.

Risk: A change to Docker Compose generation, healthcheck configuration, or network setup could silently degrade startup performance by 2-3x before being detected.

4. Integration Test Suite Missing CI Workflow Coverage

Per docs/INTEGRATION-TESTS.md, the following integration test categories have no CI workflow running them on PRs:

Domain/Network tests (6 files, ~50 tests) — only test-chroot.yml runs these in some cases
Protocol/Security tests (8 files, ~100 tests) — test-integration-suite.yml covers some
Container/Ops tests (7 files, ~45 tests)

Mapping from the docs shows some tests may fall through gaps between the --testPathPatterns filters in test-integration-suite.yml.

🟡 Medium Priority

5. No Required Status Checks Policy Documented

The workflows exist but there's no documented or enforced branch protection configuration requiring specific checks to pass before merge. If branch protection is not configured with required checks, a failing build.yml could still allow a merge.

6. Coverage Thresholds Are Below Meaningful Levels

Current thresholds (38%/30%/35%/38%) are set to match the current baseline rather than aspirational targets. The test-coverage-improver agentic workflow runs weekly to improve this, but there is no roadmap or enforced timeline for reaching a minimum acceptable coverage (typically 60-80% for production code).

7. No Mutation Testing

The test suite doesn't include mutation testing (e.g., Stryker Mutator for TypeScript). This means tests may achieve coverage percentages while not actually asserting on important logic variations.

8. Smoke Tests Are Not Required for PR Merge (Reaction-Triggered)

smoke-claude.md, smoke-copilot.md, and smoke-codex.md run on PR events, but are also reaction-gated (reaction: heart/eyes/hooray). If reactions are the primary trigger, these expensive end-to-end tests may not always run on every PR, leaving a gap for regressions in agent-specific behavior.

9. No Docker Compose Schema Validation

src/docker-manager.ts generates Docker Compose YAML dynamically. There's no step that validates the generated docker-compose.yml against the Docker Compose schema (via docker compose config or JSON Schema). A malformed config would only be caught at runtime.

10. No Formatting Check (Prettier)

build.yml and lint.yml run ESLint but there's no Prettier formatting check. Code style inconsistencies accumulate over time and create noisy diffs.

🟢 Low Priority

11. No SBOM Generation

No Software Bill of Materials (SBOM) is generated or attached to releases. For a security-critical tool like AWF, an SBOM would aid supply chain transparency.

12. Workflow Action Pinning Inconsistency

performance-monitor.yml uses actions/checkout@v4 (floating tag), while all other workflows use pinned SHAs. This creates a minor supply chain risk for that specific workflow.

13. No Changelog Validation on PRs

While pr-title.yml enforces conventional commit format, there's no check that CHANGELOG.md or release notes are updated for feature/fix PRs.

14. Link Check Only Triggered on MD Changes

link-check.yml only runs when **/*.md files change. A refactoring that renames source files could silently break documentation links without triggering a link check.

15. No License Header / REUSE Compliance Check

There's no automated check that new source files include proper license headers, which can be important for an open-source project.

📋 Actionable Recommendations

HIGH: Unit Test Coverage for Core Logic

Issue: docker-manager.ts and cli.ts have near-zero unit test coverage.
Solution: Add unit tests using Jest mocks for execa (already a dependency) to test generateDockerCompose(), startContainers(), stopContainers(), and signal handling without needing Docker. The test-coverage-improver agentic workflow can be pointed specifically at these files.
Complexity: Medium (mocking execa and filesystem operations requires careful setup)
Impact: High — would catch config generation regressions, wrong network addresses, missing env vars

HIGH: Container Image CVE Scanning

Issue: No vulnerability scanning of built Docker images.
Solution: Add a container-security-scan.yml workflow that runs trivy image or docker scout cves on containers/squid/, containers/agent/, containers/api-proxy/ images and uploads SARIF to the Security tab. Can run on PR + weekly schedule.

- name: Scan agent image
  uses: aquasecurity/trivy-action@<sha>
  with:
    image-ref: 'ghcr.io/github/gh-aw-firewall/agent:latest'
    format: 'sarif'
    output: 'trivy-agent.sarif'

Complexity: Low
Impact: High — catches OS-level CVEs before GHCR publish

HIGH: Add PRs to Performance Monitor Trigger

Issue: Performance regressions only caught weekly.
Solution: Add pull_request trigger to performance-monitor.yml with a reduced iteration count (e.g., 2 vs 5), posting benchmark delta as a PR comment. Use continue-on-error: true to avoid blocking merges on transient performance variability.
Complexity: Low
Impact: High — immediate feedback on latency regressions

MEDIUM: Raise Coverage Thresholds Incrementally

Issue: 38%/30% thresholds are too low to catch real regressions.
Solution: Increase thresholds by 5% per quarter via the test-coverage-improver workflow. Set a roadmap target of 60% statements / 50% branches as a 6-month goal, enforced in jest.config.js.
Complexity: Low (config change + test writing)
Impact: Medium — raises confidence that new tests actually cover production paths

MEDIUM: Docker Compose Config Validation

Issue: Generated docker-compose.yml is never validated before container start.
Solution: Add a test in the unit test suite that calls generateDockerCompose() and pipes the output through docker compose config --quiet to validate schema. This can run in the existing test-coverage.yml job.
Complexity: Low
Impact: Medium — catches malformed YAML and invalid service configurations early

MEDIUM: Pin Actions in performance-monitor.yml

Issue: Floating tag actions/checkout@v4 creates supply chain risk.
Solution: Pin to SHA: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd (same as all other workflows).
Complexity: Very Low
Impact: Low-Medium — consistency in security posture

LOW: Add SBOM to Release Workflow

Issue: No SBOM attached to releases.
Solution: Add anchore/sbom-action to release.yml to generate and attach an SBOM artifact to each GitHub Release.
Complexity: Low
Impact: Low (supply chain transparency)

LOW: Broader Link Check Trigger

Issue: Link check skips non-MD PRs that may rename referenced files.
Solution: Change link-check.yml to also trigger on push to main (post-merge check) so broken links are always caught.
Complexity: Very Low
Impact: Low

📈 Metrics Summary

Metric	Value
Total workflow files	45 (18 `.yml` + 27 `.md` agentic)
Workflows running on PRs	13 required + 5 reaction-gated smoke tests
Agentic workflows on PRs	2 (`security-guard`, `build-test`)
Scheduled workflows	8
Recent Build Verification success rate	10/10 (100%) — last 10 runs
Recent Integration Test success rate	~9/10 (in-progress run observed)
Unit test coverage — statements	38.39% (threshold: 38%)
Unit test coverage — branches	31.78% (threshold: 30%)
Unit test coverage — functions	37.03% (threshold: 35%)
Unit test coverage — lines	38.31% (threshold: 38%)
Integration test files	~26 files, ~265 tests
Unit test files	19 files, ~200 tests
Critical files with 0% coverage	1 (`cli.ts`)
Critical files with <20% coverage	1 (`docker-manager.ts` at 18%)

Assessment generated by AI agent on 2026-04-08. Based on analysis of .github/workflows/ configuration files, COVERAGE_SUMMARY.md, docs/INTEGRATION-TESTS.md, and recent GitHub Actions workflow run history.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 605.7K · ◷

expires on Apr 15, 2026, 11:11 PM UTC

2026-04-16T01:09:01Z

github-actions[bot]
bot Apr 16, 2026
Author

This discussion was automatically closed because it expired on 2026-04-15T23:11:56.184Z.

Closed by Workflow

0 replies

2026-04-16T01:12:42Z

github-actions[bot]
bot Apr 16, 2026
Author

🔮 The ancient spirits stir, and the smoke-test agent has walked this threshold.
The runes of build and browser answered true.
Yet two sigils remain veiled by absent tools.
So speaks the oracle: the watch continues.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1819

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1819

Uh oh!

github-actions[bot] bot Apr 8, 2026

📊 Current CI/CD Pipeline Status

Pipeline Summary

✅ Existing Quality Gates

On Every PR

Scheduled / Async Quality

🔍 Identified Gaps

🔴 High Priority

1. Low Unit Test Coverage with Low Thresholds

2. No Container Security Scanning (Image Vulnerabilities)

3. Performance Benchmarks Not Run on PRs

4. Integration Test Suite Missing CI Workflow Coverage

🟡 Medium Priority

5. No Required Status Checks Policy Documented

6. Coverage Thresholds Are Below Meaningful Levels

7. No Mutation Testing

8. Smoke Tests Are Not Required for PR Merge (Reaction-Triggered)

9. No Docker Compose Schema Validation

10. No Formatting Check (Prettier)

🟢 Low Priority

11. No SBOM Generation

12. Workflow Action Pinning Inconsistency

13. No Changelog Validation on PRs

14. Link Check Only Triggered on MD Changes

15. No License Header / REUSE Compliance Check

📋 Actionable Recommendations

HIGH: Unit Test Coverage for Core Logic

HIGH: Container Image CVE Scanning

HIGH: Add PRs to Performance Monitor Trigger

MEDIUM: Raise Coverage Thresholds Incrementally

MEDIUM: Docker Compose Config Validation

MEDIUM: Pin Actions in performance-monitor.yml

LOW: Add SBOM to Release Workflow

LOW: Broader Link Check Trigger

📈 Metrics Summary

Replies: 2 comments

Uh oh!

github-actions[bot] bot Apr 16, 2026 Author

Uh oh!

github-actions[bot] bot Apr 16, 2026 Author

github-actions[bot]
bot Apr 8, 2026

github-actions[bot]
bot Apr 16, 2026
Author

github-actions[bot]
bot Apr 16, 2026
Author