[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1961

2026-04-13T12:57:58Z

github-actions[bot]
bot Apr 13, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature and well-structured CI/CD pipeline with 14+ workflows running on pull requests. Recent runs show consistent success across build verification and integration tests, with only occasional action_required states. The pipeline covers TypeScript compilation, linting, type-checking, unit tests, integration tests, security scanning, and AI-based code review.

Key observation: The repo is high-velocity (multiple PRs merged per day, e.g., 10 PRs on 2026-04-12/13 alone), making robust CI/CD quality gates especially important.

✅ Existing Quality Gates

Automated on every PR (`pull_request` trigger):

Workflow	What It Checks
`build.yml`	TypeScript build, ESLint, dist verification, API/CLI proxy unit tests (Node 20 + 22 matrix)
`lint.yml`	ESLint + Markdown lint
`test-integration.yml` (TypeScript Type Check)	`tsc --noEmit` strict type check
`test-coverage.yml`	Jest unit tests + coverage diff vs base branch, fails on regression
`codeql.yml`	CodeQL security analysis (JS/TS + Actions, security-extended queries)
`dependency-audit.yml`	`npm audit` with SARIF upload for main + docs-site, fails on high/critical
`test-integration-suite.yml`	5 parallel Docker integration test jobs: domain filtering, network security, protocol security, container ops, API proxy
`test-chroot.yml`	4 parallel chroot integration test jobs: languages (Python/Go/Java/.NET), package managers (Rust/Ruby), procfs, edge cases
`test-action.yml`	Tests `action.yml` setup for latest/pinned/invalid versions
`test-examples.yml`	Runs example bash scripts end-to-end
`pr-title.yml`	Enforces semantic PR titles (`feat/fix/docs/ci/...`)
`security-guard.lock.yml`	Claude-based AI security review
`build-test.lock.yml`	Copilot-based AI build test suite
`link-check.yml`	Documentation link validation (*only on `.md` path changes**)

Opt-in on PRs (require emoji reactions from maintainers):

Smoke tests for Claude, Copilot, Codex, Chroot, and Services agent execution

Scheduled (NOT on PRs):

Performance Monitor — daily benchmarks with regression tracking
Security Review — daily AI-powered threat modeling
Dependency Security Monitor — daily vulnerability scanning

🔍 Identified Gaps

🔴 High Priority

1. Critically low unit test coverage thresholds

Current thresholds (38% statements, 30% branches, 35% functions) are well below acceptable levels for a security-critical firewall tool. The two most critical files are effectively untested:

cli.ts — 0% coverage (the main entry point: argument parsing, signal handling, container orchestration)
docker-manager.ts — 18% coverage (container lifecycle, config generation, cleanup — 250 statements, only 45 covered)

The global threshold masks these file-level gaps because smaller, fully-covered files (logger.ts, squid-config.ts, cli-workflow.ts) pull the average up.

2. No container image vulnerability scanning on PRs

Container images (ubuntu/squid:latest, ubuntu:22.04) are built during integration tests on every PR but never scanned for CVEs. Trivy/Grype scans only occur indirectly via CodeQL. A base image with a critical CVE could pass all current checks and ship in a release. Container signing with cosign only happens during releases, not PR validation.

3. No performance regression gating on PRs

The performance benchmark runs daily (scheduled) and creates issues when regressions are detected, but PRs are not blocked by performance regressions. A PR introducing 500ms startup latency would pass all checks and merge before detection. Given that AWF wraps time-sensitive AI agents, startup latency is a user-facing metric.

🟡 Medium Priority

4. Smoke tests are reaction-gated, not automated for all PRs

Real-world agent smoke tests (Claude, Copilot, Codex) require maintainers to add specific emoji reactions (❤️, 👀, 🎉) to trigger. This means most PRs from automated agents (Copilot SWE) are merged without actual smoke test validation. The smoke-chroot and smoke-services tests similarly require 🚀 reaction.

5. No per-file coverage gates for critical modules

While global coverage thresholds exist, there are no file-specific thresholds. cli.ts could remain at 0% indefinitely as long as global numbers stay above thresholds. Jest supports per-file or per-directory thresholds via coverageThreshold patterns.

6. No dist/bundle size monitoring

There is no tracking of the compiled dist/ size. A PR accidentally including a large dev dependency in production output, or a new --build-bundle artifact growing significantly, would go undetected until a user notices.

7. No license compliance scanning

No automated check validates that new dependencies have compatible open-source licenses. This is increasingly important for enterprise tooling like AWF.

8. Link check does not run on non-markdown PRs

link-check.yml only triggers when *.md files change. A code change that removes a documented CLI flag or changes a URL structure would not trigger link validation. Broken links in documentation would only be caught on the weekly schedule or the next markdown-only PR.

🟢 Low Priority

9. No commit-level message linting

Only PR titles are semantically validated. Individual commit messages within a PR are unchecked. For repositories using conventional commits for changelog generation, this can produce inconsistent histories.

10. Build matrix limited to Linux

The Node 20/22 matrix covers version compatibility but only on ubuntu-latest. There is no Windows or macOS build verification, which could be relevant for users running AWF on those platforms (particularly the npm install path).

11. No static analysis beyond CodeQL for shell scripts

The containers/agent/ directory contains complex shell scripts (setup-iptables.sh, entrypoint.sh) that implement critical security logic. No shellcheck or shfmt linting is configured for these scripts.

12. Container builds not cached between CI jobs

Each integration test job rebuilds container images from scratch (separate docker build calls per job). This adds ~2-3 minutes per job. With 9 parallel integration test jobs, Docker layer caching (cache-from/cache-to) could significantly reduce CI time.

📋 Actionable Recommendations

1. Raise coverage thresholds and add per-file gates

Solution: Update jest.config.js to enforce meaningful thresholds for critical files:

coverageThreshold: {
  global: { branches: 50, functions: 60, lines: 60, statements: 60 },
  './src/cli.ts': { branches: 50, functions: 50, lines: 50, statements: 50 },
  './src/docker-manager.ts': { branches: 40, functions: 40, lines: 40, statements: 40 },
}

Set incremental targets and ratchet them up over quarters.
Complexity: Low | Impact: High — directly improves regression detection for the two most critical files

2. Add container image vulnerability scanning to PR CI

Solution: Add a Trivy scan step in build.yml or a new container-security.yml:

- uses: aquasecurity/trivy-action@master
  with:
    image-ref: 'ghcr.io/github/gh-aw-firewall/agent:latest'
    format: 'sarif'
    output: 'trivy-results.sarif'
    severity: 'CRITICAL,HIGH'
- uses: github/codeql-action/upload-sarif@v4
  with:
    sarif_file: 'trivy-results.sarif'

Complexity: Low | Impact: High — catches base image CVEs before release

3. Add performance regression check to PR CI

Solution: Run a lightweight version of the benchmark (fewer iterations) on PRs and compare to the last N values stored in benchmark-data branch. Fail or warn if median startup time increases >20%.
Complexity: Medium | Impact: Medium — prevents silent latency regressions

4. Auto-trigger smoke tests on all non-draft PRs targeting main

Solution: Remove reaction gate from smoke tests or add a separate, lighter "smoke-quick" test that runs automatically. Alternatively, auto-add the trigger reaction via a bot when PRs are opened by trusted actors.
Complexity: Medium | Impact: High — ensures real agent execution is validated before merge

5. Add shellcheck/shfmt to lint workflow

Solution: Add shellcheck and shfmt steps to lint.yml for all .sh files under containers/:

- name: Lint shell scripts
  run: |
    find containers/ -name '*.sh' -exec shellcheck {} +

Complexity: Low | Impact: Medium — improves quality of security-critical shell scripts

6. Add dist size monitoring

Solution: Record du -sh dist/ in CI and compare to baseline stored as an artifact. Alert if size increases >10%.
Complexity: Low | Impact: Low-Medium — prevents accidental production dependency bloat

7. Enable link check for all PRs

Solution: Remove paths: filter from link-check.yml so it runs on every PR, or add it to the always-running build.yml as a step.
Complexity: Low | Impact: Low — prevents broken documentation links

8. Add license compliance check

Solution: Add license-checker or fossa to the dependency audit workflow.
Complexity: Low | Impact: Medium — important for enterprise distribution

📈 Metrics Summary

Metric	Value
Total workflow files (`.yml`)	46
Agentic workflow files (`.md` compiled)	27
PR-triggered automated workflows	14
Opt-in (reaction-gated) smoke tests	5
Scheduled-only workflows	7
Unit test coverage (statements)	38.39%
Unit test coverage (branches)	31.78%
Unit test coverage (functions)	37.03%
`cli.ts` coverage	0% ⚠️
`docker-manager.ts` coverage	18% ⚠️
Integration test jobs on PRs	9 parallel
Build workflow success rate (recent 5 runs)	5/5 ✅
Integration test success rate (recent 5 runs)	5/5 ✅
Total build workflow runs (lifetime)	1,717
Total integration test runs (lifetime)	825

Assessment generated on 2026-04-13 from workflow file analysis and recent run history.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 1.1M · ◷

expires on Apr 20, 2026, 12:57 PM UTC

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1961

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1961

Uh oh!

github-actions[bot] bot Apr 13, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

Automated on every PR (pull_request trigger):

Opt-in on PRs (require emoji reactions from maintainers):

Scheduled (NOT on PRs):

🔍 Identified Gaps

🔴 High Priority

🟡 Medium Priority

🟢 Low Priority

📋 Actionable Recommendations

1. Raise coverage thresholds and add per-file gates

2. Add container image vulnerability scanning to PR CI

3. Add performance regression check to PR CI

4. Auto-trigger smoke tests on all non-draft PRs targeting main

5. Add shellcheck/shfmt to lint workflow

6. Add dist size monitoring

7. Enable link check for all PRs

8. Add license compliance check

📈 Metrics Summary

Replies: 0 comments

github-actions[bot]
bot Apr 13, 2026

Automated on every PR (`pull_request` trigger):