[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #2001

2026-04-15T23:11:04Z

github-actions[bot]
bot Apr 15, 2026

📊 Current CI/CD Pipeline Status

The repository has a mature, multi-layered CI/CD pipeline with 40+ workflows covering static analysis, unit testing, integration testing, security scanning, documentation, and AI-powered smoke tests. All agentic workflows are compiled (28/28 compiled per agenticworkflows status).

Workflows running on every PR:

Workflow	Purpose	Matrix
`build.yml` — Build Verification	Lint + TypeScript build + api-proxy/cli-proxy unit tests	Node 20 & 22
`lint.yml` — Lint	ESLint + Markdownlint	Node 20
`test-integration.yml` — TypeScript Type Check	`tsc --noEmit` strict type checking	Node 22
`test-coverage.yml` — Test Coverage	Unit tests + coverage comparison vs base branch	Node 20
`test-action.yml` — Test Setup Action	Verifies `action.yml` installation	ubuntu-latest
`test-integration-suite.yml` — Integration Tests	5 parallel jobs: domain, network, protocol/security, container ops, API proxy	ubuntu-latest
`test-chroot.yml` — Chroot Integration Tests	4 parallel jobs: languages, package managers, procfs, edge cases	ubuntu-latest
`test-examples.yml` — Examples Test	Executes shell example scripts end-to-end	ubuntu-latest
`codeql.yml` — CodeQL	Static SAST analysis (JS/TS + Actions)	ubuntu-latest
`dependency-audit.yml` — Dependency Vulnerability Audit	`npm audit` for main + docs-site, SARIF upload	ubuntu-latest
`pr-title.yml` — PR Title Check	Conventional commits validation	ubuntu-latest
`docs-preview.yml` — Documentation Preview	Builds and archives doc site (docs changes only)	ubuntu-latest
`link-check.yml` — Link Check	Checks Markdown links (md-file changes only)	ubuntu-latest
`security-guard.md` — Security Guard (AI)	Claude reviews security-critical file changes	AWF sandbox
`build-test.md` — Build Test Suite (AI)	Tests 8 language ecosystems through the firewall	AWF sandbox
`smoke-*.md` — Smoke Tests	Full agent smoke tests (Claude, Codex, Copilot, OpenCode, Chroot, Services)	AWF sandbox

Scheduled/maintenance workflows: performance-monitor.yml (daily benchmarks), dependency-security-monitor.md (daily vulnerability scan), security-review.md (daily security review), claude-token-usage-analyzer.md, copilot-token-usage-analyzer.md, plus 10+ agentic maintenance workflows.

✅ Existing Quality Gates

Build & Compilation — TypeScript compilation verified on Node 20 and 22 on every PR
Static Analysis — ESLint (with custom no-unsafe-execa rule), Markdownlint, TypeScript strict type check
SAST Security — CodeQL (JS/TS + Actions), weekly schedule + every PR
Dependency Security — npm audit failing on high/critical CVEs, SARIF uploaded to GitHub Security tab; covers main and docs-site packages
Unit Test Coverage — Jest coverage with regression detection (fails PR if coverage drops); reports comparison vs base branch in PR comment
Integration Testing — 26 integration test files (~265 tests) covering domain filtering, network security, chroot isolation, credential hiding, protocol support, API proxy, and container operations
Chroot Integration Tests — Dedicated 4-job parallel workflow verifying multi-language runtime support in chroot sandbox
End-to-end Examples — Real shell scripts executed against locally-built containers
PR Title Validation — Conventional commits enforcement (feat/fix/docs/refactor/etc.) with scope validation
AI Security Review — Claude reviews all diffs touching security-critical files (iptables, squid config, Dockerfile, entrypoint scripts)
Multi-Ecosystem Build Tests — 8 language ecosystems (Bun, C++, Deno, .NET, Go, Java, Node.js, Rust) tested through AWF firewall
Live Smoke Tests — Real AI agent sessions (Claude, Copilot, Codex, OpenCode) running through the full AWF pipeline
Performance Tracking — Daily benchmarks with regression detection and automated issue creation; history stored on benchmark-data branch

🔍 Identified Gaps

🔴 High Priority

1. Coverage Thresholds Are Critically Low for a Security Product

The two most important source files have near-zero coverage:

File	Statements	Functions	Lines
`docker-manager.ts` (core orchestration)	18% (45/250)	4% (1/25)	17%
`cli.ts` (entry point & argument handling)	0% (0/69)	0% (0/10)	0%

The configured thresholds (statements: 38%, branches: 30%, functions: 35%) are low enough that these critical files pass without any improvement. A regression in docker-manager.ts or cli.ts will not be caught by the coverage gate.

Root cause: The overall 38% threshold is dragged up by fully-covered smaller modules (logger.ts, squid-config.ts), masking critical gaps.

2. Multiple Integration Test Categories Have No CI Coverage

Per the coverage heat map in docs/INTEGRATION-TESTS.md, these test areas have integration tests written but are not wired into any CI workflow:

Feature	Integration Test Exists	CI Workflow
`--block-domains` deny-list	❌ Not written	❌
`--env-all` passthrough	❌ Not written	❌
SSL Bump	Unit ✅ only	❌
Docker warning stub	Tests exist but skipped	❌

The documentation explicitly notes: "Domain/Network: 6 files, ~50 tests — CI Workflow: None" and "Protocol/Security: 8 files, ~100 tests — CI Workflow: None" — though test-integration-suite.yml does cover these groups, the documentation and heat map should be updated to reflect actual CI coverage.

3. Performance Regression Tests Not Running on PRs

performance-monitor.yml only triggers on a daily schedule and workflow_dispatch. PRs that introduce performance regressions (startup time, container orchestration latency) will not be caught until the next daily run — potentially after merge.

Benchmarks tracked include: awf-startup, container-startup, squid-startup, total-execution. The p95/p99 thresholds and regression detection logic already exist; they just need a PR trigger.

4. Smoke Tests Require Manual Emoji Reactions to Run on PRs

While smoke tests do trigger on pull_request events, they require specific emoji reactions from authorized users:

smoke-claude: reaction: heart
smoke-copilot: reaction: eyes
smoke-codex: reaction: hooray
smoke-opencode: reaction: rocket

This means smoke tests are opt-in on PRs rather than automatic. PRs that break the agent execution layer can be merged without any live agent validation, unless a maintainer manually adds a reaction.

🟡 Medium Priority

5. No Container Image Vulnerability Scanning

Docker images for squid, agent, and api-proxy containers are built and used in every integration test run but are never scanned for OS-level CVEs. The dependency-audit.yml only scans npm packages. Tools like Trivy or Grype could scan container images and upload SARIF results to the GitHub Security tab.

This is particularly important because containers/squid/ uses ubuntu/squid:latest as a base, which may pull in outdated system packages.

6. Documentation Build Failures Don't Block PRs

In docs-preview.yml, the documentation build step uses continue-on-error: true:

- name: Build documentation
  id: build
  continue-on-error: true   # ← build failures are silently ignored
  run: |
    cd docs-site
    npm run build

A broken documentation build will post a failure comment but will not block the PR. Documentation is part of the product surface area and build failures should be required status checks.

7. No Dist Bundle Size Tracking on PRs

The build.yml verifies that dist/cli.js exists but does not track its size. Accidental inclusion of large dependencies or tree-shaking regressions would go unnoticed. The performance-monitor.yml already has infrastructure for appending metrics to a history branch.

8. Integration Test Gaps: Three Unwritten Feature Areas

As noted in the heat map, these features have no tests at any level:

--block-domains / deny-list functionality
--env-all flag behavior and security implications (blocking proxy env vars)
Docker daemon warning stub (currently skipped)

For a security product, the --block-domains and --env-all gaps are especially concerning as they're explicitly listed as untested.

🟢 Low Priority

9. `link-check.yml` Doesn't Catch Link Rot from Non-Markdown Changes

The link check workflow only triggers when **/*.md files are modified. If a PR renames a source file that's referenced in documentation, the broken link won't be detected until the weekly scheduled run.

10. No SBOM Generation or Provenance Attestation

There's no Software Bill of Materials (SBOM) generated for the Docker images or npm package as part of the release pipeline. With SLSA/supply-chain concerns growing, this is increasingly expected for security-critical tooling.

11. No Commit Signing Enforcement

The PR title semantic check and branch protection are present, but there's no enforcement of signed commits (GPG/SSH), which is a supply-chain security best practice for a security product.

12. AI-Generated PR Agents Not Required Status Checks

The security-guard.md and build-test.md agentic workflows run on PRs but their outcomes may not be configured as required status checks in branch protection. Their action_required conclusion (which is the agentic workflow pending state) could cause them to be ignored.

📋 Actionable Recommendations

1. Raise Coverage Thresholds Per File (High Priority, Medium Complexity)

Add per-file coverage thresholds in jest.config.js using coverageThreshold with per-path overrides:

coverageThreshold: {
  global: { statements: 38, branches: 30, functions: 35, lines: 38 },
  './src/docker-manager.ts': { statements: 30 },  // raise incrementally
  './src/cli.ts': { statements: 20 },              // start low, raise over time
}

Create a test-coverage-improver agentic workflow (already exists and runs weekly) that files PRs specifically targeting docker-manager.ts and cli.ts coverage improvements.

Impact: Ensures regressions in the most critical code paths are caught automatically.

2. Add Performance Benchmark to PR Workflow (High Priority, Low Complexity)

Add a pull_request trigger to performance-monitor.yml with a reduced iteration count (e.g., 5 instead of 30):

on:
  pull_request:
    branches: [main]
    paths:
      - 'src/**'
      - 'containers/**'
  schedule:
    - cron: "0 6 * * *"

Impact: Catches startup time and container orchestration regressions before merge.

3. Add Container Image Scanning (High Priority, Low Complexity)

Add a container-scan job to build.yml or create a dedicated container-security.yml using Trivy:

- name: Scan agent container
  uses: aquasecurity/trivy-action@<sha>
  with:
    image-ref: 'ghcr.io/github/gh-aw-firewall/agent:latest'
    format: 'sarif'
    output: 'trivy-agent.sarif'
    severity: 'CRITICAL,HIGH'

- name: Upload Trivy SARIF
  uses: github/codeql-action/upload-sarif@<sha>
  with:
    sarif_file: 'trivy-agent.sarif'

Impact: Detects OS-level CVEs in the containers used to sandbox AI agents.

4. Write `--block-domains` and `--env-all` Integration Tests (High Priority, High Complexity)

Create tests/integration/block-domains.test.ts and tests/integration/env-all.test.ts covering:

Deny-list prevents access to explicitly blocked domains even if allow-listed
--env-all passes through arbitrary env vars but not HTTP_PROXY/HTTPS_PROXY/SQUID_PROXY_* (verified by PROXY_ENV_VARS in src/upstream-proxy.ts)

Impact: Eliminates blind spots in two security-relevant features.

5. Fix Documentation Build as a Required Status Check (Medium Priority, Low Complexity)

Remove continue-on-error: true from docs-preview.yml's build step. If docs should never break the PR, at minimum add a dedicated docs-build job that fails hard:

- name: Build documentation
  run: cd docs-site && npm run build  # Remove continue-on-error

Impact: Prevents broken documentation from shipping with releases.

6. Add Dist Bundle Size Check (Medium Priority, Low Complexity)

Add a step in build.yml that compares dist/cli.js size against a baseline and fails if it grows by more than a threshold (e.g., 20%):

DIST_SIZE=$(wc -c < dist/cli.js)
echo "dist/cli.js size: \$\{DIST_SIZE} bytes"
if [ "$DIST_SIZE" -gt 500000 ]; then
  echo "::error::dist/cli.js exceeds size limit (\$\{DIST_SIZE} > 500000 bytes)"
  exit 1
fi

Impact: Catches accidental dependency bloat.

7. Update INTEGRATION-TESTS.md Heat Map (Low Priority, Low Complexity)

The heat map was last updated February 2026. test-integration-suite.yml now covers domain, network, protocol/security, container ops, and API proxy integration tests. The "CI" column should be updated for these categories from ❌ to ✅.

Impact: Accurate documentation for contributors.

8. Generate SBOM on Release (Low Priority, Medium Complexity)

Add SBOM generation to release.yml using anchore/sbom-action:

- name: Generate SBOM
  uses: anchore/sbom-action@<sha>
  with:
    image: ghcr.io/github/gh-aw-firewall/agent:$\{\{ github.ref_name }}
    format: spdx-json
    artifact-name: sbom-agent.spdx.json

Impact: Supply-chain transparency and compliance readiness.

📈 Metrics Summary

Metric	Value
Total CI/CD workflows	47 (19 agentic, 28 standard)
Agentic workflows compiled	28/28 ✅
Workflows running on every PR	~16 workflows
Unit test coverage (statements)	38.4% overall; `docker-manager.ts`: 18%, `cli.ts`: 0%
Integration test files	26 files, ~265 tests
Languages tested via build-test	8 (Bun, C++, Deno, .NET, Go, Java, Node, Rust)
Container images scanned for CVEs	0 of 3 ❌
Security SAST tools active	CodeQL (JS/TS + Actions), ESLint custom rules
Dependency vulnerability coverage	Main package + docs-site ✅
Performance regression on PR	❌ (daily only)
Smoke tests auto-run on PR	❌ (reaction-gated)

Key Risk Areas

docker-manager.ts at 18% coverage — The core container lifecycle manager has 25 functions of which 24 have zero test coverage. This is the highest-risk gap for a security product.
Performance benchmarks not on PRs — Regressions can slip in undetected between daily runs.
No container image OS-level scanning — Agent containers run on ubuntu:22.04 base which can accumulate CVEs between releases.

Generated by CI/CD Pipelines and Integration Tests Gap Assessment · ● 1.5M · ◷

expires on Apr 22, 2026, 11:11 PM UTC

2026-04-16T00:37:00Z

github-actions[bot]
bot Apr 16, 2026
Author

🔮 The ancient spirits stir, and the firewall runes glow.
The smoke-test agent has walked this circle and marked it true.
By moonlit logs and sealed ports, this chamber was visited.

🔮 The oracle has spoken through Smoke Codex

0 replies

2026-04-16T00:57:53Z

github-actions[bot]
bot Apr 16, 2026
Author

🔮 The ancient spirits stir in the firewall winds.
The oracle marks this chamber: the smoke-test agent has walked here.
May guarded paths remain true, and failing omens be swiftly mended.

🔮 The oracle has spoken through Smoke Codex

0 replies

2026-04-16T01:24:20Z

github-actions[bot]
bot Apr 16, 2026
Author

🔮 The ancient spirits stir above the firewall’s wards.
This oracle marks that the smoke-test agent has walked these halls.
May guarded pathways remain true, and forbidden roads stay sealed.
The sigil is set.

🔮 The oracle has spoken through Smoke Codex

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #2001

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #2001

Uh oh!

github-actions[bot] bot Apr 15, 2026

📊 Current CI/CD Pipeline Status

✅ Existing Quality Gates

🔍 Identified Gaps

🔴 High Priority

1. Coverage Thresholds Are Critically Low for a Security Product

2. Multiple Integration Test Categories Have No CI Coverage

3. Performance Regression Tests Not Running on PRs

4. Smoke Tests Require Manual Emoji Reactions to Run on PRs

🟡 Medium Priority

5. No Container Image Vulnerability Scanning

6. Documentation Build Failures Don't Block PRs

7. No Dist Bundle Size Tracking on PRs

8. Integration Test Gaps: Three Unwritten Feature Areas

🟢 Low Priority

9. link-check.yml Doesn't Catch Link Rot from Non-Markdown Changes

10. No SBOM Generation or Provenance Attestation

11. No Commit Signing Enforcement

12. AI-Generated PR Agents Not Required Status Checks

📋 Actionable Recommendations

1. Raise Coverage Thresholds Per File (High Priority, Medium Complexity)

2. Add Performance Benchmark to PR Workflow (High Priority, Low Complexity)

3. Add Container Image Scanning (High Priority, Low Complexity)

4. Write --block-domains and --env-all Integration Tests (High Priority, High Complexity)

5. Fix Documentation Build as a Required Status Check (Medium Priority, Low Complexity)

6. Add Dist Bundle Size Check (Medium Priority, Low Complexity)

7. Update INTEGRATION-TESTS.md Heat Map (Low Priority, Low Complexity)

8. Generate SBOM on Release (Low Priority, Medium Complexity)

📈 Metrics Summary

Key Risk Areas

Replies: 3 comments

Uh oh!

github-actions[bot] bot Apr 16, 2026 Author

Uh oh!

github-actions[bot] bot Apr 16, 2026 Author

Uh oh!

github-actions[bot] bot Apr 16, 2026 Author

github-actions[bot]
bot Apr 15, 2026

9. `link-check.yml` Doesn't Catch Link Rot from Non-Markdown Changes

4. Write `--block-domains` and `--env-all` Integration Tests (High Priority, High Complexity)

github-actions[bot]
bot Apr 16, 2026
Author

github-actions[bot]
bot Apr 16, 2026
Author

github-actions[bot]
bot Apr 16, 2026
Author