[CI/CD Assessment] CI/CD Pipelines and Integration Tests Gap Assessment #1698
Closed
Replies: 1 comment
-
|
This discussion was automatically closed because it expired on 2026-04-13T12:56:11.181Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
📊 Current CI/CD Pipeline Status
The repository has a mature and layered CI/CD setup with 72 total workflow files (45 standard YAML + 27 Markdown-based agentic workflows). PR quality gates are generally healthy, with one notable broken gate.
Overall health: 🟡 Good — strong coverage breadth, but meaningful gaps in depth.
✅ Existing Quality Gates
The following checks run automatically on every PR targeting
main:lint.ymllint.ymltsc --noEmit)test-integration.ymlbuild.ymltest-coverage.ymltest-integration-suite.ymltest-chroot.ymltest-examples.ymltest-action.ymlpr-title.ymlcodeql.ymldependency-audit.ymllink-check.yml*.mdonlysecurity-guard.lock.ymlbuild-test.lock.ymlsmoke-*.lock.ymlScheduled / ongoing:
performance-monitor.yml)🔍 Identified Gaps
🔴 High Priority
1. 8 Integration Tests Not Included in Any CI Workflow
The following test files exist in
tests/integration/but are not referenced by any workflow job's--testPathPatterns:api-proxy-observability.test.tsapi-proxy-rate-limit.test.tsapi-target-allowlist.test.tschroot-capsh-chain.test.tschroot-copilot-home.test.tsgh-host-injection.test.tsghes-auto-populate.test.tshost-tcp-services.test.tsworkdir-tmpfs-hiding.test.tsThese are never executed in CI, meaning regressions in API proxy observability, token rate limiting, GitHub host injection prevention, GHES support, host TCP service access, tmpfs workdir hiding, and capsh privilege-drop chain can be merged undetected.
2. Dependency Vulnerability Audit Consistently Failing on PRs
Recent PR workflow data shows
Dependency Vulnerability Auditwith a 0% pass rate (2 failures out of 2 recent PR runs). This security gate is broken, meaning npm vulnerabilities could merge undetected. This is a critical regression.3. Coverage Thresholds Are Critically Low for a Security Tool
Current enforced thresholds: 38% statements, 30% branches, 35% functions. For a security-critical infrastructure tool (network firewall), these are far below industry standards. Crucially:
cli.ts(main entry point): 0% coveragedocker-manager.ts(orchestration): 18% coverage, 4% function coverageAny refactoring or logic change in these files gets zero test signal.
🟡 Medium Priority
4. No Shell Script Static Analysis (ShellCheck)
The repository contains 20+ shell scripts in
containers/agent/,containers/squid/,scripts/ci/, andexamples/. None of these are linted by any CI check. Shell scripts are high-risk surface area (entrypoint.sh, setup-iptables.sh, cleanup.sh) and bugs here have security implications.5. No Dockerfile Linting (Hadolint)
Three Dockerfiles exist (
containers/agent/,containers/squid/,containers/api-proxy/). No static analysis checks for best practices, layer ordering, or known antipatterns (e.g.,apt-getwithout--no-install-recommends, missingHEALTHCHECK, pinned base images).6. Performance Benchmarks Not PR-Gated
performance-monitor.ymlruns weekly but not on PRs. A PR that significantly increases container startup time, domain resolution latency, or proxy overhead will not be caught before merge. The benchmark infrastructure already exists — it just isn't triggered on PRs.7. Smoke Tests Require Manual Triggers for 3 of 5 Agents
smoke-claude: requires ❤️ reactionsmoke-codex: requires 🎉 reactionsmoke-copilot: requires 👀 reactionsmoke-chroot: path-based (runs automatically)smoke-services: automatic (runs on all PRs)Real agent smoke tests for the three major engines don't run automatically on every PR. If a PR breaks agent invocation or credential injection, it may be merged before anyone adds a reaction.
8. Documentation Link Check Not Triggered on Code-Only PRs
link-check.ymlonly runs when*.mdfiles change. A PR that renames a file, deletes a section, or changes an anchor can silently break doc links if no markdown is modified.🟢 Low Priority
9. Secret Digger Workflows Failing Consistently
The hourly
secret-digger-claudeandsecret-digger-copilotworkflows have a 100% failure rate in recent runs. While this is not a PR gate, these runs are intended to proactively detect leaked credentials. They are currently providing no value.10. No SBOM (Software Bill of Materials) Generation
No workflow generates a CycloneDX or SPDX SBOM for the published container images or npm package. This is increasingly required for supply chain security compliance (SLSA, NTIA guidelines).
11. No License Compliance Check
With 100+ transitive npm dependencies, no workflow validates that all dependency licenses are compatible with the project's license. This is common in open-source security tools.
12. No PR Size Gate
Very large PRs (1000+ line changes) are not flagged or blocked. This makes review quality harder to maintain for a security-critical project.
📋 Actionable Recommendations
1. Add Missing Integration Tests to CI (High | Low complexity)
Add the 8 uncovered test patterns to the appropriate workflow jobs in
test-integration-suite.ymlandtest-chroot.yml:2. Fix or Isolate the Failing Dependency Audit (High | Low complexity)
Investigate whether the audit failures are high/critical CVEs in prod dependencies or false positives in dev dependencies. Options:
--productionflag to scope the audit to runtime dependencies only, or--ignore-scripts+ specific advisory exceptions via.nsprc3. Raise Coverage Thresholds Incrementally (High | Medium complexity)
Increase thresholds 5% per quarter and prioritize
cli.tsanddocker-manager.ts. Immediate target: 50% statements, 45% branches. Long-term target for a security tool: ≥70% statements.Add to
jest.config.js:4. Add ShellCheck for Shell Scripts (Medium | Low complexity)
Add to
lint.ymlalongside ESLint.5. Add Hadolint for Dockerfile Linting (Medium | Low complexity)
Add to
build.ymlor a newlint-containers.yml.6. Add PR-Triggered Performance Regression Check (Medium | Medium complexity)
Add a lightweight subset of benchmarks to PRs (e.g., container startup time only). Use a threshold of +20% regression as a warning, not a hard failure, to avoid flakiness.
7. Make Smoke Tests Automatic on PR (Medium | Low complexity)
Remove the reaction requirements from
smoke-claude.md,smoke-codex.md, andsmoke-copilot.mdor change them to run automatically on PRs from maintainers (usingroles: maintainer). The current reaction gate adds human latency to catching agent regressions.8. Investigate and Fix Secret Digger Failures (Low | Low complexity)
Run
agenticworkflows-auditon a recent Secret Digger run to identify whether failures are credential, model, or logic issues. These scans provide real security value when working.📈 Metrics Summary
Assessment generated by CI/CD Pipelines and Integration Tests Gap Assessment workflow — run ID 24032352554.
Beta Was this translation helpful? Give feedback.
All reactions