AGENTS.md

Guide for AI agents and LLMs working in this repository. Read this before exploring the codebase or proposing changes.

What this repo is

docker/docker-agent-action — a GitHub Action (and a family of sub-actions) that runs Docker Agent AI agents inside GitHub Actions workflows. It is published to the GitHub Marketplace and consumed by other repos as uses: docker/docker-agent-action@vX.Y.Z.

The repo ships three things:

Root composite action (action.yml) — downloads the docker-agent binary, optionally installs mcp-gateway, validates inputs, runs the agent securely (auth checks, prompt injection detection, secret-leak scanning), and exposes outputs.
review-pr/ — a higher-level composite action and reusable workflow (.github/workflows/review-pr.yml) that orchestrates a multi-agent PR review pipeline (drafter → verifier → poster) with a learning loop driven by reviewer feedback.
TypeScript helpers in src/ — bundled to dist/*.js and invoked by internal sub-actions (e.g., setup-credentials, security primitives, signed commits via the GitHub API).

Anything else here (workflows under .github/workflows/, scripts, tests) exists to develop, test, release, or self-test these three artifacts.

Repo layout

.
├── action.yml                       # ← Root action ("Docker Agent Runner"). Composite. Source of truth for inputs/outputs.
├── DOCKER_AGENT_VERSION             # Pinned docker-agent version (currently v1.54.0). Read at runtime by action.yml.
├── package.json                     # pnpm workspace root. Scripts: build, test, lint, format, actionlint.
├── tsup.config.ts                   # Bundles src/<name>/index.ts → dist/<name>.js (ESM, Node 24, fully bundled).
├── tsconfig.json                    # TS config. rootDir=src, target ES2024, strict.
├── vitest.config.ts                 # Two projects: "unit" and "integration".
├── biome.json                       # Formatter + linter (Biome). 100 char width, 2 spaces, single quotes, semicolons.
│
├── src/
│   ├── add-reaction/                # Adds emoji reactions to issue/PR comments.
│   │   ├── index.ts                 # Entry → bundled to dist/add-reaction.js
│   │   └── __tests__/
│   ├── check-org-membership/        # Authorizes a review: auto-run on PR-author membership, review_requested on the (trusted, timeline-derived) requester. Resolves PR author via pulls.get.
│   │   ├── index.ts                 # Entry → bundled to dist/check-org-membership.js (standalone CLI + library).
│   │   └── __tests__/
│   ├── credentials/                 # Fetches AWS secrets via OIDC, exports PAT and AI keys.
│   │   ├── index.ts                 # Entry → bundled to dist/credentials.js
│   │   ├── ai-keys.ts
│   │   ├── aws-credentials.ts
│   │   ├── github-app.ts            # Reads docker-agent-action/github-app from Secrets Manager; exports GITHUB_APP_TOKEN (a PAT) + ORG_MEMBERSHIP_TOKEN.
│   │   └── __tests__/
│   ├── filter-diff/                 # Strips excluded-path sections from a unified diff.
│   │   ├── index.ts                 # CLI entry → bundled to dist/filter-diff.js
│   │   ├── filter-diff.ts           # Core filterDiff() pure function + applyFilter() I/O wrapper.
│   │   └── __tests__/
│   ├── score-confidence/            # Per-finding confidence scoring for the PR review pipeline.
│   │   ├── index.ts                 # CLI entry → bundled to dist/score-confidence.js
│   │   ├── score-confidence.ts      # Core scoreFinding()/scoreFindings() pure functions + posting policy.
│   │   │                            #   Source of truth for the model mirrored in pr-review.yaml.
│   │   └── __tests__/
│   ├── score-risk/                  # Per-file risk scoring for the PR review pipeline.
│   │   ├── index.ts                 # CLI entry → bundled to dist/score-risk.js
│   │   ├── score-risk.ts            # Core scoreFiles() pure function.
│   │   └── __tests__/
│   ├── get-pr-meta/                 # Fetches PR metadata (title, body, author, base branch) used by review-pr.
│   │   ├── index.ts                 # Entry → bundled to dist/get-pr-meta.js
│   │   └── __tests__/
│   ├── mention-reply/               # Handles @docker-agent mention events: parses context, verifies org membership, builds prompt.
│   │   ├── index.ts                 # Entry → bundled to dist/mention-reply.js
│   │   └── __tests__/
│   ├── post-comment/                # Posts comments to PRs/issues.
│   │   ├── index.ts                 # Entry → bundled to dist/post-comment.js
│   │   └── __tests__/
│   ├── security/                    # Security primitives consumed by action.yml.
│   │   ├── index.ts                 # CLI dispatcher → bundled to dist/security.js.
│   │   │                            #   Subcommands: check-auth <association> <allowed-roles-json>
│   │   │                            #                sanitize-input <inputPath> <outputPath>
│   │   │                            #                sanitize-output <filePath>
│   │   ├── check-auth.ts            # author_association-based authorization.
│   │   ├── sanitize-input.ts        # Detects prompt injection patterns. Sets risk-level output.
│   │   ├── sanitize-output.ts       # Scans agent output for leaked API keys / tokens.
│   │   ├── patterns.ts              # Single source of truth for SECRET_PATTERNS, SECRET_PREFIXES, CRITICAL_PATTERNS.
│   │   └── __tests__/security.test.ts  # Vitest unit tests (replaces former test-security.sh / test-exploits.sh).
│   └── signed-commit/               # CLI tool that creates verified commits via GitHub's GraphQL API.
│       ├── index.ts                 # Entry → bundled to dist/signed-commit.js
│       ├── signed-commit.ts
│       └── __tests__/
│
├── review-pr/                       # PR-review action + agents.
│   ├── action.yml                   # Composite: orchestrates diff fetching, chunking, risk scoring, review, learning.
│   ├── README.md                    # User-facing docs for the PR review feature.
│   ├── reply/action.yml             # Sub-action: replies to feedback on review comments.
│   └── agents/
│       ├── pr-review.yaml           # Root reviewer agent (docker-agent YAML).
│       ├── pr-review-feedback.yaml  # Processes captured feedback into memory.
│       ├── pr-review-mention-reply.yaml  # Handles @docker-agent mention-reply responses.
│       ├── pr-review-reply.yaml     # Replies in-thread to reviewer comments.
│       ├── refs/                    # Reference docs passed to agents (posting format, code-review style).
│       └── evals/                   # docker-agent eval JSON files (success-*, security-*, marlin-*, etc.).
│
├── setup-credentials/               # Composite action: fetches AWS creds via OIDC, exports GITHUB_APP_TOKEN +
│   └── action.yml                   #   ORG_MEMBERSHIP_TOKEN. At root so consumers can use
│                                    #   docker/docker-agent-action/setup-credentials@VERSION directly.
│                                    #   Also exports DOCKER_AGENT_ACTION_ROOT (repo root of the downloaded action copy)
│                                    #   for subsequent run: steps that need to invoke dist/ bundles.
│
├── .github/
│   ├── actions/
│   │   └── mention-reply/           # Internal-only JS action (node24). main = dist/mention-reply.js.
│   │       └── action.yml           #   Only used by review-pr.yml; not intended for external consumers.
│   ├── workflows/                   # CI + self-test + release workflows (see "Workflows" below).
│   └── CODEOWNERS
│
├── scripts/
│   ├── act-local.sh                 # Helper for running workflows locally with `act`.
│   └── debug-permissions.ts
│
├── .agents/
│   └── skills/
│       └── add-pr-reviewer-to-repo/
│           └── SKILL.md             # Skill: set up or upgrade a repo to use the PR reviewer reusable workflow.
│
└── tests/                           # Shell-based integration tests for action.yml bash logic.
    ├── test-job-summary.sh
    ├── test-output-extraction.sh
    ├── out.diff                      # Fixture used by test-output-extraction.sh
    └── test.diff                    # Fixture used by test-output-extraction.sh

Critical conventions

Versioning & releases

This action is consumed via uses: docker/docker-agent-action@vX.Y.Z. The committed dist/ directory is the runtime artifact that consumers download — it must be checked in for tagged releases.
DOCKER_AGENT_VERSION is the single source of truth for the docker-agent binary version. action.yml reads it with cat. Update via .github/workflows/update-docker-agent-version.yml.
Internal uses: references to this action (e.g. review-pr/action.yml → docker/docker-agent-action@<sha>) are pinned to commit SHAs with version comments, not tags. Bumping requires updating both the SHA and the comment.

TypeScript / `src` rules

Only src/<name>/index.ts files listed in the explicit entry map in tsup.config.ts are bundled to dist/<name>.js. To add a new action entrypoint, create src/<name>/index.ts and add it to the entry map in tsup.config.ts. Pure library modules that are only imported by other actions (e.g. add-reaction, check-org-membership, get-pr-meta, post-comment) should not be added to the entry map — they get bundled into their consumer automatically.
New logic in composite actions must be implemented as TypeScript in src/ with Vitest unit tests — not as inline bash, awk, or other scripting languages embedded in YAML files. Shell steps in action YAML files should only orchestrate calls to dist/*.js tools (e.g. node "$ACTION_PATH/dist/filter-diff.js" pr.diff "$EXCLUDE_PATHS"). This keeps business logic testable, type-safe, and auditable outside the YAML layer.
tsup runs with noExternal: [/.*/] — all npm dependencies are bundled in. Do not assume node_modules exists at runtime.
Target is node24, ESM only, Node platform (so AWS SDK uses the Node export, not browser).
Sourcemaps are intentionally disabled (consumers clone dist/; sourcemaps would bloat every checkout).
Use .js extension in relative imports (import { x } from './foo.js') — required by Node16 module resolution even though the source is .ts.
A createRequire banner is injected by tsup.config.ts so CJS dependencies bundled into ESM (e.g. tunnel via @actions/http-client) can require('net') etc. at runtime. The banner uses import.meta.url and is ESM-only — if format is ever extended to include 'cjs', move the banner to a format-specific entry to avoid a parse error.

Linting / formatting

Biome (biome.json) handles both formatting and linting. Run pnpm format to fix, pnpm lint to check.
pnpm lint runs three things in CI parity: biome ci ., tsc --noEmit, actionlint.
actionlint validates all *.yml workflow files. It runs after pnpm build because the build emits dist/ files referenced by some actions. If you change a workflow, run pnpm actionlint locally.
Biome config: 100-col line width, 2-space indent, single quotes, semicolons always, trailing commas everywhere.

Tests

pnpm test — Vitest "unit" project (src/**/__tests__/**/*.test.ts).
pnpm test:integration — Vitest "integration" project (*.integration.test.ts).
tests/*.sh are integration tests for the shell logic inside action.yml (output extraction, job summary, etc.). Run them when changing the bash blocks of action.yml.
Security unit tests live in src/security/__tests__/security.test.ts (Vitest) and run as part of pnpm test. Run them when changing anything under src/security/.
The PR review agent has a separate eval suite under review-pr/agents/evals/. Run with docker agent eval review-pr/agents/pr-review.yaml review-pr/agents/evals/.

Security-first design (do not regress)

The action runs untrusted input (PR titles, bodies, comments, diffs) through an LLM with credentials. Several mitigations are non-negotiable:

No eval in any bash block. Argument arrays + quoted expansion only. If you find yourself wanting eval "$EXTRA_ARGS", stop and use read -ra.
All API keys are explicit inputs. action.yml's "Validate inputs" step rejects runs with no provider key. Do not add a fallback to env vars.
All secret values are masked with ::add-mask:: before any other step can log them.
Authorization for comment-triggered events is enforced in four tiers: skip-auth (caller already verified) → trusted-bot PAT bypass (resolves the github-token input to its GitHub login via gh api /user; if it matches the comment author's login, auto-authorize — handles machine-user PAT bots whose account type may be "User", not "Bot") → org-membership-token (preferred, queries /orgs/:org/members/:user) → author_association (legacy fallback, unreliable for pull_request_review_comment). Don't remove tiers; add new ones above the fallback.
Output sanitization (node "$ACTION_PATH/dist/security.js" sanitize-output) runs on every agent invocation — if it detects a leaked secret it opens a security incident issue and fails the run. Keep this on the if: always() path.
Prompt sanitization writes to /tmp/prompt-clean.txt; the runner prefers this file over the raw $PROMPT_INPUT. Don't bypass it.
The full threat model commentary lives in this file (the security/ shell scripts it was previously co-located with no longer exist; the logic has moved to src/security/).

`review-pr` action specifics

Uses a best-effort cache lock (pr-review-lock-<repo>-<pr>-* cache key) to avoid concurrent reviews on the same PR. Lock TTL is 600s; the agent execution timeout is 1800s (30 min) — these are intentionally decoupled. Reviews are idempotent so the small race window is acceptable.
Memory persistence uses actions/cache keyed by pr-review-memory-<repo>-<job>-<run_id> with prefix-based restore. The DB lives at ${{ github.workspace }}/.cache/pr-review-memory.db.
Feedback loop: the reply-to-feedback job in .github/workflows/review-pr.yml (which runs the pr-review-reply.yaml agent) uploads a pr-review-feedback artifact on every reply via its "Upload feedback artifact" step. The next review run downloads all such artifacts, runs pr-review-feedback.yaml to call add_memory(...) for each, then deletes the artifacts.
Bot reply detection uses HTML markers:  on review comments,  on agent replies (including mention-reply responses). Don't change these strings — workflows in consumer repos grep for them.
Copilot-style triggers: in addition to the original pull_request_review / issue_comment /review paths, review-pr.yml now also fires on:
- pull_request action review_requested when github.event.requested_reviewer.login == 'docker-agent'
- @docker-agent mentions on PR/issue comments — these run the .github/actions/mention-reply handler (sets should-reply and builds the context prompt) and then the review-pr/mention-reply sub-action (referenced from a pinned SHA, not present as a local path on every commit). The pr-review-mention-reply.yaml agent handles the actual reply.
Diffs over 1500 lines are chunked at file boundaries in review-pr/action.yml (see "Split diff into chunks"). Per-file risk scoring (security paths, line counts, error-handling patterns) prioritizes verifier attention.
Per-finding confidence scoring assigns each verified finding a precise 0–100 score (band: strong/moderate/weak/negligible) from the verifier's verdict, evidence_strength, and context_completeness, plus drafter↔verifier severity concordance and scope. src/score-confidence/score-confidence.ts is the single source of truth for the model (weights, bands, threshold, posting policy); the "Confidence Scoring" section of review-pr/agents/pr-review.yaml mirrors it as a strict lookup table so the orchestrator can apply it inline (the gitignored dist/ is not available at agent runtime). Change one, change both — the unit tests pin every value. Security and high-severity CONFIRMED/LIKELY findings are always posted regardless of score; below-threshold findings are surfaced in a summary rather than silently dropped. The inline-posting cutoff is configurable via the confidence-threshold action input (a band name or a number clamped to 30–100, default moderate = 55); the action resolves it by invoking the bundled dist/score-confidence.js resolve-threshold CLI (so the resolution logic stays in TypeScript, not bash) and injects it into the agent prompt, and scoreFinding/scoreFindings accept a matching postThreshold option.
Stale review threads on lines no longer in the diff are auto-resolved via GraphQL resolveReviewThread. Threads with no  marker are never touched.

Workflows (`.github/workflows/`)

Workflow	Purpose
`test.yml`	Unit + integration tests on push/PR.
`test-e2e.yml`	End-to-end action invocation against a real agent.
`release.yml`	Publishes tagged releases (must include a built `dist/`).
`review-pr.yml`	Reusable workflow consumers call as `docker/docker-agent-action/.github/workflows/review-pr.yml@v…`.
`self-review-pr.yml` + `-trigger.yml`	Dogfooding: the repo reviews its own PRs.
`reply-to-feedback.yml`	Handles replies to bot review comments.
`pr-describe.yml`	Generates PR descriptions from diffs.
`security-scan.yml`	Periodic security scanning.
`update-docker-agent-version.yml`	Bumps `DOCKER_AGENT_VERSION` automatically.
`update-consumers.yml`	Pushes version updates to downstream consumer repos.
`migrate-consumers.yml`	Consumer migration to the new repo: opens PRs across consumer repos rewriting `docker/cagent-action` refs to `docker/docker-agent-action` (incremental, no deadline — old repo stays live; dry-run by default, `repos` allowlist for pilots).
`manual-test-pirate-agent.yml`	Manual smoke test with a toy agent.

Common tasks (cheat sheet)

# Install (uses pnpm via Corepack, see packageManager in package.json)
pnpm install --frozen-lockfile

# Build TypeScript bundles → dist/
pnpm build

# Type-check only
pnpm typecheck

# Unit tests (includes src/security/__tests__)
pnpm test

# Integration tests (Vitest)
pnpm test:integration

# Shell-based integration tests for action.yml bash logic
bash tests/test-job-summary.sh
bash tests/test-output-extraction.sh

# Format + lint (write fixes)
pnpm format

# Strict CI check (Biome + tsc + actionlint). Run before every commit.
pnpm lint

# Run an eval suite for the PR-review agent
docker agent eval review-pr/agents/pr-review.yaml review-pr/agents/evals/ \
  -e GITHUB_TOKEN -e GH_TOKEN

Editing checklist

When you change something, verify:

Did you change action.yml inputs/outputs? Update README.md's input table and (if relevant) review-pr/action.yml consumers.
Did you add/remove a src/<name>/index.ts? dist/ will change after pnpm build. Commit it for tagged releases (CI does this on release.yml; for PRs, build is verified but dist/ may be ignored — check .gitignore).
Did you change a bash block in any action.yml? Run pnpm actionlint and the relevant tests/*.sh.
Did you change anything under src/security/? Re-run pnpm test (covers src/security/__tests__/security.test.ts) and confirm the threat model above is still covered.
Did you bump a pinned uses: SHA? Update the trailing version comment too.
Did you change a  marker, an output name, or an env var name? Search the repo (and consumer documentation) for references first — these are public contracts.

Things to avoid

Don't add eval to any shell snippet. Use bash arrays.
Don't depend on node_modules being present at action runtime. Add new packages to package.json and let tsup bundle them.
Don't introduce env-var fallbacks for API keys — explicit inputs only.
Don't remove if: always() from sanitize-output / upload-artifact / summary steps.
Don't commit changes to review-pr/agents/.cache/*.db* files (they're local memory artifacts).
Don't rename markers (, ) without a versioned migration plan.
Don't loosen authorization checks — comment-triggered events are the primary abuse vector for this action.

Agent skills

Reusable, task-specific how-to guides for AI agents are kept in .agents/skills/. Each skill is a single SKILL.md file with a YAML frontmatter block (name + description) followed by step-by-step instructions.

Skill	Description
`add-pr-reviewer-to-repo`	Set up or upgrade a consuming repo to use `docker/docker-agent-action/.github/workflows/review-pr.yml`. Covers 1-workflow vs 2-workflow (fork) patterns, trigger mode selection, VERSION pinning, upgrade checklist, and common troubleshooting.

When asked to onboard a new repo (or upgrade an existing one) to the PR reviewer, load the add-pr-reviewer-to-repo skill before starting.

Where to look for more context

User-facing docs: README.md (root action), review-pr/README.md (PR review feature).
Contributing rules: CONTRIBUTING.md.
Code of conduct: CODE_OF_CONDUCT.md.
License: Apache 2.0 (LICENSE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AGENTS.md

What this repo is

Repo layout

Critical conventions

Versioning & releases

TypeScript / `src` rules

Linting / formatting

Tests

Security-first design (do not regress)

`review-pr` action specifics

Workflows (`.github/workflows/`)

Common tasks (cheat sheet)

Editing checklist

Things to avoid

Agent skills

Where to look for more context

Uh oh!

FilesExpand file tree

AGENTS.md

Latest commit

History

AGENTS.md

File metadata and controls

AGENTS.md

What this repo is

Repo layout

Critical conventions

Versioning & releases

TypeScript / src rules

Linting / formatting

Tests

Security-first design (do not regress)

review-pr action specifics

Workflows (.github/workflows/)

Common tasks (cheat sheet)

Editing checklist

Things to avoid

Agent skills

Where to look for more context

TypeScript / `src` rules

`review-pr` action specifics

Workflows (`.github/workflows/`)