getsentry · dcramer · May 7, 2026 · May 7, 2026 · May 7, 2026
diff --git a/README.md b/README.md
@@ -20,8 +20,9 @@ Your code is under new management. Agents that review your code - locally or on
 # Initialize warden in your repository
 npx @sentry/warden init
 
-# Add the built-in baseline security check
+# Add the built-in baseline reviews
 npx @sentry/warden add security-review
+npx @sentry/warden add code-review
 
 # Run a pre-review on current branch changes
 # Uses Claude Code subscription if logged in, or set WARDEN_ANTHROPIC_API_KEY

diff --git a/packages/docs/public/llms.txt b/packages/docs/public/llms.txt
@@ -4,7 +4,7 @@
 
 Warden watches over your code by running **skills** against your changes. Skills are prompts that define what to look for: security vulnerabilities, API design issues, performance problems, or anything else you want consistent coverage on.
 
-Skills follow the [agentskills.io](https://agentskills.io) specification. They're markdown files with a prompt that tells the AI what to look for. Warden includes a baseline `security-review` skill by default. Treat it as a first pass, not a complete security audit, and add community or custom skills when you need deeper coverage.
+Skills follow the [agentskills.io](https://agentskills.io) specification. They're markdown files with a prompt that tells the AI what to look for. Warden includes `security-review` for baseline AppSec coverage and `code-review` for correctness bugs. Treat them as first passes, and add more skills when you need deeper coverage.
 
 - Docs: https://warden.sentry.dev
 - GitHub: https://github.com/getsentry/warden
@@ -51,6 +51,7 @@ Creates `warden.toml` (configuration) and `.github/workflows/warden.yml` (GitHub
 
 ```bash
 warden add security-review
+warden add code-review
 ```
 
 ### Run Locally
@@ -116,6 +117,7 @@ Add a skill trigger to your configuration.
 ```bash
 warden add                     # Interactive mode
 warden add security-review     # Add baseline security review
+warden add code-review         # Add correctness bug review
 warden add --list              # List available skills
 warden add --remote your-org/warden-skills --skill api-review
 warden add --remote your-org/warden-skills@abc123 --skill api-review  # Pinned to commit
@@ -176,6 +178,13 @@ version = 1
 [[skills]]
 name = "security-review"
 
+[[skills.triggers]]
+type = "pull_request"
+actions = ["opened", "synchronize"]
+
+[[skills]]
+name = "code-review"
+
 [[skills.triggers]]
 type = "pull_request"
 actions = ["opened", "synchronize"]
@@ -331,6 +340,9 @@ Skills can be referenced in multiple ways:
 [[skills]]
 name = "security-review"
 
+[[skills]]
+name = "code-review"
+
 # By relative path
 [[skills]]
 name = "./custom-skills/my-review"
@@ -351,7 +363,7 @@ Resolution order:
 1. Remote repository (if `remote` field is specified)
 2. Direct path (if skill contains `/`, `\`, or starts with `.`)
 3. Conventional directories: `.warden/skills/`, `.agents/skills/`, `.claude/skills/`
-4. Built-in skills, including `security-review`
+4. Built-in skills, including `security-review` and `code-review`
 
 ### Environment Variables
 

diff --git a/packages/docs/src/components/HeroFlow.astro b/packages/docs/src/components/HeroFlow.astro
@@ -1,6 +1,7 @@
 ---
 const skills = [
   "security-review",
+  "code-review",
   "api-design-review",
   "architecture-review",
   "dependency-review",
@@ -18,6 +19,7 @@ const desktopSkillLayout = [
   { side: "left", inset: "1.45rem" },
   { side: "left", inset: "2.25rem" },
   { side: "right", inset: "0.45rem" },
+  { side: "left", inset: "0.95rem" },
 ];
 const connectorCount = skills.length + 2;
 ---

diff --git a/packages/docs/src/pages/cli.astro b/packages/docs/src/pages/cli.astro
@@ -74,6 +74,7 @@ warden init --force    # Overwrite existing files`}
       <Code
         code={`warden add                     # Interactive mode
 warden add security-review     # Add baseline security review
+warden add code-review         # Add correctness bug review
 warden add --list              # List available skills
 
 # Remote skills from GitHub repositories
@@ -154,6 +155,7 @@ warden setup-app --org my-org  # For an organization`}
     <Code
       code={`# By name (repo-local first, then built-in skills)
 warden --skill security-review
+warden --skill code-review
 
 # Relative path
 warden --skill ./skills/custom-review
@@ -173,7 +175,7 @@ warden --skill ~/my-skills/security`}
     <li>Remote repository (if <code>remote</code> is specified in trigger config)</li>
     <li>Direct path (if contains <code>/</code>, <code>\</code>, or starts with <code>.</code>)</li>
     <li>Conventional directories: <code>.warden/skills/</code>, <code>.agents/skills/</code>, <code>.claude/skills/</code></li>
-    <li>Built-in skills, including <code>security-review</code></li>
+    <li>Built-in skills, including <code>security-review</code> and <code>code-review</code></li>
   </ol>
 
   <h2 id="environment-variables">Environment Variables</h2>

diff --git a/packages/docs/src/pages/config.astro b/packages/docs/src/pages/config.astro
@@ -36,6 +36,13 @@ const tocItems = [
 [[skills]]
 name = "security-review"
 
+[[skills.triggers]]
+type = "pull_request"
+actions = ["opened", "synchronize"]
+
+[[skills]]
+name = "code-review"
+
 [[skills.triggers]]
 type = "pull_request"
 actions = ["opened", "synchronize"]`}
@@ -322,6 +329,9 @@ createFixPR = true`}
 [[skills]]
 name = "security-review"
 
+[[skills]]
+name = "code-review"
+
 # By relative path
 [[skills]]
 name = "./custom-skills/my-review"
@@ -346,7 +356,7 @@ remote = "your-org/warden-skills@abc123def"`}
     <li>Remote repository (if <code>remote</code> field is specified)</li>
     <li>Direct path (if skill contains <code>/</code>, <code>\</code>, or starts with <code>.</code>)</li>
     <li>Conventional directories: <code>.warden/skills/</code>, <code>.agents/skills/</code>, <code>.claude/skills/</code></li>
-    <li>Built-in skills, including <code>security-review</code></li>
+    <li>Built-in skills, including <code>security-review</code> and <code>code-review</code></li>
   </ol>
 
   <h2 id="skill-files">Skill Files</h2>

diff --git a/packages/docs/src/pages/guide.astro b/packages/docs/src/pages/guide.astro
@@ -40,7 +40,7 @@ const tocItems = [
       <li>Reports findings with severity, location, and optional fixes</li>
     </ol>
 
-    <p>Skills follow the <a href="https://agentskills.io">agentskills.io</a> specification -they're markdown files with a prompt that tells the AI what to look for. Warden includes a baseline <code>security-review</code> skill by default. Treat it as a first pass, not a complete security audit, and add community or custom skills when you need deeper coverage.</p>
+    <p>Skills follow the <a href="https://agentskills.io">agentskills.io</a> specification -they're markdown files with a prompt that tells the AI what to look for. Warden includes <code>security-review</code> for baseline AppSec coverage and <code>code-review</code> for correctness bugs. Treat them as first passes, and add more skills when you need deeper coverage.</p>
 
     <p>Warden works in two contexts:</p>
     <ul>
@@ -135,7 +135,8 @@ export WARDEN_ANTHROPIC_API_KEY=sk-ant-...`}
 
     <Terminal showCopy={true}>
       <Code
-        code={`warden --skill security-review`}
+        code={`warden --skill security-review
+warden --skill code-review`}
         lang="bash"
         theme="vitesse-black"
       />
@@ -242,15 +243,16 @@ Focus on issues in the changed code. For each issue found, report:
 
     <h2 id="adding-skills">Adding Skills</h2>
 
-    <p>Use built-in skills by name. Add local or remote skills when your codebase needs more specialized checks.</p>
+    <p>Use built-in skills by name. Add more skills when your codebase needs specialized checks.</p>
 
-    <h3>Add the Baseline Security Review</h3>
+    <h3>Add Built-in Reviews</h3>
 
-    <p><code>security-review</code> ships with Warden as a baseline first pass, so no local skill file or remote repository is required:</p>
+    <p><code>security-review</code> and <code>code-review</code> ship with Warden as baseline first passes. Add them by name:</p>
 
     <Terminal showCopy={true}>
       <Code
-        code={`warden add security-review`}
+        code={`warden add security-review
+warden add code-review`}
         lang="bash"
         theme="vitesse-black"
       />

diff --git a/packages/docs/src/pages/index.astro b/packages/docs/src/pages/index.astro
@@ -95,15 +95,15 @@ vulnerabilities in code changes for Warden's baseline security skill.
 
   <section class="section skill-showcase" id="whats-a-skill">
     <h2>Its Just Skills</h2>
-    <p>The PR feedback above comes from skills. Warden ships with a baseline <code>security-review</code> skill. It is a first pass, not a complete security audit, and it is still just a SKILL.md file telling Warden what to look for.</p>
+    <p>The PR feedback above comes from skills. Warden ships with <code>security-review</code> for baseline AppSec coverage and <code>code-review</code> for correctness bugs. They are first passes, and they are still just SKILL.md files telling Warden what to look for.</p>
     <Terminal title="built-in security-review/SKILL.md">
       <Code
         code={skillExample}
         lang="markdown"
         theme="vitesse-black"
       />
     </Terminal>
-    <p class="skill-showcase-note">Use it by name. No local skill file, build step, schema, or SDK required.</p>
+    <p class="skill-showcase-note">Use built-ins by name. No local skill file, build step, schema, or SDK required.</p>
     <p class="skill-showcase-detail">Real skills can include detailed reference material, code examples, style guides, architectural constraints, or anything else you'd put in a design doc. The prompt is the skill.</p>
   </section>
 
@@ -128,7 +128,8 @@ vulnerabilities in code changes for Warden's baseline security skill.
 <span class="cli-green">Created</span> .github/workflows/warden.yml
 
 <span class="cli-bold">Next steps:</span>
-  1. Add a skill: <span class="cli-cyan">warden add security-review</span>
+  1. Add built-in reviews: <span class="cli-cyan">warden add security-review</span>
+     <span class="cli-cyan">warden add code-review</span>
   2. <span class="cli-cyan">export WARDEN_ANTHROPIC_API_KEY=sk-ant-...</span>
   3. Add <span class="cli-cyan">WARDEN_ANTHROPIC_API_KEY</span> to repository secrets
      <span class="cli-dim">https://github.com/your-org/your-repo/settings/secrets/actions</span>
@@ -138,11 +139,12 @@ vulnerabilities in code changes for Warden's baseline security skill.
 
     <div class="step">
       <h3>Load Skills</h3>
-      <p>Start with the baseline security check. Add custom or remote skills when your codebase needs deeper coverage.</p>
-      <Terminal showCopy={true} copyText="warden add security-review">
-        <pre class="cli-output"><span class="cli-dim">$</span> warden add security-review</pre>
+      <p>Start with baseline security and correctness reviews. Add more skills when your codebase needs deeper coverage.</p>
+      <Terminal showCopy={true} copyText={`warden add security-review\nwarden add code-review`}>
+        <pre class="cli-output"><span class="cli-dim">$</span> warden add security-review
+<span class="cli-dim">$</span> warden add code-review</pre>
       </Terminal>
-      <p>Create your own skills or find ones driven by the community at <a href="https://skills.sh">skills.sh</a>.</p>
+      <p>Keep adding skills for areas where your codebase needs specific coverage.</p>
     </div>
 
     <div class="step">

diff --git a/src/builtin-skills/code-review/SKILL.md b/src/builtin-skills/code-review/SKILL.md
@@ -0,0 +1,85 @@
+---
+name: code-review
+description: Finds real correctness bugs in code changes. Use for adversarial code review, bug hunts, regression review, PR correctness review, logic errors, data loss, race conditions, state bugs, interface contract breaks, error handling bugs, edge cases, broken builds, or broken workflows. Excludes style, readability, architecture, AppSec, and best-practice-only feedback unless the issue causes a demonstrable bug.
+allowed-tools: Read Grep Glob
+---
+
+You are an extremely adversarial production code reviewer finding only real bugs in code changes.
+Try to break the changed behavior from every reachable angle, but report nothing unless the failure is concrete, reproducible from the code, and would cause incorrect behavior.
+
+## References
+
+Load only matching references:
+
+| Reference | Read When |
+|-----------|-----------|
+| `references/javascript-typescript.md` | Reviewing JavaScript, TypeScript, Node, React, Next.js, or browser code |
+| `references/python.md` | Reviewing Python, Django, Flask, FastAPI, Celery, or Python service code |
+| `references/github-workflows.md` | Reviewing GitHub Actions workflows, local actions, reusable workflows, or scripts and config loaded by workflows |
+
+## Bugs Only Rule
+
+Report a finding only when you can prove all of these:
+
+- The changed code is reachable in production, a user entry point, a published interface, a shipped workflow, or a test that can mask a real regression.
+- A specific input, state, ordering, configuration, dependency result, or retry path triggers the failure.
+- The surrounding code, tests, schema, docs, or public contract shows what should happen.
+- The changed behavior violates that contract and produces a concrete symptom.
+- The impact is observable: wrong result, crash, data loss, corrupted state, missed side effect, duplicate side effect, broken build, failed deploy, or false success.
+
+No proof, no finding. Suspicion is not a result.
+
+## Investigation Process
+
+1. Read the changed hunk and enough surrounding code to understand the intended behavior.
+2. Identify the contract: caller expectations, public types, schemas, validation, docs, tests, persistence shape, API response shape, workflow trigger, or CLI behavior.
+3. Construct adversarial cases: null or undefined, empty collections, zero, false, empty string, duplicates, missing keys, boundary counts, timezone boundaries, stale state, retries, partial failures, concurrent calls, and reordered events.
+4. Trace data and state across imports, wrappers, validators, serializers, database writes, caches, queues, and dependent call sites.
+5. Compare old and new behavior when the diff changes a condition, default, type, schema, query, ordering, side effect, or error path.
+6. Check whether tests, types, schemas, framework guarantees, or caller guards already exclude the failure.
+7. Report only defects that survive this verification.
+
+## What To Report
+
+| Category | Report When |
+|----------|-------------|
+| Logic and conditions | Branches are inverted, unreachable, too broad, too narrow, or collapse distinct cases such as `0`, `false`, `""`, `null`, and missing values. |
+| Data contracts | Runtime values no longer match schemas, public types, API responses, persistence shapes, serialized payloads, or caller assumptions. |
+| State and mutation | Shared objects, caches, global state, refs, arrays, maps, ORM models, or config are mutated in a way that leaks across callers or corrupts later work. |
+| Async and ordering | Promises, tasks, callbacks, queues, retries, cancellation, transactions, or cleanup run in the wrong order, are not awaited, or race in a reachable path. |
+| Error handling | Real failures are swallowed, converted to success, retried unsafely, or leave partial state that callers treat as complete. |
+| Boundaries and edge cases | Empty, first, last, duplicate, pagination, sorting, timezone, locale, precision, overflow, migration, or compatibility cases produce wrong behavior. |
+| Persistence and migrations | Writes are non-atomic, migrations lose data, backfills skip rows, query filters update the wrong records, or rollback paths leave inconsistent state. |
+| API and dependency behavior | Published interfaces, CLI flags, config options, webhooks, service calls, or third-party dependency changes break documented or existing caller behavior. |
+| UI correctness | The UI displays stale, wrong, duplicate, missing, or unsaved data because of the changed code, not because of style or preference. |
+| Build, test, and workflow breakage | Changed code, packaging, imports, exports, generated artifacts, CI, or release workflows fail deterministically or report false success. |
+
+## Severity
+
+| Level | Use For |
+|-------|---------|
+| high | Data loss or corruption, critical-path crashes, broken production deploy or release, incorrect billing or permissions state, published interface breakage for normal callers, deadlock or hang in core flow, or false success after a failed destructive operation. |
+| medium | Reproducible wrong results, recoverable crashes, duplicate or missed side effects, broken non-critical workflow, meaningful edge case in a shipped path, or compatibility break with a clear affected caller. |
+| low | Narrow but real bug with limited blast radius, confusing state that can cause user-visible mistakes, or a test/tooling bug that masks the intended behavior. |
+
+- Use the lower severity when impact depends on unproven preconditions.
+- Do not inflate severity for cleverness. The bug earns its level through impact.
+
+## What Not To Report
+
+- AppSec findings. Use the dedicated AppSec skill for exploitability issues.
+- Style, naming, formatting, comments, readability, or maintainability concerns.
+- Architecture, design layering, type hygiene, or refactor advice without a proven incorrect behavior.
+- Performance concerns unless the changed code causes a reachable timeout, hang, memory blowup, quota exhaustion, or missed deadline.
+- Missing tests, weak tests, or low coverage unless the changed test now asserts the wrong behavior or hides a real regression.
+- Existing bugs untouched by the change unless the change makes them reachable or materially worse.
+- Generated, vendored, fixture, example, migration-only, or test-only code unless it is shipped, executed, or masks a shipped bug.
+- Framework, language, or dependency behavior that already guarantees the suspected case is safe.
+- Hypothetical failures that require unrealistic inputs, impossible call order, or assumptions not supported by the code.
+
+## Finding Format
+
+- Title: name the exact bug and trigger.
+- Description: include the changed behavior, trigger conditions, expected behavior, actual behavior, and concrete impact.
+- `verification`: list checked files, functions, callers, guards, tests, schemas, or framework guarantees.
+- `suggestedFix`: include only when the fix is complete for the analyzed path.