Detect and block low-quality AI-generated pull requests before they waste maintainer time.
Maintainers everywhere are drowning in AI slop PRs — drive-by pull requests written by an LLM in 30 seconds, with hallucinated APIs, placeholder tests, and 800 lines of comments explaining trivial code. SlopStop is a free, open-source GitHub Action and CLI that scores incoming PRs and helps you stop slop at the door.
SlopStop verdict: ❌ Likely slop
3 signal(s): 2 high, 1 warning.
[HIGH] overcommented-code — Disproportionate comment-to-code ratio
src/util.ts
18/22 added lines are comment-only (82%), with 4 AI cliché phrase(s).
[HIGH] placeholder-tests — Tests appear to assert nothing meaningful
src/util.test.ts:8
3 placeholder assertion(s) in test file: expect(true).toBe(true) (L8), assert True (L12).
[WARNING] diff-vs-description-mismatch — Large change with thin description
PR adds 412 lines across 6 file(s) but description is only 23 chars.
In 2026 the volume of AI-generated PRs has exploded. Most are useful. A growing minority are not — they are pattern-matched plausibility, not engineering.
Open-source maintainers have started publicly refusing to accept LLM-generated submissions (Curl, Jeff Geerling, Axios coverage) because triaging slop costs more than the slop saves. SlopStop tries to fix the asymmetry: surface obvious tells in seconds so maintainers can ignore the worst PRs without reading them line by line.
Using AI to write code isn't the problem. The problem is signal loss in the review process.
Traditional PR review rests on an implicit assumption: writing 500 lines of code costs the author something, so they probably thought it through. AI breaks that assumption — the marginal cost of generating code approaches zero, which creates "submit first, think later" behavior. Reviewer time stays constant; the volume of things needing review explodes.
SlopStop's real job is to restore that signal, not to block AI use. The detectors flag symptoms of low-effort submissions regardless of their origin:
placeholder-testsdoesn't flag "AI-written tests" — it flags tests that verify nothing. That's a bug no matter who wrote them.overcommented-codedoesn't flag lots of comments — it flags a severe comment-to-code imbalance, which usually means the author didn't deeply understand what the code was doing.diff-vs-description-mismatchflags a disconnect between changes and their stated intent, leaving reviewers unable to judge the purpose of a change.
The intended audience is maintainers, not authors. When PR volume is 10× but reviewer time is fixed, this tool helps answer: which PRs deserve careful attention, and which can be closed immediately?
1. Open-source: screening external contributors
Contributors generate PRs at near-zero cost. SlopStop runs on pull_request open/synchronize events, scores automatically, and fails CI on high-severity signals — so maintainers can skip the worst submissions without reading them.
2. Internal teams: quality gate before review
Add SlopStop to CI with fail-on: warning. It acts as a pre-review check that catches structurally empty tests or change/description mismatches before a human reviewer ever opens the diff.
3. Individual developers: pre-push self-check
The Claude Code plugin runs SlopStop automatically before push, using your existing Claude credentials — no separate API key needed. Catch your own slop before anyone else sees it.
| Detector | What it flags |
|---|---|
overcommented-code |
Files where most added lines are comments, especially LLM-cliché phrases like "Here's the implementation" or "In a real-world scenario..." |
placeholder-tests |
Test files containing expect(true).toBe(true), assert True, t.Skip(), and similar empty assertions. |
diff-vs-description-mismatch |
PRs with hundreds of additions but a one-line description, or wide-touching changes not framed as refactors. |
Brings semantic understanding — runs ONLY after Layer 1 fires, so it costs nothing on clean PRs.
| Detector | What it flags |
|---|---|
test-doesnt-verify-change |
LLM verdict that the tests in the PR do not actually exercise the source changes — even if they don't use literal expect(true) patterns. |
description-doesnt-match-changes |
LLM verdict that the PR description omits significant changes or fabricates claims not present in the diff. |
Supported providers (BYO key):
- Anthropic (
claude-haiku-4-5default) — setANTHROPIC_API_KEY - OpenAI (
gpt-4o-minidefault) — setOPENAI_API_KEY - OpenAI-compatible (vLLM, LiteLLM, Together, etc.) — set
SLOPSTOP_LLM_API_KEYandSLOPSTOP_LLM_BASE_URL - Ollama (self-hosted, no key) — defaults to
http://localhost:11434 - Claude Code session — reuses your existing
claude logincredentials when running inside the plugin
Cost: typical PR run is <$0.01. Hard caps: 50 calls and 100,000 tokens per run by default.
Three ways to use SlopStop. Same engine, different surfaces.
.github/workflows/slopstop.yml:
name: SlopStop
on:
pull_request:
types: [opened, synchronize, reopened]
permissions:
contents: read
pull-requests: write
jobs:
check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with: { fetch-depth: 0 }
- uses: slopstop/slopstop@v0.2
with:
fail-on: high
judge: anthropic # off | anthropic | openai | openai-compat | ollama | auto
judge-model: claude-haiku-4-5 # optional, provider default if omitted
judge-budget-tokens: 100000 # hard cap on total tokens per run
judge-budget-calls: 50 # hard cap on LLM calls per run
# judge-base-url: http://... # for openai-compat or ollama
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}Layer 2 only kicks in if you set the API key as a repo secret. Without it, the action runs Layer 1 only.
# install
npm install -g slopstop
# Layer 1 only
slopstop check --from main --to HEAD --body "$(git log -1 --pretty=%B)"
# Layer 1 + Layer 2 (auto-detects provider from env)
ANTHROPIC_API_KEY=sk-... slopstop check --judge auto --from main --to HEAD --body "..."
# Custom model
ANTHROPIC_API_KEY=sk-... slopstop check --judge anthropic --judge-model claude-opus-4-7 --from main --to HEAD --body "..."
# OpenAI-compatible endpoint (vLLM, LiteLLM, Together, etc.)
SLOPSTOP_LLM_API_KEY=sk-... slopstop check \
--judge openai-compat \
--judge-model your-model-name \
--judge-base-url https://your-endpoint/v1 \
--from main --to HEAD --body "..."
# Ollama (self-hosted, no key)
slopstop check --judge ollama --judge-model llama3.2 --judge-base-url http://localhost:11434 \
--from main --to HEAD --body "..."
# Health check: what's wired up?
slopstop diagnose
# From stdin
git diff main..HEAD | slopstop check --diff - --body "fix"Run slopstop --help for the full option list.
/plugin install slopstop
Then reload:
/reload-plugins
The plugin adds three things:
/slopstop:slop-check— run Layer 1 from the CLI + Layer 2 from your Claude session (no API key needed)/slopstop:slop-explain— explain a signal in plain language and propose concrete fixesslopstopskill — auto-loaded discipline that teaches Claude to avoid slop patterns when generating code, tests, or PR descriptions- Pre-push hook — runs Layer 1 automatically before
git pushorgh pr create
By default the hook uses Layer 1 only and fails on warning. To enable Layer 2 in the hook, set hook_judge in the plugin config.
To check a specific range or PR:
/slopstop:slop-check main..HEAD
# or pipe a PR diff directly
gh pr diff 123 | slopstop check --diff - --body "$(gh pr view 123 --json body -q .body)"
Create .slopstop.yml at the repo root (optional):
# disable specific detectors
disabled:
- diff-vs-description-mismatch
# adjust thresholds
thresholds:
overcommented-code.ratio: 0.5 # comment/code ratio to trigger (default 0.4)
overcommented-code.minAdditions: 30 # minimum added lines before ratio is checked (default 30)
overcommented-code.clicheBypassHits: 3 # AI-cliché hits that bypass the size gate (default 3)
overcommented-code.clicheBypassMinSize: 10 # minimum non-blank lines for cliché bypass (default 10)
diff-vs-description-mismatch.minAdditions: 200 # additions needed to trigger (default 100)
diff-vs-description-mismatch.thinDescChars: 80 # description length considered "thin" (default 80)
diff-vs-description-mismatch.largeFileCount: 8 # file count to trigger wide-change check (default 8)
test-doesnt-verify-change.confidence: 0.7 # minimum LLM confidence to emit a signal (default 0.7)
# fail the check at or above this severity
failOn: highAdd # slopstop:disable to suppress all detectors on that line, or name specific ones:
expect(true).toBe(true) // slopstop:disable placeholder-tests
expect(true).toBe(true) // slopstop:disable placeholder-tests, overcommented-code
expect(true).toBe(true) // slopstop:disableThe directive can also appear on the line immediately above the flagged line.
- Deterministic first. Layer 1 has zero LLM cost and finishes in milliseconds. LLM judging is opt-in only.
- Conservative defaults. False positives are worse than false negatives — a noisy bot gets disabled. Default thresholds err on the side of "say nothing".
- Maintainer-controlled. Every detector can be disabled. Lines can opt out with
# slopstop:disable. PR authors get clear, actionable feedback when something fires. - Unopinionated about AI use. SlopStop does not claim "AI-generated" — it flags patterns common in low-effort PRs, regardless of source. Real humans write slop too.
Slop comes in many shapes. If you encounter a slop PR pattern this project misses, please open an issue with a redacted example diff and a description of the tell. New detectors are easy to add — see docs/detectors.md.
This is currently a one-maintainer project. Issue triage is best-effort but I aim for under 24 hours.
MIT. See LICENSE.