Skip to content

zhuxiangyi/slopstop

Repository files navigation

SlopStop

Detect and block low-quality AI-generated pull requests before they waste maintainer time.

CI npm License: MIT

Maintainers everywhere are drowning in AI slop PRs — drive-by pull requests written by an LLM in 30 seconds, with hallucinated APIs, placeholder tests, and 800 lines of comments explaining trivial code. SlopStop is a free, open-source GitHub Action and CLI that scores incoming PRs and helps you stop slop at the door.

SlopStop verdict: ❌ Likely slop
3 signal(s): 2 high, 1 warning.

  [HIGH] overcommented-code — Disproportionate comment-to-code ratio
    src/util.ts
    18/22 added lines are comment-only (82%), with 4 AI cliché phrase(s).

  [HIGH] placeholder-tests — Tests appear to assert nothing meaningful
    src/util.test.ts:8
    3 placeholder assertion(s) in test file: expect(true).toBe(true) (L8), assert True (L12).

  [WARNING] diff-vs-description-mismatch — Large change with thin description
    PR adds 412 lines across 6 file(s) but description is only 23 chars.

Why this exists

In 2026 the volume of AI-generated PRs has exploded. Most are useful. A growing minority are not — they are pattern-matched plausibility, not engineering.

Open-source maintainers have started publicly refusing to accept LLM-generated submissions (Curl, Jeff Geerling, Axios coverage) because triaging slop costs more than the slop saves. SlopStop tries to fix the asymmetry: surface obvious tells in seconds so maintainers can ignore the worst PRs without reading them line by line.

Using AI to write code isn't the problem. The problem is signal loss in the review process.

Traditional PR review rests on an implicit assumption: writing 500 lines of code costs the author something, so they probably thought it through. AI breaks that assumption — the marginal cost of generating code approaches zero, which creates "submit first, think later" behavior. Reviewer time stays constant; the volume of things needing review explodes.

SlopStop's real job is to restore that signal, not to block AI use. The detectors flag symptoms of low-effort submissions regardless of their origin:

  • placeholder-tests doesn't flag "AI-written tests" — it flags tests that verify nothing. That's a bug no matter who wrote them.
  • overcommented-code doesn't flag lots of comments — it flags a severe comment-to-code imbalance, which usually means the author didn't deeply understand what the code was doing.
  • diff-vs-description-mismatch flags a disconnect between changes and their stated intent, leaving reviewers unable to judge the purpose of a change.

The intended audience is maintainers, not authors. When PR volume is 10× but reviewer time is fixed, this tool helps answer: which PRs deserve careful attention, and which can be closed immediately?

Typical scenarios

1. Open-source: screening external contributors

Contributors generate PRs at near-zero cost. SlopStop runs on pull_request open/synchronize events, scores automatically, and fails CI on high-severity signals — so maintainers can skip the worst submissions without reading them.

2. Internal teams: quality gate before review

Add SlopStop to CI with fail-on: warning. It acts as a pre-review check that catches structurally empty tests or change/description mismatches before a human reviewer ever opens the diff.

3. Individual developers: pre-push self-check

The Claude Code plugin runs SlopStop automatically before push, using your existing Claude credentials — no separate API key needed. Catch your own slop before anyone else sees it.


What it detects

Layer 1 — deterministic, free, runs always

Detector What it flags
overcommented-code Files where most added lines are comments, especially LLM-cliché phrases like "Here's the implementation" or "In a real-world scenario..."
placeholder-tests Test files containing expect(true).toBe(true), assert True, t.Skip(), and similar empty assertions.
diff-vs-description-mismatch PRs with hundreds of additions but a one-line description, or wide-touching changes not framed as refactors.

Layer 2 — LLM judge, opt-in via your own API key

Brings semantic understanding — runs ONLY after Layer 1 fires, so it costs nothing on clean PRs.

Detector What it flags
test-doesnt-verify-change LLM verdict that the tests in the PR do not actually exercise the source changes — even if they don't use literal expect(true) patterns.
description-doesnt-match-changes LLM verdict that the PR description omits significant changes or fabricates claims not present in the diff.

Supported providers (BYO key):

  • Anthropic (claude-haiku-4-5 default) — set ANTHROPIC_API_KEY
  • OpenAI (gpt-4o-mini default) — set OPENAI_API_KEY
  • OpenAI-compatible (vLLM, LiteLLM, Together, etc.) — set SLOPSTOP_LLM_API_KEY and SLOPSTOP_LLM_BASE_URL
  • Ollama (self-hosted, no key) — defaults to http://localhost:11434
  • Claude Code session — reuses your existing claude login credentials when running inside the plugin

Cost: typical PR run is <$0.01. Hard caps: 50 calls and 100,000 tokens per run by default.


Quick start

Three ways to use SlopStop. Same engine, different surfaces.

1. GitHub Action (catch PRs from contributors)

.github/workflows/slopstop.yml:

name: SlopStop
on:
  pull_request:
    types: [opened, synchronize, reopened]

permissions:
  contents: read
  pull-requests: write

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with: { fetch-depth: 0 }
      - uses: slopstop/slopstop@v0.2
        with:
          fail-on: high
          judge: anthropic        # off | anthropic | openai | openai-compat | ollama | auto
          judge-model: claude-haiku-4-5   # optional, provider default if omitted
          judge-budget-tokens: 100000     # hard cap on total tokens per run
          judge-budget-calls: 50          # hard cap on LLM calls per run
          # judge-base-url: http://...    # for openai-compat or ollama
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}

Layer 2 only kicks in if you set the API key as a repo secret. Without it, the action runs Layer 1 only.

2. CLI (local self-check or any CI)

# install
npm install -g slopstop

# Layer 1 only
slopstop check --from main --to HEAD --body "$(git log -1 --pretty=%B)"

# Layer 1 + Layer 2 (auto-detects provider from env)
ANTHROPIC_API_KEY=sk-... slopstop check --judge auto --from main --to HEAD --body "..."

# Custom model
ANTHROPIC_API_KEY=sk-... slopstop check --judge anthropic --judge-model claude-opus-4-7 --from main --to HEAD --body "..."

# OpenAI-compatible endpoint (vLLM, LiteLLM, Together, etc.)
SLOPSTOP_LLM_API_KEY=sk-... slopstop check \
  --judge openai-compat \
  --judge-model your-model-name \
  --judge-base-url https://your-endpoint/v1 \
  --from main --to HEAD --body "..."

# Ollama (self-hosted, no key)
slopstop check --judge ollama --judge-model llama3.2 --judge-base-url http://localhost:11434 \
  --from main --to HEAD --body "..."

# Health check: what's wired up?
slopstop diagnose

# From stdin
git diff main..HEAD | slopstop check --diff - --body "fix"

Run slopstop --help for the full option list.

3. Claude Code plugin (block your own slop before push)

/plugin install slopstop

Then reload:

/reload-plugins

The plugin adds three things:

  • /slopstop:slop-check — run Layer 1 from the CLI + Layer 2 from your Claude session (no API key needed)
  • /slopstop:slop-explain — explain a signal in plain language and propose concrete fixes
  • slopstop skill — auto-loaded discipline that teaches Claude to avoid slop patterns when generating code, tests, or PR descriptions
  • Pre-push hook — runs Layer 1 automatically before git push or gh pr create

By default the hook uses Layer 1 only and fails on warning. To enable Layer 2 in the hook, set hook_judge in the plugin config.

To check a specific range or PR:

/slopstop:slop-check main..HEAD

# or pipe a PR diff directly
gh pr diff 123 | slopstop check --diff - --body "$(gh pr view 123 --json body -q .body)"

Configuration

Create .slopstop.yml at the repo root (optional):

# disable specific detectors
disabled:
  - diff-vs-description-mismatch

# adjust thresholds
thresholds:
  overcommented-code.ratio: 0.5              # comment/code ratio to trigger (default 0.4)
  overcommented-code.minAdditions: 30        # minimum added lines before ratio is checked (default 30)
  overcommented-code.clicheBypassHits: 3     # AI-cliché hits that bypass the size gate (default 3)
  overcommented-code.clicheBypassMinSize: 10 # minimum non-blank lines for cliché bypass (default 10)
  diff-vs-description-mismatch.minAdditions: 200   # additions needed to trigger (default 100)
  diff-vs-description-mismatch.thinDescChars: 80   # description length considered "thin" (default 80)
  diff-vs-description-mismatch.largeFileCount: 8   # file count to trigger wide-change check (default 8)
  test-doesnt-verify-change.confidence: 0.7        # minimum LLM confidence to emit a signal (default 0.7)

# fail the check at or above this severity
failOn: high

Opting out per line

Add # slopstop:disable to suppress all detectors on that line, or name specific ones:

expect(true).toBe(true) // slopstop:disable placeholder-tests
expect(true).toBe(true) // slopstop:disable placeholder-tests, overcommented-code
expect(true).toBe(true) // slopstop:disable

The directive can also appear on the line immediately above the flagged line.


Design principles

  1. Deterministic first. Layer 1 has zero LLM cost and finishes in milliseconds. LLM judging is opt-in only.
  2. Conservative defaults. False positives are worse than false negatives — a noisy bot gets disabled. Default thresholds err on the side of "say nothing".
  3. Maintainer-controlled. Every detector can be disabled. Lines can opt out with # slopstop:disable. PR authors get clear, actionable feedback when something fires.
  4. Unopinionated about AI use. SlopStop does not claim "AI-generated" — it flags patterns common in low-effort PRs, regardless of source. Real humans write slop too.

Contributing

Slop comes in many shapes. If you encounter a slop PR pattern this project misses, please open an issue with a redacted example diff and a description of the tell. New detectors are easy to add — see docs/detectors.md.

This is currently a one-maintainer project. Issue triage is best-effort but I aim for under 24 hours.


License

MIT. See LICENSE.

About

GitHub Action, CLI, and Claude Code plugin to detect AI slop PRs — placeholder tests, overcommented code, and hallucinated changes. Free Layer 1 + optional LLM judge.

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors