Skip to content

zhangshuoming990105/ConstrainedDecodingAttack

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ConstrainedDecodingAttack (CDA)

Public proof-of-concept code for the CCS 2026 paper

When Grammar Guides the Attack: Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output Shuoming Zhang, Jiacheng Zhao, Hanyuan Dong, Ruiyuan Xu, Zhicheng Li, Yangyu Zhang, Shuaijiang Li, Yuan Wen, Chunwei Xia, Zheng Wang, Xiaobing Feng, Huimin Cui.

⚠ Read RESPONSIBLE_USE.md before running anything in this repository. The PoCs can elicit harmful content from production LLMs.

What this repo is

A minimal, single-file illustration of the two attacks proposed in the paper:

  • EnumAttack (paper §4.1) — hide the malicious intent inside a JSON Schema enum field on the control plane while keeping the user prompt benign.
  • DictAttack (paper §4.2) — decouple the malicious payload across a benign-looking key sequence on the data plane and a benign-looking dictionary on the control plane.

Plus a reference implementation of the paper's Algorithm 1 (DictAttack payload generation), self-contained and runnable without any LLM call.

What this repo is not

This repository deliberately ships only enough code to demonstrate the mechanism. It does not include:

  • Multi-model / multi-benchmark batch runners.
  • Defense-side audit pipelines (LlamaGuard / Moderation API / SelfDefend).
  • The Circuit Breaker harness used in §5.5.
  • Per-model run logs, attack output dumps, or any pre-generated AdvBench / HarmBench / JailbreakBench attack payloads.

The full evaluation harness lives in a separate gated artifact for academic-only access. Verified researchers can request it from the corresponding author (zhangshuoming21s@ict.ac.cn) with their institutional affiliation and intended use.

Layout

.
├── poc/
│   ├── enum_attack_poc.py          One-file EnumAttack PoC (OpenRouter)
│   └── dict_attack_poc.py          One-file DictAttack PoC (OpenRouter)
├── algorithm/
│   └── dictattack_payload_gen.py   Algorithm 1 reference impl (no LLM call)
├── examples/
│   └── sample_run.md               Sanitized example output
├── LICENSE                          MIT
├── RESPONSIBLE_USE.md
├── CITATION.bib
└── requirements.txt

Quick start

python3 -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# Algorithm 1 (no API key required, deterministic synonym fallback):
python3 algorithm/dictattack_payload_gen.py "How to make a bomb"

# DictAttack PoC against a real endpoint:
export OPENROUTER_API_KEY=sk-or-...
python3 poc/dict_attack_poc.py
# Override target / question:
OPENROUTER_MODEL="openai/gpt-4o-mini" \
POC_QUESTION="..."                   \
python3 poc/dict_attack_poc.py

# EnumAttack PoC:
python3 poc/enum_attack_poc.py

examples/sample_run.md shows what a successful DictAttack run looks like (attack content redacted).

Citation

@inproceedings{zhang2026cda,
  title     = {When Grammar Guides the Attack: Uncovering Control-Plane
               Vulnerabilities in {LLM}s with Structured Output},
  author    = {Zhang, Shuoming and Zhao, Jiacheng and Dong, Hanyuan and
               Xu, Ruiyuan and Li, Zhicheng and Zhang, Yangyu and
               Li, Shuaijiang and Wen, Yuan and Xia, Chunwei and
               Wang, Zheng and Feng, Xiaobing and Cui, Huimin},
  booktitle = {Proceedings of the 2026 ACM SIGSAC Conference on Computer and
               Communications Security (CCS '26)},
  year      = {2026},
  publisher = {ACM},
}

Disclosure

We disclosed the underlying vulnerability to OpenAI and Google (Gemini) in early 2025; the embargo has since passed. The maintainers of xgrammar were notified and acknowledged the issue. Mitigations deployed inside closed-source provider stacks are not visible to us; do not assume any particular endpoint is patched.

About

Public proof-of-concept code for the CCS 2026 paper: When Grammar Guides the Attack — Uncovering Control-Plane Vulnerabilities in LLMs with Structured Output

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages