This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
This is the public repo of PIArena — do not leak private info. PIArena is a platform for prompt injection (PI) attack and defense evaluation, providing a plug-and-play toolbox and systematic benchmark for LLM systems.
# Setup
conda create -n piarena python=3.10 -y && conda activate piarena
pip install -r requirements.txt
pip install -e . # Install piarena as editable package (required for batch scripts)
# torch/vllm are commented out in requirements.txt — install separately for your CUDA version
huggingface-cli login
# Run single evaluation (requires GPU)
python main.py --dataset open_prompt_injection --attack combined --defense pisanitizer
python main.py --config configs/experiments/my_experiment.yaml
# Search-based attacks: PAIR, TAP, strategy_search (needs backend + attacker LLMs)
python main_search.py --attack pair --backend_llm <model> --attacker_llm <model> --defense pisanitizer
python main_search.py --attack tap --backend_llm <model> --attacker_llm <model> --defense pisanitizer
python main_search.py --attack strategy_search --backend_llm <model> --attacker_llm <model> --defense pisanitizer --batch_size 8
# Run batch experiments (edit scripts to configure jobs, GPUs, datasets)
python scripts/run.py # Standard attacks
python scripts/run_search.py # Search-based attacks
# Agent benchmarks
git submodule update --init --recursive
cd agents/agentdojo && pip install -e . && cd ../..
python main_injecagent.py --model meta-llama/Llama-3.1-8B-Instruct --defense none
python main_agentdojo.py --model gpt-5-mini --attack none --suite workspace
python main_agentdojo.py --model gpt-4o-2024-08-06 --attack important_instructions --defense datafilter --suite shopping
# Website (React 18 + Vite)
cd website && npm install && npm run devIf editable install fails in a fresh environment, upgrade the packaging toolchain first:
pip install -U pip setuptools wheelThe repository uses setuptools.build_meta in pyproject.toml for editable installs.
Attack → Defense → LLM → Evaluation
Each dataset item has: target_inst, context, injected_task, target_task_answer, injected_task_answer.
- Attack transforms
contextintoinjected_context(embedding the injected task) - Defense processes
(target_inst, injected_context)and queries the LLM - Evaluation checks both utility (did LLM answer the original task?) and ASR (did the injection succeed?)
Global ATTACK_REGISTRY and DEFENSE_REGISTRY use @registry.register decorators. Modules auto-register on import via __init__.py. Factory functions: get_attack(name, config), get_defense(name, config).
The Model class selects backend by model name string:
- Contains
azure→ Azure OpenAI (config fromconfigs/azure_configs/) - Contains
google→ Google GenAI (config fromconfigs/google_configs/) - Contains
anthropic→ Anthropic SDK (config fromconfigs/anthropic_configs/) - Everything else → HuggingFace Transformers (loaded via
AutoModelForCausalLM)
Query interface: model.query(messages, max_new_tokens=1024, temperature=0.01) where messages is a list of {"role": str, "content": str} dicts. Batch interface: model.batch_query(messages_list, ...).
Evaluator is chosen by dataset name pattern:
open_prompt_injection→llm_judge+open_prompt_injection_utilitysep→llm_judge+llm_judgeknowledge_corruption→substring_match+substring_match*_long→llm_judge+ LongBench metrics (qa_f1, rouge, retrieval, code_sim)- Default →
llm_judge+llm_judge
The llm_judge uses Qwen/Qwen3-4B-Instruct-2507 by default (globally cached).
- Create a new file in
piarena/attacks/orpiarena/defenses/ - Subclass
BaseAttackorBaseDefense, set anameclass attribute - Decorate with
@ATTACK_REGISTRY.registeror@DEFENSE_REGISTRY.register - Import the module in the package
__init__.pyto trigger registration
BaseAttack: implement execute(context, injected_task, **kwargs) -> str returning injected context.
BaseDefense: implement execute(target_inst, context) -> dict and optionally override get_response(target_inst, context, llm) -> dict.
Config merging: DEFAULT_CONFIG (class-level) is merged with config dict passed to constructor.
Batch support now lives on the defense classes themselves through BaseDefense.execute_batch() and BaseDefense.get_response_batch(). Defenses may override these methods for true batching or rely on the default loop-based fallback. strategy_search uses the defense object directly rather than a separate batch registry.
- Results saved to
results/evaluation_results/{name}/{dataset}-{llm}-{attack}-{defense}-{seed}.json - Attack results cached separately in
tmp_attack_results/for reuse across defense runs (use--attack_pathto load pre-computed attacks) - Re-running skips already-computed sample indices
CLI args > YAML config (configs/experiments/) > hardcoded defaults. YAML supports attack_config and defense_config sub-dicts for component-specific settings.
main.py— Main evaluation pipeline (GPU required)main_search.py— Search-based attacks: PAIR, TAP, strategy_searchpair/tapstill use an eager attacker model objectstrategy_searchaccepts an attacker model path and lazily loadsattacker_llmonly if a non-vLLM fallback is needed
main_injecagent.py/main_agentdojo.py— Agent benchmarksmain_agentdojo.pynow covers both classic AgentDojo suites and merged AgentDyn suites in the vendoredagents/agentdojotree
scripts/run.py— Batch runner for standard attacksscripts/run_search.py— Batch runner for search-based attacksscripts/run_injecagent.py/scripts/run_agentdojo.py— Batch runners for agent benchmarks- All batch scripts use
GPUSchedulerfrompiarena/gpu_utils.pyfor least-loaded GPU scheduling (auto-detects local vs Slurm)
CHANGELOG.md— running record of notable repository changesdocs/- user manuals hosted on the project page and consumed by the website
- When planning implementation work, write the plan as a markdown file under
plans/. - When code changes affect behavior, APIs, scripts, workflows, or the website, update the related markdown files and docs in the same change.
- Record notable repository changes in
CHANGELOG.mdas part of the same task.
Datasets load from local JSON in datasets/ first, falling back to HuggingFace (sleeepeer/PIArena). 17 datasets covering QA, summarization, extraction, long-context, and knowledge corruption tasks.
inject(clean_data, injected_prompt, inject_position, inject_times) — positions: "end", "start", "random" (random sentence insertion). Used by heuristic attacks.