PassNet is an AI system for compiler optimization that leverages LLM-driven agents to automatically generate high-performance GPU kernels through compiler pass mechanisms for computation graph optimization. PassNet includes a complete optimization toolchain, the PassBench evaluation benchmark, and the PassAgent agent evaluation framework.
English | δΈζ
- Paper: arXiv:2605.29357
- Dataset: PassNet on HuggingFace
- Leaderboard: PassBench Leaderboard
- Project Structure
- Architecture Overview
- Core Components
- DataSet
- Quick Start
- PassBench Evaluation Pipeline
- PassAgent Evaluation
- License
PassNet/
βββ pass_bench/ # PassBench compiler evaluation framework: kernel compilation, correctness verification, performance benchmarking
βββ pass_agent/ # PassAgent evaluation framework
βββ samples/ # PassBench sample data
βββ sample_lists/ # PassBench sample list files (eval/train splits)
βββ entry_scripts/ # Evaluation entry scripts
βββ graphs/ # Subgraph data
βββ graph_lists/ # Subgraph lists and grouping info
βββ test/ # Unit tests
βββ Dockerfile.nvidia # Docker image definition
βββ requirements.txt # Python dependencies
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β PassAgent β
β (LLM-driven Pass Generation) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ ββββββ
β β Multi-step Iterative Solving Β· k-attempts Β· R2E-Gym Framework β β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β β
ββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββββββββ¬βββββββββββββββββ β
read data β generated pass β β
βΌ βΌ β
βββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
β DataSet β β PassBench β β
β βββββββββββββββββββββββββββββββ β β ββββββββββββββββββββββββββββ β β
β β graphs/ β β β β 1. Execution & Eval β β β
β β sole_op (5,939) β β β β Eager Execution β β β
β β fusible (22,870) β β β β pass_mgr Execution β β β
β β typical (25,151) β β β ββββββββββββββ¬ββββββββββββββ β β
β βββββββββββββββββββββββββββββββ β β β β β
β βββββββββββββββββββββββββββββββ β β βΌ β feedback
β β samples/ β β β ββββββββββββββββββββββββββββ β β
β β sole_op (1,029) β β β β 2. Result Checking β β β
β β fusible (4,676) β β β β Correctness & Speedup β β β
β β typical (4,278) β β β ββββββββββββββ¬ββββββββββββββ β β
β βββββββββββββββββββββββββββββββ β β β β β
β βββββββββββββββββββββββββββββββ β β βΌ β β
β β sample_lists/ β β β ββββββββββββββββββββββββββββ β β
β β train/ β β β β 3. Score Aggregation β β β
β β eval/ β β β β ES(t) & AS Met β β β
β βββββββββββββββββββββββββββββββ β β ββββββββββββββββββββββββββββ β β
βββββββββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββββ β
βββββββββββββββββββββββ
PassBench β Compiler Evaluation Framework
Provides kernel compilation, correctness verification, and performance benchmarking. It serves as both a standalone evaluation tool and the backend evaluation framework invoked by PassAgent:
- Kernel Compilation: Executes pass matching and replacement via the
pass_mgrcompiler method - Correctness Verification: Validates numerical correctness of optimized kernels against dtype-specific tolerance thresholds (float32 / float16 / bfloat16)
- Performance Benchmarking: Measures speedup over 100 trials and outputs
aggregated_score.json - Score Aggregation:
aggregate_es_scores.pycomputes ES(t) scores across all graphs in a sample
PassAgent β R2E-Gym Agent Evaluation Framework
Evaluates agent capabilities for compiler optimization using the R2E-Gym framework. See pass_agent/README.md for details.
Stores raw computation subgraphs extracted from deep learning models, serving as the source for PassBench samples:
- fusible_subgraphs/: A small set of example fusible subgraphs (1,456), containing computation graphs with multi-operator fusion opportunities
- hf_subgraphs/ (Legacy): Previous version subgraph data, containing sole op (1,410), fusible (4,167), and typical (6,157) categories
- hf_subgraphs_v2/: HuggingFace model subgraphs, organized into three categories:
sole_op_subgraphs: Single-operator subgraphs (5,939)fusible_subgraphs: Fusible subgraphs (22,870)typical_subgraphs: Typical subgraphs (25,151)
Stores subgraph path lists, UID groupings, and other information for sample filtering and group management:
Subgraph Path Lists (line format: subgraph_UID\tsubgraph_relative_path)
| File | Subgraphs | Description |
|---|---|---|
fusible_subgraphs.txt |
1,455 | Example fusible subgraph paths |
hf_sole_op_subgraphs.txt |
1,410 | Legacy sole op subgraph paths |
hf_fusible_subgraphs.txt |
4,166 | Legacy fusible subgraph paths |
hf_typical_subgraphs.txt |
6,157 | Legacy typical subgraph paths |
hf_sole_op_subgraphs_v2.txt |
5,939 | v2 sole op subgraph paths |
hf_fusible_subgraphs_v2.txt |
22,870 | v2 fusible subgraph paths |
hf_typical_subgraphs_v2.txt |
25,151 | v2 typical subgraph paths |
Evaluation samples generated from graphs/, each serving as an independently executable evaluation unit:
- fusible_subgraphs/: A small set of example samples from TIMM models' fusible subgraphs, organized by
model_name/subgraph_index - hf_subgraphs/ (Legacy): Previous version subgraph samples, containing sole op (590), fusible (2,489), and typical (3,382) categories
- hf_subgraphs_v2/: v2 subgraph samples with extended multi-dtype support, containing sole op (1,029), fusible (4,676), and typical (4,278) categories, organized by hash path
xx/yy/hash/, dataset published at PassNet/PassNet
Each sample directory contains:
| File | Description |
|---|---|
entry.sh |
Evaluation entry script that executes compilation, verification, and performance statistics |
graph_list.txt |
List of computation graphs included in the sample |
graphs/ |
Computation graph definitions (model.py, weight_meta.py, etc.) |
pass_dir/ |
Output directory for generated optimization passes |
pass_bench/ |
Copy of the evaluation framework (for standalone execution within Docker containers) |
sample_uids.txt |
Unique sample identifier (hf_subgraphs_v2 only) |
Stores sample path lists for evaluation and training, organized by purpose and subgraph type, available in both txt and csv formats:
train/ (Training Set)
| File | Samples | Description |
|---|---|---|
hf_sole_op_train_samples_v2.txt |
1,028 | Sole op subgraph training samples |
hf_fusible_train_samples_v2.txt |
4,476 | Fusible subgraph training samples |
hf_typical_train_samples_v2.txt |
4,078 | Typical subgraph training samples |
hf_sole_op_train_samples.txt (Legacy) |
589 | Legacy sole op subgraph training samples |
hf_fusible_train_samples.txt (Legacy) |
2,289 | Legacy fusible subgraph training samples |
hf_typical_train_samples.txt (Legacy) |
3,182 | Legacy typical subgraph training samples |
eval/ (Evaluation Set)
| File | Samples | Description |
|---|---|---|
hf_fusible_eval_samples_v2.txt |
200 | Fusible subgraph evaluation samples |
hf_typical_eval_samples_v2.txt |
200 | Typical subgraph evaluation samples |
hf_fusible_eval_samples.txt (Legacy) |
200 | Legacy fusible subgraph evaluation samples |
hf_typical_eval_samples.txt (Legacy) |
200 | Legacy typical subgraph evaluation samples |
- Python 3.12+
- PyTorch 2.9+ (CUDA 12.8)
- NVIDIA GPU (CUDA support)
- Docker (optional, for containerized evaluation)
cd /path/to/passnet
# Install dependencies
pip install -r requirements.txt
# Set environment variables
export PYTHONPATH=$PYTHONPATH:/path/to/passnet# Verify sample evaluation
bash samples/fusible_subgraphs/crossvit_15_dagger_240.in1k/crossvit_15_dagger_240.in1k_0_start14_end16_4/entry.shdocker build . -t passnet:latest -f Dockerfile.nvidiadocker run --gpus all --privileged \
-v <path-to-passnet-project>:/workspace \
-w /workspace \
passnet:latest \
bash samples/fusible_subgraphs/crossvit_15_dagger_240.in1k/crossvit_15_dagger_240.in1k_0_start14_end16_4/entry.shThe PassNet evaluation pipeline works as follows:
- Analyze computation graph: Read
model.pyandweight_meta.pyto understand the target subgraph's operators, tensor shapes, and dtypes - Generate optimization pass: LLM agent generates a pass file and places it in
pass_dir/ - Pass matching and replacement:
pass_mgrmatches the pattern in the FX graph and replaces it with the optimized kernel - Correctness verification: Compare eager and compiled outputs using dtype-specific tolerance thresholds
- Performance benchmarking: Measure speedup and compute ES(t), output
aggregated_score.json
# place your pass file
cp MyPass.py samples/<type>/<hash>/pass_dir/
echo '["MyPass"]' > samples/<type>/<hash>/pass_dir/sorted_output_pass_rule_names.json
# run evaluation for a single sample
bash samples/<type>/<hash>/entry.shSee pass_bench/README.md for pass file format and batch evaluation.
Evaluate agents using the PassAgent framework:
cd pass_agent
pip install -r requirements.txt
python examples/run_pass_agent_demo.py \
--llm-name openai/glm-4.7 \
--llm-base-url <your-llm-base-url> \
--openai-api-key <your-api-key> \
--dataset datasets/passbench_demo_dataset.jsonl \
--max-steps 50 \
--k 10See pass_agent/README.md for details.
Please refer to the license file in the project root directory.