feat: Add Inductor backend configs by JewelRoam · Pull Request #688 · PaddlePaddle/GraphNet

JewelRoam · 2026-04-15T08:27:48Z

Overview

This PR introduces configuration for PyTorch Inductor backend,
allowing users to select predefined config templates that set groups of
torch._inductor.config overrides. This provides an extension to PyTorch's
official "mode" concept while maintaining full compatibility with existing
test_compiler.py framework.

Motivation

Previously, InductorBackend accepted only basic config parameters through
individual inductor_config dictionary entries. Users could not easily enable
common combinations of Inductor options such as:

CUTLASS-based GEMM kernels - For optimal GEMM performance on modern GPUs
CUDA Graphs - To reduce kernel launch overhead for small batch inference
Model freezing - To inline weights as constants for deployment optimization
TMA (Tensor Memory Accelerator) - For H100+ GPUs with hardware acceleration

This PR addresses these limitations by introducing config templates - pre-defined,
well-tested combinations of torch._inductor.config options that users can select
by name. Templates, mode, and options are mutually exclusive, providing clear
separation of concerns.

Changes Summary

1. Inductor Backend Configuration Templates

File: graph_net_bench/torch/backend/inductor_backend.py

Features

_INDUCTOR_CONFIG_TEMPLATES dictionary with 6 predefined templates
Parameter: graph_net_inductor_config_template - select template by name
Mutual exclusivity: exactly one of template/mode/options can be specified
No global state mutation: uses torch.compile's options parameter directly

Supported Templates

Template	Description	Config Overrides
`triton`	Default Triton backend	`cpp_wrapper: False`
`cpp_wrapper`	C++ wrapper for kernels	`cpp_wrapper: True`
`cudagraphs`	CUDA Graphs	`triton.cudagraphs: True`
`max_autotune`	Comprehensive autotuning	4 autotune options
`freezing`	Model freezing	`freezing: True`
`tma`	TMA persistent matmul	`triton.enable_persistent_tma_matmul: True`

Note that the TMA template works universally across GPU architectures:

H100+ (CC >= 9.0): Enables TMA persistent kernels
A100 or other GPUs: Enables non-TMA persistent kernels as fallback

No runtime error occurs on GPUs without TMA support.

2. CUDA Graphs Compatibility Fix

File: graph_net_bench/torch/test_compiler.py

When CUDA Graphs is enabled, output tensor pointers are recorded to CUDA Graph buffers.
Subsequent model calls overwrite these buffers, causing errors when accessing compiled
output after eager run. Fixed by cloning outputs immediately:

if isinstance(outs, torch.Tensor):
    outs = outs.clone()
elif isinstance(outs, tuple):
    outs = tuple(t.clone() if isinstance(t, torch.Tensor) else t for t in outs)

Note: eval_backend_perf.py and eval_backend_diff.py are unaffected
(torch.save/torch.load creates independent copies).

3. Test

File: test/inductor_backend_test.py (new file, ~290 lines)

Test Coverage:

Template validation: 11 tests
Parameter handling: 5 tests
Config validation: 9 tests
Integration: 1 test
Total: 26 tests

Usage

# Schema: exactly one of template/mode/options can be specified
--config '{"graph_net_inductor_config_template": "<template_name>"}'    # template
--config '{"mode": "<mode_name>"}'                                    # mode
--config '{"options": {...}}'                                          # custom

# Example: use max-autotune template
python -m graph_net_bench.torch.test_compiler \
    --compiler inductor \
    --model-path samples/torchvision/alexnet \
    --config "$(echo '{"graph_net_inductor_config_template": "max_autotune"}' | base64 -w0)" \
    --trials 5 --warmup 3

Documentation References

All configuration keys verified against PyTorch source code:

Config File: https://github.com/pytorch/pytorch/blob/main/torch/_inductor/config.py
Compile API: https://pytorch.org/docs/stable/generated/torch.compile.html

## Overview This PR introduces a flexible configuration system for PyTorch Inductor backend with 8 predefined config templates, CUDA Graphs compatibility fix, and comprehensive unit tests (28 tests total). ## Changes - Inductor backend with 8 config templates (triton, cpp_wrapper, cutlass, aten, cudagraphs, max_autotune, freezing, tma) - CUDA Graphs output buffer overwrite fix in test_compiler.py - 28 unit tests in test/inductor_backend_test.py ## Testing - All config keys verified against PyTorch 2.7.1 source code - All templates tested with actual model compilation - Unit tests pass: 28/28 OK - TMA config gracefully falls back on non-TMA GPUs (A100) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

paddle-bot · 2026-04-15T08:27:56Z

Thanks for your contribution!

…to bench

…integration - Rename to for clarity - Remove redundant and templates (merge into ) - Add mutual exclusion check: exactly one of template/mode/options can be specified - Remove global config modification; use torch.compile's parameter directly - Clean up mapping (template no longer affects mode) - Update tests to reflect template exclusivity and parameter changes - Simplify __call__ with inline conditional kwargs expansion Templates now exclusively control torch._inductor.config options, while is passed directly to torch.compile without interference.

Xreki

LGTM

JewelRoam changed the title ~~feat: Add Inductor backend config templates and comprehensive test suite~~ feat: Add Inductor backend config templates Apr 15, 2026

JewelRoam added 2 commits April 15, 2026 17:29

Merge branch 'develop' of https://github.com/PaddlePaddle/GraphNet in…

cf2571a

…to bench

JewelRoam changed the title ~~feat: Add Inductor backend config templates~~ feat: Add Inductor backend configs Apr 16, 2026

Xreki approved these changes Apr 23, 2026

View reviewed changes

JewelRoam merged commit ac35ce2 into PaddlePaddle:develop Apr 23, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Inductor backend configs#688

feat: Add Inductor backend configs#688
JewelRoam merged 3 commits into
PaddlePaddle:developfrom
JewelRoam:bench

JewelRoam commented Apr 15, 2026 •

edited

Loading

Uh oh!

paddle-bot Bot commented Apr 15, 2026

Uh oh!

Xreki left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

JewelRoam commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Motivation

Changes Summary

1. Inductor Backend Configuration Templates

Features

Supported Templates

2. CUDA Graphs Compatibility Fix

3. Test

Usage

Documentation References

Uh oh!

paddle-bot Bot commented Apr 15, 2026

Uh oh!

Xreki left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JewelRoam commented Apr 15, 2026 •

edited

Loading