Add quantization examples and documentation for IREE by Copilot · Pull Request #3 · ucb-bar/iree

Copilot · 2025-12-07T03:37:04Z

Users lack documentation and examples for quantization support in IREE. This adds comprehensive reference material and working scripts for INT8, INT4, FP8, and FP4 quantization formats.

Changes

Documentation

README.md - Quick start guide covering all quantization types, hardware requirements, compilation workflows
QUANTIZATION_SUPPORT.md - Technical reference documenting supported formats (INT8, INT4, FP8 E4M3/E5M2 variants, experimental FP4), compiler passes, and performance characteristics

Scripts

quantize_mobilenet_v2.py - End-to-end example demonstrating INT8 quantization via ONNX Runtime, INT4/FP8 pattern generation, model download
int8_quantization.py - Dynamic and static INT8 quantization with calibration
int4_quantization.py - Generates MLIR examples showing i4 types, grouped quantization, dequantization fusion patterns
fp8_quantization.py - Generates hardware-specific FP8 examples (E4M3FNUZ/E5M2FNUZ for AMD MI300, E4M3FN/E5M2 for NVIDIA Hopper)
test.sh - Validation suite

Supported Quantization Types

Format	Hardware	Status	Compiler Location
INT8 (i8/si8/ui8)	Universal	Production	`compiler/src/iree/compiler/Dialect/Flow/IR/FlowBase.td`
INT4 (i4/si4/ui4)	CPU, GPU	Production	`compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp`
FP8 E4M3FNUZ/E5M2FNUZ	AMD gfx942, gfx950	Production	`compiler/plugins/target/ROCM/builtins/mlir_ukernel/`
FP8 E4M3FN/E5M2	NVIDIA SM 90+	Production	`compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp`
FP4 E2M1FN	Limited	Experimental	`tests/e2e/linalg/fp4_f32_conversion.mlir`

Usage Example

# Download and quantize MobileNet V2 in all formats
python quantize_mobilenet_v2.py --download
python quantize_mobilenet_v2.py --model mobilenet_v2.onnx --all

# Compile INT8 model for CPU
iree-import-onnx mobilenet_v2_int8.onnx -o model.mlir
iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb

# Compile FP8 model for AMD GPU
iree-compile model.mlir --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx942 -o model.vmfb

All scripts include inline documentation explaining IREE's quantization patterns, optimization passes (FuseDequantizationMatmul), and hardware-specific considerations.

Original prompt

Which type of quantization is it supported by IREE?

CAn you generate scripts that if we have a mobilenet_v2.onnx we can actually get all the different quantization formats that IREE supports?

I was interested in int4, int8, f8 and f4. But i dont know if all of those are supported and what are the best way to get them

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…4d OPU

Co-authored-by: copparihollmann <70057799+copparihollmann@users.noreply.github.com>

copparihollmann and others added 2 commits November 14, 2025 11:52

Modifications to enable capture and dispatch of custom kernel for mmt…

57887ef

…4d OPU

Initial plan

80ddbf2

Copilot AI assigned Copilot and copparihollmann Dec 7, 2025

Copilot started work on behalf of copparihollmann December 7, 2025 03:37 View session

Copilot AI and others added 3 commits December 7, 2025 03:48

Add comprehensive quantization examples and scripts for IREE

d3e6f16

Co-authored-by: copparihollmann <70057799+copparihollmann@users.noreply.github.com>

Address code review feedback - extract constants and improve portability

74653ff

Co-authored-by: copparihollmann <70057799+copparihollmann@users.noreply.github.com>

Add comprehensive quantization support documentation

e64bc01

Co-authored-by: copparihollmann <70057799+copparihollmann@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Add scripts for mobile net v2 quantization formats in IREE~~ Add quantization examples and documentation for IREE Dec 7, 2025

Copilot AI requested a review from copparihollmann December 7, 2025 03:54

Copilot finished work on behalf of copparihollmann December 7, 2025 03:54

copparihollmann mentioned this pull request Dec 10, 2025

Add IREE-Evo: Autonomous evolutionary compiler optimization framework with quantization use case #5

Draft

8 tasks

copparihollmann force-pushed the main branch from b18b661 to 56acf7e Compare January 24, 2026 00:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add quantization examples and documentation for IREE#3

Add quantization examples and documentation for IREE#3
Copilot wants to merge 5 commits intomainfrom
copilot/add-mobilenet-v2-quantization-scripts

Copilot AI commented Dec 7, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Dec 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Documentation

Scripts

Supported Quantization Types

Usage Example

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Dec 7, 2025 •

edited

Loading