Skip to content

Add quantization examples and documentation for IREE#3

Draft
Copilot wants to merge 5 commits intomainfrom
copilot/add-mobilenet-v2-quantization-scripts
Draft

Add quantization examples and documentation for IREE#3
Copilot wants to merge 5 commits intomainfrom
copilot/add-mobilenet-v2-quantization-scripts

Conversation

Copy link
Copy Markdown

Copilot AI commented Dec 7, 2025

Users lack documentation and examples for quantization support in IREE. This adds comprehensive reference material and working scripts for INT8, INT4, FP8, and FP4 quantization formats.

Changes

Documentation

  • README.md - Quick start guide covering all quantization types, hardware requirements, compilation workflows
  • QUANTIZATION_SUPPORT.md - Technical reference documenting supported formats (INT8, INT4, FP8 E4M3/E5M2 variants, experimental FP4), compiler passes, and performance characteristics

Scripts

  • quantize_mobilenet_v2.py - End-to-end example demonstrating INT8 quantization via ONNX Runtime, INT4/FP8 pattern generation, model download
  • int8_quantization.py - Dynamic and static INT8 quantization with calibration
  • int4_quantization.py - Generates MLIR examples showing i4 types, grouped quantization, dequantization fusion patterns
  • fp8_quantization.py - Generates hardware-specific FP8 examples (E4M3FNUZ/E5M2FNUZ for AMD MI300, E4M3FN/E5M2 for NVIDIA Hopper)
  • test.sh - Validation suite

Supported Quantization Types

Format Hardware Status Compiler Location
INT8 (i8/si8/ui8) Universal Production compiler/src/iree/compiler/Dialect/Flow/IR/FlowBase.td
INT4 (i4/si4/ui4) CPU, GPU Production compiler/src/iree/compiler/GlobalOptimization/FuseDequantizationMatmul.cpp
FP8 E4M3FNUZ/E5M2FNUZ AMD gfx942, gfx950 Production compiler/plugins/target/ROCM/builtins/mlir_ukernel/
FP8 E4M3FN/E5M2 NVIDIA SM 90+ Production compiler/src/iree/compiler/Codegen/Dialect/GPU/IR/IREEGPUAttrs.cpp
FP4 E2M1FN Limited Experimental tests/e2e/linalg/fp4_f32_conversion.mlir

Usage Example

# Download and quantize MobileNet V2 in all formats
python quantize_mobilenet_v2.py --download
python quantize_mobilenet_v2.py --model mobilenet_v2.onnx --all

# Compile INT8 model for CPU
iree-import-onnx mobilenet_v2_int8.onnx -o model.mlir
iree-compile model.mlir --iree-hal-target-backends=llvm-cpu -o model.vmfb

# Compile FP8 model for AMD GPU
iree-compile model.mlir --iree-hal-target-backends=rocm --iree-rocm-target-chip=gfx942 -o model.vmfb

All scripts include inline documentation explaining IREE's quantization patterns, optimization passes (FuseDequantizationMatmul), and hardware-specific considerations.

Original prompt

Which type of quantization is it supported by IREE?

CAn you generate scripts that if we have a mobilenet_v2.onnx we can actually get all the different quantization formats that IREE supports?

I was interested in int4, int8, f8 and f4. But i dont know if all of those are supported and what are the best way to get them


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits December 7, 2025 03:48
Co-authored-by: copparihollmann <70057799+copparihollmann@users.noreply.github.com>
Co-authored-by: copparihollmann <70057799+copparihollmann@users.noreply.github.com>
Co-authored-by: copparihollmann <70057799+copparihollmann@users.noreply.github.com>
Copilot AI changed the title [WIP] Add scripts for mobile net v2 quantization formats in IREE Add quantization examples and documentation for IREE Dec 7, 2025
Copilot AI requested a review from copparihollmann December 7, 2025 03:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants