AMD Quark is a comprehensive cross-platform toolkit designed to simplify and enhance the quantization of deep learning models. Supporting both PyTorch and ONNX models, AMD Quark empowers developers to optimize their models for deployment on a wide range of hardware backends, achieving significant performance gains without compromising accuracy.
Feature Set | PyTorch backend | ONNX backend |
---|---|---|
Data Types | int4, uint4, int8, uint8, float16, bfloat16, OCP FP8 E4M3/E5M2, OCP MX int8, OCP MX FP4, OCP MX FP6 E3M2/E2M3, OCP MX FP8 E4M3/E5M2 | int8, uint8, int16, uint16, int32, uint32, float16, bfloat16 |
Quant Mode | eager mode, FX graph mode | ONNX graph mode |
Quant Strategy | static quant, dynamic quant, weight-only | static quant, dynamic quant, weight-only |
Quant Scheme | per-tensor, per-channel, per-group | per-tensor, per-channel |
Symmetric | symmetric, asymmetric | symmetric, asymmetric |
Calibration Method | MinMax, Percentile, MSE | MinMax, Percentile, MinMSE, Entropy, NonOverflow |
Scale Type | float16, float32 | float16, float32 |
KV-Cache Quant | FP8 KV-Cache Quant | N/A |
Supported Ops. | nn.Linear , nn.Conv2d , nn.ConvTranspose2d , nn.Embedding , nn.EmbeddingBag , |
Most ONNX ops. |
nn.BatchNorm2d , nn.BatchNorm3d , nn.LeakyReLU , nn.AvgPool2d , nn.AdaptiveAvgPool2d |
Full List | |
Pre-Quant Optimization | SmoothQuant | QuaRot, SmoothQuant (Single_GPU/CPU), CLE, Bias Correction |
Quantization Algorithm | AWQ, GPTQ | AdaQuant, AdaRound, GPTQ |
Export Format | ONNX, JSON-Safetensors, GGUF(Q4_1) | N/A |
Operating Systems | Linux {ROCm, CUDA, CPU}, Windows {CPU} | Linux {ROCm, CUDA, CPU}, Windows {CPU} |
Quantization Technique | Supported Models |
---|---|
LLM Pruning | Model Support |
LLM Post Training Quantization (PTQ) | Model Support |
LLM Quantization Aware Training (QAT) | Model Support |
Vision Model Quantization | Model Support |
Quark for ONNX | Model Support |
Official releases of AMD Quark are available on PyPI https://pypi.org/project/amd-quark/, and can be installed with pip:
pip install amd-quark
For full instructions to install AMD Quark from Python wheels or ZIP files, refer to our 🛠️Installation Guide. The Installation Guide also contains verification steps that apply to building from source.
- Clone or download this repository.
- Follow the steps from the PyTorch website to install the appropriate PyTorch package for your system.
- You can then build and install AMD Quark, and its dependencies, which are detailed in requirements.txt, by running:
git clone --recursive https://github.com/AMD/Quark
cd Quark
# [Optional] run git submodule if you are updating an existing Quark repository
git submodule sync
git submodule update --init --recursive
pip install .
AMD Quark's documentation site contains Getting Started, API documentation for both PyTorch and ONNX backends, and other detailed information. The Installation Guide includes our Recommended First Time User Installation guide, to get set up with Quark quickly. Check out our Frequently Asked Questions for both PyTorch and ONNX for more details.
AMD Quark provides examples of Language Model and Image Classification model quantization, which can be found under examples/torch/ and examples/onnx/. These examples are documented here:
The examples folder also contain integrations of other quantizers under examples/torch/extensions/. You can read about those here:
AMD Quark is not set up to accept community contributions (bug reports, feature requests, or Pull Requests) just yet. Please watch this space!
Copyright (C) 2025, Advanced Micro Devices, Inc. All rights reserved. SPDX-License-Identifier: MIT. See LICENSE file for detail.