Guidance regarding prepare calibration dataset to perform GPTQ #14264

ynwang007 · 2025-03-05T07:13:20Z

ynwang007
Mar 5, 2025

Hi,

I want some guidance regarding how to prepare calibration dataset for quantizing models using GPTQ? Are there some recommended dataset that could be good to start with and will works for majority models (llama, qwen, etc)? Or do we have to use different dataset for different models based on actual task to perform?

In the GPTQ paper https://arxiv.org/pdf/2210.17323, it was using 128 samples from c4 dataset, it also mentions:
"We emphasize that this means that GPTQ does not see any task-specific data, and our results thus remain actually “zero-shot”.

Also from HF document https://huggingface.co/docs/transformers/v4.49.0/en/quantization/gptq, it says:
"You could also pass your own dataset as a list of strings, but it is highly recommended to use the same dataset from the GPTQ paper."

Curious does C4 works well in practice? Does anyone have any experience to share? Thanks a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Guidance regarding prepare calibration dataset to perform GPTQ #14264

{{title}}

Replies: 0 comments

Select a reply

Guidance regarding prepare calibration dataset to perform GPTQ #14264

ynwang007 Mar 5, 2025

Replies: 0 comments

ynwang007
Mar 5, 2025