|
| 1 | + |
| 2 | +Step-by-Step |
| 3 | +============ |
| 4 | +This document describes the step-by-step instructions to run [VLM quantization for Llava](https://huggingface.co/liuhaotian/llava-v1.5-7b) using AutoRound Quantization. |
| 5 | + |
| 6 | +# Run Quantization on Multimodal Models |
| 7 | + |
| 8 | +In this example, we introduce an straight-forward way to execute quantization on some popular multimodal models such as LLaVA. |
| 9 | + |
| 10 | +Please note that LLAVA quantized model is currently only support inference with **auto_round** format. |
| 11 | + |
| 12 | +## Install |
| 13 | +If you are not using Linux, do NOT proceed, see instructions for [macOS](https://github.com/haotian-liu/LLaVA/blob/main/docs/macOS.md) and [Windows](https://github.com/haotian-liu/LLaVA/blob/main/docs/Windows.md). |
| 14 | + |
| 15 | +1. Clone this repository and navigate to LLaVA folder |
| 16 | +```shell |
| 17 | +git clone https://github.com/haotian-liu/LLaVA.git |
| 18 | +cd LLaVA |
| 19 | +``` |
| 20 | + |
| 21 | +2. Install Package |
| 22 | +``` |
| 23 | +pip install --upgrade pip # enable PEP 660 support |
| 24 | +pip install -e . |
| 25 | +``` |
| 26 | + |
| 27 | +## Download the calibration/Evaluation data |
| 28 | + |
| 29 | +Our calibration process resembles the official visual instruction tuning process. To align the official implementation of [LLaVA](https://github.com/haotian-liu/LLaVA/tree/main?tab=readme-ov-file#visual-instruction-tuning) |
| 30 | + |
| 31 | +Please download the annotation of the final mixture our instruction tuning data [llava_v1_5_mix665k.json](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/blob/main/llava_v1_5_mix665k.json), and download the images from constituting datasets: |
| 32 | + |
| 33 | +COCO: [train2017](http://images.cocodataset.org/zips/train2017.zip), and unzip the image folder to any directory you desire. |
| 34 | + |
| 35 | +Please refer to [llava_eval_datasets](https://github.com/haotian-liu/LLaVA/blob/main/docs/Evaluation.md#scripts) to download the textVQA dataset for evaluation usage |
| 36 | + |
| 37 | +<br /> |
| 38 | + |
| 39 | +## 2. Run Examples |
| 40 | +Enter into the examples folder and install requirements |
| 41 | + |
| 42 | +```bash |
| 43 | +pip install -r requirements.txt |
| 44 | +``` |
| 45 | + |
| 46 | +- **Default Settings:** |
| 47 | +```bash |
| 48 | +CUDA_VISIBLE_DEVICES=0 python3 main.py --model_name liuhaotian/llava-v1.5-7b --bits 4 --group_size 128 --quantize |
| 49 | +``` |
| 50 | + |
| 51 | +## 3. Results |
| 52 | +Using [COCO 2017](https://cocodataset.org/) and [LLaVA-Instruct-150K](https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K) datasets for quantization calibration, and TextVQA dataset for evaluation. When the vision components are not involved in quantization, it is able to achieve accuracy loss within 1%. The results for fake quantized LLava-7b are as follows: |
| 53 | +| Model | Config | Precision | Hyperparameter | Accuracy% | Relative drop | |
| 54 | +| :----: | :----: | :----: | :----: | :----: | :----: | |
| 55 | +| liuhaotian/llava-v1.5-7b | - | FP16 | - | 58.21 | - | |
| 56 | +| liuhaotian/llava-v1.5-7b | W4G128 | FP16 | with vision | 56.39 | -3.13% | |
| 57 | +| liuhaotian/llava-v1.5-7b | W4G128 | FP16 | w/o vision | 58.08 | -0.22% | |
| 58 | + |
| 59 | + |
| 60 | +## 4. Known Issues |
| 61 | +* huggingface format model is not support yet, e.g. llava-1.5-7b-hf |
| 62 | +* Setting seqlen to 2048 is not working yet. |
| 63 | + |
| 64 | + |
| 65 | +## 5. Environment |
| 66 | + |
| 67 | +PyTorch 1.8 or higher version is needed |
| 68 | + |
| 69 | + |
| 70 | +## Reference |
| 71 | +If you find SignRound useful for your research, please cite our paper: |
| 72 | +```bash |
| 73 | +@article{cheng2023optimize, |
| 74 | + title={Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs}, |
| 75 | + author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao}, |
| 76 | + journal={arXiv preprint arXiv:2309.05516}, |
| 77 | + year={2023} |
| 78 | +} |
| 79 | +``` |
| 80 | + |
| 81 | + |
| 82 | + |
0 commit comments