Skip to content

Commit 498f7ac

Browse files
Move Compress dependencies to setup.py (#539)
**Type of change:** Documentation **Overview:** Replace Dockerfile for Puzzletron compression with dependencies in `setup.py` --------- Signed-off-by: Liana Mikaelyan <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Co-authored-by: Keval Morabia <[email protected]>
1 parent e337869 commit 498f7ac

File tree

3 files changed

+11
-31
lines changed

3 files changed

+11
-31
lines changed

examples/compress/Dockerfile

Lines changed: 0 additions & 26 deletions
This file was deleted.

examples/compress/README.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
# Compress Algorithm Tutorial
22

3+
This tutorial demonstrates how to compress large language models using the Compress algorithm based on the [Puzzle paper](https://arxiv.org/abs/2411.19146).
34
This tutorial demonstrates how to compress large language models using the compress algorithm based on the [Puzzle paper](https://arxiv.org/abs/2411.19146).
45
The goal of the algorithm it to find the most optimal modifications to MLP and attention layers of the model, resulting in a heterogeneous model architecture.
56
The supported modifications are:
@@ -13,16 +14,19 @@ In this example, we compress the [meta-llama/Llama-3.1-8B-Instruct](https://hugg
1314

1415
## Environment
1516

16-
- [Dockerfile](./Dockerfile) to use.
17-
- 2x NVIDIA H100 80GB HBM3 (1 card will be good as well).
17+
- Install TensorRT-Model-Optimizer in editable mode with the corresponding dependencies:
18+
```bash
19+
pip install -e .[hf,compress]
20+
```
21+
- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use s single GPU.
1822

1923
## Compress the Model
2024

2125
1. Specify the `puzzle_dir`, `input_hf_model_path`, `dataset_path`, `intermediate_size_list`, and `target_memory` arguments in the [llama-3_1-8B_pruneffn_memory.yaml](./configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml) configuration file.
2226

2327
**_NOTE:_**
2428
How to choose `intermediate_size_list`?
25-
The list specifies the candidate FFN sizes that we wish to search over. It is recommended to choose several pruning sizes (e.g. 15%, 20%, 30% etc of the original). Note that the values must be hardware-friendly (divisible by a multiple of 2) to avoid issues with tensor operations in subsequent steps.
29+
The list specifies the candidate FFN sizes that we wish to search over. It is recommended to choose several pruning sizes (e.g. 15%, 20%, 30% etc of the original). Note that the values must be hardware-friendly (divisible by a 256) to avoid issues with tensor operations in subsequent steps.
2630

2731
Let's first shoot for 32% GPU memory reduction setting `target_memory = 78_000` GiB. This means that the algorithm will choose the candidates with highest accuracy that also meet the specified requirements.
2832

@@ -179,12 +183,12 @@ block_14: attention no_op ffn intermediate_3072
179183
180184
## Evaluation
181185
182-
Once the model is ready, you can evaluate it using [Language Model Evaluation Harness](https://pypi.org/project/lm-eval/). For example, run the following to evaluate the model on a subset of [MMLU](https://huggingface.co/datasets/cais/mmlu).
186+
Once the model is ready, you can evaluate it using [Language Model Evaluation Harness](https://pypi.org/project/lm-eval/). For example, run the following to evaluate the model on [Massive Multitask Language Understanding](https://huggingface.co/datasets/cais/mmlu) benchmark.
183187
184188
```bash
185189
lm_eval --model hf \
186190
--model_args pretrained=path/to/model,dtype=bfloat16,trust_remote_code=true,parallelize=True \
187-
--tasks mmlu_humanities \
191+
--tasks mmlu \
188192
--num_fewshot 5 \
189193
--batch_size 4
190194
```

setup.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,6 +104,8 @@
104104
"fire",
105105
"hydra-core==1.3.2",
106106
"omegaconf==2.3.0",
107+
"wandb~=0.17.5",
108+
"lru-dict",
107109
],
108110
}
109111

0 commit comments

Comments
 (0)