Move Compress dependencies to setup.py (#539)

LianaMikael · kevalmorabia97 · kevalmorabia97 · commit 498f7ac1e31f · 2025-11-12T07:33:04.000-08:00
**Type of change:** Documentation

**Overview:** Replace Dockerfile for Puzzletron compression with
dependencies in `setup.py`

---------

Signed-off-by: Liana Mikaelyan &lt;45925959+LianaMikael@users.noreply.github.com&gt;
Signed-off-by: Keval Morabia &lt;28916987+kevalmorabia97@users.noreply.github.com&gt;
Co-authored-by: Keval Morabia &lt;28916987+kevalmorabia97@users.noreply.github.com&gt;
diff --git a/examples/compress/Dockerfile b/examples/compress/Dockerfile
diff --git a/examples/compress/README.md b/examples/compress/README.md
@@ -1,5 +1,6 @@
 # Compress Algorithm Tutorial
 
+This tutorial demonstrates how to compress large language models using the Compress algorithm based on the [Puzzle paper](https://arxiv.org/abs/2411.19146).
 This tutorial demonstrates how to compress large language models using the compress algorithm based on the [Puzzle paper](https://arxiv.org/abs/2411.19146).
 The goal of the algorithm it to find the most optimal modifications to MLP and attention layers of the model, resulting in a heterogeneous model architecture.
 The supported modifications are: 
@@ -13,16 +14,19 @@ In this example, we compress the [meta-llama/Llama-3.1-8B-Instruct](https://hugg
 
 ## Environment
 
-- [Dockerfile](./Dockerfile) to use.
-- 2x NVIDIA H100 80GB HBM3 (1 card will be good as well).
+- Install TensorRT-Model-Optimizer in editable mode with the corresponding dependencies:
+```bash
+pip install -e .[hf,compress]
+```
+- For this example we are using 2x NVIDIA H100 80GB HBM3 to show multi-GPU steps. You can use also use s single GPU.
 
 ## Compress the Model
 
 1. Specify the `puzzle_dir`, `input_hf_model_path`, `dataset_path`, `intermediate_size_list`, and `target_memory` arguments in the [llama-3_1-8B_pruneffn_memory.yaml](./configs/llama-3_1-8B_pruneffn_memory/llama-3_1-8B_pruneffn_memory.yaml) configuration file.
 
    **_NOTE:_**
    How to choose `intermediate_size_list`? 
-   The list specifies the candidate FFN sizes that we wish to search over. It is recommended to choose several pruning sizes (e.g. 15%, 20%, 30% etc of the original). Note that the values must be hardware-friendly (divisible by a multiple of 2) to avoid issues with tensor operations in subsequent steps. 
+   The list specifies the candidate FFN sizes that we wish to search over. It is recommended to choose several pruning sizes (e.g. 15%, 20%, 30% etc of the original). Note that the values must be hardware-friendly (divisible by a 256) to avoid issues with tensor operations in subsequent steps. 
 
    Let's first shoot for 32% GPU memory reduction setting `target_memory = 78_000` GiB. This means that the algorithm will choose the candidates with highest accuracy that also meet the specified requirements.
 
@@ -179,12 +183,12 @@ block_14:  attention  no_op   ffn  intermediate_3072
 
 ## Evaluation
 
-Once the model is ready, you can evaluate it using [Language Model Evaluation Harness](https://pypi.org/project/lm-eval/). For example, run the following to evaluate the model on a subset of [MMLU](https://huggingface.co/datasets/cais/mmlu).
+Once the model is ready, you can evaluate it using [Language Model Evaluation Harness](https://pypi.org/project/lm-eval/). For example, run the following to evaluate the model on [Massive Multitask Language Understanding](https://huggingface.co/datasets/cais/mmlu) benchmark.
 
 ```bash
 lm_eval --model hf \
   --model_args pretrained=path/to/model,dtype=bfloat16,trust_remote_code=true,parallelize=True \
-  --tasks mmlu_humanities \
+  --tasks mmlu \
   --num_fewshot 5 \
   --batch_size 4
 ```
diff --git a/setup.py b/setup.py
@@ -104,6 +104,8 @@
         "fire",
         "hydra-core==1.3.2",
         "omegaconf==2.3.0",
+        "wandb~=0.17.5",
+        "lru-dict",
     ],
 }
 

Original file line number	Diff line number	Diff line change
`@@ -104,6 +104,8 @@`
`104`	`104`	`"fire",`
`105`	`105`	`"hydra-core==1.3.2",`
`106`	`106`	`"omegaconf==2.3.0",`
	`107`	`+ "wandb~=0.17.5",`
	`108`	`+ "lru-dict",`
`107`	`109`	`],`
`108`	`110`	`}`
`109`	`111`