Compress tutorial (PoC) #492

danielkorzekwa · 2025-11-03T13:56:41Z

What does this PR do?

Compress tutorial (PoC) + compress cli app.

using MIP-based NAS search algorithm. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…ation. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…ress module. Signed-off-by: Daniel Korzekwa <[email protected]>

…ntal/ folder to not be run by CICD yet. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Keval Morabia <[email protected]>

…tmp_path. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…thm. Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…o_decilm_convertion

…as_convert Signed-off-by: Daniel Korzekwa <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…as_convert

Signed-off-by: Daniel Korzekwa <[email protected]>

codecov · 2025-11-03T14:11:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.40%. Comparing base (1c12fd8) to head (68c875f).
⚠️ Report is 2 commits behind head on feature/compress.

Additional details and impacted files

@@                Coverage Diff                @@
##           feature/compress     #492   +/-   ##
=================================================
  Coverage             73.40%   73.40%           
=================================================
  Files                   180      180           
  Lines                 18127    18127           
=================================================
  Hits                  13306    13306           
  Misses                 4821     4821

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

AAnoosheh · 2025-11-06T13:51:22Z

modelopt/torch/_compress/dataset/prepare_dataset.py

@@ -0,0 +1,64 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


Why does this need to be its own file?

it is how it was design, any suggestions?

We can rename from modelopt/torch/_compress/dataset/prepare_dataset.py to modelopt/torch/_compress/utils/dataset_utils.py and later unify with https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/dataset_utils.py

We already have nemotron-post-training-dataset-v2 supported in modelopt/torch/utils/dataset_utils.py so ideally we should be able to just used that

It seems made for the Nemotron post-training dataset rather than being generic. Which file even uses this?

There is already modelopt/torch/_compress/utils/data/dataset.py created as a part of dkorzekwa/mip branch. Once dkorzekwa/mip branch is merged to feature/compress, we can refactor the dataset module accounting for modelopt/torch/utils/dataset_utils.

Created internal issue: issues/58

modelopt/torch/_compress/dateutils.py

examples/compress/main.py

kevalmorabia97 · 2025-11-07T10:10:42Z

examples/compress/main.py

+        )
+
+        # mip_and_realize_models (distributed processing)
+        # TODO: How to make it part of mnt.search() api, similarly to run_full_compress() API


I think this can be improved once everything is self contained in modelopt. We dont need separate function for mip_only. We can re-run same run_full_compress but internally for each sub-step, it should check if checkpoint already exists and skip that step.

This generic solution will also help in other cases where whole compress pipeline takes too long and we want to resume from some intermediate step

yes, this is a possible solution to run all pipeline but skip some steps,

kevalmorabia97 · 2025-11-07T10:18:34Z

modelopt/torch/_compress/dataset/prepare_dataset.py

@@ -0,0 +1,64 @@
+# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.


We can rename from modelopt/torch/_compress/dataset/prepare_dataset.py to modelopt/torch/_compress/utils/dataset_utils.py and later unify with https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/dataset_utils.py

We already have nemotron-post-training-dataset-v2 supported in modelopt/torch/utils/dataset_utils.py so ideally we should be able to just used that

modelopt/torch/_compress/dateutils.py

kevalmorabia97 · 2025-11-07T10:28:18Z

examples/compress/README.md

+The supported modifications are: 
+
+- `ffn_intermediate_size`: different FFN intermediate sizes
+- `attention op/noop`: complete removal of attention layers


Didnt we decide to keep PoC just ffn pruning and no attn module replacement?

We use also attention op/noop as this is part of the solid compression example we did internally at Nvidia.

examples/compress/README.md

kevalmorabia97 · 2025-11-07T10:51:55Z

examples/compress/README.md

+
+   ```bash
+   ...
+   block_0:   attention  gqa_4   ffn  intermediate_14336


GQA4 will only work with TP4 if training in Megatron-fw. Maybe deployment also but I dont know for sure. Should we remove GQA pruning from search space?

GQA is not in the search space, only attenion op/noop

GQA4 - there are 8 groups each with 4 KV heads, and not 4 groups. Add internal NV issues/60 to clarify it.

examples/compress/README.md

examples/compress/Dockerfile

kevalmorabia97 · 2025-11-07T10:59:35Z

Bunch of code quality checks are also failing

**Type of change:** Documentation **Overview:** Updated the tutorial with more details on how to choose the required config parameters and added MMLU evaluation. --------- Signed-off-by: Liana Mikaelyan <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…rRT-Model-Optimizer into dkorzekwa/compress_tutorial

kevalmorabia97 · 2025-11-12T12:29:16Z

modelopt/torch/_compress/nas/plugins/compress_nas_plugin.py

 import score_pruning_activations
 import scoring
 import torch
+from logger import mprint


Does this import path need to be fixed?

fixed: from modelopt.torch._compress.tools.logger import mprint

Signed-off-by: Daniel Korzekwa <[email protected]>

kevalmorabia97 · 2025-11-12T15:16:30Z

modelopt/torch/_compress/tools/logger.py

+        if self.global_rank == 0:
+            color = LogColors.GREEN
+        elif self.local_rank == self.world_size - 1:
+            color = LogColors.RED
+        else:
+            color = LogColors.CYAN


Why do we use this color scheme? Red implies error. Lets use same color for all ranks or different for rank0 and same for all other ranks

done: using GREEN for logging across all ranks

modelopt/torch/_compress/tools/logger.py

**Type of change:** Documentation **Overview:** Replace Dockerfile for Puzzletron compression with dependencies in `setup.py` --------- Signed-off-by: Liana Mikaelyan <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Co-authored-by: Keval Morabia <[email protected]>

Signed-off-by: Keval Morabia <[email protected]>

Signed-off-by: Daniel Korzekwa <[email protected]>

…rRT-Model-Optimizer into dkorzekwa/compress_tutorial

danielkorzekwa and others added 30 commits October 27, 2025 11:50

The main compression function for a model

c758ad5

using MIP-based NAS search algorithm. Signed-off-by: Daniel Korzekwa <[email protected]>

Code formatting

8af9903

Signed-off-by: Daniel Korzekwa <[email protected]>

Model search space configuration used by test_compress.py test.

5ba6c27

Signed-off-by: Daniel Korzekwa <[email protected]>

Tokenizer used by test_compress.py test.

0bc5d84

Signed-off-by: Daniel Korzekwa <[email protected]>

Tokenizer utility used by test_compress.py test

87d4fa5

Signed-off-by: Daniel Korzekwa <[email protected]>

e2e tests for compress.py

ced1e99

Signed-off-by: Daniel Korzekwa <[email protected]>

Add convert_llama3_config_to_decilm_config + unit test

5de0bdc

Signed-off-by: Daniel Korzekwa <[email protected]>

Remove unused bypass distillation config files.

800414c

Signed-off-by: Daniel Korzekwa <[email protected]>

Moving integration tests to tests/experimental to not trigger CICD

16abcc9

Signed-off-by: Daniel Korzekwa <[email protected]>

update docs

a5ba1c7

Signed-off-by: Daniel Korzekwa <[email protected]>

Replace mprint with print and replace osp.join with path1 / path2 not…

1bda391

…ation. Signed-off-by: Daniel Korzekwa <[email protected]>

Refactor file checking assertions to use .is_file() and .exists()

bb38401

Signed-off-by: Daniel Korzekwa <[email protected]>

Add a new dependency section to setyp.py for the modelopt.torch._comp…

8415548

…ress module. Signed-off-by: Daniel Korzekwa <[email protected]>

Move test_convert_llama3_config_to_decilm_config.py to tests/experime…

b1b1833

…ntal/ folder to not be run by CICD yet. Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'feature/compress' into dkorzekwa/e2e_compression_test

d4ffc91

Fix: Add missing LICENSE headers

6f28e4a

Signed-off-by: Keval Morabia <[email protected]>

Use spawn_multiprocess_job for test_compress test (to be able to use …

016fb63

…tmp_path. Signed-off-by: Daniel Korzekwa <[email protected]>

Add comments.

0ccf1c4

Signed-off-by: Daniel Korzekwa <[email protected]>

Add _save_dummy_dataset to the test_compress.py

58439ca

Signed-off-by: Daniel Korzekwa <[email protected]>

Refactoring: Move torch distributed env variables to dist_utils.py

2e5f776

Signed-off-by: Daniel Korzekwa <[email protected]>

Refactoring: move torch distributed variables to dist_utils

6274db5

Signed-off-by: Daniel Korzekwa <[email protected]>

Move os.environ["WANDB_DISABLED"] = "true" to dist_utils.py

d942e0a

Signed-off-by: Daniel Korzekwa <[email protected]>

Implement integration test for mnt.convert() for the _compress algori…

f765921

…thm. Signed-off-by: Daniel Korzekwa <[email protected]>

Implement mtn.convert() for compress() algorithm.

de876d6

Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'dkorzekwa/e2e_compression_test' into dkorzekwa/llama3_t…

72bdc7a

…o_decilm_convertion

Merge branch 'dkorzekwa/llama3_to_decilm_convertion' into dkorzekwa/n…

40d28af

…as_convert Signed-off-by: Daniel Korzekwa <[email protected]>

Fix broken test - incorrect package names.

f7fe23c

Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'dkorzekwa/llama3_to_decilm_convertion' into dkorzekwa/n…

3d1d286

…as_convert

Implementing nas.convert for compress algorithm.

a210483

Signed-off-by: Daniel Korzekwa <[email protected]>

Improve docs

739f868

Signed-off-by: Daniel Korzekwa <[email protected]>

danielkorzekwa added 2 commits November 3, 2025 08:36

Update compress tutorial

6e1d910

Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'feature/compress' into dkorzekwa/compress_tutorial

25b4aed

Signed-off-by: Daniel Korzekwa <[email protected]>

danielkorzekwa requested review from a team as code owners November 3, 2025 13:56

danielkorzekwa requested review from kevalmorabia97 and removed request for a team November 3, 2025 13:56

LianaMikael requested a review from a team as a code owner November 4, 2025 10:03

AAnoosheh reviewed Nov 6, 2025

View reviewed changes

kevalmorabia97 reviewed Nov 7, 2025

View reviewed changes

Update Puzzle Compression Tutorial (#493)

59d0b46

**Type of change:** Documentation **Overview:** Updated the tutorial with more details on how to choose the required config parameters and added MMLU evaluation. --------- Signed-off-by: Liana Mikaelyan <[email protected]>

LianaMikael force-pushed the dkorzekwa/compress_tutorial branch from bb91d73 to 59d0b46 Compare November 12, 2025 10:34

danielkorzekwa added 2 commits November 12, 2025 13:21

Use a standard python logging library

5c9d6f4

Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'dkorzekwa/compress_tutorial' of github.com:NVIDIA/Tenso…

e337869

…rRT-Model-Optimizer into dkorzekwa/compress_tutorial

kevalmorabia97 reviewed Nov 12, 2025

View reviewed changes

Refactor imports

5dd18b1

Signed-off-by: Daniel Korzekwa <[email protected]>

kevalmorabia97 reviewed Nov 12, 2025

View reviewed changes

modelopt/torch/_compress/tools/logger.py Show resolved Hide resolved

kevalmorabia97 requested a review from a team as a code owner November 12, 2025 15:30

kevalmorabia97 requested review from kevalmorabia97 and removed request for a team November 12, 2025 15:30

kevalmorabia97 force-pushed the dkorzekwa/compress_tutorial branch from 342f901 to 498f7ac Compare November 12, 2025 15:33

kevalmorabia97 and others added 3 commits November 12, 2025 07:36

Fix code quality

2222952

Signed-off-by: Keval Morabia <[email protected]>

Use the same color for logging across all ranks

d6a80af

Signed-off-by: Daniel Korzekwa <[email protected]>

Merge branch 'dkorzekwa/compress_tutorial' of github.com:NVIDIA/Tenso…

68c875f

…rRT-Model-Optimizer into dkorzekwa/compress_tutorial

kevalmorabia97 approved these changes Nov 12, 2025

View reviewed changes

danielkorzekwa merged commit 50a580c into feature/compress Nov 12, 2025
21 checks passed

danielkorzekwa deleted the dkorzekwa/compress_tutorial branch November 12, 2025 22:56

		@@ -0,0 +1,64 @@
		# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.

Compress tutorial (PoC) #492

Compress tutorial (PoC) #492

Uh oh!

Conversation

danielkorzekwa commented Nov 3, 2025

What does this PR do?

Uh oh!

codecov bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

kevalmorabia97 Nov 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kevalmorabia97 commented Nov 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

codecov bot commented Nov 3, 2025 •

edited

Loading

kevalmorabia97 Nov 7, 2025 •

edited

Loading