Skip to content

Conversation

@danielkorzekwa
Copy link

What does this PR do?

Compress tutorial (PoC) + compress cli app.

danielkorzekwa and others added 30 commits October 27, 2025 11:50
using MIP-based NAS search algorithm.

Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
…ntal/ folder to not be run by CICD yet.

Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
@danielkorzekwa danielkorzekwa requested review from a team as code owners November 3, 2025 13:56
@danielkorzekwa danielkorzekwa requested review from kevalmorabia97 and removed request for a team November 3, 2025 13:56
@codecov
Copy link

codecov bot commented Nov 3, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.40%. Comparing base (1c12fd8) to head (68c875f).
⚠️ Report is 2 commits behind head on feature/compress.

Additional details and impacted files
@@                Coverage Diff                @@
##           feature/compress     #492   +/-   ##
=================================================
  Coverage             73.40%   73.40%           
=================================================
  Files                   180      180           
  Lines                 18127    18127           
=================================================
  Hits                  13306    13306           
  Misses                 4821     4821           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LianaMikael LianaMikael requested a review from a team as a code owner November 4, 2025 10:03
@@ -0,0 +1,64 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to be its own file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is how it was design, any suggestions?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rename from modelopt/torch/_compress/dataset/prepare_dataset.py to modelopt/torch/_compress/utils/dataset_utils.py and later unify with https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/dataset_utils.py

We already have nemotron-post-training-dataset-v2 supported in modelopt/torch/utils/dataset_utils.py so ideally we should be able to just used that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems made for the Nemotron post-training dataset rather than being generic. Which file even uses this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already modelopt/torch/_compress/utils/data/dataset.py created as a part of dkorzekwa/mip branch. Once dkorzekwa/mip branch is merged to feature/compress, we can refactor the dataset module accounting for modelopt/torch/utils/dataset_utils.

Created internal issue: issues/58

)

# mip_and_realize_models (distributed processing)
# TODO: How to make it part of mnt.search() api, similarly to run_full_compress() API
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be improved once everything is self contained in modelopt. We dont need separate function for mip_only. We can re-run same run_full_compress but internally for each sub-step, it should check if checkpoint already exists and skip that step.

This generic solution will also help in other cases where whole compress pipeline takes too long and we want to resume from some intermediate step

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this is a possible solution to run all pipeline but skip some steps,

@@ -0,0 +1,64 @@
# SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can rename from modelopt/torch/_compress/dataset/prepare_dataset.py to modelopt/torch/_compress/utils/dataset_utils.py and later unify with https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/dataset_utils.py

We already have nemotron-post-training-dataset-v2 supported in modelopt/torch/utils/dataset_utils.py so ideally we should be able to just used that

The supported modifications are:

- `ffn_intermediate_size`: different FFN intermediate sizes
- `attention op/noop`: complete removal of attention layers
Copy link
Collaborator

@kevalmorabia97 kevalmorabia97 Nov 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didnt we decide to keep PoC just ffn pruning and no attn module replacement?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use also attention op/noop as this is part of the solid compression example we did internally at Nvidia.


```bash
...
block_0: attention gqa_4 ffn intermediate_14336
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GQA4 will only work with TP4 if training in Megatron-fw. Maybe deployment also but I dont know for sure. Should we remove GQA pruning from search space?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GQA is not in the search space, only attenion op/noop

GQA4 - there are 8 groups each with 4 KV heads, and not 4 groups. Add internal NV issues/60 to clarify it.

@kevalmorabia97
Copy link
Collaborator

Bunch of code quality checks are also failing

**Type of change:**
Documentation

**Overview:**
Updated the tutorial with more details on how to choose the required
config parameters and added MMLU evaluation.

---------

Signed-off-by: Liana Mikaelyan <[email protected]>
@LianaMikael LianaMikael force-pushed the dkorzekwa/compress_tutorial branch from bb91d73 to 59d0b46 Compare November 12, 2025 10:34
import score_pruning_activations
import scoring
import torch
from logger import mprint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this import path need to be fixed?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed: from modelopt.torch._compress.tools.logger import mprint

Signed-off-by: Daniel Korzekwa <[email protected]>
Comment on lines 79 to 84
if self.global_rank == 0:
color = LogColors.GREEN
elif self.local_rank == self.world_size - 1:
color = LogColors.RED
else:
color = LogColors.CYAN
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we use this color scheme? Red implies error. Lets use same color for all ranks or different for rank0 and same for all other ranks

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done: using GREEN for logging across all ranks

@kevalmorabia97 kevalmorabia97 requested a review from a team as a code owner November 12, 2025 15:30
@kevalmorabia97 kevalmorabia97 requested review from kevalmorabia97 and removed request for a team November 12, 2025 15:30
**Type of change:** Documentation

**Overview:** Replace Dockerfile for Puzzletron compression with
dependencies in `setup.py`

---------

Signed-off-by: Liana Mikaelyan <[email protected]>
Signed-off-by: Keval Morabia <[email protected]>
Co-authored-by: Keval Morabia <[email protected]>
@kevalmorabia97 kevalmorabia97 force-pushed the dkorzekwa/compress_tutorial branch from 342f901 to 498f7ac Compare November 12, 2025 15:33
kevalmorabia97 and others added 3 commits November 12, 2025 07:36
Signed-off-by: Keval Morabia <[email protected]>
…rRT-Model-Optimizer into dkorzekwa/compress_tutorial
@danielkorzekwa danielkorzekwa merged commit 50a580c into feature/compress Nov 12, 2025
21 checks passed
@danielkorzekwa danielkorzekwa deleted the dkorzekwa/compress_tutorial branch November 12, 2025 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants