-
Notifications
You must be signed in to change notification settings - Fork 193
Compress tutorial (PoC) #492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compress tutorial (PoC) #492
Conversation
using MIP-based NAS search algorithm. Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
…ation. Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
…ress module. Signed-off-by: Daniel Korzekwa <[email protected]>
…ntal/ folder to not be run by CICD yet. Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Keval Morabia <[email protected]>
…tmp_path. Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
…thm. Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
…o_decilm_convertion
…as_convert Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## feature/compress #492 +/- ##
=================================================
Coverage 73.40% 73.40%
=================================================
Files 180 180
Lines 18127 18127
=================================================
Hits 13306 13306
Misses 4821 4821 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| @@ -0,0 +1,64 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need to be its own file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is how it was design, any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can rename from modelopt/torch/_compress/dataset/prepare_dataset.py to modelopt/torch/_compress/utils/dataset_utils.py and later unify with https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/dataset_utils.py
We already have nemotron-post-training-dataset-v2 supported in modelopt/torch/utils/dataset_utils.py so ideally we should be able to just used that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems made for the Nemotron post-training dataset rather than being generic. Which file even uses this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already modelopt/torch/_compress/utils/data/dataset.py created as a part of dkorzekwa/mip branch. Once dkorzekwa/mip branch is merged to feature/compress, we can refactor the dataset module accounting for modelopt/torch/utils/dataset_utils.
Created internal issue: issues/58
| ) | ||
|
|
||
| # mip_and_realize_models (distributed processing) | ||
| # TODO: How to make it part of mnt.search() api, similarly to run_full_compress() API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be improved once everything is self contained in modelopt. We dont need separate function for mip_only. We can re-run same run_full_compress but internally for each sub-step, it should check if checkpoint already exists and skip that step.
This generic solution will also help in other cases where whole compress pipeline takes too long and we want to resume from some intermediate step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, this is a possible solution to run all pipeline but skip some steps,
| @@ -0,0 +1,64 @@ | |||
| # SPDX-FileCopyrightText: Copyright (c) 2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved. | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can rename from modelopt/torch/_compress/dataset/prepare_dataset.py to modelopt/torch/_compress/utils/dataset_utils.py and later unify with https://github.com/NVIDIA/TensorRT-Model-Optimizer/blob/main/modelopt/torch/utils/dataset_utils.py
We already have nemotron-post-training-dataset-v2 supported in modelopt/torch/utils/dataset_utils.py so ideally we should be able to just used that
| The supported modifications are: | ||
|
|
||
| - `ffn_intermediate_size`: different FFN intermediate sizes | ||
| - `attention op/noop`: complete removal of attention layers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didnt we decide to keep PoC just ffn pruning and no attn module replacement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use also attention op/noop as this is part of the solid compression example we did internally at Nvidia.
|
|
||
| ```bash | ||
| ... | ||
| block_0: attention gqa_4 ffn intermediate_14336 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GQA4 will only work with TP4 if training in Megatron-fw. Maybe deployment also but I dont know for sure. Should we remove GQA pruning from search space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GQA is not in the search space, only attenion op/noop
GQA4 - there are 8 groups each with 4 KV heads, and not 4 groups. Add internal NV issues/60 to clarify it.
|
Bunch of code quality checks are also failing |
**Type of change:** Documentation **Overview:** Updated the tutorial with more details on how to choose the required config parameters and added MMLU evaluation. --------- Signed-off-by: Liana Mikaelyan <[email protected]>
bb91d73 to
59d0b46
Compare
Signed-off-by: Daniel Korzekwa <[email protected]>
…rRT-Model-Optimizer into dkorzekwa/compress_tutorial
| import score_pruning_activations | ||
| import scoring | ||
| import torch | ||
| from logger import mprint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this import path need to be fixed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed: from modelopt.torch._compress.tools.logger import mprint
Signed-off-by: Daniel Korzekwa <[email protected]>
| if self.global_rank == 0: | ||
| color = LogColors.GREEN | ||
| elif self.local_rank == self.world_size - 1: | ||
| color = LogColors.RED | ||
| else: | ||
| color = LogColors.CYAN |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we use this color scheme? Red implies error. Lets use same color for all ranks or different for rank0 and same for all other ranks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done: using GREEN for logging across all ranks
**Type of change:** Documentation **Overview:** Replace Dockerfile for Puzzletron compression with dependencies in `setup.py` --------- Signed-off-by: Liana Mikaelyan <[email protected]> Signed-off-by: Keval Morabia <[email protected]> Co-authored-by: Keval Morabia <[email protected]>
342f901 to
498f7ac
Compare
Signed-off-by: Keval Morabia <[email protected]>
Signed-off-by: Daniel Korzekwa <[email protected]>
…rRT-Model-Optimizer into dkorzekwa/compress_tutorial
What does this PR do?
Compress tutorial (PoC) + compress cli app.