-
Notifications
You must be signed in to change notification settings - Fork 281
fix regression issue and command in mix-precision example #2317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: He, Xin3 <[email protected]>
Signed-off-by: He, Xin3 <[email protected]>
Signed-off-by: He, Xin3 <[email protected]>
PR Reviewer Guide 🔍Here are some key observations to aid the review process:
|
PR Code Suggestions ✨ |
Signed-off-by: He, Xin3 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better change AR dependency to pip released v0.8 version after AR v0.8 released
parser.add_argument("--device_map", type=str, default=None, help="device map for model") | ||
parser.add_argument("--use_recipe", action="store_true", help="whether to use recipe to quantize model") | ||
parser.add_argument("--recipe_file", type=str, default="recipes/Meta-Llama-3.1-8B-Instruct_6bits.json", help="path of recipe file") | ||
parser.add_argument("--mem_per_param_scale", default=13, type=int, help="memory per param scale factor") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not see this arg is used in example, is it added for further tuning consideration? any guideline on how user set the value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, It's for llama3.3 70b pipeline parallel. It's added in case that user wants to run 70b without TP.
It's not the suggested way, the suggested way is using main branch with my fix of compile, so I intend not to introduce it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge can wait binary published
User description
Type of Change
bug fix
PR Type
Enhancement, Bug fix
Description
Added
mem_per_param_scale
andenable_torch_compile
argumentsUpdated dtype handling for
uNVFP4
andNVFP4+
Fixed regression issues in dtype mapping and layer configuration
Updated README to include
enable_torch_compile
in example commandDiagram Walkthrough
File Walkthrough
quantize.py
Add mem_per_param_scale and enable_torch_compile
examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision/quantize.py
mem_per_param_scale
andenable_torch_compile
argumentsuNVFP4
andNVFP4+
README.md
Update README with enable_torch_compile
examples/pytorch/nlp/huggingface_models/language-modeling/quantization/mix-precision/README.md
enable_torch_compile