-
Notifications
You must be signed in to change notification settings - Fork 266
[MoE Calibration] Simplify MoE calibration interface #1851
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[MoE Calibration] Simplify MoE calibration interface #1851
Conversation
|
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
|
@kylesayrs @dsikka Few clarifications:
|
7fefaac to
ba42881
Compare
|
@sairampillai , regarding DCO, you can ignore that. We can sign it via github once reviewed/approved |
…illai/llm-compressor into moe_calibration_refactor
…illai/llm-compressor into moe_calibration_refactor
…illai/llm-compressor into moe_calibration_refactor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, but I worry that this implementation uses more abstraction than is necessary. I like the idea of "contextual" vs "permanent" changes, and we should definitely log which one is being used to the user.
Please consider simplifying to a single mapping dictionary, and a single ABC class to handle the from_original and restore functions. Don't be afraid to remove/ refactor existing code!
|
Hey @sairampillai! Are you still interested in contributing to this PR? If not, please let me know and I can assign someone to pick up where you left off! |
|
@kylesayrs I am working on the updates, I will push an update soon for review! |
Signed-off-by: Sairam Pillai <[email protected]>
Signed-off-by: Sairam Pillai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great so far, thanks for following up!
Signed-off-by: Sairam Pillai <[email protected]>
Signed-off-by: Sairam Pillai <[email protected]>
Signed-off-by: Sairam Pillai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! Is this ready to be tested?
| MOE_CALIBRATION_MODULES: Dict[str, Type[MoECalibrationModule]] = {} | ||
|
|
||
|
|
||
| def register_moe_calibration(module_class_name: str): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this is also implemented via the RegistryMixin, but we can standardize that in a follow up as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your registry is slightly different, let's leave this for a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I think you can remove the from_original class methods and just use the constructors directly
| # MoE calibration is now handled automatically by the pipeline. | ||
| # The `SequentialLlama4TextMoe` modules will be applied during calibration | ||
| # to enable proper expert calibration and vLLM compatibility. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to keep this note for all examples? Might be cleaner without them, what do people think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt it helpful to have the note in varied examples since this would be a breaking change once we deprecate older methods. Open to recommendations.
| return CalibrationDeepseekV3MoE.from_original( | ||
| original=module, | ||
| config=config, | ||
| calibrate_all_experts=calibrate_all_experts, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use the constructor? We can probably remove the from_original class method
| return CalibrationDeepseekV3MoE.from_original( | |
| original=module, | |
| config=config, | |
| calibrate_all_experts=calibrate_all_experts, | |
| return CalibrationDeepseekV3MoE( | |
| original=module, | |
| config=config, | |
| calibrate_all_experts=calibrate_all_experts, |
| Legacy replacement function. | ||
| Use SequentialLlama4TextMoe.from_original() instead. | ||
| """ | ||
| return SequentialLlama4TextMoe.from_original( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, should just be able to use the constructor directly, no?
Introduce standardized MoE calibration interface and deprecate legacy replace_modules_for_calibration
Summary
Implements a simplified, decorator-based registration system for MoE model calibration using a single
MoECalibrationModulebase class, making MoE model integration easier and deprecates the legacyreplace_modules_for_calibrationfunction.Problem
MoE model calibration currently requires module replacement logic scattered across
replace_modules_for_calibrationand manual context management. This makes contributing new MoE model support difficult and error-prone. Additionally, each model required custom replacement functions with duplicated boilerplate code.Relevant Issues
Fixes #1829
Solution
MoECalibrationModuleabstract base class implementationfrom_original()classmethod and optionalrestore()is_permanentflag to specify if module replacement is to be restored usingrestore()Decorator-Based Registration:
@register_moe_calibration("ModuleName")decoratorMOE_CALIBRATION_MODULESregistryNew Model Integration: Adding MoE support requires only:
Dataset Arguments: New:
moe_calibrate_all_experts: bool = True- Controls whether all experts see all tokens during calibrationTrue(default): All experts receive all tokens for proper quantization statisticsFalse: Normal routing behavior (only routed experts are used)oneshot()andDatasetArgumentsmoe_calibration_contextby pipelinesAutomatic Context Management:
moe_calibration_contextintegrated into pipelinesoneshot.pyBackward Compatibility: Deprecation of
replace_modules_for_calibrationwith warningsTest Plan
Testing
Migration Guide
Before:
After: