Skip to content

Conversation

@sairampillai
Copy link

@sairampillai sairampillai commented Sep 22, 2025

Introduce standardized MoE calibration interface and deprecate legacy replace_modules_for_calibration

Summary

Implements a simplified, decorator-based registration system for MoE model calibration using a single MoECalibrationModule base class, making MoE model integration easier and deprecates the legacy replace_modules_for_calibration function.

Problem

MoE model calibration currently requires module replacement logic scattered across replace_modules_for_calibration and manual context management. This makes contributing new MoE model support difficult and error-prone. Additionally, each model required custom replacement functions with duplicated boilerplate code.

Relevant Issues

Fixes #1829

Solution

MoECalibrationModule abstract base class implementation

  • Only two required methods: from_original() classmethod and optional restore()
  • is_permanent flag to specify if module replacement is to be restored using restore()
  • Clear contract: permanent modules stay in calibration form, non-permanent modules get restored after context exit

Decorator-Based Registration: @register_moe_calibration("ModuleName") decorator

  • Automatic registration in MOE_CALIBRATION_MODULES registry
  • Models self-register when their module is imported

New Model Integration: Adding MoE support requires only:

@register_moe_calibration("YourMoEModule")
class CalibrationYourMoE(MoECalibrationModule):
    is_permanent = True  # or False
    
    @classmethod
    def from_original(cls, original, config, calibrate_all_experts=True):
        return cls(config, original, calibrate_all_experts)

Dataset Arguments: New: moe_calibrate_all_experts: bool = True - Controls whether all experts see all tokens during calibration

  • True (default): All experts receive all tokens for proper quantization statistics
  • False: Normal routing behavior (only routed experts are used)
  • Used by both oneshot() and DatasetArguments
  • Automatically passed to moe_calibration_context by pipelines

Automatic Context Management: moe_calibration_context integrated into pipelines

  • Wraps calibration automatically in oneshot.py
  • Handles module replacement and restoration transparently
  • No manual context management required by users

Backward Compatibility: Deprecation of replace_modules_for_calibration with warnings

  • Legacy function preserved for compatibility
  • Clear migration path documented in deprecation message

Test Plan

  • ✅ Unit tests for contextual MoE calibration with automatic module restoration
  • ✅ Unit tests for permanent MoE calibration persistence
  • ✅ Integration tests with Qwen3, Llama4, and DeepSeek V3 models
  • ✅ Verification that all experts receive data during calibration
  • ✅ Deprecation warning verification for legacy functions

Testing

  • ✅ All unit tests pass
  • ✅ Calibration types working correctly
  • ✅ Model structure correctly modified and restored inside/outside contexts
  • ✅ Linting and type checking pass
  • ✅ Backward compatibility verified with deprecation warnings

Migration Guide

Before:

# Required defining MoEModelConfig entries, handling context manually
from llmcompressor.modeling.prepare import replace_modules_for_calibration
model = replace_modules_for_calibration(model, calibrate_all_experts=True)

After:

# Automatic - just use moe_calibration_context
from llmcompressor.modeling import moe_calibration_context

with moe_calibration_context(model, calibrate_all_experts=True):
    # Run calibration - modules replaced automatically
    for batch in dataloader:
        model(**batch)
# Modules restored automatically (if not permanent)

@github-actions
Copy link

👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review.

Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed.

@sairampillai
Copy link
Author

@kylesayrs @dsikka Few clarifications:

  • I pushed a couple of commits without signing, how do you suggest I fix that?
  • I have deprecated the calibrate_moe_context parameter, do we want to plan how to phase it out?
  • I have tested using unit tests but without GPU (gpu poor), can you point me to best ways to test this change end-to-end?

@sairampillai sairampillai force-pushed the moe_calibration_refactor branch from 7fefaac to ba42881 Compare September 24, 2025 17:07
@sairampillai sairampillai marked this pull request as ready for review September 24, 2025 17:07
@brian-dellabetta
Copy link
Collaborator

@sairampillai , regarding DCO, you can ignore that. We can sign it via github once reviewed/approved

@sairampillai sairampillai requested a review from dsikka September 26, 2025 13:26
@sairampillai sairampillai changed the title Moe calibration refactor [MoE Calibration] Simplify MoE calibration logic application and contribution Sep 26, 2025
@sairampillai sairampillai changed the title [MoE Calibration] Simplify MoE calibration logic application and contribution [MoE Calibration] Simplify MoE calibration interface Sep 26, 2025
@dsikka
Copy link
Collaborator

dsikka commented Oct 1, 2025

@kylesayrs

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good, but I worry that this implementation uses more abstraction than is necessary. I like the idea of "contextual" vs "permanent" changes, and we should definitely log which one is being used to the user.

Please consider simplifying to a single mapping dictionary, and a single ABC class to handle the from_original and restore functions. Don't be afraid to remove/ refactor existing code!

@kylesayrs
Copy link
Collaborator

Hey @sairampillai! Are you still interested in contributing to this PR? If not, please let me know and I can assign someone to pick up where you left off!

@sairampillai
Copy link
Author

@kylesayrs I am working on the updates, I will push an update soon for review!

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great so far, thanks for following up!

Copy link
Collaborator

@kylesayrs kylesayrs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks awesome! Is this ready to be tested?

MOE_CALIBRATION_MODULES: Dict[str, Type[MoECalibrationModule]] = {}


def register_moe_calibration(module_class_name: str):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this is also implemented via the RegistryMixin, but we can standardize that in a follow up as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your registry is slightly different, let's leave this for a follow up

@kylesayrs kylesayrs self-requested a review October 21, 2025 14:33
Copy link
Collaborator

@brian-dellabetta brian-dellabetta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution! I think you can remove the from_original class methods and just use the constructors directly

Comment on lines +12 to +14
# MoE calibration is now handled automatically by the pipeline.
# The `SequentialLlama4TextMoe` modules will be applied during calibration
# to enable proper expert calibration and vLLM compatibility.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep this note for all examples? Might be cleaner without them, what do people think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt it helpful to have the note in varied examples since this would be a breaking change once we deprecate older methods. Open to recommendations.

Comment on lines +100 to +103
return CalibrationDeepseekV3MoE.from_original(
original=module,
config=config,
calibrate_all_experts=calibrate_all_experts,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use the constructor? We can probably remove the from_original class method

Suggested change
return CalibrationDeepseekV3MoE.from_original(
original=module,
config=config,
calibrate_all_experts=calibrate_all_experts,
return CalibrationDeepseekV3MoE(
original=module,
config=config,
calibrate_all_experts=calibrate_all_experts,

Legacy replacement function.
Use SequentialLlama4TextMoe.from_original() instead.
"""
return SequentialLlama4TextMoe.from_original(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, should just be able to use the constructor directly, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Help Wanted] Refactor/ Clean up MoE calibration logic

4 participants