mtp for verl by ArronHZG · Pull Request #62 · ISEEKYAN/mbridge

ArronHZG · 2025-12-11T12:49:06Z

No description provided.

ISEEKYAN · 2026-01-19T09:03:50Z

mbridge/core/safetensor_io.py

    def __init__(self, hf_dir: str):
        index_file = os.path.join(hf_dir, "model.safetensors.index.json")
-        config = AutoConfig.from_pretrained(hf_dir)
+        config = AutoConfig.from_pretrained(hf_dir, trust_remote_code=True)


we can't set trust_remove_code by default

ISEEKYAN · 2026-01-19T09:06:03Z

example/0.load_model_and_generate_single_gpu.py

        hf_model_path, trust_remote_code=trust_remote_code
    )
+
+    if hasattr(config, "num_nextn_predict_layers"):


why do we set this, is this a debug code?

@ArronHZG This should not be needed once the patches here are applied #62 (comment)

HollowMan6 · 2026-02-14T14:48:04Z

example/0.load_model_and_generate_single_gpu.py

@@ -27,9 +27,17 @@ def init_distributed():

 def load_model(hf_model_path, trust_remote_code=False):
    """Load model"""
-    bridge = AutoBridge.from_pretrained(
+
+    # use AutoConfig to change hf config
+    config = AutoConfig.from_pretrained(
        hf_model_path, trust_remote_code=trust_remote_code
    )
+
+    if hasattr(config, "num_nextn_predict_layers"):
+        config.num_nextn_predict_layers = 0
+
+    bridge = AutoBridge.from_config(config)
+


These changes should not be needed

HollowMan6 · 2026-02-14T14:50:52Z

mbridge/models/mimo.py

        # Handle transformer components within MTP
        # Check if this is a transformer_layer component
        if "transformer_layer" in name:
            # Create a proxy name to use with parent class methods
            # Convert mtp.layers.{idx}.transformer_layer.* to decoder.layers.{idx}.*
            proxy_name = name.replace(
                f"mtp.layers.{mtp_layer_idx}.transformer_layer",
                f"decoder.layers.{mtp_layer_idx}",
            )


Need these changes (replace from Line 84 to Line 92 with these code) so that we don't need to disable num_nextn_predict_layers when loading from hf weights (so that MTP weights can be loaded correctly)

Suggested change

# Handle transformer components within MTP. MCore may expose these under

# either "...transformer_layer.*" or "...mtp_model_layer.*".

layer_prefixes = ("transformer_layer", "mtp_model_layer")

proxy_name = None

for layer_prefix in layer_prefixes:

mcore_prefix = f"mtp.layers.{mtp_layer_idx}.{layer_prefix}"

if mcore_prefix in name:

proxy_name = name.replace(

mcore_prefix,

f"decoder.layers.{mtp_layer_idx}",

)

break

if proxy_name is not None:

ArronHZG added 9 commits December 11, 2025 20:48

use_mtp

8d14f00

rm use_mtp

88fa317

use AutoConfig to change hf config

1b18b06

refactor code

059cd00

set mtp_loss_scaling_factor

bbcff60

trust_remote_code=True

2978b69

for transformer >= 5.0.0

6bf2d45

change mimo

3fffba7

Merge branch 'main' into feature/verl_mtp

9a02d0e

ISEEKYAN reviewed Jan 19, 2026

View reviewed changes

HollowMan6 suggested changes Feb 14, 2026

View reviewed changes

HollowMan6 mentioned this pull request Feb 14, 2026

Add MiMo dense MTP models bridge support NVIDIA-NeMo/Megatron-Bridge#2387

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

mtp for verl#62

mtp for verl#62
ArronHZG wants to merge 9 commits intoISEEKYAN:mainfrom
ArronHZG:feature/verl_mtp

ArronHZG commented Dec 11, 2025

Uh oh!

ISEEKYAN Jan 19, 2026

Uh oh!

ISEEKYAN Jan 19, 2026

Uh oh!

HollowMan6 Feb 14, 2026

Uh oh!

HollowMan6 Feb 14, 2026

Uh oh!

HollowMan6 Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

+        # Handle transformer components within MTP. MCore may expose these under
+        # either "...transformer_layer.*" or "...mtp_model_layer.*".
+        layer_prefixes = ("transformer_layer", "mtp_model_layer")
+        proxy_name = None
+        for layer_prefix in layer_prefixes:
+            mcore_prefix = f"mtp.layers.{mtp_layer_idx}.{layer_prefix}"
+            if mcore_prefix in name:
+                proxy_name = name.replace(
+                    mcore_prefix,
+                    f"decoder.layers.{mtp_layer_idx}",
+                )
+                break
+        if proxy_name is not None:

Comments

Conversation

ArronHZG commented Dec 11, 2025

Uh oh!

ISEEKYAN Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

ISEEKYAN Jan 19, 2026

Choose a reason for hiding this comment

Uh oh!

HollowMan6 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

HollowMan6 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

HollowMan6 Feb 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants