Skip to content

Comments

mtp for verl#62

Open
ArronHZG wants to merge 9 commits intoISEEKYAN:mainfrom
ArronHZG:feature/verl_mtp
Open

mtp for verl#62
ArronHZG wants to merge 9 commits intoISEEKYAN:mainfrom
ArronHZG:feature/verl_mtp

Conversation

@ArronHZG
Copy link

No description provided.

def __init__(self, hf_dir: str):
index_file = os.path.join(hf_dir, "model.safetensors.index.json")
config = AutoConfig.from_pretrained(hf_dir)
config = AutoConfig.from_pretrained(hf_dir, trust_remote_code=True)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can't set trust_remove_code by default

hf_model_path, trust_remote_code=trust_remote_code
)

if hasattr(config, "num_nextn_predict_layers"):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we set this, is this a debug code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ArronHZG This should not be needed once the patches here are applied #62 (comment)

Comment on lines 7 to 40
@@ -27,9 +27,17 @@ def init_distributed():

def load_model(hf_model_path, trust_remote_code=False):
"""Load model"""
bridge = AutoBridge.from_pretrained(

# use AutoConfig to change hf config
config = AutoConfig.from_pretrained(
hf_model_path, trust_remote_code=trust_remote_code
)

if hasattr(config, "num_nextn_predict_layers"):
config.num_nextn_predict_layers = 0

bridge = AutoBridge.from_config(config)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes should not be needed

Comment on lines 84 to 92
# Handle transformer components within MTP
# Check if this is a transformer_layer component
if "transformer_layer" in name:
# Create a proxy name to use with parent class methods
# Convert mtp.layers.{idx}.transformer_layer.* to decoder.layers.{idx}.*
proxy_name = name.replace(
f"mtp.layers.{mtp_layer_idx}.transformer_layer",
f"decoder.layers.{mtp_layer_idx}",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need these changes (replace from Line 84 to Line 92 with these code) so that we don't need to disable num_nextn_predict_layers when loading from hf weights (so that MTP weights can be loaded correctly)

Suggested change
# Handle transformer components within MTP. MCore may expose these under
# either "...transformer_layer.*" or "...mtp_model_layer.*".
layer_prefixes = ("transformer_layer", "mtp_model_layer")
proxy_name = None
for layer_prefix in layer_prefixes:
mcore_prefix = f"mtp.layers.{mtp_layer_idx}.{layer_prefix}"
if mcore_prefix in name:
proxy_name = name.replace(
mcore_prefix,
f"decoder.layers.{mtp_layer_idx}",
)
break
if proxy_name is not None:
Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants