[model] support glm4_moe conversion #320

Kuangdd01 · 2026-01-13T11:25:14Z

Description

IMO, GLM4_MOE can be treated as a combination of deepseek_v3 and qwen/llama. It keep the first dense layer, one share expert per sparse layer and route_bias with GQA. So we just combine the template of qwen and deepseek_v3 and specify some arguments like rope_percent.
This modification was verified with a tiny random model on LlamaFactory.

training and merge process pass ✅

Help Need

Real model should be tested
MTP moudle should be address after GLM4.5

cc @hiyouga @chocoded

chocoded · 2026-01-13T12:18:10Z

It appears that the final layer (Layer 46) has a different structure compared to the intermediate layers. This requires some special handling which seems to have been overlooked in the current implementation. Could you please look into this?

PanAndy · 2026-01-13T12:29:01Z

@chocoded

Kuangdd01 · 2026-01-13T13:05:21Z

Sure, the last layer is mtp-specific layer and we will look through this.

chocoded · 2026-01-13T13:14:02Z

For reference, you can check out the implementation here: converters and hf_invalid_keys. Could you please update the code to account for this?

support glm_moe template

9da2a39

PanAndy requested a review from chocoded January 13, 2026 12:31

chocoded self-assigned this Jan 13, 2026

chocoded closed this Jan 13, 2026

chocoded reopened this Jan 13, 2026

fix rope droppout

06bce9b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[model] support glm4_moe conversion #320

[model] support glm4_moe conversion #320

Kuangdd01 commented Jan 13, 2026 •

edited

Loading

Uh oh!

chocoded commented Jan 13, 2026

Uh oh!

PanAndy commented Jan 13, 2026

Uh oh!

Kuangdd01 commented Jan 13, 2026

Uh oh!

chocoded commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[model] support glm4_moe conversion #320

Are you sure you want to change the base?

[model] support glm4_moe conversion #320

Conversation

Kuangdd01 commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Help Need

Uh oh!

chocoded commented Jan 13, 2026

Uh oh!

PanAndy commented Jan 13, 2026

Uh oh!

Kuangdd01 commented Jan 13, 2026

Uh oh!

chocoded commented Jan 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kuangdd01 commented Jan 13, 2026 •

edited

Loading