Skip to content

model : Port Minimax M2 from mainline#907

Merged
ikawrakow merged 1 commit intomainfrom
fcp/minimax
Nov 6, 2025
Merged

model : Port Minimax M2 from mainline#907
ikawrakow merged 1 commit intomainfrom
fcp/minimax

Conversation

@firecoperana
Copy link
Collaborator

Add support for Minimax M2 ggml-org/llama.cpp#16831
Closes #877
Only add model support. Tool calling support is not yet merged.

cb(cur, "attn_norm", il);

// Q, K, V projections
ggml_tensor * Qcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wq, cur);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code from here to line 8416 can now be replaced with

            auto [Qcur, Kcur, Vcur] = llm_build_mul_mat_qkv(gf, cur, 
                    model.layers[il].wqkv, nullptr, model.layers[il].wqk, nullptr,
                    model.layers[il].wq, nullptr, model.layers[il].wk, nullptr, model.layers[il].wv, nullptr,
                    model.layers[il].attn_q_norm, model.layers[il].attn_k_norm, 0, il); 

In that way we don't need to worry about how Q, K, V should be reshaped, we automatically get Q,K,V fusion for this model if the user requested via -mqkv, and we use a standardized way to compute attention (when attention is standard as it is here).

It would be useful to do that so it can get tested as part of this PR.

Copy link
Collaborator Author

@firecoperana firecoperana Nov 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get error ik_llama.cpp\ggml\src\ggml.c:6358: GGML_ASSERT(ggml_can_repeat(b, a)) failed when loading the model using cpu only backend after incorporating both changes. I think it's that the buffer of a is null that caused the crash.
Call Stack:

 	ggml.dll!ggml_abort(const char * file, int line, const char * fmt, ...) Line 272	C
 	ggml.dll!ggml_mul_impl(ggml_context * ctx, ggml_tensor * a, ggml_tensor * b, bool inplace) Line 6360	C
>	ggml.dll!ggml_fused_rms_norm_impl(ggml_context * ctx, ggml_tensor * a, ggml_tensor * b, float eps, bool inplace) Line 7277	C
 	ggml.dll!ggml_fused_rms_norm(ggml_context * ctx, ggml_tensor * a, ggml_tensor * b, float eps) Line 7305	C
 	llama.dll!llm_build_context::llm_build_norm(ggml_context * ctx, ggml_tensor * cur, const llama_hparams & hparams, ggml_tensor * mw, ggml_tensor * mb, llm_norm_type type, const std::function<void __cdecl(ggml_tensor *,char const *,int)> & cb, int il, float scale_eps) Line 582	C++
 	llama.dll!llm_build_context::llm_build_mul_mat_qkv(ggml_cgraph * gf, ggml_tensor * cur, ggml_tensor * wqkv, ggml_tensor * bqkv, ggml_tensor * wqk, ggml_tensor * bqk, ggml_tensor * wq, ggml_tensor * bq, ggml_tensor * wk, ggml_tensor * bk, ggml_tensor * wv, ggml_tensor * bv, ggml_tensor * q_norm, ggml_tensor * k_norm, float attention_scale, int il) Line 1348	C++
 	llama.dll!llm_build_context::build_minimaxm2() Line 8418	C++

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. The attn_q_norm and attn_k_norm tensors are not just 1d with the size of one attention head, but rather head_size x head_count, so the usual trick of fusing the RMS_NORM, does not work. Sorry. Ignore the two comments then.

@ikawrakow ikawrakow merged commit 0378f38 into main Nov 6, 2025
@firecoperana firecoperana deleted the fcp/minimax branch November 14, 2025 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support for the new model Minimax M2

2 participants