model : Port Minimax M2 from mainline by firecoperana · Pull Request #907 · ikawrakow/ik_llama.cpp

firecoperana · 2025-11-06T04:41:54Z

Add support for Minimax M2 ggml-org/llama.cpp#16831
Closes #877
Only add model support. Tool calling support is not yet merged.

ikawrakow · 2025-11-06T06:16:13Z

src/llama-build-context.cpp

+            cb(cur, "attn_norm", il);
+
+            // Q, K, V projections
+            ggml_tensor * Qcur = llm_build_lora_mm(lctx, ctx0, model.layers[il].wq, cur);


The code from here to line 8416 can now be replaced with

auto [Qcur, Kcur, Vcur] = llm_build_mul_mat_qkv(gf, cur, model.layers[il].wqkv, nullptr, model.layers[il].wqk, nullptr, model.layers[il].wq, nullptr, model.layers[il].wk, nullptr, model.layers[il].wv, nullptr, model.layers[il].attn_q_norm, model.layers[il].attn_k_norm, 0, il);

In that way we don't need to worry about how Q, K, V should be reshaped, we automatically get Q,K,V fusion for this model if the user requested via -mqkv, and we use a standardized way to compute attention (when attention is standard as it is here).

It would be useful to do that so it can get tested as part of this PR.

I get error ik_llama.cpp\ggml\src\ggml.c:6358: GGML_ASSERT(ggml_can_repeat(b, a)) failed when loading the model using cpu only backend after incorporating both changes. I think it's that the buffer of a is null that caused the crash.
Call Stack:

ggml.dll!ggml_abort(const char * file, int line, const char * fmt, ...) Line 272 C ggml.dll!ggml_mul_impl(ggml_context * ctx, ggml_tensor * a, ggml_tensor * b, bool inplace) Line 6360 C > ggml.dll!ggml_fused_rms_norm_impl(ggml_context * ctx, ggml_tensor * a, ggml_tensor * b, float eps, bool inplace) Line 7277 C ggml.dll!ggml_fused_rms_norm(ggml_context * ctx, ggml_tensor * a, ggml_tensor * b, float eps) Line 7305 C llama.dll!llm_build_context::llm_build_norm(ggml_context * ctx, ggml_tensor * cur, const llama_hparams & hparams, ggml_tensor * mw, ggml_tensor * mb, llm_norm_type type, const std::function<void __cdecl(ggml_tensor *,char const *,int)> & cb, int il, float scale_eps) Line 582 C++ llama.dll!llm_build_context::llm_build_mul_mat_qkv(ggml_cgraph * gf, ggml_tensor * cur, ggml_tensor * wqkv, ggml_tensor * bqkv, ggml_tensor * wqk, ggml_tensor * bqk, ggml_tensor * wq, ggml_tensor * bq, ggml_tensor * wk, ggml_tensor * bk, ggml_tensor * wv, ggml_tensor * bv, ggml_tensor * q_norm, ggml_tensor * k_norm, float attention_scale, int il) Line 1348 C++ llama.dll!llm_build_context::build_minimaxm2() Line 8418 C++

Oh, I see. The attn_q_norm and attn_k_norm tensors are not just 1d with the size of one attention head, but rather head_size x head_count, so the usual trick of fusing the RMS_NORM, does not work. Sorry. Ignore the two comments then.

src/llama-load-tensors.cpp

firecoperana requested a review from ikawrakow November 6, 2025 04:41

model : Port Minimax M2 from mainline

bf23f55

firecoperana force-pushed the fcp/minimax branch from bacd235 to bf23f55 Compare November 6, 2025 04:44

ikawrakow approved these changes Nov 6, 2025

View reviewed changes

ikawrakow merged commit 0378f38 into main Nov 6, 2025

firecoperana deleted the fcp/minimax branch November 14, 2025 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model : Port Minimax M2 from mainline#907

model : Port Minimax M2 from mainline#907
ikawrakow merged 1 commit intomainfrom
fcp/minimax

firecoperana commented Nov 6, 2025

Uh oh!

ikawrakow Nov 6, 2025

Uh oh!

firecoperana Nov 6, 2025 •

edited

Loading

Uh oh!

ikawrakow Nov 6, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

firecoperana commented Nov 6, 2025

Uh oh!

ikawrakow Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

firecoperana Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ikawrakow Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

firecoperana Nov 6, 2025 •

edited

Loading