Bug description
|
l, h, q, t = ( |
|
model_args.n_layers, |
|
model_args.n_heads, |
|
model_args.dim // model_args.n_heads, |
|
seq_len, |
|
) |
However, head_dim is not necessarily equal to dim / n_heads
e.g. Qwen3-4B, dim=2560, n_heads=32, head_dim=128
Versions
latest main