请教:[relation between model structure and size]模型结构和模型参数规模之间是否存在某些限制关系 #5101
cccc0der
started this conversation in
Community | General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
The situation is:
We are using a llama2 likely structure to do pretrain[RmsNorm+GQA]. The loss decreases very well but the final generation results are very poor.
And we removed GQA and using MHA instead to restart pretrain, although the loss decline was not as fast as the last model, but the final generated results performed well, much better than the last model.
My questions:
Is there a certain relationship between the model structure and the model parameter scale? It is only appropriate to apply a certain model structure when a certain parameter scale is reached, otherwise it will have the opposite effect.
With the same pre-training corpus and distribution, does a better loss decline not mean that the final generation effect is better?
Thanks
背景是这样:
我们在使用llama2类似结构进行预训练,使用了RmsNorm和GQA定义模型结构,训练过程loss下降情况很好,但是最终生成结果很差,对某些格式拟合严重。
中断后我们去除了GQA使用MHA重新进行训练,loss下降虽然不如前者快,但是最终生成结果表现还不错,远好于前者
我想问的是:
谢谢
Beta Was this translation helpful? Give feedback.
All reactions