请教：[relation between model structure and size]模型结构和模型参数规模之间是否存在某些限制关系 #5101

cccc0der · 2023-11-23T03:52:05Z

cccc0der
Nov 23, 2023

The situation is:

We are using a llama2 likely structure to do pretrain[RmsNorm+GQA]. The loss decreases very well but the final generation results are very poor.

And we removed GQA and using MHA instead to restart pretrain, although the loss decline was not as fast as the last model, but the final generated results performed well, much better than the last model.

My questions:

Is there a certain relationship between the model structure and the model parameter scale? It is only appropriate to apply a certain model structure when a certain parameter scale is reached, otherwise it will have the opposite effect.
With the same pre-training corpus and distribution, does a better loss decline not mean that the final generation effect is better?

Thanks

背景是这样：

我们在使用llama2类似结构进行预训练，使用了RmsNorm和GQA定义模型结构，训练过程loss下降情况很好，但是最终生成结果很差，对某些格式拟合严重。

中断后我们去除了GQA使用MHA重新进行训练，loss下降虽然不如前者快，但是最终生成结果表现还不错，远好于前者

我想问的是：

在模型结构和模型参数规模方面是否有一定的关系，达到某个参数规模再应用某个模型结构才适合，否则还带来反效果
同样的预训练语料和分布，loss下降更好是否并不能说明最终生成效果更好

谢谢

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

请教：[relation between model structure and size]模型结构和模型参数规模之间是否存在某些限制关系 #5101

{{title}}

Replies: 0 comments

Select a reply

请教：[relation between model structure and size]模型结构和模型参数规模之间是否存在某些限制关系 #5101

cccc0der Nov 23, 2023

Replies: 0 comments

cccc0der
Nov 23, 2023