想问个问题,falcon的注意力权重是fused_qkv的形式,用col_nn.Linear1D_Col这个函数进行切分是否正确???假设我的tp_size=2,会不会造成切分错误,为啥不用FusedLinear1D_Col #5222
Unanswered
laiqinghan
asked this question in
Community | Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
想问个问题,falcon的注意力权重是fused_qkv的形式,用col_nn.Linear1D_Col这个函数进行切分是否正确???假设我的tp_size=2,会不会造成切分错误,为啥不用FusedLinear1D_Col.这种切分方式在运行上应该不会报错,但是如果是微调huggingface的模型会不会造成计算错乱,直接对[QKV]切分,假设tp_size=2,会不会导致K被切分了,但Q,V并没有被切分,要是这样的话,下面这行拆分q,k,v的代码也可以跑通,但是计算逻辑上是否有问题呢???:祈祷:
Beta Was this translation helpful? Give feedback.
All reactions