Hi LiveAvatar team, thanks again for the great work! I am trying to reimplement liveavatar training. I want to ask about the text prompt details. Will you use the same data process as wan2.2 S2V to use a MLLM to extract very detailed text prompt for the training data, and also tune the text embedding and text cross attention during the training and distillation?
Hi LiveAvatar team, thanks again for the great work! I am trying to reimplement liveavatar training. I want to ask about the text prompt details. Will you use the same data process as wan2.2 S2V to use a MLLM to extract very detailed text prompt for the training data, and also tune the text embedding and text cross attention during the training and distillation?