You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@qiqiApink Thank you for sharing your impressive work.!
In your paper, it is mentioned that the input can be composed of an initial pose + text to generate subsequent motions. However, in the demo you provided, motion tokens are directly placed into the prompt.
I've tried to encode a single pose (in SMPL format) into tokens. As I understand it, I first need to convert SMPL into the HumanML3D format to be used as input for vqvae. However, when I input the converted HumanML3D data into vqvae, I encounter the following error: "RuntimeError: Calculated padded input size per channel: (3). Kernel size: (4). Kernel size can't be greater than actual
input size"
I know it's caused by the shape of the input data. A single pose can only get a single row of HumanML3D that is (1, 263) . But your vqvae need a input shape at least size of 4, namely (4, 263). My question is how do I obtain HumanML3D data with a shape of (4, 263) from just a single pose to use as input for vqvae? Or, could you tell me how to correctly obtain motion tokens for a single pose?
The text was updated successfully, but these errors were encountered:
@qiqiApink Thank you for sharing your impressive work.!
In your paper, it is mentioned that the input can be composed of an initial pose + text to generate subsequent motions. However, in the demo you provided, motion tokens are directly placed into the prompt.
I've tried to encode a single pose (in SMPL format) into tokens. As I understand it, I first need to convert SMPL into the HumanML3D format to be used as input for vqvae. However, when I input the converted HumanML3D data into vqvae, I encounter the following error: "RuntimeError: Calculated padded input size per channel: (3). Kernel size: (4). Kernel size can't be greater than actual
input size"
I know it's caused by the shape of the input data. A single pose can only get a single row of HumanML3D that is (1, 263) . But your vqvae need a input shape at least size of 4, namely (4, 263). My question is how do I obtain HumanML3D data with a shape of (4, 263) from just a single pose to use as input for vqvae? Or, could you tell me how to correctly obtain motion tokens for a single pose?
The text was updated successfully, but these errors were encountered: