How to encode a single pose #20

NEO946B · 2024-02-27T08:52:32Z

@qiqiApink Thank you for sharing your impressive work.!
In your paper, it is mentioned that the input can be composed of an initial pose + text to generate subsequent motions. However, in the demo you provided, motion tokens are directly placed into the prompt.
I've tried to encode a single pose (in SMPL format) into tokens. As I understand it, I first need to convert SMPL into the HumanML3D format to be used as input for vqvae. However, when I input the converted HumanML3D data into vqvae, I encounter the following error: "RuntimeError: Calculated padded input size per channel: (3). Kernel size: (4). Kernel size can't be greater than actual
input size"

I know it's caused by the shape of the input data. A single pose can only get a single row of HumanML3D that is (1, 263) . But your vqvae need a input shape at least size of 4, namely (4, 263). My question is how do I obtain HumanML3D data with a shape of (4, 263) from just a single pose to use as input for vqvae? Or, could you tell me how to correctly obtain motion tokens for a single pose?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to encode a single pose #20

How to encode a single pose #20

NEO946B commented Feb 27, 2024

How to encode a single pose #20

How to encode a single pose #20

Comments

NEO946B commented Feb 27, 2024