Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to encode a single pose #20

Open
NEO946B opened this issue Feb 27, 2024 · 0 comments
Open

How to encode a single pose #20

NEO946B opened this issue Feb 27, 2024 · 0 comments

Comments

@NEO946B
Copy link

NEO946B commented Feb 27, 2024

@qiqiApink Thank you for sharing your impressive work.!
In your paper, it is mentioned that the input can be composed of an initial pose + text to generate subsequent motions. However, in the demo you provided, motion tokens are directly placed into the prompt.
I've tried to encode a single pose (in SMPL format) into tokens. As I understand it, I first need to convert SMPL into the HumanML3D format to be used as input for vqvae. However, when I input the converted HumanML3D data into vqvae, I encounter the following error: "RuntimeError: Calculated padded input size per channel: (3). Kernel size: (4). Kernel size can't be greater than actual
input size"

I know it's caused by the shape of the input data. A single pose can only get a single row of HumanML3D that is (1, 263) . But your vqvae need a input shape at least size of 4, namely (4, 263). My question is how do I obtain HumanML3D data with a shape of (4, 263) from just a single pose to use as input for vqvae? Or, could you tell me how to correctly obtain motion tokens for a single pose?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant