Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion Script from SPIN code base to Metadata parameters in Human Nerf! #65

Open
avinabsaha opened this issue Mar 31, 2023 · 26 comments

Comments

@avinabsaha
Copy link

@chungyiweng, could you please provide the script used to obtain the Metadata parameters for the Monocular In-the-wild Videos after you run SPIN?

@Dipankar1997161
Copy link

I don't think that there is a conversion script in this repo.
Take the numpy file which you get from SPIN/VIBE and set it up as mentioned in the readme.

@avinabsaha
Copy link
Author

I ran the demo code provided by SPIN : https://github.com/nkolot/SPIN/blob/master/demo.py

Can you tell me how to get the parameters of body pose and shape, and camera intrinsic and extrinsic parameters from this code?

Thanks for the help!

@Dipankar1997161
Copy link

Dipankar1997161 commented Apr 2, 2023

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file.
The camera parameters would depend if you using your custom file.

For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.

Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

@gushengbo
Copy link

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.

For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.

Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

@Dipankar1997161
Copy link

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.
For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.
Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

No, you cannot use your own camera params.
From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )

@gushengbo
Copy link

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.
For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.
Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )

Thank you for your reply, but I have another question that how to convert the (s,tx,ty) to (tx,ty,tz)?

@Dipankar1997161
Copy link

Apologies for the late response.

Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:

E = [[1, 0, 0, tx],
[0, 1, 0, ty],
[0, 0, 1, tz],
[0, 0, 0, 1]]

@gushengbo
Copy link

gushengbo commented Jul 9, 2023

Apologies for the late response.

Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:

E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

Thank you very much! But what is the img_res? There is no img_res in ROMP. It's is the image size? but my image size is (1920,1080), which has two parameters.

@Dipankar1997161
Copy link

Apologies for the late response.
Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:
E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

Thank you very much! But what is the img_res? There is no img_res in ROMP. It's is the image size? but my image size is (1920,1080), which has two parameters.

Yess, it's the inage resolution

@gushengbo
Copy link

which parameter should I choose, 1920 or 1080?

@Dipankar1997161
Copy link

which parameter should I choose, 1920 or 1080?

It's like whichever res you want.
512, 1024 and so on.

Try that, otherwise, calculate the image resolution based on pixels.
I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

@gushengbo
Copy link

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on.

Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows:
image
This appears to be an error in the SMPL predicted by the ROMP model??

@Dipankar1997161
Copy link

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on.
Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

This is after how many iterations??? Could you tell me that

@gushengbo
Copy link

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on.
Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

This is after how many iterations??? Could you tell me that

20000, but I think the result is wrong

@Dipankar1997161
Copy link

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on.
Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

This is after how many iterations??? Could you tell me that

20000, but I think the result is wrong

20000 iterations are extremly less for rendering on humannerf. You would atleast require 60K and more, to get the accurate results.

Also, how many images are there in your case? and can you show me the SMPL mesh generated by ROMP for any frame

@gushengbo
Copy link

which parameter should I choose, 1920 or 1080?

It's like whichever res you want. 512, 1024 and so on.
Try that, otherwise, calculate the image resolution based on pixels. I would try the 1st methods since, humanenrf mostly uses 512 or 1080 as the image resolution

I have resized the images to 1080*1080, and set img_res as 1080. The results are follows: image This appears to be an error in the SMPL predicted by the ROMP model??

This is after how many iterations??? Could you tell me that

20000, but I think the result is wrong

20000 iterations are extremly less for rendering on humannerf. You would atleast require 60K and more, to get the accurate results.

Also, how many images are there in your case? and can you show me the SMPL mesh generated by ROMP for any frame

OK,Thank you very much! There are 50 images, but they are all in same pose.
image

In the hand is wrong.

@Dipankar1997161
Copy link

there are few errors in this.

  1. the image size is larger than that of the smpl, hence the arm and face accuracy after rendering might be incorrect - try reducing the size of the image and run ROMP again to see, if there is any improvement.

  2. the pose doesn't matter, what matters is, most of the human body should be visible if not all. this will ensure a proper rendering, else the backside, might be incorrect.

  3. try to increase the images close to >100 if possible

@gushengbo
Copy link

there are few errors in this.

  1. the image size is larger than that of the smpl, hence the arm and face accuracy after rendering might be incorrect - try reducing the size of the image and run ROMP again to see, if there is any improvement.
  2. the pose doesn't matter, what matters is, most of the human body should be visible if not all. this will ensure a proper rendering, else the backside, might be incorrect.
  3. try to increase the images close to >100 if possible

OK, I will try it. Thank you very much.

@gushengbo
Copy link

I resize the images to (512,512), but I find that in the hand is still wrong.
image
image
image
image

@Dipankar1997161
Copy link

I resize the images to (512,512), but I find that in the hand is still wrong. image image image image

Do you have the video of your data. Then try to run ROMP directly on the video once, to see if theirs any difference. Don't run on images, run it on Video.

If it doesn't work, I will tell you more methods

@gushengbo
Copy link

gushengbo commented Jul 11, 2023

Sorry, I don't have image. But I successfully render the images until 145000 iterations. I think it benefited by pose correction function in humannerf. Thank you very much! Best wish to you!

@gacu068
Copy link

gacu068 commented Jul 11, 2023

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.
For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.
Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )

@Dipankar1997161 Hello, How to get the intrinsic parameters in VIBE? Did you mean the config.yaml in config folder?

@Dipankar1997161
Copy link

I did not use SPIN, but rather used VIBE and ROMP, from ROMP you get the poses and shapes in a numpy file. The camera parameters would depend if you using your custom file.
For example, in my case, I am using my own data, so I have my own camera params. But from ROMP you can still get the cams and cams Trans, then set it up according to the file format given in humannerf and you are good to go.
Note that these camera params won't be 4x4 matrix ,but would rather give you a 1x3 matrix, so revise it according to your use case

Hello, Did you use your own camera params and VIBE?

No, you cannot use your own camera params. From VIBE, ROMP, you get weak perspective camera model (s, tx, ty ), convert that to the extrinsic ( tx, ty, tz ) while the Rotation part remains np.eye(3). Next use the intrinsic parameters given in the VIBE, ROMP, with cx and Cy as your image centers and focal length ( check their config file )

@Dipankar1997161 Hello, How to get the intrinsic parameters in VIBE? Did you mean the config.yaml in config folder?

I was talking about ROMP, for VIBE I need to check it

@three-legs
Copy link

Apologies for the late response.

Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:

E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?

@Dipankar1997161
Copy link

Apologies for the late response.
Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:
E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?

It is not a random value, depends on which method u using, for example ROMP or VIBE, the focal length is predefined in their Config files. U can check that out. If u using your own device, there is a formula of FOV based on which u calculate the focsl length ( even the ROMP, has the following formula mentioned in their config)

@three-legs
Copy link

Apologies for the late response.
Also, here is the code for the conversion ( change the values as per yours )

def convert_weak_perspective_to_perspective(
weak_perspective_camera,
focal_length=5000.,
img_res=224,
):
# Convert Weak Perspective Camera [s, tx, ty] to camera translation [tx, ty, tz]
# in 3D given the bounding box size
# This camera translation can be used in a full-perspective projection

perspective_camera = np.stack(
    [
        weak_perspective_camera[1],
        weak_perspective_camera[2],
        2 * focal_length / (img_res * weak_perspective_camera[0] + 1e-9)
    ],
    axis=-1
)

return perspective_camera

Then the extrinsic parameters will be:
E = [[1, 0, 0, tx], [0, 1, 0, ty], [0, 0, 1, tz], [0, 0, 0, 1]]

@Dipankar1997161 Hello, how can i get focal_length, or is it just a random value set by myself?

It is not a random value, depends on which method u using, for example ROMP or VIBE, the focal length is predefined in their Config files. U can check that out. If u using your own device, there is a formula of FOV based on which u calculate the focsl length ( even the ROMP, has the following formula mentioned in their config)

Thank you for your reply!But i still have a problem where is the config file ,did you mean the files in configs folder?But i didn't find any parameters about focal_length or intrinsic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants