Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Egocentric Image Dataset and Training Script #12

Open
juanitapuentes opened this issue Feb 12, 2025 · 1 comment
Open

Comments

@juanitapuentes
Copy link

juanitapuentes commented Feb 12, 2025

Hi :)

I appreciate the work done in this repository and the provided training scripts. I have successfully trained the motion prior using SLAM poses (script 1_train_motion_prior.py). However, I am trying to understand the branch that incorporates egocentric images and HaMeR for guiding the diffusion. I have a few questions regarding this setup:

Dataset Source: I couldn't find a reference to the dataset used for egocentric images. Could you clarify where the dataset is defined or loaded in the code?

Training Script: The only training script I found is for training the motion prior with SLAM poses. Is there a separate script that handles the full training pipeline using egocentric images and HaMeR?

Any guidance on this would be greatly appreciated. Thank you for your time!

@brentyi
Copy link
Owner

brentyi commented Feb 13, 2025

Glad you got training working!!

Images are only used at test time, so we don't have a dataset or script for training. The diffusion model in EgoAllo is only conditioned on SLAM poses.

If you want to follow the details here:

  1. We run HaMeR on images from test sequences. For Aria data this is done with 2_run_hamer_on_vrs.py.
  2. The inference script (3_aria_inference.py) loads hand information in the form of HaMeR outputs + optionally Aria MPS wrist detections.
    • The loading code is here.
  3. The hand inputs are passed to the guidance optimizer. You can find this in the sampling function here.
  4. We minimize a mix of prior, 3D, and reprojection losses in do_guidance_optimization().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants