Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

question about Model D training #11

Closed
jetyingjia opened this issue Jan 29, 2024 · 5 comments
Closed

question about Model D training #11

jetyingjia opened this issue Jan 29, 2024 · 5 comments

Comments

@jetyingjia
Copy link

Awesome work, Congratulations!
I have some questions about the Model D training.
1、In this model,Pre-train with [Mask,Concept],this concept means the text embeddings(2560 categories)? Than how get this concept to 1B masks?
2、In this paper, get 2.25TB image embedding. How use this data?

@PhyscalX
Copy link
Collaborator

PhyscalX commented Jan 29, 2024

Hi, @jetyingjia

  1. Each mask has a pre-computed image embedding for encoding log target via encode_tgt(...)
  2. The 2.25TB image embedding database contains 1B embeddings for 1B masks, used in 1.

BTW, there should be 60 days to compute 1B EVA-CLIP-E embeddings if using 8 NVIDIA A100 😅.

@jetyingjia
Copy link
Author

Hi, @PhyscalX
1、This means the model D‘s classify branch target is the concept distribution(image embeeding project to 2560-dimension distribution logits), not the region pseudo label(many paper use pseudo label, eg:OWL)
2、The idea of learn the concept distribution, have other paper recommended?
Thank you!

@PhyscalX
Copy link
Collaborator

  1. We have clarified that we use KL divergence loss in Sec 3.1.
  2. This method is used by many CLIP-based distillation papers (e.g. RegionCLIP, a modified Faster R-CNN
    for Open-Vocabulary Classification). However, it is challenging to integrate this method into SAM with 1B masks.

@jetyingjia
Copy link
Author

@PhyscalX
Good idea,Do you have the plan to release the full project(including training)? As I want to fine-tune this model in my datasets.

@PhyscalX
Copy link
Collaborator

Refer to issue #5, currently, we have no plan to release the full code.
Instead, we have released the visual prompter and losses for pre-training and fine-tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants