-
-
Notifications
You must be signed in to change notification settings - Fork 807
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discuss anything here #6
Comments
有web demo么?gradio的那种? |
@ucas010 Yes, we are working with huggingface to develop a gradio demo, it will be ready in the next few days. |
very good work |
膜拜大神 |
|
What are the hardware requirements for people that want to run local? I did not see that listed anywhere. |
We don't have any specific requirement. If you can run SDXL locally (24GB VRAM), everything should be fine. |
does it work with sd1.5? |
Hello, I'm really excited to try this out. Would you mind supplying the necessary diffusers, cuda, and pytorch requirements? Not sure why it is failing to load the pipleine without throwing an error. It just says pipe is 'None' File "/home/ubuntu/instantID.py", line 37, in thanks in advance! |
Let's answer together at this post. @JD1234JD1234 No, we only provide checkpoints for SDXL. @caldwecg It seems to be problem with your pipeline. I believe diffusers==0.25.0 will be ok. |
@SlZeroth We use this realistic base model |
Does this only work with faces? How could we adapt it to other things like objects or animals? |
Only human face. |
When there are multiple faces, it seems that only one person can be recognized.Can it be fixed |
We only detect the biggest face from the given image at this moment. For multi-person in multi-style, we will add it later. @kn12 |
@haofanwang will there be a SD 1.5 version? "our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin" - https://arxiv.org/abs/2401.07519 |
Would it be possible for us to train an alternative adapter that would work on other inputs? |
When I create images where the face is covering most of the image, the quality and likeness is VERY good, but once going to full body portraits like someone riding a horse or such, the quality drops noticeably (like the situations where one normally would click "restore face" in a1111). Anyone know a good tip on how to do an "Adetailer"ish fix, basically focus+upscale the face, swap, and then downscale it back again? Not even sure that is possible. I am not yet fluent enough in SD Python coding. |
The WeChat group is out of data |
@arjun810 Sure. It definitely deserves a try. |
@zewolf5 Is the face too small? |
@I8Robot Updated. |
hey guys, I'm trying to port InstantID natively to comfyUI. I worked on the IPAdapter extension and the code looks very similar. If I understand correctly it's like a FaceID model with additional controlnet. Contrary to IPAdapter you don't use zeroed uncond embeds, is that right? I was a little surprised to see that. Also the controlnet just seems to takes the keypoints from insightface, right? The controlnet doesn't seem to work in comfyui and I'm not sure why yet. I'm surprised it's so effective with just the keypoints btw |
I do not think the face is too small. If i generate a normal non-InstantID, the face will look OK. I am using other SDXL base models than the YamerMIX_v8 one. Tried a few models, they mostly behave the same. Below I am using RealitiesEdgeXL_20. Using 5 random Swift pictures as the face, and a image generated normally as pose reference: There tend to be some bad areas between the nose and mouth, and then the often the eyes get pushed up. Like its been rendered at a lower resolution and upscaled'ish. Just my subjective opinion. So what I see is that the quality of the face has an earlier "point of degradation" when face gets smaller than normal rendering. Almost like SD15. Other than the smaller faces, the results are fantastic. Wishlist: ADetailer solution for all the faces in the image. |
maybe you can refer to https://github.com/xiaohu2015/IP-Adapter/blob/instantid/instantid_demo.ipynb the controlnet uses id embedding as condition instead of text embeds |
hey nice to see you here @xiaohu2015 😄 yeah that is what I'm doing, the IPAdapter part is done but the controlnet doesn't seem to react well. I'm sending the KPS to the controlnet but I get something like this So I'm trying to understand where the problem is |
I was able to get good results from the following configuration: https://huggingface.co/spaces/InstantX/InstantID So I replicated it with the same settings as seen in the text file attached. The only thing I changed was the image resizing to be 512x512 because I keep running out of cuda memory when trying with a bigger image. Is there anything else that would be causing it to look so poor? Thanks |
For 512x512 generation, you can try with SDXL-turbo model, which performs much better than SDXL on this size. @caldwecg |
How to release vram,after generated image |
InstantID finally supported natively in ComfyUI (instead of sandboxed with diffusers). Have fun! |
the QR code of Wechat group is expired |
@haofangwang QR code expired. Is there any other channel for discussion? |
Do you have any guideline using this base? I have made unsuccessfull attempts based on the github code. Thank you! |
二维码过期了,能重新分享一下吗?想进群 |
A significant amount of generated portraits feature a rather long and unnatural neck, is there a workaround to alleviate this issue? |
You can add another pose ControlNet. |
@wuliebucha @Lotayou @k15201363625 Updated. |
Can you update it again? Didn't notice. |
Thanks @haofanwang! It works (and adds a bit of latency obviously). Another question: do you think the model is capable of processing group pictures, i.e. with more than one individual? Is it something you tried? Extracting facial keypoints for all individuals is not an issue but I have troubles to figure out how this would work on the face embedding side / how the model would correctly assign the right face embedding to the right face on the picture.. |
二维码过期了,可以辛苦再更新一次嘛?十分感谢 @haofanwang |
could you provide some information to retrain this model? |
@haofanwang 可以更新下微信群二维码吗?非常感谢! |
求加入 |
二维码过期了 |
求加群。谢谢 |
麻烦更新一下二维码,谢谢 |
Updated. |
可以在更新一下二维码么 |
Sure. Updated. @kisstea |
Why is torch > 2.1 necessary? Due to my CUDA version I can only use torch 2.0.1 - is this a problem? |
@pranurs 2.0.1 should be fine. |
@haofanwang 可以更新一下二维码吗?一直想进群,错过了,感谢 |
@wuliebucha Sure, updated. |
Hi, do you know when there will be a release that doesn't use insightface? I have read that you are working on it but haven't seen any updates. |
可以麻烦再更新一下二维码吗?我也还没有入群,谢谢! @haofanwang |
Thanks for all interests in our project. To make it more clearer, we illustrate differences with previous works as following.
(1) Compared to Dreambooth, Textual Inverison, LoRA, etc., we are tuning-free during the inference phase, which means we do not need to collect multiple images from a specific person and fine-tune them. We consider the recent work PhotoMaker to be a type of LoRA as it trains UNet but in a PEFT manner and requires building a human-centered text image dataset. Surprisingly, our results were comparable or even better than the fine-tuned approach.
(2) Our work is most similar to IP-Adapter. We follow its decoupled cross-attention design and are as pluggable and compatible with other models in the community. But we additionally introduce IdentityNet (a variant of ControlNet) to obtain better ID retention capabilities.
We are open to discuss anything here, you can post your finding and share with us. We also make a WeChat group to facilitate discussion.
The text was updated successfully, but these errors were encountered: