Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discuss anything here #6

Open
haofanwang opened this issue Jan 17, 2024 · 60 comments
Open

Discuss anything here #6

haofanwang opened this issue Jan 17, 2024 · 60 comments

Comments

@haofanwang
Copy link
Member

haofanwang commented Jan 17, 2024

Thanks for all interests in our project. To make it more clearer, we illustrate differences with previous works as following.

(1) Compared to Dreambooth, Textual Inverison, LoRA, etc., we are tuning-free during the inference phase, which means we do not need to collect multiple images from a specific person and fine-tune them. We consider the recent work PhotoMaker to be a type of LoRA as it trains UNet but in a PEFT manner and requires building a human-centered text image dataset. Surprisingly, our results were comparable or even better than the fine-tuned approach.

(2) Our work is most similar to IP-Adapter. We follow its decoupled cross-attention design and are as pluggable and compatible with other models in the community. But we additionally introduce IdentityNet (a variant of ControlNet) to obtain better ID retention capabilities.

We are open to discuss anything here, you can post your finding and share with us. We also make a WeChat group to facilitate discussion.

b1be15d6b727ac06ad790c2220672755

@ucas010
Copy link

ucas010 commented Jan 18, 2024

有web demo么?gradio的那种?

@haofanwang
Copy link
Member Author

@ucas010 Yes, we are working with huggingface to develop a gradio demo, it will be ready in the next few days.

@xiaohu2015
Copy link

very good work

@h3clikejava
Copy link

very good work

膜拜大神

@wangqixun
Copy link
Member

有web demo么?gradio的那种?

https://huggingface.co/spaces/InstantX/InstantID

@bsenftner
Copy link

What are the hardware requirements for people that want to run local? I did not see that listed anywhere.

@haofanwang
Copy link
Member Author

We don't have any specific requirement. If you can run SDXL locally (24GB VRAM), everything should be fine.

@SlZeroth
Copy link

SlZeroth commented Jan 22, 2024

@haofanwang

image

what kind of custom stable diffusion model do you use to generate this link in the image ?

thank you!

@JD1234JD1234
Copy link

does it work with sd1.5?

@caldwecg
Copy link

Hello, I'm really excited to try this out. Would you mind supplying the necessary diffusers, cuda, and pytorch requirements? Not sure why it is failing to load the pipleine without throwing an error. It just says pipe is 'None'

File "/home/ubuntu/instantID.py", line 37, in
pipe.load_ip_adapter_instantid(face_adapter)
AttributeError: 'NoneType' object has no attribute 'load_ip_adapter_instantid'

thanks in advance!

@haofanwang
Copy link
Member Author

Let's answer together at this post.

@JD1234JD1234 No, we only provide checkpoints for SDXL.

@caldwecg It seems to be problem with your pipeline. I believe diffusers==0.25.0 will be ok.

@haofanwang
Copy link
Member Author

@SlZeroth We use this realistic base model

@arjun810
Copy link

arjun810 commented Jan 23, 2024

Does this only work with faces? How could we adapt it to other things like objects or animals?

@haofanwang
Copy link
Member Author

Only human face.

@kn12
Copy link

kn12 commented Jan 23, 2024

When there are multiple faces, it seems that only one person can be recognized.Can it be fixed

@haofanwang
Copy link
Member Author

We only detect the biggest face from the given image at this moment. For multi-person in multi-style, we will add it later. @kn12

@JD1234JD1234
Copy link

@haofanwang will there be a SD 1.5 version? "our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin" - https://arxiv.org/abs/2401.07519

@ResearcherXman ResearcherXman pinned this issue Jan 24, 2024
@arjun810
Copy link

Only human face.

Would it be possible for us to train an alternative adapter that would work on other inputs?

@ResearcherXman ResearcherXman unpinned this issue Jan 24, 2024
@ResearcherXman ResearcherXman pinned this issue Jan 24, 2024
@zewolf5
Copy link

zewolf5 commented Jan 24, 2024

When I create images where the face is covering most of the image, the quality and likeness is VERY good, but once going to full body portraits like someone riding a horse or such, the quality drops noticeably (like the situations where one normally would click "restore face" in a1111). Anyone know a good tip on how to do an "Adetailer"ish fix, basically focus+upscale the face, swap, and then downscale it back again? Not even sure that is possible. I am not yet fluent enough in SD Python coding.

@I8Robot
Copy link

I8Robot commented Jan 25, 2024

The WeChat group is out of data

@haofanwang
Copy link
Member Author

@arjun810 Sure. It definitely deserves a try.

@haofanwang
Copy link
Member Author

@zewolf5 Is the face too small?

@haofanwang
Copy link
Member Author

@I8Robot Updated.

@cubiq
Copy link

cubiq commented Jan 25, 2024

hey guys, I'm trying to port InstantID natively to comfyUI. I worked on the IPAdapter extension and the code looks very similar.

If I understand correctly it's like a FaceID model with additional controlnet. Contrary to IPAdapter you don't use zeroed uncond embeds, is that right? I was a little surprised to see that.

Also the controlnet just seems to takes the keypoints from insightface, right? The controlnet doesn't seem to work in comfyui and I'm not sure why yet. I'm surprised it's so effective with just the keypoints btw

@zewolf5
Copy link

zewolf5 commented Jan 25, 2024

@zewolf5 Is the face too small?

I do not think the face is too small. If i generate a normal non-InstantID, the face will look OK. I am using other SDXL base models than the YamerMIX_v8 one. Tried a few models, they mostly behave the same. Below I am using RealitiesEdgeXL_20.

Using 5 random Swift pictures as the face, and a image generated normally as pose reference:
CloseUpPortrait
CloseUpPortrait_InstantID

Then further away:
WomanRidingHorse
WomanRidingHorse_InstantID

There tend to be some bad areas between the nose and mouth, and then the often the eyes get pushed up. Like its been rendered at a lower resolution and upscaled'ish. Just my subjective opinion.

So what I see is that the quality of the face has an earlier "point of degradation" when face gets smaller than normal rendering. Almost like SD15.

Other than the smaller faces, the results are fantastic.

Wishlist: ADetailer solution for all the faces in the image.

@xiaohu2015
Copy link

xiaohu2015 commented Jan 25, 2024

hey guys, I'm trying to port InstantID natively to comfyUI. I worked on the IPAdapter extension and the code looks very similar.

If I understand correctly it's like a FaceID model with additional controlnet. Contrary to IPAdapter you don't use zeroed uncond embeds, is that right? I was a little surprised to see that.

Also the controlnet just seems to takes the keypoints from insightface, right? The controlnet doesn't seem to work in comfyui and I'm not sure why yet. I'm surprised it's so effective with just the keypoints btw

maybe you can refer to https://github.com/xiaohu2015/IP-Adapter/blob/instantid/instantid_demo.ipynb

the controlnet uses id embedding as condition instead of text embeds

@cubiq
Copy link

cubiq commented Jan 25, 2024

maybe you can refer to https://github.com/xiaohu2015/IP-Adapter/blob/instantid/instantid_demo.ipynb

the controlnet uses id embedding as condition instead of text embeds

hey nice to see you here @xiaohu2015 😄

yeah that is what I'm doing, the IPAdapter part is done but the controlnet doesn't seem to react well.

I'm sending the KPS to the controlnet but I get something like this

ComfyUI_temp_bjtfr_00006_

So I'm trying to understand where the problem is

@caldwecg
Copy link

I was able to get good results from the following configuration: https://huggingface.co/spaces/InstantX/InstantID

So I replicated it with the same settings as seen in the text file attached.
instantID.txt

The only thing I changed was the image resizing to be 512x512 because I keep running out of cuda memory when trying with a bigger image. Is there anything else that would be causing it to look so poor? Thanks

Below are the input and output:
image_0 (1)
image

@haofanwang
Copy link
Member Author

haofanwang commented Jan 29, 2024

For 512x512 generation, you can try with SDXL-turbo model, which performs much better than SDXL on this size. @caldwecg

@kackbob
Copy link

kackbob commented Jan 30, 2024

How to release vram,after generated image

@cubiq
Copy link

cubiq commented Feb 10, 2024

InstantID finally supported natively in ComfyUI (instead of sandboxed with diffusers). Have fun!

https://github.com/cubiq/ComfyUI_InstantID

@k15201363625
Copy link

the QR code of Wechat group is expired

@Lotayou
Copy link

Lotayou commented Feb 19, 2024

@haofangwang QR code expired. Is there any other channel for discussion?

@vrrusso
Copy link

vrrusso commented Feb 22, 2024

@SlZeroth We use this realistic base model

Do you have any guideline using this base? I have made unsuccessfull attempts based on the github code. Thank you!

@wuliebucha
Copy link

感谢大家对我们项目的兴趣。为了更清楚地说明,我们如下说明与之前作品的差异。

(1)与Dreambooth、Textual Inverison、LoRA等相比,我们在推理阶段是免调整的,这意味着我们不需要从特定的人那里收集多个图像并对其进行微调。我们认为最近的工作 PhotoMaker 是一种 LoRA,因为它以 PEFT 方式训练 UNet,并且需要构建以人为中心的文本图像数据集。令人惊讶的是,我们的结果与微调方法相当甚至更好。

(2) 我们的工作与IP-Adapter最为相似。我们遵循其解耦的交叉注意力设计,并且与社区中的其他模型一样可插拔和兼容。但我们还引入了 IdentityNet(ControlNet 的变体)以获得更好的 ID 保留功能。

我们愿意在这里讨论任何事情,您可以发布您的发现并与我们分享。我们还建了一个微信群,方便大家讨论。

mmqrcode1707013990475

二维码过期了,能重新分享一下吗?想进群

@plienhar
Copy link

plienhar commented Mar 1, 2024

A significant amount of generated portraits feature a rather long and unnatural neck, is there a workaround to alleviate this issue?

@haofanwang
Copy link
Member Author

A significant amount of generated portraits feature a rather long and unnatural neck, is there a workaround to alleviate this issue?

You can add another pose ControlNet.

@haofanwang
Copy link
Member Author

@wuliebucha @Lotayou @k15201363625 Updated.

@tobuta
Copy link

tobuta commented Mar 15, 2024

@wuliebucha @Lotayou @k15201363625 Updated.

Can you update it again? Didn't notice.

@plienhar
Copy link

Thanks @haofanwang! It works (and adds a bit of latency obviously). Another question: do you think the model is capable of processing group pictures, i.e. with more than one individual? Is it something you tried? Extracting facial keypoints for all individuals is not an issue but I have troubles to figure out how this would work on the face embedding side / how the model would correctly assign the right face embedding to the right face on the picture..

@bettyYsj
Copy link

二维码过期了,可以辛苦再更新一次嘛?十分感谢 @haofanwang

@remember00000
Copy link

could you provide some information to retrain this model?

@janced
Copy link

janced commented Apr 21, 2024

@haofanwang 可以更新下微信群二维码吗?非常感谢!

@zdxpan
Copy link

zdxpan commented Apr 24, 2024

求加入,最近的一些人像特征技术一起交流!
image

@yml-blog
Copy link

求加入

@yml-blog
Copy link

二维码过期了

@CsChoy
Copy link

CsChoy commented Apr 29, 2024

求加群。谢谢

@niuxiaozhang
Copy link

麻烦更新一下二维码,谢谢

@haofanwang
Copy link
Member Author

Updated.

@kisstea
Copy link

kisstea commented May 21, 2024

可以在更新一下二维码么

@haofanwang
Copy link
Member Author

Sure. Updated. @kisstea

@pranurs
Copy link

pranurs commented Jun 22, 2024

Why is torch > 2.1 necessary? Due to my CUDA version I can only use torch 2.0.1 - is this a problem?

@haofanwang
Copy link
Member Author

@pranurs 2.0.1 should be fine.

@wuliebucha
Copy link

@haofanwang 可以更新一下二维码吗?一直想进群,错过了,感谢

@haofanwang
Copy link
Member Author

@wuliebucha Sure, updated.

@michaelmalice
Copy link

Hi, do you know when there will be a release that doesn't use insightface? I have read that you are working on it but haven't seen any updates.

@ykj467422034
Copy link

可以麻烦再更新一下二维码吗?我也还没有入群,谢谢! @haofanwang

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests