Discuss anything here #6

haofanwang · 2024-01-17T12:14:58Z

Thanks for all interests in our project. To make it more clearer, we illustrate differences with previous works as following.

(1) Compared to Dreambooth, Textual Inverison, LoRA, etc., we are tuning-free during the inference phase, which means we do not need to collect multiple images from a specific person and fine-tune them. We consider the recent work PhotoMaker to be a type of LoRA as it trains UNet but in a PEFT manner and requires building a human-centered text image dataset. Surprisingly, our results were comparable or even better than the fine-tuned approach.

(2) Our work is most similar to IP-Adapter. We follow its decoupled cross-attention design and are as pluggable and compatible with other models in the community. But we additionally introduce IdentityNet (a variant of ControlNet) to obtain better ID retention capabilities.

We are open to discuss anything here, you can post your finding and share with us. We also make a WeChat group to facilitate discussion.

ucas010 · 2024-01-18T01:22:41Z

有web demo么？gradio的那种？

haofanwang · 2024-01-18T02:08:03Z

@ucas010 Yes, we are working with huggingface to develop a gradio demo, it will be ready in the next few days.

xiaohu2015 · 2024-01-21T09:34:20Z

very good work

h3clikejava · 2024-01-22T01:31:02Z

very good work

膜拜大神

wangqixun · 2024-01-22T13:27:49Z

有web demo么？gradio的那种？

https://huggingface.co/spaces/InstantX/InstantID

bsenftner · 2024-01-22T13:30:05Z

What are the hardware requirements for people that want to run local? I did not see that listed anywhere.

haofanwang · 2024-01-22T14:03:58Z

We don't have any specific requirement. If you can run SDXL locally (24GB VRAM), everything should be fine.

SlZeroth · 2024-01-22T15:10:37Z

@haofanwang

what kind of custom stable diffusion model do you use to generate this link in the image ?

thank you!

JD1234JD1234 · 2024-01-22T18:50:09Z

does it work with sd1.5?

caldwecg · 2024-01-23T03:49:32Z

Hello, I'm really excited to try this out. Would you mind supplying the necessary diffusers, cuda, and pytorch requirements? Not sure why it is failing to load the pipleine without throwing an error. It just says pipe is 'None'

File "/home/ubuntu/instantID.py", line 37, in
pipe.load_ip_adapter_instantid(face_adapter)
AttributeError: 'NoneType' object has no attribute 'load_ip_adapter_instantid'

thanks in advance!

haofanwang · 2024-01-23T03:55:26Z

Let's answer together at this post.

@JD1234JD1234 No, we only provide checkpoints for SDXL.

@caldwecg It seems to be problem with your pipeline. I believe diffusers==0.25.0 will be ok.

haofanwang · 2024-01-23T03:56:40Z

@SlZeroth We use this realistic base model

arjun810 · 2024-01-23T10:10:29Z

Does this only work with faces? How could we adapt it to other things like objects or animals?

haofanwang · 2024-01-23T10:28:03Z

Only human face.

kn12 · 2024-01-23T16:40:16Z

When there are multiple faces, it seems that only one person can be recognized.Can it be fixed

haofanwang · 2024-01-23T16:42:02Z

We only detect the biggest face from the given image at this moment. For multi-person in multi-style, we will add it later. @kn12

JD1234JD1234 · 2024-01-23T18:55:54Z

@haofanwang will there be a SD 1.5 version? "our work seamlessly integrates with popular pre-trained text-to-image diffusion models like SD1.5 and SDXL, serving as an adaptable plugin" - https://arxiv.org/abs/2401.07519

arjun810 · 2024-01-24T10:42:18Z

Only human face.

Would it be possible for us to train an alternative adapter that would work on other inputs?

zewolf5 · 2024-01-24T19:10:02Z

When I create images where the face is covering most of the image, the quality and likeness is VERY good, but once going to full body portraits like someone riding a horse or such, the quality drops noticeably (like the situations where one normally would click "restore face" in a1111). Anyone know a good tip on how to do an "Adetailer"ish fix, basically focus+upscale the face, swap, and then downscale it back again? Not even sure that is possible. I am not yet fluent enough in SD Python coding.

I8Robot · 2024-01-25T02:41:08Z

The WeChat group is out of data

haofanwang · 2024-01-25T03:02:19Z

@arjun810 Sure. It definitely deserves a try.

haofanwang · 2024-01-25T03:06:23Z

@zewolf5 Is the face too small?

haofanwang · 2024-01-25T03:08:34Z

@I8Robot Updated.

cubiq · 2024-01-25T07:23:10Z

hey guys, I'm trying to port InstantID natively to comfyUI. I worked on the IPAdapter extension and the code looks very similar.

If I understand correctly it's like a FaceID model with additional controlnet. Contrary to IPAdapter you don't use zeroed uncond embeds, is that right? I was a little surprised to see that.

Also the controlnet just seems to takes the keypoints from insightface, right? The controlnet doesn't seem to work in comfyui and I'm not sure why yet. I'm surprised it's so effective with just the keypoints btw

zewolf5 · 2024-01-25T07:59:01Z

@zewolf5 Is the face too small?

I do not think the face is too small. If i generate a normal non-InstantID, the face will look OK. I am using other SDXL base models than the YamerMIX_v8 one. Tried a few models, they mostly behave the same. Below I am using RealitiesEdgeXL_20.

Using 5 random Swift pictures as the face, and a image generated normally as pose reference:

Then further away:

There tend to be some bad areas between the nose and mouth, and then the often the eyes get pushed up. Like its been rendered at a lower resolution and upscaled'ish. Just my subjective opinion.

So what I see is that the quality of the face has an earlier "point of degradation" when face gets smaller than normal rendering. Almost like SD15.

Other than the smaller faces, the results are fantastic.

Wishlist: ADetailer solution for all the faces in the image.

xiaohu2015 · 2024-01-25T08:56:23Z

hey guys, I'm trying to port InstantID natively to comfyUI. I worked on the IPAdapter extension and the code looks very similar.

If I understand correctly it's like a FaceID model with additional controlnet. Contrary to IPAdapter you don't use zeroed uncond embeds, is that right? I was a little surprised to see that.

Also the controlnet just seems to takes the keypoints from insightface, right? The controlnet doesn't seem to work in comfyui and I'm not sure why yet. I'm surprised it's so effective with just the keypoints btw

maybe you can refer to https://github.com/xiaohu2015/IP-Adapter/blob/instantid/instantid_demo.ipynb

the controlnet uses id embedding as condition instead of text embeds

cubiq · 2024-01-25T10:11:09Z

maybe you can refer to https://github.com/xiaohu2015/IP-Adapter/blob/instantid/instantid_demo.ipynb

the controlnet uses id embedding as condition instead of text embeds

hey nice to see you here @xiaohu2015 😄

yeah that is what I'm doing, the IPAdapter part is done but the controlnet doesn't seem to react well.

I'm sending the KPS to the controlnet but I get something like this

So I'm trying to understand where the problem is

caldwecg · 2024-01-29T03:15:59Z

I was able to get good results from the following configuration: https://huggingface.co/spaces/InstantX/InstantID

So I replicated it with the same settings as seen in the text file attached.
instantID.txt

The only thing I changed was the image resizing to be 512x512 because I keep running out of cuda memory when trying with a bigger image. Is there anything else that would be causing it to look so poor? Thanks

Below are the input and output:

haofanwang · 2024-01-29T13:32:41Z

For 512x512 generation, you can try with SDXL-turbo model, which performs much better than SDXL on this size. @caldwecg

kackbob · 2024-01-30T09:13:01Z

How to release vram,after generated image

cubiq · 2024-02-10T12:15:10Z

InstantID finally supported natively in ComfyUI (instead of sandboxed with diffusers). Have fun!

https://github.com/cubiq/ComfyUI_InstantID

k15201363625 · 2024-02-14T02:25:33Z

the QR code of Wechat group is expired

Lotayou · 2024-02-19T08:20:53Z

@haofangwang QR code expired. Is there any other channel for discussion?

vrrusso · 2024-02-22T19:10:08Z

@SlZeroth We use this realistic base model

Do you have any guideline using this base? I have made unsuccessfull attempts based on the github code. Thank you!

wuliebucha · 2024-02-26T07:55:12Z

感谢大家对我们项目的兴趣。为了更清楚地说明，我们如下说明与之前作品的差异。

（1）与Dreambooth、Textual Inverison、LoRA等相比，我们在推理阶段是免调整的，这意味着我们不需要从特定的人那里收集多个图像并对其进行微调。我们认为最近的工作 PhotoMaker 是一种 LoRA，因为它以 PEFT 方式训练 UNet，并且需要构建以人为中心的文本图像数据集。令人惊讶的是，我们的结果与微调方法相当甚至更好。

(2) 我们的工作与IP-Adapter最为相似。我们遵循其解耦的交叉注意力设计，并且与社区中的其他模型一样可插拔和兼容。但我们还引入了 IdentityNet（ControlNet 的变体）以获得更好的 ID 保留功能。

我们愿意在这里讨论任何事情，您可以发布您的发现并与我们分享。我们还建了一个微信群，方便大家讨论。

二维码过期了，能重新分享一下吗？想进群

plienhar · 2024-03-01T13:11:56Z

A significant amount of generated portraits feature a rather long and unnatural neck, is there a workaround to alleviate this issue?

haofanwang · 2024-03-01T13:57:31Z

A significant amount of generated portraits feature a rather long and unnatural neck, is there a workaround to alleviate this issue?

You can add another pose ControlNet.

haofanwang · 2024-03-01T13:59:20Z

@wuliebucha @Lotayou @k15201363625 Updated.

tobuta · 2024-03-15T06:00:18Z

@wuliebucha @Lotayou @k15201363625 Updated.

Can you update it again? Didn't notice.

plienhar · 2024-03-22T11:51:55Z

Thanks @haofanwang! It works (and adds a bit of latency obviously). Another question: do you think the model is capable of processing group pictures, i.e. with more than one individual? Is it something you tried? Extracting facial keypoints for all individuals is not an issue but I have troubles to figure out how this would work on the face embedding side / how the model would correctly assign the right face embedding to the right face on the picture..

bettyYsj · 2024-03-27T10:03:18Z

二维码过期了，可以辛苦再更新一次嘛？十分感谢 @haofanwang

remember00000 · 2024-04-02T07:17:06Z

could you provide some information to retrain this model?

janced · 2024-04-21T14:45:58Z

@haofanwang 可以更新下微信群二维码吗？非常感谢！

zdxpan · 2024-04-24T10:01:13Z

求加入，最近的一些人像特征技术一起交流！

yml-blog · 2024-04-24T15:34:54Z

求加入

yml-blog · 2024-04-24T15:36:42Z

二维码过期了

CsChoy · 2024-04-29T07:49:02Z

求加群。谢谢

niuxiaozhang · 2024-05-10T01:38:58Z

麻烦更新一下二维码，谢谢

haofanwang · 2024-05-10T05:32:15Z

Updated.

kisstea · 2024-05-21T12:26:23Z

可以在更新一下二维码么

haofanwang · 2024-05-22T03:00:12Z

Sure. Updated. @kisstea

pranurs · 2024-06-22T20:33:46Z

Why is torch > 2.1 necessary? Due to my CUDA version I can only use torch 2.0.1 - is this a problem?

haofanwang · 2024-06-23T07:49:30Z

@pranurs 2.0.1 should be fine.

wuliebucha · 2024-06-26T02:18:09Z

@haofanwang 可以更新一下二维码吗？一直想进群，错过了，感谢

haofanwang · 2024-06-26T02:57:33Z

@wuliebucha Sure, updated.

michaelmalice · 2024-06-28T06:18:04Z

Hi, do you know when there will be a release that doesn't use insightface? I have read that you are working on it but haven't seen any updates.

ykj467422034 · 2024-08-09T01:48:13Z

可以麻烦再更新一下二维码吗？我也还没有入群，谢谢！ @haofanwang

ResearcherXman pinned this issue Jan 24, 2024

ResearcherXman unpinned this issue Jan 24, 2024

ResearcherXman pinned this issue Jan 24, 2024

Discuss anything here #6

Discuss anything here #6

Comments

haofanwang commented Jan 17, 2024 • edited Loading

ucas010 commented Jan 18, 2024

haofanwang commented Jan 18, 2024

xiaohu2015 commented Jan 21, 2024

h3clikejava commented Jan 22, 2024

wangqixun commented Jan 22, 2024

bsenftner commented Jan 22, 2024

haofanwang commented Jan 22, 2024

SlZeroth commented Jan 22, 2024 • edited Loading

JD1234JD1234 commented Jan 22, 2024

caldwecg commented Jan 23, 2024

haofanwang commented Jan 23, 2024

haofanwang commented Jan 23, 2024

arjun810 commented Jan 23, 2024 • edited Loading

haofanwang commented Jan 23, 2024

kn12 commented Jan 23, 2024

haofanwang commented Jan 23, 2024

JD1234JD1234 commented Jan 23, 2024

arjun810 commented Jan 24, 2024

zewolf5 commented Jan 24, 2024

I8Robot commented Jan 25, 2024

haofanwang commented Jan 25, 2024

haofanwang commented Jan 25, 2024

haofanwang commented Jan 25, 2024

cubiq commented Jan 25, 2024 • edited Loading

zewolf5 commented Jan 25, 2024

xiaohu2015 commented Jan 25, 2024 • edited Loading

cubiq commented Jan 25, 2024

caldwecg commented Jan 29, 2024

haofanwang commented Jan 29, 2024 • edited Loading

kackbob commented Jan 30, 2024 • edited Loading

cubiq commented Feb 10, 2024

k15201363625 commented Feb 14, 2024

Lotayou commented Feb 19, 2024

vrrusso commented Feb 22, 2024 • edited Loading

wuliebucha commented Feb 26, 2024

plienhar commented Mar 1, 2024

haofanwang commented Mar 1, 2024

haofanwang commented Mar 1, 2024

tobuta commented Mar 15, 2024

plienhar commented Mar 22, 2024

bettyYsj commented Mar 27, 2024

remember00000 commented Apr 2, 2024

janced commented Apr 21, 2024

zdxpan commented Apr 24, 2024

yml-blog commented Apr 24, 2024

yml-blog commented Apr 24, 2024

CsChoy commented Apr 29, 2024

niuxiaozhang commented May 10, 2024

haofanwang commented May 10, 2024

kisstea commented May 21, 2024

haofanwang commented May 22, 2024

pranurs commented Jun 22, 2024

haofanwang commented Jun 23, 2024

wuliebucha commented Jun 26, 2024

haofanwang commented Jun 26, 2024

michaelmalice commented Jun 28, 2024

ykj467422034 commented Aug 9, 2024

haofanwang commented Jan 17, 2024 •

edited

Loading

SlZeroth commented Jan 22, 2024 •

edited

Loading

arjun810 commented Jan 23, 2024 •

edited

Loading

cubiq commented Jan 25, 2024 •

edited

Loading

xiaohu2015 commented Jan 25, 2024 •

edited

Loading

haofanwang commented Jan 29, 2024 •

edited

Loading

kackbob commented Jan 30, 2024 •

edited

Loading

vrrusso commented Feb 22, 2024 •

edited

Loading