Open
Description
Model/Pipeline/Scheduler description
SUPIR is a super-resolution model that looks like it produces excellent results
Github Repo: https://github.com/Fanghua-Yu/SUPIR
The model is quite memory intensive, so the optimisation features available in diffusers might be quite helpful in making this accessible to lower resource GPUs.
Open source status
- The model implementation is available.The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
No response
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
nxbringr commentedon Mar 5, 2024
Hey @DN6, can I please work on this?
yiyixuxu commentedon Mar 5, 2024
@ihkap11 hey! sure!
Bhavay-2001 commentedon Mar 18, 2024
Hi @yiyixuxu, anyone working on this? Can I also contribute? Please let me know how may I proceed?
nxbringr commentedon Mar 18, 2024
Hey @Bhavay-2001 I'm currently working on this. Will post the PR here soon.
I can tag you on the PR if I there is something I need help with :)
Bhavay-2001 commentedon Mar 18, 2024
ok great. Pls let me know.
Thanks
landmann commentedon Mar 29, 2024
@ihkap11 how's it going 😁 I'd loooooove to have this
nxbringr commentedon Mar 29, 2024
Hey @landmann I'll post the PR this weekend and tag you if you want to contribute to it :) apologies for the delay, it's my first new model implementation PR
landmann commentedon Mar 29, 2024
You a real champ 🙌
Happy Friday, my gal/dude!
nxbringr commentedon Mar 31, 2024
Initial Update:
Paper Insights
Motivation:
perceptual effects and intelligence of IR results.
Architecture Overview:
Generative Prior: The authors choose SDXL (Stable Diffusion XL) as the backbone for their generative prior due to its high-resolution image generation capability without hierarchical design.
Degradation-Robust Encoder: They fine-tune the SDXL encoder to make it robust to degradation, enabling effective mapping of low-quality (LQ) images to the latent space.
Large-Scale Adaptor: The author designed a new adaptor with network trimming and a ZeroSFT connector to control the generation process at the pixel level.
Issues with existing adaptors
Why do we need this?
Multi-Modality Language Guidance: They incorporate the LLaVA multi-modal large language model to understand image content and guide the restoration process using textual prompts.
Restoration-Guided Sampling: They propose a modified sampling method to selectively guide the prediction results to be close to the LQ image, ensuring fidelity in the restored image.
Thoughts on implementation details:
sd_xl_base_1.0_0.9vae.safetensors
as base pre-trained generative prior.To cover later:
I'm currently in the process of breaking down SUPIR code into diffusers artefacts and figuring out optimization techniques to make it compatible with low-resource GPUs.
Feel free to correct me or start a discussion on this thread. Let me know if you wish to collaborate, I'm happy to set up discussions and work on it together :).
landmann commentedon Apr 1, 2024
Looks fantastic! How far along did you get, @ihkap11 ?
Btw, a good reference for the input parameters are here https://replicate.com/cjwbw/supir?prediction=32glqstbvpjjppxmvcge5gsncu
landmann commentedon Apr 3, 2024
@ihkap11 how you doing? Which part are you stuck?
9 remaining items
gitlabspy commentedon Jun 21, 2024
Any progress👀?
elismasilva commentedon Jun 27, 2024
hi @ihkap11, have news?
nxbringr commentedon Jun 27, 2024
Hey! I tried but couldn't get this working. Feel free to take over the implementation for this Issue.
elismasilva commentedon Jun 27, 2024
But do you have a branch where we can continue where you left off? I might try this after I finish a project I'm involved with.
sayakpaul commentedon Jun 29, 2024
Cc: @asomoza
asomoza commentedon Jun 30, 2024
Just in case, this is not an easy task, everything is in the sgm format so there's a lot of conversion involved. It requires a deep understanding of the original code and diffusers.
Probably the best choice here is to start as a research project and convert all the sgm code to diffusers, and then when stuck, get help from the maintainers and the community.
zdxpan commentedon Sep 4, 2024
accoding to the paper and the Comfyui
will impliment below:
3 SUPIR-controlnet : which take latents in and time stpes in , generate controlnet residuals dowsamples and midsample out
not impliment
4 An hacked Unet which modify the connector of each dow and up blocks use ZeroSFT
elismasilva commentedon Jun 9, 2025
hi @DN6 i already did it and it is working fine see results: https://imgsli.com/Mzg2NTUx/0/1. But then I saw that they changed the license and it seems that it is very restrictive. I don't know how to deal with this case. I even sent an email to the owner but it's been a month since I got a response.
asomoza commentedon Jun 9, 2025
They changed the license from MIT to "something unnamed" literally two days after the original post here, didn't see that.
Don't know why they keep changing it to a custom one, probably because there's some other party involved, but basically, this license is the same as Flux, you can use it and change it freely if it's non-commercial which is fine, we have other models with the same type of custom license.
AFAIK this is still the SOTA upscaler model so it would be nice to have it.
elismasilva commentedon Jun 9, 2025
they have this on their new license: "b. Proprietary Neural Network Components: Notwithstanding the open-source license granted in Section 2(a) for open-source components, the Licensor retains all rights, title, and interest in and to the proprietary Neural Network Components developed and trained by the Licensor. The license granted herein does not include rights to use, copy, modify, distribute, or create derivative works of these proprietary Neural Network Components without obtaining express written permission from the Licensor."
The license is a bit contradictory because at the same time it says that parts of the code are open source, another part cannot be changed. I wonder if in the case where I converted the model to .safetensors I would be violating this clause.
asomoza commentedon Jun 9, 2025
I understand that part (I'm not expert) as to referring to derivative works only, that's a clause for limiting the use of the model to do something similar to circumvent the license and then use it commercially, like using its outputs to train a similar model.
When integrating it with diffusers and changing the format for diffusers, and of course copying the license and referencing them as copyright owners, you're not using it as a derivative work but as the original model.
On the other part, if we interpret it as for everything and not just derivative works, it invalidates all from before and at that point they should just remove "open source" from it and just say that any use of the model will need their permission.
But still it would be a lot better to get the permission directly from the authors, since it's a not easy to understand license.
elismasilva commentedon Jun 9, 2025
I agree, it's not easy to understand this license. Let's see if anyone else has any thoughts on this. I don't think the authors will answer me as it's been a long time.
vladmandic commentedon Jun 9, 2025
imo, it's a non-standard license, but it's pretty clear since it several times explicitly defines " neural network architecture, weights and biases", not model file and/or packaging as such. conversion to safetensors is NOT derivative work since it fully maintains original "architecture, weights and biases".
as an example, under this license, converting to something like gguf with pre-quantized weights would be a no-no without explicit permission.
elismasilva commentedon Jun 10, 2025
Good point. And in this case I saw a version of the model files that seems to have been pruned because it was only 2GB or they converted it to fp16 I don't know for sure. In my case I just split the modules in diffusers format to be loaded as pretrained so at most what happened is that some keys had to be adapted in the case of the encoder, even so there was no change in the content.