Open
Description
Model/Pipeline/Scheduler description
SUPIR is a super-resolution model that looks like it produces excellent results
Github Repo: https://github.com/Fanghua-Yu/SUPIR
The model is quite memory intensive, so the optimisation features available in diffusers might be quite helpful in making this accessible to lower resource GPUs.
Open source status
- The model implementation is available.The model weights are available (Only relevant if addition is not a scheduler).
Provide useful links for the implementation
No response
Metadata
Metadata
Assignees
Type
Projects
Milestone
Relationships
Development
No branches or pull requests
Activity
nxbringr commentedon Mar 5, 2024
Hey @DN6, can I please work on this?
yiyixuxu commentedon Mar 5, 2024
@ihkap11 hey! sure!
Bhavay-2001 commentedon Mar 18, 2024
Hi @yiyixuxu, anyone working on this? Can I also contribute? Please let me know how may I proceed?
nxbringr commentedon Mar 18, 2024
Hey @Bhavay-2001 I'm currently working on this. Will post the PR here soon.
I can tag you on the PR if I there is something I need help with :)
Bhavay-2001 commentedon Mar 18, 2024
ok great. Pls let me know.
Thanks
landmann commentedon Mar 29, 2024
@ihkap11 how's it going 😁 I'd loooooove to have this
nxbringr commentedon Mar 29, 2024
Hey @landmann I'll post the PR this weekend and tag you if you want to contribute to it :) apologies for the delay, it's my first new model implementation PR
landmann commentedon Mar 29, 2024
You a real champ 🙌
Happy Friday, my gal/dude!
nxbringr commentedon Mar 31, 2024
Initial Update:
Paper Insights
Motivation:
perceptual effects and intelligence of IR results.
Architecture Overview:
Generative Prior: The authors choose SDXL (Stable Diffusion XL) as the backbone for their generative prior due to its high-resolution image generation capability without hierarchical design.
Degradation-Robust Encoder: They fine-tune the SDXL encoder to make it robust to degradation, enabling effective mapping of low-quality (LQ) images to the latent space.
Large-Scale Adaptor: The author designed a new adaptor with network trimming and a ZeroSFT connector to control the generation process at the pixel level.
Issues with existing adaptors
Why do we need this?
Multi-Modality Language Guidance: They incorporate the LLaVA multi-modal large language model to understand image content and guide the restoration process using textual prompts.
Restoration-Guided Sampling: They propose a modified sampling method to selectively guide the prediction results to be close to the LQ image, ensuring fidelity in the restored image.
Thoughts on implementation details:
sd_xl_base_1.0_0.9vae.safetensors
as base pre-trained generative prior.To cover later:
I'm currently in the process of breaking down SUPIR code into diffusers artefacts and figuring out optimization techniques to make it compatible with low-resource GPUs.
Feel free to correct me or start a discussion on this thread. Let me know if you wish to collaborate, I'm happy to set up discussions and work on it together :).
landmann commentedon Apr 1, 2024
Looks fantastic! How far along did you get, @ihkap11 ?
Btw, a good reference for the input parameters are here https://replicate.com/cjwbw/supir?prediction=32glqstbvpjjppxmvcge5gsncu
landmann commentedon Apr 3, 2024
@ihkap11 how you doing? Which part are you stuck?
24 remaining items