Skip to content

Add SUPIR Upscaler #7219

Open
Open
@DN6

Description

@DN6
Collaborator

Model/Pipeline/Scheduler description

SUPIR is a super-resolution model that looks like it produces excellent results

Github Repo: https://github.com/Fanghua-Yu/SUPIR

The model is quite memory intensive, so the optimisation features available in diffusers might be quite helpful in making this accessible to lower resource GPUs.

Open source status

  • The model implementation is available.
    The model weights are available (Only relevant if addition is not a scheduler).

Provide useful links for the implementation

No response

Activity

nxbringr

nxbringr commented on Mar 5, 2024

@nxbringr
Contributor

Hey @DN6, can I please work on this?

yiyixuxu

yiyixuxu commented on Mar 5, 2024

@yiyixuxu
Collaborator

@ihkap11 hey! sure!

Bhavay-2001

Bhavay-2001 commented on Mar 18, 2024

@Bhavay-2001
Contributor

Hi @yiyixuxu, anyone working on this? Can I also contribute? Please let me know how may I proceed?

nxbringr

nxbringr commented on Mar 18, 2024

@nxbringr
Contributor

Hey @Bhavay-2001 I'm currently working on this. Will post the PR here soon.
I can tag you on the PR if I there is something I need help with :)

Bhavay-2001

Bhavay-2001 commented on Mar 18, 2024

@Bhavay-2001
Contributor

ok great. Pls let me know.
Thanks

landmann

landmann commented on Mar 29, 2024

@landmann
Contributor

@ihkap11 how's it going 😁 I'd loooooove to have this

nxbringr

nxbringr commented on Mar 29, 2024

@nxbringr
Contributor

Hey @landmann I'll post the PR this weekend and tag you if you want to contribute to it :) apologies for the delay, it's my first new model implementation PR

landmann

landmann commented on Mar 29, 2024

@landmann
Contributor

You a real champ 🙌
Happy Friday, my gal/dude!

nxbringr

nxbringr commented on Mar 31, 2024

@nxbringr
Contributor

Initial Update:

  • Understood the paper (paper highlights below)
  • Currently defining paper components that will become diffuser artefacts. (WIP: breaking down SUPIR code)
Paper Insights

Motivation:

  • IR methods based on generative priors leverage powerful pre-trained generative models to introduce high-quality generation and prior knowledge into IR, bringing significant progress in
    perceptual effects and intelligence of IR results.
  • Continuously enhancing the capabilities of the generative prior is key to achieving more intelligent IR results, with model scaling being a crucial and effective approach.
  • The authors propose scaling up generative priors and training data to address these limitations.

Architecture Overview:

  1. Generative Prior: The authors choose SDXL (Stable Diffusion XL) as the backbone for their generative prior due to its high-resolution image generation capability without hierarchical design.

  2. Degradation-Robust Encoder: They fine-tune the SDXL encoder to make it robust to degradation, enabling effective mapping of low-quality (LQ) images to the latent space.

  3. Large-Scale Adaptor: The author designed a new adaptor with network trimming and a ZeroSFT connector to control the generation process at the pixel level.

    Issues with existing adaptors
    • LoRA limits generation but struggles with LQ image control
    • T2I lacks the capacity for effective LQ image content identification
    • ControlNet’s direct copy is challenging for the SDXL model scale.
    1. Network Trimming: Modify the adaptor architecture by trimming half of the ViT blocks in each encoder block (of SDXL) to achieve a balance between network capacity and computational feasibility.
    2. Redesigning the Connector: The introduced ZeroSFT module is built upon zero convolution and incorporates an additional spatial feature transfer (SFT) operation and group normalization.
    Why do we need this?
    • The authors note that while SDXL's generative capacity delivers excellent visual effects, it also makes precise pixel-level control challenging.
    • ControlNet uses zero convolution for generation guidance, but relying solely on residuals is insufficient for the level of control required by IR tasks.
  4. Multi-Modality Language Guidance: They incorporate the LLaVA multi-modal large language model to understand image content and guide the restoration process using textual prompts.

  5. Restoration-Guided Sampling: They propose a modified sampling method to selectively guide the prediction results to be close to the LQ image, ensuring fidelity in the restored image.

Thoughts on implementation details:

  • Trainable components are degradation robust encoder and trimmed ControlNet.
  • Extend the SDXL class from Diffusers and use SDXL checkpoint = sd_xl_base_1.0_0.9vae.safetensors as base pre-trained generative prior.
  • The SUPIR model will first load pre-trained weights from the SDXL checkpoint, then it will load SUPIR-specific weights, which include the modifications and additions made to adapt the SDXL model for image restoration tasks.
  • Trimmed ControlNet encoder which trims half of the ViT blocks from each encoder block. (Todo: Figure out where to make this change)
  • In the SUPIR model, SDXL (Stable Diffusion XL) is used as the backbone for the generative prior, and the GLVControl and LightGLVUNet modules are used as the adaptor to guide the SDXL model for image restoration. Todo: Convert to Diffusers Artifact
  • Probably, a dummy code would look like this:
class SUPIRModel(nn.Module):
    def __init__(self, sdxl_model_path):
        super().__init__()
        self.sdxl_pipeline = StableDiffusionXLPipeline.from_pretrained(sdxl_model_path)
        self.glv_control = GLVControl(in_channels=3, out_channels=64, context_dim=128)
        self.light_glv_unet = LightGLVUNet(in_channels=3, out_channels=3)
        
    def forward(self, lq_image, context, num_inference_steps=50):
        # Generate control signal using GLVControl
        control_signal = self.glv_control(lq_image, context)
        
        # Use SDXL pipeline for guided diffusion
        restored_image = self.sdxl_pipeline(
            prompt="",
            image=lq_image,
            control_image=control_signal,
            num_inference_steps=num_inference_steps,
            generator=None,
        ).images[0]
        
        # Refine the restored image using LightGLVUNet
        refined_image = self.light_glv_unet(restored_image, control_signal)
        
        return refined_image
  • ZeroFST acts as a connector. Todo: Convert to Diffusers Artifact

To cover later:

  • LLaVA for multi-modality language guidance.

I'm currently in the process of breaking down SUPIR code into diffusers artefacts and figuring out optimization techniques to make it compatible with low-resource GPUs.

Feel free to correct me or start a discussion on this thread. Let me know if you wish to collaborate, I'm happy to set up discussions and work on it together :).

landmann

landmann commented on Apr 1, 2024

@landmann
Contributor

Looks fantastic! How far along did you get, @ihkap11 ?

Btw, a good reference for the input parameters are here https://replicate.com/cjwbw/supir?prediction=32glqstbvpjjppxmvcge5gsncu

landmann

landmann commented on Apr 3, 2024

@landmann
Contributor

@ihkap11 how you doing? Which part are you stuck?

9 remaining items

gitlabspy

gitlabspy commented on Jun 21, 2024

@gitlabspy

Any progress👀?

elismasilva

elismasilva commented on Jun 27, 2024

@elismasilva
Contributor

hi @ihkap11, have news?

nxbringr

nxbringr commented on Jun 27, 2024

@nxbringr
Contributor

Hey! I tried but couldn't get this working. Feel free to take over the implementation for this Issue.

elismasilva

elismasilva commented on Jun 27, 2024

@elismasilva
Contributor

Hey! I tried but couldn't get this working. Feel free to take over the implementation for this Issue.

But do you have a branch where we can continue where you left off? I might try this after I finish a project I'm involved with.

sayakpaul

sayakpaul commented on Jun 29, 2024

@sayakpaul
Member
asomoza

asomoza commented on Jun 30, 2024

@asomoza
Member

Just in case, this is not an easy task, everything is in the sgm format so there's a lot of conversion involved. It requires a deep understanding of the original code and diffusers.

Probably the best choice here is to start as a research project and convert all the sgm code to diffusers, and then when stuck, get help from the maintainers and the community.

zdxpan

zdxpan commented on Sep 4, 2024

@zdxpan

accoding to the paper and the Comfyui
will impliment below:

  1. SUPIR MODEL_LOADER -> SUPIR_MODEL, SUPIR_vae
image - SUPIR_VAE = vae.from_config and load conveted state_dict - --
  1. SUPIR_FIRSTSTAGE Denoiser : take Low quality image in and blur or smooth image out and it`s latent
  • this stage include an SUPIR VAE include vae-encoder vae-decoder
  • SUPIR_VAE.encoder(LQ_image) ->supir_latent -> SUPIR_VAE.decoder (supir_latent).
image

3 SUPIR-controlnet : which take latents in and time stpes in , generate controlnet residuals dowsamples and midsample out

  • Class trim_controlnet(Controlnet_normal):
    not impliment
image

4 An hacked Unet which modify the connector of each dow and up blocks use ZeroSFT

  • replace Unet-zero_conv-connector
image
elismasilva

elismasilva commented on Jun 9, 2025

@elismasilva
Contributor

hi @DN6 i already did it and it is working fine see results: https://imgsli.com/Mzg2NTUx/0/1. But then I saw that they changed the license and it seems that it is very restrictive. I don't know how to deal with this case. I even sent an email to the owner but it's been a month since I got a response.

asomoza

asomoza commented on Jun 9, 2025

@asomoza
Member

They changed the license from MIT to "something unnamed" literally two days after the original post here, didn't see that.

Don't know why they keep changing it to a custom one, probably because there's some other party involved, but basically, this license is the same as Flux, you can use it and change it freely if it's non-commercial which is fine, we have other models with the same type of custom license.

AFAIK this is still the SOTA upscaler model so it would be nice to have it.

elismasilva

elismasilva commented on Jun 9, 2025

@elismasilva
Contributor

They changed the license from MIT to "something unnamed" literally two days after the original post here, didn't see that.

Don't know why they keep changing it to a custom one, probably because there's some other party involved, but basically, this license is the same as Flux, you can use it and change it freely if it's non-commercial which is fine, we have other models with the same type of custom license.

AFAIK this is still the SOTA upscaler model so it would be nice to have it.

they have this on their new license: "b. Proprietary Neural Network Components: Notwithstanding the open-source license granted in Section 2(a) for open-source components, the Licensor retains all rights, title, and interest in and to the proprietary Neural Network Components developed and trained by the Licensor. The license granted herein does not include rights to use, copy, modify, distribute, or create derivative works of these proprietary Neural Network Components without obtaining express written permission from the Licensor."

The license is a bit contradictory because at the same time it says that parts of the code are open source, another part cannot be changed. I wonder if in the case where I converted the model to .safetensors I would be violating this clause.

asomoza

asomoza commented on Jun 9, 2025

@asomoza
Member

I understand that part (I'm not expert) as to referring to derivative works only, that's a clause for limiting the use of the model to do something similar to circumvent the license and then use it commercially, like using its outputs to train a similar model.

When integrating it with diffusers and changing the format for diffusers, and of course copying the license and referencing them as copyright owners, you're not using it as a derivative work but as the original model.

On the other part, if we interpret it as for everything and not just derivative works, it invalidates all from before and at that point they should just remove "open source" from it and just say that any use of the model will need their permission.

But still it would be a lot better to get the permission directly from the authors, since it's a not easy to understand license.

elismasilva

elismasilva commented on Jun 9, 2025

@elismasilva
Contributor

I agree, it's not easy to understand this license. Let's see if anyone else has any thoughts on this. I don't think the authors will answer me as it's been a long time.

vladmandic

vladmandic commented on Jun 9, 2025

@vladmandic
Contributor

imo, it's a non-standard license, but it's pretty clear since it several times explicitly defines " neural network architecture, weights and biases", not model file and/or packaging as such. conversion to safetensors is NOT derivative work since it fully maintains original "architecture, weights and biases".

as an example, under this license, converting to something like gguf with pre-quantized weights would be a no-no without explicit permission.

elismasilva

elismasilva commented on Jun 10, 2025

@elismasilva
Contributor

imo, it's a non-standard license, but it's pretty clear since it several times explicitly defines " neural network architecture, weights and biases", not model file and/or packaging as such. conversion to safetensors is NOT derivative work since it fully maintains original "architecture, weights and biases".

as an example, under this license, converting to something like gguf with pre-quantized weights would be a no-no without explicit permission.

Good point. And in this case I saw a version of the model files that seems to have been pruned because it was only 2GB or they converted it to fp16 I don't know for sure. In my case I just split the modules in diffusers format to be loaded as pretrained so at most what happened is that some keys had to be adapted in the case of the encoder, even so there was no change in the content.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @asomoza@DN6@yiyixuxu@landmann@zdxpan

        Issue actions

          Add SUPIR Upscaler · Issue #7219 · huggingface/diffusers