Skip to content

Image to image style transfer with LoRA fine-tuning on diffusion model

Notifications You must be signed in to change notification settings

trapoom555/Diffusion-LoRA-Style-Transfer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Diffusion LoRA Style Transfer

This project aims to tackle the Neural Style Transfer problem using a fine-tuned diffusion model with the LoRA technique. It is demonstrated that the style transfer can be achieved with under 1 minute of fine-tuning on a single RTX3090 using this method.

Motivation

Neural style transfer methods such as Cycle-GAN has achieved high performance. However, it requires extensive training time and memory usage due to the need to optimize 4 neural networks from scratch. This project explores an alternative way to tackle this problem using diffusion models by leveraging their large-scale pretraining advantage to achieve faster convergence time with easier objective and low memory requirement for achieving efficient style transfer.

Methodology

We first fine-tune the diffusion model using Monet paintings and their corresponding painting names as captions. We prepend "A Monet painting," as an identifier to associate this phrase with the Monet style inspired by DreamBooth. We fine-tune the model using LoRA parameter efficient fine-tuning method.

Once the model has learned the target style distribution, we use the model to denoise the diffused latent vector from N-th steps. We designed this pipeline based on the insight from SDEdit that we can solve SDE from any intermediate timestep to modify the original image. To retain the original image details of the original image, we further add the IP-Adapter as an image condition to the denoiser.

Data

We use the Monet painting dataset from WikiArt as our experimental dataset. It can be downloaded here.

How to use this repository

Monet dataset caption generation

python ./data/caption.py

LoRA fine-tuning

./scripts/train.sh

Transfer style to content image

./scripts/infer_img.sh

Apply style to all content images in a folder

./scripts/infer_img.sh

Hyperparameter Tips

The outcomes largely depend on two hyperparameters including --image_cond_scale and --strength. The first hyperparameter determines how strong we condition the original image on the output. If we want the output to be closer to the original image, set this value high (close to 1.0). The second hyperparameter indicates how many steps that we diffuse the latent vector, the higher this value is, the closer the output to the Monet distribution is. But if the strength is too high, the outcome will be far from the original image.

Footnote

This work is one of the experiments in the final project of Big Data Intelligence Fall 2024 course at Tsinghua University 🟣. We would like to express our sincere gratitude to this course !

About

Image to image style transfer with LoRA fine-tuning on diffusion model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published