Diffusion LoRA Style Transfer

This project aims to tackle the Neural Style Transfer problem using a fine-tuned diffusion model with the LoRA technique. It is demonstrated that the style transfer can be achieved with under 1 minute of fine-tuning on a single RTX3090 using this method.

Motivation

Neural style transfer methods such as Cycle-GAN has achieved high performance. However, it requires extensive training time and memory usage due to the need to optimize 4 neural networks from scratch. This project explores an alternative way to tackle this problem using diffusion models by leveraging their large-scale pretraining advantage to achieve faster convergence time with easier objective and low memory requirement for achieving efficient style transfer.

Methodology

We first fine-tune the diffusion model using Monet paintings and their corresponding painting names as captions. We prepend "A Monet painting," as an identifier to associate this phrase with the Monet style inspired by DreamBooth. We fine-tune the model using LoRA parameter efficient fine-tuning method.

Once the model has learned the target style distribution, we use the model to denoise the diffused latent vector from N-th steps. We designed this pipeline based on the insight from SDEdit that we can solve SDE from any intermediate timestep to modify the original image. To retain the original image details of the original image, we further add the IP-Adapter as an image condition to the denoiser.

Data

We use the Monet painting dataset from WikiArt as our experimental dataset. It can be downloaded here.

How to use this repository

Monet dataset caption generation

python ./data/caption.py

LoRA fine-tuning

./scripts/train.sh

Transfer style to content image

./scripts/infer_img.sh

Apply style to all content images in a folder

./scripts/infer_img.sh

Hyperparameter Tips

The outcomes largely depend on two hyperparameters including --image_cond_scale and --strength. The first hyperparameter determines how strong we condition the original image on the output. If we want the output to be closer to the original image, set this value high (close to 1.0). The second hyperparameter indicates how many steps that we diffuse the latent vector, the higher this value is, the closer the output to the Monet distribution is. But if the strength is too high, the outcome will be far from the original image.

Footnote

This work is one of the experiments in the final project of Big Data Intelligence Fall 2024 course at Tsinghua University 🟣. We would like to express our sincere gratitude to this course !

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assets		assets
data		data
inference		inference
scripts		scripts
train		train
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Diffusion LoRA Style Transfer

Motivation

Methodology

Data

How to use this repository

Monet dataset caption generation

LoRA fine-tuning

Transfer style to content image

Apply style to all content images in a folder

Hyperparameter Tips

Footnote

About

Releases

Packages

Languages

trapoom555/Diffusion-LoRA-Style-Transfer

Folders and files

Latest commit

History

Repository files navigation

Diffusion LoRA Style Transfer

Motivation

Methodology

Data

How to use this repository

Monet dataset caption generation

LoRA fine-tuning

Transfer style to content image

Apply style to all content images in a folder

Hyperparameter Tips

Footnote

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages