SimpleTuner/documentation/quickstart/LUMINA2.md at main · codecadede/SimpleTuner

Lumina2 Quickstart

In this example, we'll be training a Lumina2 LoRA or full model fine-tune.

Hardware requirements

Lumina2 is a 2B parameter model, making it much more accessible than larger models like Flux or SD3. The model's smaller size means:

When training a rank-16 LoRA, it uses:

Approximately 12-14GB VRAM for LoRA training
Approximately 16-20GB VRAM for full model fine-tuning
About 20-30GB of system RAM during startup

You'll need:

Minimum: A single RTX 3060 12GB or RTX 4060 Ti 16GB
Recommended: RTX 3090, RTX 4090, or A100 for faster training
System RAM: At least 32GB recommended

Prerequisites

Make sure that you have python installed; SimpleTuner does well with 3.10 through 3.12.

You can check this by running:

python --version

If you don't have python 3.12 installed on Ubuntu, you can try the following:

apt -y install python3.12 python3.12-venv

Container image dependencies

For Vast, RunPod, and TensorDock (among others), the following will work on a CUDA 12.2-12.8 image:

apt -y install nvidia-cuda-toolkit

Installation

Install SimpleTuner via pip:

pip install simpletuner[cuda]

For manual installation or development setup, see the installation documentation.

Setting up the environment

To run SimpleTuner, you will need to set up a configuration file, the dataset and model directories, and a dataloader configuration file.

Configuration file

Copy config/config.json.example to config/config.json:

cp config/config.json.example config/config.json

There, you will need to modify the following variables:

model_type - Set this to lora for LoRA training or full for full fine-tuning.
model_family - Set this to lumina2.
output_dir - Set this to the directory where you want to store your checkpoints and validation images. It's recommended to use a full path here.
train_batch_size - Can be 1-4 depending on your GPU memory and dataset size.
validation_resolution - Lumina2 supports multiple resolutions. Common options: 1024x1024, 512x512, 768x768.
validation_guidance - Lumina2 uses classifier-free guidance. Values of 3.5-7.0 work well.
validation_num_inference_steps - 20-30 steps work well for Lumina2.
gradient_accumulation_steps - Can be used to simulate larger batch sizes. A value of 2-4 works well.
optimizer - adamw_bf16 is recommended. lion and optimi-stableadamw also work well.
mixed_precision - Keep this as bf16 for best results.
gradient_checkpointing - Set to true to save VRAM.
learning_rate - For LoRA: 1e-4 to 5e-5. For full fine-tuning: 1e-5 to 1e-6.

Lumina2 example configuration

This goes into config.json

{
    "base_model_precision": "int8-torchao",
    "checkpoint_step_interval": 50,
    "data_backend_config": "config/lumina2/multidatabackend.json",
    "disable_bucket_pruning": true,
    "eval_steps_interval": 50,
    "evaluation_type": "clip",
    "flow_schedule_auto_shift": true,
    "gradient_checkpointing": true,
    "hub_model_id": "lumina2-lora",
    "learning_rate": 1e-4,
    "lora_alpha": 16,
    "lora_rank": 16,
    "lora_type": "standard",
    "lr_scheduler": "constant",
    "max_train_steps": 400000,
    "model_family": "lumina2",
    "model_type": "lora",
    "num_train_epochs": 0,
    "optimizer": "adamw_bf16",
    "output_dir": "output/lumina2",
    "push_checkpoints_to_hub": true,
    "push_to_hub": true,
    "quantize_via": "cpu",
    "report_to": "wandb",
    "seed": 42,
    "tracker_project_name": "lumina2-training",
    "tracker_run_name": "lumina2-lora",
    "train_batch_size": 4,
    "use_ema": true,
    "vae_batch_size": 1,
    "validation_disable_unconditional": true,
    "validation_guidance": 4.0,
    "validation_guidance_rescale": 0.0,
    "validation_negative_prompt": "ugly, cropped, blurry, low-quality, mediocre average",
    "validation_num_inference_steps": 40,
    "validation_prompt": "A photo-realistic image of a cat",
    "validation_prompt_library": false,
    "validation_resolution": "1024x1024",
    "validation_seed": 42,
    "validation_step_interval": 50
}

For Lycoris training, switch lora_type to lycoris

Validation prompts

Inside config/config.json is the "primary validation prompt". Additionally, create a prompt library file:

{
  "portrait": "a high-quality portrait photograph with natural lighting",
  "landscape": "a breathtaking landscape photograph with dramatic lighting",
  "artistic": "an artistic rendering with vibrant colors and creative composition",
  "detailed": "a highly detailed image with sharp focus and rich textures",
  "stylized": "a stylized illustration with unique artistic flair"
}

Add to your config:

{
  "--user_prompt_library": "config/user_prompt_library.json"
}

Dataset considerations

Lumina2 benefits from high-quality training data. Create a --data_backend_config (config/multidatabackend.json):

[
  {
    "id": "lumina2-training",
    "type": "local",
    "crop": true,
    "crop_aspect": "square",
    "crop_style": "center",
    "resolution": 1024,
    "minimum_image_size": 512,
    "maximum_image_size": 2048,
    "target_downsample_size": 1024,
    "resolution_type": "pixel_area",
    "cache_dir_vae": "cache/vae/lumina2/training",
    "instance_data_dir": "/datasets/training",
    "caption_strategy": "textfile",
    "metadata_backend": "discovery"
  },
  {
    "id": "text-embeds",
    "type": "local",
    "dataset_type": "text_embeds",
    "default": true,
    "cache_dir": "cache/text/lumina2",
    "disabled": false,
    "write_batch_size": 128
  }
]

Create your dataset directory. Be sure to update this path with your actual location.

mkdir -p /datasets/training
# Place your images and caption files in /datasets/training/

Caption files should have the same name as the image with a .txt extension.

Login to WandB

SimpleTuner has optional tracker support, primarily focused on Weights & Biases. You can disable this with report_to=none.

To enable wandb, run the following commands:

wandb login

Login to Huggingface Hub

To push checkpoints to Huggingface Hub, ensure

huggingface-cli login

Executing the training run

From the SimpleTuner directory, you have several options to start training:

Option 1 (Recommended - pip install):

pip install simpletuner[cuda]
simpletuner train

Option 2 (Git clone method):

simpletuner train

Option 3 (Legacy method - still works):

./train.sh

This will begin the text embed and VAE output caching to disk.

Training tips for Lumina2

Learning rates

LoRA Training

Start with 1e-4 and adjust based on results
Lumina2 trains quickly, so monitor early iterations closely
Ranks 8-32 work well for most use cases, 64-128 may require closer monitoring, and 256-512 may be useful for training new tasks into the model

Full Fine-tuning

Use lower learning rates: 1e-5 to 5e-6
Consider using EMA (Exponential Moving Average) for stability
Gradient clipping (max_grad_norm) of 1.0 is recommended

Resolution considerations

Lumina2 supports flexible resolutions:

Training at 1024x1024 provides best quality
Mixed resolution training (512px, 768px, 1024px) has not been field-tested for quality impact
Aspect ratio bucketing works well with Lumina2

Training duration

Due to Lumina2's efficient 2B parameter size:

LoRA training often converges in 500-2000 steps
Full fine-tuning may need 2000-5000 steps
Monitor validation images frequently as the model trains quickly

Common issues and solutions

Model converging too quickly: Lower the learning rate, switch from Lion optimiser to AdamW
Artifacts in generated images: Ensure high-quality training data and consider reducing learning rate
Out of memory: Enable gradient checkpointing and reduce batch size
Easily overfitting: Use regularisation datasets

Inference tips

Using your trained model

Lumina2 models can be used with:

Diffusers library directly
ComfyUI with appropriate nodes
Other inference frameworks supporting Gemma2-based models

Optimal inference settings

Guidance scale: 4.0-6.0
Inference steps: 20-50
Use the same resolution you trained on for best results

Notes

Advantages of Lumina2

Fast training due to 2B parameter size
Good quality-to-size ratio
Supports various training modes (LoRA, LyCORIS, full)
Efficient memory usage

Current limitations

No ControlNet support yet
Limited to text-to-image generation
Requires caption quality to be high for best results

Memory optimization

Unlike larger models, Lumina2 typically doesn't require:

Model quantization
Extreme memory optimization techniques
Complex mixed precision strategies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lumina2 Quickstart

Hardware requirements

Prerequisites

Container image dependencies

Installation

Setting up the environment

Configuration file

Lumina2 example configuration

Validation prompts

Dataset considerations

Login to WandB

Login to Huggingface Hub

Executing the training run

Training tips for Lumina2

Learning rates

LoRA Training

Full Fine-tuning

Resolution considerations

Training duration

Common issues and solutions

Inference tips

Using your trained model

Optimal inference settings

Notes

Advantages of Lumina2

Current limitations

Memory optimization

FilesExpand file tree

LUMINA2.md

Latest commit

History

LUMINA2.md

File metadata and controls

Lumina2 Quickstart

Hardware requirements

Prerequisites

Container image dependencies

Installation

Setting up the environment

Configuration file

Lumina2 example configuration

Validation prompts

Dataset considerations

Login to WandB

Login to Huggingface Hub

Executing the training run

Training tips for Lumina2

Learning rates

LoRA Training

Full Fine-tuning

Resolution considerations

Training duration

Common issues and solutions

Inference tips

Using your trained model

Optimal inference settings

Notes

Advantages of Lumina2

Current limitations

Memory optimization