In this example, we'll be training a Lumina2 LoRA or full model fine-tune.
Lumina2 is a 2B parameter model, making it much more accessible than larger models like Flux or SD3. The model's smaller size means:
When training a rank-16 LoRA, it uses:
- Approximately 12-14GB VRAM for LoRA training
- Approximately 16-20GB VRAM for full model fine-tuning
- About 20-30GB of system RAM during startup
You'll need:
- Minimum: A single RTX 3060 12GB or RTX 4060 Ti 16GB
- Recommended: RTX 3090, RTX 4090, or A100 for faster training
- System RAM: At least 32GB recommended
Make sure that you have python installed; SimpleTuner does well with 3.10 through 3.12.
You can check this by running:
python --versionIf you don't have python 3.12 installed on Ubuntu, you can try the following:
apt -y install python3.12 python3.12-venvFor Vast, RunPod, and TensorDock (among others), the following will work on a CUDA 12.2-12.8 image:
apt -y install nvidia-cuda-toolkitInstall SimpleTuner via pip:
pip install simpletuner[cuda]For manual installation or development setup, see the installation documentation.
To run SimpleTuner, you will need to set up a configuration file, the dataset and model directories, and a dataloader configuration file.
Copy config/config.json.example to config/config.json:
cp config/config.json.example config/config.jsonThere, you will need to modify the following variables:
model_type- Set this tolorafor LoRA training orfullfor full fine-tuning.model_family- Set this tolumina2.output_dir- Set this to the directory where you want to store your checkpoints and validation images. It's recommended to use a full path here.train_batch_size- Can be 1-4 depending on your GPU memory and dataset size.validation_resolution- Lumina2 supports multiple resolutions. Common options:1024x1024,512x512,768x768.validation_guidance- Lumina2 uses classifier-free guidance. Values of 3.5-7.0 work well.validation_num_inference_steps- 20-30 steps work well for Lumina2.gradient_accumulation_steps- Can be used to simulate larger batch sizes. A value of 2-4 works well.optimizer-adamw_bf16is recommended.lionandoptimi-stableadamwalso work well.mixed_precision- Keep this asbf16for best results.gradient_checkpointing- Set totrueto save VRAM.learning_rate- For LoRA:1e-4to5e-5. For full fine-tuning:1e-5to1e-6.
This goes into config.json
{
"base_model_precision": "int8-torchao",
"checkpoint_step_interval": 50,
"data_backend_config": "config/lumina2/multidatabackend.json",
"disable_bucket_pruning": true,
"eval_steps_interval": 50,
"evaluation_type": "clip",
"flow_schedule_auto_shift": true,
"gradient_checkpointing": true,
"hub_model_id": "lumina2-lora",
"learning_rate": 1e-4,
"lora_alpha": 16,
"lora_rank": 16,
"lora_type": "standard",
"lr_scheduler": "constant",
"max_train_steps": 400000,
"model_family": "lumina2",
"model_type": "lora",
"num_train_epochs": 0,
"optimizer": "adamw_bf16",
"output_dir": "output/lumina2",
"push_checkpoints_to_hub": true,
"push_to_hub": true,
"quantize_via": "cpu",
"report_to": "wandb",
"seed": 42,
"tracker_project_name": "lumina2-training",
"tracker_run_name": "lumina2-lora",
"train_batch_size": 4,
"use_ema": true,
"vae_batch_size": 1,
"validation_disable_unconditional": true,
"validation_guidance": 4.0,
"validation_guidance_rescale": 0.0,
"validation_negative_prompt": "ugly, cropped, blurry, low-quality, mediocre average",
"validation_num_inference_steps": 40,
"validation_prompt": "A photo-realistic image of a cat",
"validation_prompt_library": false,
"validation_resolution": "1024x1024",
"validation_seed": 42,
"validation_step_interval": 50
}For Lycoris training, switch lora_type to lycoris
Inside config/config.json is the "primary validation prompt". Additionally, create a prompt library file:
{
"portrait": "a high-quality portrait photograph with natural lighting",
"landscape": "a breathtaking landscape photograph with dramatic lighting",
"artistic": "an artistic rendering with vibrant colors and creative composition",
"detailed": "a highly detailed image with sharp focus and rich textures",
"stylized": "a stylized illustration with unique artistic flair"
}Add to your config:
{
"--user_prompt_library": "config/user_prompt_library.json"
}Lumina2 benefits from high-quality training data. Create a --data_backend_config (config/multidatabackend.json):
[
{
"id": "lumina2-training",
"type": "local",
"crop": true,
"crop_aspect": "square",
"crop_style": "center",
"resolution": 1024,
"minimum_image_size": 512,
"maximum_image_size": 2048,
"target_downsample_size": 1024,
"resolution_type": "pixel_area",
"cache_dir_vae": "cache/vae/lumina2/training",
"instance_data_dir": "/datasets/training",
"caption_strategy": "textfile",
"metadata_backend": "discovery"
},
{
"id": "text-embeds",
"type": "local",
"dataset_type": "text_embeds",
"default": true,
"cache_dir": "cache/text/lumina2",
"disabled": false,
"write_batch_size": 128
}
]Create your dataset directory. Be sure to update this path with your actual location.
mkdir -p /datasets/training
# Place your images and caption files in /datasets/training/Caption files should have the same name as the image with a .txt extension.
SimpleTuner has optional tracker support, primarily focused on Weights & Biases. You can disable this with report_to=none.
To enable wandb, run the following commands:
wandb loginTo push checkpoints to Huggingface Hub, ensure
huggingface-cli loginFrom the SimpleTuner directory, you have several options to start training:
Option 1 (Recommended - pip install):
pip install simpletuner[cuda]
simpletuner trainOption 2 (Git clone method):
simpletuner trainOption 3 (Legacy method - still works):
./train.shThis will begin the text embed and VAE output caching to disk.
- Start with
1e-4and adjust based on results - Lumina2 trains quickly, so monitor early iterations closely
- Ranks 8-32 work well for most use cases, 64-128 may require closer monitoring, and 256-512 may be useful for training new tasks into the model
- Use lower learning rates:
1e-5to5e-6 - Consider using EMA (Exponential Moving Average) for stability
- Gradient clipping (
max_grad_norm) of 1.0 is recommended
Lumina2 supports flexible resolutions:
- Training at 1024x1024 provides best quality
- Mixed resolution training (512px, 768px, 1024px) has not been field-tested for quality impact
- Aspect ratio bucketing works well with Lumina2
Due to Lumina2's efficient 2B parameter size:
- LoRA training often converges in 500-2000 steps
- Full fine-tuning may need 2000-5000 steps
- Monitor validation images frequently as the model trains quickly
- Model converging too quickly: Lower the learning rate, switch from Lion optimiser to AdamW
- Artifacts in generated images: Ensure high-quality training data and consider reducing learning rate
- Out of memory: Enable gradient checkpointing and reduce batch size
- Easily overfitting: Use regularisation datasets
Lumina2 models can be used with:
- Diffusers library directly
- ComfyUI with appropriate nodes
- Other inference frameworks supporting Gemma2-based models
- Guidance scale: 4.0-6.0
- Inference steps: 20-50
- Use the same resolution you trained on for best results
- Fast training due to 2B parameter size
- Good quality-to-size ratio
- Supports various training modes (LoRA, LyCORIS, full)
- Efficient memory usage
- No ControlNet support yet
- Limited to text-to-image generation
- Requires caption quality to be high for best results
Unlike larger models, Lumina2 typically doesn't require:
- Model quantization
- Extreme memory optimization techniques
- Complex mixed precision strategies