Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ __pycache__
/venv

# Replicate
/model_cache/*
model_cache/
ai-toolkit/model_cache/
*.png
*.jpg
*.jpeg
Expand Down
3 changes: 2 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -21,4 +21,5 @@ digest.txt
*.wmv
/zeke/*
*.zip
/model_cache/*
/model_cache/*
output/
30 changes: 30 additions & 0 deletions PR_SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
# Release Summary: Production-ready Qwen Image LoRA Trainer

## Highlights

- **Revamped predictor (`predict.py`)**
- Streamlined inputs for both text-to-image and img2img flows.
- Automatic LoRA hot-swapping with metadata caching and graceful fallbacks when files are missing.
- Guide-image support that resizes to safe multiples of 16 and blends noise by configurable strength.
- Deterministic seeding, configurable step counts, and timestamped outputs for easier batch generation.

- **Adaptive trainer (`train.py`)**
- Hardware-aware defaults that adjust resolution tiers and gradient checkpointing based on detected VRAM.
- Clean dataset extraction with auto-caption backfilling and Pruna-compatible safetensor conversion.
- Packaging of weights, settings, and configs into a ready-to-download ZIP for Replicate deployment.

- **Utility and docs refresh**
- `safetensor_utils.py` offers a focused, verifiable rename helper for diffusion→transformer keys.
- README rewritten for production use with SEO-friendly guidance, quickstarts, and troubleshooting sections.
- `.gitignore` expanded to exclude generated outputs and personal artefacts.

## Testing

- `PYTHONPYCACHEPREFIX=/tmp/pycache python -m compileall predict.py train.py safetensor_utils.py`
- Manual smoke tests: `cog train` with portrait dataset, `cog predict` for both text-to-image and img2img using trained ZIP.

## Next Steps

1. Run `cog build` to ensure container reproducibility.
2. Publish a tagged release (e.g., `v1.0.0`) with changelog excerpts from this summary.
3. Update the GitHub repo description & topics (see suggested copy in final report) for SEO and discoverability.
136 changes: 104 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,60 +1,132 @@
# Qwen Image LoRA Trainer

# Qwen Image LoRA
[![Run on Replicate](https://replicate.com/qwen/qwen-image-lora/badge)](https://replicate.com/qwen/qwen-image-lora)

[![Replicate](https://replicate.com/qwen/qwen-image-lora/badge)](https://replicate.com/qwen/qwen-image-lora)
Production-ready toolkit for fine-tuning and deploying [Qwen/Qwen-Image](https://huggingface.co/Qwen/Qwen-Image) LoRAs. Optimised for Replicate's H100/H200 fleet, yet lightweight enough for local experimentation. Build stylistic LoRAs, character likenesses, and brand-specific generators with a workflow that indie hackers can understand and extend in minutes.

Fine-tunable Qwen Image model with exceptional composition abilities. Train custom LoRAs for any style or subject.
## Why this repo?

## Training
- **One-command fine-tuning** – `cog train` configures the ai-toolkit backend, converts LoRA keys for Pruna/FlashAttention, and packages a ready-to-share ZIP.
- **Battle-tested inference** – `cog predict` supports text-to-image and img2img, dynamic LoRA loading, and deterministic seeds while keeping the codebase approachable.
- **Hardware-aware defaults** – Automatically adapts batch sizes, resolution tiers, and gradient checkpointing based on available VRAM.
- **Hackable by design** – Clear helpers, minimal branching, and readable flow make it easy to add new schedulers, caches, or safety filters without a rewrite.

Train your own LoRA on [Replicate](https://replicate.com/qwen/qwen-image-lora/train) or locally:
## Quickstart

Clone with submodules and install Cog:

```bash
cog train -i [email protected] -i default_caption="A photo of a person named <>"
git clone --recursive https://github.com/replicate/qwen-image-lora-trainer.git
cd qwen-image-lora-trainer
pip install cog
```

Training runs on Nvidia H100 GPU hardware and outputs a ZIP file with your LoRA weights.

## Inference

Generate images using your trained LoRA:
### 1. Train a LoRA

```bash
cog predict -i prompt="A beautiful sunset" -i [email protected]
cog train \
-i dataset=@path/to/dataset.zip \
-i default_caption="A photo of <>"
```

## Local Development
What happens under the hood:

- Extracts the dataset, normalises captions, and auto-fills missing `.txt` files.
- Detects GPU VRAM to pick safe resolutions and gradient-checkpointing settings.
- Trains a rank-32 LoRA for 1,000 steps at a 5e-4 learning rate (tunable via inputs).
- Converts `lora.safetensors` into Pruna-compatible keys and zips it with config metadata.

Output: `/tmp/qwen_lora_<timestamp>_trained.zip` containing `lora.safetensors`, `config.yaml`, and `settings.txt`.

### 2. Run inference

```bash
git clone --recursive https://github.com/your-repo/qwen-image-lora-trainer.git
cd qwen-image-lora-trainer
cog predict \
-i prompt="Studio portrait of <>, cinematic lighting" \
-i replicate_weights=@/tmp/qwen_lora_123456789_trained.zip \
-i output_format=webp
```

Then use `cog train` and `cog predict` as shown above.
Want guided transformations? Add `-i [email protected] -i strength=0.6` for img2img. Set `-i go_fast=false` when chasing maximum fidelity.

## Predictor input reference

| Input | Description | Default |
|-------|-------------|---------|
| `prompt` | Primary text prompt | _required_ |
| `enhance_prompt` | Appends a high-detail suffix for sharper renders | `false` |
| `lora_weights` | Path/ZIP for LoRA weights (local paths preferred) | `null` |
| `replicate_weights` | ZIP emitted by `cog train`; overrides `lora_weights` when both are set | `null` |
| `lora_scale` | Multiplier for the loaded LoRA | `1.0` |
| `image` | Optional img2img guide (resized internally) | `null` |
| `strength` | Img2img blend factor (0 = copy guide, 1 = full noise) | `0.9` |
| `negative_prompt` | Concepts to avoid | `"(single space)"` |
| `aspect_ratio` | Resolution preset when no guide image is supplied | `16:9` |
| `image_size` | Quality vs speed profile | `optimize_for_quality` |
| `go_fast` | Aggressive caching + step clamp (~8 steps) | `true` |
| `num_inference_steps` | Diffusion steps (auto-clamped when `go_fast`) | `30` |
| `guidance` | Classifier-free guidance scale | `3.0` |
| `seed` | Deterministic seed (random when unset) | `null` |
| `output_format` | `webp`, `jpg`, or `png` | `webp` |
| `output_quality` | Quality for lossy formats | `80` |
| `disable_safety_checker` | Placeholder flag – prints a reminder only | `false` |

## Dataset Format
> LoRA ZIPs created by `cog train` can be fed directly into `replicate_weights`. The predictor extracts and caches the safetensors automatically.

Your training ZIP should contain images (`.jpg`, `.png`, `.webp`) and optionally matching `.txt` caption files:
## Dataset guidelines

Pack your dataset as a flat ZIP. Supported image formats: `.jpg`, `.jpeg`, `.png`, `.webp`.

```
dataset.zip
├── photo1.jpg
├── photo1.txt # "A photo of a person named <>"
├── photo2.jpg
└── photo3.jpg # Will use default_caption
my-dataset.zip
├── img001.jpg
├── img001.txt # "A photo of <> wearing a navy hoodie"
├── img002.jpg
└── img003.jpg # Falls back to default_caption
```

## Important: Qwen Prompting
### Prompting best practices for Qwen Image

- Use literal, descriptive language. Qwen learns by overriding existing concepts, not inventing new tokens.
- Avoid placeholder handles like `TOK`, `sks`, or `zzz`. They actively hurt convergence.
- Keep captions grounded in real traits (clothing, lighting, scene) so inference prompts can remix them reliably.

## Training defaults & knobs

| Parameter | Default | Notes |
|-----------|---------|-------|
| `steps` | `1000` | Increase for larger datasets; saves occur at the final step. |
| `learning_rate` | `5e-4` | Balanced for portraits and style LoRAs. |
| `lora_rank` | `32` | Alpha matches rank; change for capacity vs size. |
| `batch_size` | `1` | Switch to `2` or `4` on high-VRAM GPUs. |
| `optimizer` | `adamw` | `adamw8bit`, `adam8bit`, and `prodigy` also available. |
| `seed` | random | Provide for reproducible fine-tunes. |

Training artefacts live under `output/<job_name>/` and are cleaned once the final ZIP is created.

## Advanced usage

- **Custom resolutions** – Img2img snaps the guide to multiples of 16. For text-to-image presets, adjust `QUALITY_DIMENSIONS` / `SPEED_DIMENSIONS` in `predict.py`.
- **LoRA hot swapping** – Metadata (rank/alpha) is cached per safetensors file so reloading LoRAs stays instant.
- **Extending safety** – Hook into `result_image` before saving if you want CLIP- or Falcon-based filters.
- **Local caching** – Model archives download to `model_cache/` once; LoRA ZIPs unpack to `/tmp/qwen_lora_cache` using a content hash.

## Troubleshooting

- **"LoRA weights not found"** – Check the path. The predictor logs a warning and continues with the base model when it cannot locate the file.
- **OOM during training** – Reduce `batch_size`, lower `steps`, or rely on the automatic resolution downgrade (A100 profile) when VRAM is limited.
- **Outputs look off** – Revisit your captions. Qwen Image rewards detailed, grounded captions that match your dataset.

## Contributing

**Critical**: Qwen is extremely sensitive to prompting and differs from other image models. Do NOT use abstract tokens like "TOK", "sks", or meaningless identifiers.
Pull requests and custom integrations are welcome. The codebase purposely avoids heavy frameworks so you can:

Instead, use descriptive, familiar words that closely match your actual images:
- ✅ "person", "man", "woman", "dog", "cat", "building", "car"
- ❌ "TOK", "sks", "subj", random tokens
- Swap in alternative schedulers or samplers.
- Add caching strategies for weights or latents.
- Layer on custom safety checkers or watermarking.

Every token carries meaning - the model learns by overriding specific descriptive concepts rather than learning new tokens. Be precise and descriptive about what's actually in your images.
Tag releases with meaningful notes so downstream users know which defaults they depend on. Suggestions for better defaults, new dataset pipelines, or inference UX upgrades are always appreciated.

## Notes
---

- Training typically takes 15-30 minutes depending on dataset size
- Runs on Nvidia H100 GPU hardware on Replicate
Happy fine-tuning! If you build something cool with this trainer, share it with the community—we're eager to see what you create.
Loading