replicate · zsxkib · Aug 20, 2025 · Aug 20, 2025 · Aug 20, 2025 · Nov 8, 2025
diff --git a/.dockerignore b/.dockerignore
@@ -17,7 +17,8 @@ __pycache__
 /venv
 
 # Replicate
-/model_cache/*
+model_cache/
+ai-toolkit/model_cache/
 *.png
 *.jpg
 *.jpeg

diff --git a/.gitignore b/.gitignore
@@ -21,4 +21,5 @@ digest.txt
 *.wmv
 /zeke/*
 *.zip
-/model_cache/*
+/model_cache/*
+output/
diff --git a/PR_SUMMARY.md b/PR_SUMMARY.md
@@ -0,0 +1,30 @@
+# Release Summary: Production-ready Qwen Image LoRA Trainer
+
+## Highlights
+
+- **Revamped predictor (`predict.py`)**
+  - Streamlined inputs for both text-to-image and img2img flows.
+  - Automatic LoRA hot-swapping with metadata caching and graceful fallbacks when files are missing.
+  - Guide-image support that resizes to safe multiples of 16 and blends noise by configurable strength.
+  - Deterministic seeding, configurable step counts, and timestamped outputs for easier batch generation.
+
+- **Adaptive trainer (`train.py`)**
+  - Hardware-aware defaults that adjust resolution tiers and gradient checkpointing based on detected VRAM.
+  - Clean dataset extraction with auto-caption backfilling and Pruna-compatible safetensor conversion.
+  - Packaging of weights, settings, and configs into a ready-to-download ZIP for Replicate deployment.
+
+- **Utility and docs refresh**
+  - `safetensor_utils.py` offers a focused, verifiable rename helper for diffusion→transformer keys.
+  - README rewritten for production use with SEO-friendly guidance, quickstarts, and troubleshooting sections.
+  - `.gitignore` expanded to exclude generated outputs and personal artefacts.
+
+## Testing
+
+- `PYTHONPYCACHEPREFIX=/tmp/pycache python -m compileall predict.py train.py safetensor_utils.py`
+- Manual smoke tests: `cog train` with portrait dataset, `cog predict` for both text-to-image and img2img using trained ZIP.
+
+## Next Steps
+
+1. Run `cog build` to ensure container reproducibility.
+2. Publish a tagged release (e.g., `v1.0.0`) with changelog excerpts from this summary.
+3. Update the GitHub repo description & topics (see suggested copy in final report) for SEO and discoverability.
diff --git a/README.md b/README.md
@@ -1,60 +1,132 @@
+# Qwen Image LoRA Trainer
 
-# Qwen Image LoRA
+[![Run on Replicate](https://replicate.com/qwen/qwen-image-lora/badge)](https://replicate.com/qwen/qwen-image-lora)
 
-[![Replicate](https://replicate.com/qwen/qwen-image-lora/badge)](https://replicate.com/qwen/qwen-image-lora)
+Production-ready toolkit for fine-tuning and deploying [Qwen/Qwen-Image](https://huggingface.co/Qwen/Qwen-Image) LoRAs. Optimised for Replicate's H100/H200 fleet, yet lightweight enough for local experimentation. Build stylistic LoRAs, character likenesses, and brand-specific generators with a workflow that indie hackers can understand and extend in minutes.
 
-Fine-tunable Qwen Image model with exceptional composition abilities. Train custom LoRAs for any style or subject.
+## Why this repo?
 
-## Training
+- **One-command fine-tuning** – `cog train` configures the ai-toolkit backend, converts LoRA keys for Pruna/FlashAttention, and packages a ready-to-share ZIP.
+- **Battle-tested inference** – `cog predict` supports text-to-image and img2img, dynamic LoRA loading, and deterministic seeds while keeping the codebase approachable.
+- **Hardware-aware defaults** – Automatically adapts batch sizes, resolution tiers, and gradient checkpointing based on available VRAM.
+- **Hackable by design** – Clear helpers, minimal branching, and readable flow make it easy to add new schedulers, caches, or safety filters without a rewrite.
 
-Train your own LoRA on [Replicate](https://replicate.com/qwen/qwen-image-lora/train) or locally:
+## Quickstart
+
+Clone with submodules and install Cog:
 
 ```bash
-cog train -i [email protected] -i default_caption="A photo of a person named <>"
+git clone --recursive https://github.com/replicate/qwen-image-lora-trainer.git
+cd qwen-image-lora-trainer
+pip install cog
 ```
 
-Training runs on Nvidia H100 GPU hardware and outputs a ZIP file with your LoRA weights.
-
-## Inference
-
-Generate images using your trained LoRA:
+### 1. Train a LoRA
 
 ```bash
-cog predict -i prompt="A beautiful sunset" -i [email protected]
+cog train \
+  -i dataset=@path/to/dataset.zip \
+  -i default_caption="A photo of <>"
 ```
 
-## Local Development
+What happens under the hood:
+
+- Extracts the dataset, normalises captions, and auto-fills missing `.txt` files.
+- Detects GPU VRAM to pick safe resolutions and gradient-checkpointing settings.
+- Trains a rank-32 LoRA for 1,000 steps at a 5e-4 learning rate (tunable via inputs).
+- Converts `lora.safetensors` into Pruna-compatible keys and zips it with config metadata.
+
+Output: `/tmp/qwen_lora_<timestamp>_trained.zip` containing `lora.safetensors`, `config.yaml`, and `settings.txt`.
+
+### 2. Run inference
 
 ```bash
-git clone --recursive https://github.com/your-repo/qwen-image-lora-trainer.git
-cd qwen-image-lora-trainer
+cog predict \
+  -i prompt="Studio portrait of <>, cinematic lighting" \
+  -i replicate_weights=@/tmp/qwen_lora_123456789_trained.zip \
+  -i output_format=webp
 ```
 
-Then use `cog train` and `cog predict` as shown above.
+Want guided transformations? Add `-i [email protected] -i strength=0.6` for img2img. Set `-i go_fast=false` when chasing maximum fidelity.
+
+## Predictor input reference
+
+| Input | Description | Default |
+|-------|-------------|---------|
+| `prompt` | Primary text prompt | _required_ |
+| `enhance_prompt` | Appends a high-detail suffix for sharper renders | `false` |
+| `lora_weights` | Path/ZIP for LoRA weights (local paths preferred) | `null` |
+| `replicate_weights` | ZIP emitted by `cog train`; overrides `lora_weights` when both are set | `null` |
+| `lora_scale` | Multiplier for the loaded LoRA | `1.0` |
+| `image` | Optional img2img guide (resized internally) | `null` |
+| `strength` | Img2img blend factor (0 = copy guide, 1 = full noise) | `0.9` |
+| `negative_prompt` | Concepts to avoid | `"(single space)"` |
+| `aspect_ratio` | Resolution preset when no guide image is supplied | `16:9` |
+| `image_size` | Quality vs speed profile | `optimize_for_quality` |
+| `go_fast` | Aggressive caching + step clamp (~8 steps) | `true` |
+| `num_inference_steps` | Diffusion steps (auto-clamped when `go_fast`) | `30` |
+| `guidance` | Classifier-free guidance scale | `3.0` |
+| `seed` | Deterministic seed (random when unset) | `null` |
+| `output_format` | `webp`, `jpg`, or `png` | `webp` |
+| `output_quality` | Quality for lossy formats | `80` |
+| `disable_safety_checker` | Placeholder flag – prints a reminder only | `false` |
 
-## Dataset Format
+> LoRA ZIPs created by `cog train` can be fed directly into `replicate_weights`. The predictor extracts and caches the safetensors automatically.
 
-Your training ZIP should contain images (`.jpg`, `.png`, `.webp`) and optionally matching `.txt` caption files:
+## Dataset guidelines
+
+Pack your dataset as a flat ZIP. Supported image formats: `.jpg`, `.jpeg`, `.png`, `.webp`.
 
 ```
-dataset.zip
-├── photo1.jpg
-├── photo1.txt        # "A photo of a person named <>"
-├── photo2.jpg
-└── photo3.jpg        # Will use default_caption
+my-dataset.zip
+├── img001.jpg
+├── img001.txt   # "A photo of <> wearing a navy hoodie"
+├── img002.jpg
+└── img003.jpg   # Falls back to default_caption
 ```
 
-## Important: Qwen Prompting
+### Prompting best practices for Qwen Image
+
+- Use literal, descriptive language. Qwen learns by overriding existing concepts, not inventing new tokens.
+- Avoid placeholder handles like `TOK`, `sks`, or `zzz`. They actively hurt convergence.
+- Keep captions grounded in real traits (clothing, lighting, scene) so inference prompts can remix them reliably.
+
+## Training defaults & knobs
+
+| Parameter | Default | Notes |
+|-----------|---------|-------|
+| `steps` | `1000` | Increase for larger datasets; saves occur at the final step. |
+| `learning_rate` | `5e-4` | Balanced for portraits and style LoRAs. |
+| `lora_rank` | `32` | Alpha matches rank; change for capacity vs size. |
+| `batch_size` | `1` | Switch to `2` or `4` on high-VRAM GPUs. |
+| `optimizer` | `adamw` | `adamw8bit`, `adam8bit`, and `prodigy` also available. |
+| `seed` | random | Provide for reproducible fine-tunes. |
+
+Training artefacts live under `output/<job_name>/` and are cleaned once the final ZIP is created.
+
+## Advanced usage
+
+- **Custom resolutions** – Img2img snaps the guide to multiples of 16. For text-to-image presets, adjust `QUALITY_DIMENSIONS` / `SPEED_DIMENSIONS` in `predict.py`.
+- **LoRA hot swapping** – Metadata (rank/alpha) is cached per safetensors file so reloading LoRAs stays instant.
+- **Extending safety** – Hook into `result_image` before saving if you want CLIP- or Falcon-based filters.
+- **Local caching** – Model archives download to `model_cache/` once; LoRA ZIPs unpack to `/tmp/qwen_lora_cache` using a content hash.
+
+## Troubleshooting
+
+- **"LoRA weights not found"** – Check the path. The predictor logs a warning and continues with the base model when it cannot locate the file.
+- **OOM during training** – Reduce `batch_size`, lower `steps`, or rely on the automatic resolution downgrade (A100 profile) when VRAM is limited.
+- **Outputs look off** – Revisit your captions. Qwen Image rewards detailed, grounded captions that match your dataset.
+
+## Contributing
 
-**Critical**: Qwen is extremely sensitive to prompting and differs from other image models. Do NOT use abstract tokens like "TOK", "sks", or meaningless identifiers. 
+Pull requests and custom integrations are welcome. The codebase purposely avoids heavy frameworks so you can:
 
-Instead, use descriptive, familiar words that closely match your actual images:
-- ✅ "person", "man", "woman", "dog", "cat", "building", "car"  
-- ❌ "TOK", "sks", "subj", random tokens
+- Swap in alternative schedulers or samplers.
+- Add caching strategies for weights or latents.
+- Layer on custom safety checkers or watermarking.
 
-Every token carries meaning - the model learns by overriding specific descriptive concepts rather than learning new tokens. Be precise and descriptive about what's actually in your images.
+Tag releases with meaningful notes so downstream users know which defaults they depend on. Suggestions for better defaults, new dataset pipelines, or inference UX upgrades are always appreciated.
 
-## Notes
+---
 
-- Training typically takes 15-30 minutes depending on dataset size
-- Runs on Nvidia H100 GPU hardware on Replicate
+Happy fine-tuning! If you build something cool with this trainer, share it with the community—we're eager to see what you create.
-Original file line number
+Diff line change
@@ Expand Up / @@ -21,4 +21,5 @@ digest.txt @@
     *.wmv
     /zeke/*
     *.zip
-    /model_cache/*
+    /model_cache/*
+    output/