An enhanced fork of the
mzbac/zimage.swiftproject.
Native Swift + MLX implementation of the Tongyi-MAI/Z-Image model family for Apple Silicon.
The repo ships:
ZImage: a Swift library for macOS and iOS targetsZImageCLI: a macOS CLI for text-to-image, ControlNet, inpainting, and quantization workflowsZImageServe: a macOS staging daemon/client for queued local generation requests
The practical goal is to run Z-Image locally without a Python runtime while still supporting the model-loading patterns people actually use: Hugging Face snapshots, local Diffusers-style folders, quantized directories, LoRA adapters, and text-to-image AIO / transformer-only .safetensors files.
- Text-to-image generation with the Z-Image diffusion transformer and Flow Match scheduler
- ControlNet conditioning and inpainting via
ZImageCLI control - LoRA and LoKr adapters on both generation pipelines and CLI paths
- Optional prompt enhancement on both generation pipelines and CLI paths through the Qwen text encoder generation flow
- 4-bit and 8-bit quantization for base-model and ControlNet directories
- Hugging Face cache reuse, local Diffusers-style directories, and text-to-image AIO / transformer-only
.safetensors
The default CLI model is Tongyi-MAI/Z-Image-Turbo.
Known Tongyi-MAI ids get model-aware defaults:
- Turbo:
1024x1024,9steps, guidance0.0 - Base:
1024x1024,50steps, guidance4.0
Known ids, inspectable local or cached snapshots, and common Z-Image-style aliases get model-aware defaults. Completely unrecognized models still fall back to Turbo-compatible defaults unless you set --steps and --guidance explicitly.
Z-Image-Turbo
Note:
- generated with
--negative-prompt "卡通,油画质感,低分辨率,塑料材质,光滑" - generated with the default Turbo settings:
--steps 9 --guidance 0.0
Z-Image (Base)
Note:
- generated with
--negative-prompt "卡通,油画质感,低分辨率,塑料材质,光滑" - generated with the default Base settings:
--steps 50 --guidance 4.0
Note:
- generated with
--negative-prompt "卡通,油画质感,低分辨率,塑料材质,光滑" - generated with the Distill LoRA's recommended recipe:
--steps 8 --guidance 1.0 --lora-scale 0.8
Note:
- generated with
--negative-prompt "卡通,油画质感,低分辨率,毛绒材质,塑料材质,光滑"and the Turbo default setting:--steps 9 - generated with the adapter's recommended recipe:
--guidance 1.0
Note:
- generated with
--negative-prompt "卡通,油画质感,低分辨率,塑料材质,光滑"and--control-scale 0.75 - Turbo ControlNet:
- ControlNet weights:
alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1 - Control file:
Z-Image-Turbo-Fun-Controlnet-Union-2.1-2602-8steps.safetensors
- ControlNet weights:
- Distill ControlNet:
- ControlNet weights:
alibaba-pai/Z-Image-Fun-Controlnet-Union-2.1 - Control file:
Z-Image-Fun-Lora-Distill-8-Steps-2603.safetensors
- ControlNet weights:
- Apple Silicon Mac
- macOS 14.0+
- Xcode 16.x (recommended; required for
./scripts/build.sh) - Metal toolchain component (
metal/metallib);./scripts/build.shwill attemptxcodebuild -downloadComponent MetalToolchainif it is missing - Network access on first run unless the weights are already cached locally
./scripts/build.sh./scripts/verify_fast.shNote: SwiftPM-built executables (swift build / swift run) need mlx.metallib next to the binary. See docs/DEVELOPMENT.md for the SwiftPM workflow (./scripts/build_mlx_metallib.sh).
cd .build/xcode/Build/Products/Release
./ZImageCLI --help
./ZImageServe --help
./ZImageCLI -p "a studio photo of a red apple on black velvet" -o output.pngThe first run downloads the default snapshot into the Hugging Face cache.
Turbo defaults:
./ZImageCLI -p "a neon-lit alley in the rain" -o turbo.pngBase model:
./ZImageCLI \
-m Tongyi-MAI/Z-Image \
-p "a black tiger in a bamboo forest" \
-o base.pngText-to-image LoRA:
./ZImageCLI \
-p "a lion painted like a children's book illustration" \
--lora ostris/z_image_turbo_childrens_drawings \
--lora-file adapter.safetensors \
--lora-scale 1.0 \
-o lora.pngWhen a LoRA repo or local directory contains multiple .safetensors files, --lora-file is required so the adapter selection stays deterministic.
If a cached Distill snapshot contains exactly one .safetensors file, omitting --lora-file is now allowed; genuinely ambiguous sources still fail closed.
For the validated Distill adapter path Z-Image-Fun-Lora-Distill-8-Steps-2603.safetensors, the CLI now auto-applies the upstream --steps 8 --guidance 1.0 --lora-scale 0.8 recipe when those flags are omitted, and warns when it does so.
ControlNet:
./ZImageCLI control \
--prompt "a dancer on a stage" \
--control-image /path/to/pose.jpg \
--controlnet-weights alibaba-pai/Z-Image-Turbo-Fun-Controlnet-Union-2.1 \
--control-file Z-Image-Turbo-Fun-Controlnet-Union-2.1-2602-8steps.safetensors \
--steps 8 \
--output control.pngWhen a ControlNet repo or local directory contains multiple .safetensors files, --control-file is required. If a cached snapshot contains exactly one .safetensors file, omitting --control-file is allowed. The current Z-Image Fun Base support target is the full Union 2.1 file Z-Image-Fun-Controlnet-Union-2.1.safetensors; upstream Lite and Tile filenames are rejected explicitly for now.
Staging daemon:
./ZImageServe serve --residency-policy adaptive --warm-model mzbac/z-image-turbo-8bit
./ZImageServe -p "a neon-lit alley in the rain" -o staged.png
./ZImageServe status
./ZImageServe shutdownStructured staged submission:
./ZImageServe batch jobs.json
./ZImageServe markdown prompts.mdZImageServe reuses the normal generation flags for ad hoc requests, prints the accepted job id for cancellation, exposes status, cancel, and shutdown for daemon operations, and keeps JSON/markdown ingestion on the client side so the socket protocol stays canonical. Markdown ingestion accepts single fenced bash/sh/zsh invocations for direct ZImageCLI or ZImageServe commands, including explicit relative or absolute executable paths. Command substitutions are resolved when each markdown item starts, while wrappers, shell control operators, and other shell expansion syntax remain rejected.
ZImageCLI control also accepts --lora, --lora-file, --lora-scale, --enhance, and --enhance-max-tokens.
Quantize a local base-model directory:
./ZImageCLI quantize \
--input models/z-image-turbo \
--output models/z-image-turbo-q8 \
--bits 8 \
--group-size 32The library surface is pipeline-first:
ZImageGenerationRequest+ZImagePipelineZImageControlGenerationRequest+ZImageControlPipeline
The code map for those entry points lives in docs/ARCHITECTURE.md.
Common CLI knobs:
--model/-m: text-to-image accepts a Hugging Face repo id, local Diffusers-style directory, or local.safetensors; the control path expects a standard snapshot or directory--width/-W,--height/-H: output size; values must be at least64and divisible by16--steps/-s: literal denoising iterations / transformer forwards- the scheduler keeps one extra terminal sigma internally, so
8steps means8transformer calls and9sigma values - this repo and Diffusers both treat
steps/num_inference_stepsas the literal denoising-iteration count - the upstream plain Turbo examples use
9, while the upstream Distill and Fun ControlNet8stepsexamples use8; those are artifact-specific recipes, not different meanings of the flag
- the scheduler keeps one extra terminal sigma internally, so
--guidance/-g: CFG scale--cfg-normalization: clamp CFG output norm back to the positive-branch norm--cfg-truncation: disable CFG once the normalized denoising timestep passes the given threshold--weights-variant: preferfp16orbf16component files when the snapshot ships multiple variants--force-transformer-override-only: text-to-image only; skip AIO auto-detection for a local.safetensors--cache-limit: MLX GPU cache limit in MB--max-sequence-length: prompt token limit for text encoding
Validation errors now exit non-zero and print the relevant command usage.
Environment variables:
HF_HUB_CACHEorHF_HOME: override the Hugging Face cache rootHF_TOKEN: authenticate for gated or private Hugging Face reposHF_ENDPOINT: override the Hugging Face API host
The detailed behavior for cache lookup, local-path handling, AIO checkpoints, quantization manifests, and ControlNet weight loading lives in docs/MODELS_AND_WEIGHTS.md.
- Model-aware defaults cover known ids, inspectable local or cached snapshots, and common Z-Image-style aliases. Completely unrecognized models still need explicit
--stepsand--guidanceif you do not want the Turbo-compatible preset. - Text-to-image supports local AIO / transformer-only
.safetensors; the control path currently expects a standard model snapshot or local directory instead. - Third-party LoRA cards can recommend different sampling settings. The CLI does not parse adapter metadata into presets.
- First-time downloads are large, and higher-resolution runs still stress unified memory.
- The CLI is macOS-only. The package declares an iOS library target, but the repo does not ship a first-party sample app.
- docs/README.md: docs index and task-based reading order
- docs/CLI.md: CLI commands, flags, and examples
- docs/MODELS_AND_WEIGHTS.md: model ids, local paths, cache lookup, AIO checkpoints, quantization
- docs/ARCHITECTURE.md: runtime layout, entry points, and source-of-truth files
- docs/DEVELOPMENT.md: build, test, CI, packaging, and validation workflows
- The original
mzbac/zimage.swiftrepo for the initial implementation and reference point - The
Tongyi-MAI/Z-ImageandTongyi-MAI/Z-Image-Turboteams for the models and reference outputs - The MLX team for the Swift bindings and runtime work that made the port practical
MIT License















