Leverages Qwen 3.5/3/2.5 VL for prompt inversion & caption generation in ComfyUI
❌ 插件不自动下载模型,可复用 ComfyOrg 提供的 qwen_2.5_vl_7b.safetensors,也可手动下载其它Qwen VL模型。
❌ This plugin does not auto-download models. It can reuse qwen_2.5_vl_7b.safetensors provided by ComfyOrg, or manually download other Qwen VL models.
- Qwen XX VL Caption: image/video prompt inversion
Qwen XX VL Caption:图片/视频提示词反推 - Qwen XX VL Batch Caption: Batch image prompt inversion (folder input)
Qwen XX VL Batch Caption:目录批量图片提示词反推 - Ovis 2.5 Run: Run Ovis 2.5 model
Ovis 2.5 Run:运行 Ovis 2.5 模型 - ASID_Caption: Run ASID Captioner model
ASID_Caption:运行 ASID Captioner 模型
a. Via ComfyUI Manager
通过 ComfyUI Manager 安装
b. Manual install:
手动安装:
- Copy the plugin folder to
ComfyUI/custom_nodes/
复制插件目录至ComfyUI/custom_nodes/ - Update dependency:
transformers>=4.57.0(>=5.2.0 for Qwen3.5)
更新依赖:transformers>=4.57.0(Qwen3.5需>=5.2.0)
- Download the model
下载模型 - Edit prompt templates (optional)
编辑指令提示词(可选) - Adjust node inputs
调整节点输入参数 - Click "Run"
点击运行
- 模型读取路径:ComfyUI 的
text_encoders目录(需手动放置已下载模型)。
Model path: ComfyUI'stext_encodersfolder (place downloaded models manually).
To reuse qwen_2.5_vl_7b.safetensors:
复用 qwen_2.5_vl_7b.safetensors 步骤:
- Create a FOLDER in ComfyUI/models/text_encoders
在ComfyUI/models/text_encoders中创建一个文件夹 - Rename the model file to
model.safetensorsand move it into the FOLDER
将模型文件重命名为model.safetensors并移入创建的文件夹 - Add required config files (from Qwen 2.5 VL's official Hugging Face repo)
添加必要配置文件(取自 Qwen 2.5 VL 官方 Hugging Face 仓库) https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
✅ No extra disk usage – model remains usable for ComfyUI's Qwen Image/Edit model.
✅ 无额外硬盘消耗,不影响原模型用于 ComfyUI 的 Qwen Image/Edit模型。
Download Qwen 2.5/3 VL official repo from Hugging Face, then place it in text_encoders.
从 Hugging Face 下载 Qwen 2.5/3 VL 官方仓库,直接放入 text_encoders 目录即可。
https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-8B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-4B-Instruct
国内也可从网盘下载:https://pan.quark.cn/s/b3975e789c3c
Support Ovis 2.5 models
新支持 Ovis 2.5 模型
https://huggingface.co/AIDC-AI/Ovis2.5-2B
https://huggingface.co/AIDC-AI/Ovis2.5-9B
Support ASID Captioner models
新支持 ASID Captioner 模型
https://huggingface.co/AudioVisual-Caption/ASID-Captioner-3B
https://huggingface.co/AudioVisual-Caption/ASID-Captioner-7B
Now you can input instruction directly, or
现在可以直接输入指令,或者
Edit prompts.txt in the custom_nodes folder (follow the existing format):
修改插件目录下的 prompts.txt 文件(参考原有格式):
- Support multiple prompts
支持多条提示词 - The nodes will use the last prompt matching the language
自动读取对应语言的最后一条提示词
| 显存 (VRAM) | 推荐精度 (Recommended Precision) |
|---|---|
| 6-8GB | Qwen 2.5 VL 7B (4bit) / Qwen 3 VL 8B (4bit) / Qwen 3 VL 4B (8bit) |
| 10-16GB | Qwen 2.5 VL 7B (8bit) / Qwen 3 VL 8B (8bit) / Qwen 3 VL 4B (bf16) |
| 16GB+ | bf16 (full precision) |
- Pre-scales the image's longer side to this size
预缩放图片长边尺寸 - Larger values may reduce processing speed
设置过大会导致速度下降
- Use True to Keep model in VRAM for consecutive prompt inversion tasks
连续进行提示词反推时选 True - False won't impact performance during batch node run
批量节点选False仅在全部图片处理完成后清理模型,不影响过程性能
- Attempt to unload all models via ComfyUI model management before loading, to avoid VRAM-related loading failures.
加载新模型前,尝试通过ComfyUI的model management卸载所有模型,以避免因剩余显存不足导致的加载失败。
- will use image_path to save output if save_path not set
save_path为空时会使用image_path保存输出