-
Notifications
You must be signed in to change notification settings - Fork 349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix embeddings with quantized models #601
Conversation
Upd. @stduhpf, please ignore this strange question, I've already checked it myself. 😅 Even more, my patch is probably wrong, because it produces oversaturated image, comparing with your result, which looks more naturally. |
@stduhpf, on second thought, disabling quantization for conditioner has disadvantage: CLIP will consume more memory – depending on quantization, significantly more. But, if it won't complicate things much, @stduhpf, you brought good idea.
|
Well with this PR It's not the whole clip model being forced to f32, "just" the There would be about a hundred megabytes of compute buffer to save with proper support for quantized concat(), compared to forcing f32. But then, the custom embeddings would get quantized too, which might cause significant quality loss... (maybe this could explain why your result is oversaturated?) Edit: just tried your ggml patch, I still get black images when |
@stduhpf, maybe it's because I'm building for ARM? If you want, here is the version I'm currently building from, will it make black images too? I didn't sync SD_TYPE_COUNT and GGML_TYPE_COUNT, therefore weight type must always be specified. It's with your patch, and to temporarily enable quantization, there is an option "--keep-quantization". Build options are in file "build_a55.sh", it makes clean build. But note the CPU options, they're tuned for ARM. |
I was getting normal-looking images with your version of the code, but that was just because of this line: |
@stduhpf, setting keep_quant to false turns the patch on. I see then, it doesn't work, maybe it's just my luck that it works for me... A friend even tried to run it on 3 Gb phone, it barely runs SD1.5 model with q4_0 quantization, closing Google services due to lack of memory. 🙃 I tried without Blas, it works too (and I'm building on Termux, any phone with Android 7+ and 6+ Gb of RAM will be sufficient). Thanks for your work! |
Thank you for your contribution. |
fixes #600