You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm developing a Flutter wrapper around the sdcpp implementation and have been testing the performance. While I'm seeing a 1.15-1.2x speedup on Android compared to the original sdcpp repo, I've noticed an unexpected pattern in memory usage where the Winograd implementation shows higher UNet compute buffer sizes compared to im2col, which seems counterintuitive given that im2col typically requires more memory for intermediate matrices. Typically like here
Here are some results I have compiled during my testings :
Stable Diffusion 1.5 (SD1.5)
Resolution
Winograd UNET Buffer Size
im2col UNET Buffer Size
Difference (Winograd - im2col)
512x512
611.79 MB
559.71 MB
+52.08 MB
384x384
244.16 MB
192.08 MB
+52.08 MB
256x256
100.41 MB
49.43 MB
+50.98 MB
Stable Diffusion XL (SDXL)
Resolution
Winograd UNET Buffer Size
im2col UNET Buffer Size
Difference (Winograd - im2col)
1024x1024
864.29 MB
830.19 MB
+34.10 MB
512x512
156.29 MB
131.85 MB
+24.44 MB
384x384
113.32 MB
95.47 MB
+17.85 MB
256x256
96.29 MB
60.31 MB
+35.98 MB
I also tried compiling with OpenCL, as you mention in your report that it is supported on Android. However, I encountered crashes and compilation errors. I would appreciate it if you could confirm its support and include a guide in the README
EDIT :
For anyone wondering how to build CLBlast on Termux (note: it has worse performance than CPU-only execution since only some operations are offloaded to the GPU, so it's generally not worth it):
You should now have a fully working CLBlast binary. I got my Mali GPU working out of the box, but Adreno GPUs might require additional steps. See here: termux/termux-packages#16852
The text was updated successfully, but these errors were encountered:
I'm developing a Flutter wrapper around the sdcpp implementation and have been testing the performance. While I'm seeing a 1.15-1.2x speedup on Android compared to the original sdcpp repo, I've noticed an unexpected pattern in memory usage where the Winograd implementation shows higher UNet compute buffer sizes compared to im2col, which seems counterintuitive given that im2col typically requires more memory for intermediate matrices. Typically like here
Here are some results I have compiled during my testings :
Stable Diffusion 1.5 (SD1.5)
Stable Diffusion XL (SDXL)
I also tried compiling with OpenCL, as you mention in your report that it is supported on Android. However, I encountered crashes and compilation errors. I would appreciate it if you could confirm its support and include a guide in the READMEEDIT :
For anyone wondering how to build CLBlast on Termux (note: it has worse performance than CPU-only execution since only some operations are offloaded to the GPU, so it's generally not worth it):
Install dependencies and libraries :
pkg update
pkg upgrade
pkg install git cmake clblast opencl-headers binutils termux-elf-cleaner ocl-icd clinfo mesa opencl-clhpp opencl-vendor-driver
Note : binutils-is-llvm can do the work just as binutils
build sdcpp
fixing alignment
You should now have a fully working CLBlast binary. I got my Mali GPU working out of the box, but Adreno GPUs might require additional steps. See here: termux/termux-packages#16852
The text was updated successfully, but these errors were encountered: