@@ -22,7 +22,7 @@ Inference of Stable Diffusion and Flux in pure C/C++
2222- Accelerated memory-efficient CPU inference
2323 - Only requires ~ 2.3GB when using txt2img with fp16 precision to generate a 512x512 image, enabling Flash Attention just requires ~ 1.8GB.
2424- AVX, AVX2 and AVX512 support for x86 architectures
25- - Full CUDA, Metal, Vulkan and SYCL backend for GPU acceleration.
25+ - Full CUDA, Metal, Vulkan, OpenCL and SYCL backend for GPU acceleration.
2626- Can load ckpt, safetensors and diffusers models/checkpoints. Standalone VAEs models
2727 - No need to convert to ` .ggml ` or ` .gguf ` anymore!
2828- Flash Attention for memory usage optimization
@@ -160,6 +160,73 @@ cmake .. -DSD_VULKAN=ON
160160cmake --build . --config Release
161161```
162162
163+ ##### Using OpenCL (for Adreno GPU)
164+
165+ Currently, it supports only Adreno GPUs and is primarily optimized for Q4_0 type
166+
167+ To build for Windows ARM please refers to [ Windows 11 Arm64
168+ ] ( https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/OPENCL.md#windows-11-arm64 )
169+
170+ Building for Android:
171+
172+ Android NDK:
173+ Download and install the Android NDK from the [ official Android developer site] ( https://developer.android.com/ndk/downloads ) .
174+
175+ Setup OpenCL Dependencies for NDK:
176+
177+ You need to provide OpenCL headers and the ICD loader library to your NDK sysroot.
178+
179+ * OpenCL Headers:
180+ ``` bash
181+ # In a temporary working directory
182+ git clone https://github.com/KhronosGroup/OpenCL-Headers
183+ cd OpenCL-Headers
184+ # Replace <YOUR_NDK_PATH> with your actual NDK installation path
185+ # e.g., cp -r CL /path/to/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include
186+ sudo cp -r CL < YOUR_NDK_PATH> /toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include
187+ cd ..
188+ ```
189+
190+ * OpenCL ICD Loader:
191+ ` ` ` bash
192+ # In the same temporary working directory
193+ git clone https://github.com/KhronosGroup/OpenCL-ICD-Loader
194+ cd OpenCL-ICD-Loader
195+ mkdir build_ndk && cd build_ndk
196+
197+ # Replace <YOUR_NDK_PATH> in the CMAKE_TOOLCHAIN_FILE and OPENCL_ICD_LOADER_HEADERS_DIR
198+ cmake .. -G Ninja -DCMAKE_BUILD_TYPE=Release \
199+ -DCMAKE_TOOLCHAIN_FILE=< YOUR_NDK_PATH> /build/cmake/android.toolchain.cmake \
200+ -DOPENCL_ICD_LOADER_HEADERS_DIR=< YOUR_NDK_PATH> /toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/include \
201+ -DANDROID_ABI=arm64-v8a \
202+ -DANDROID_PLATFORM=24 \
203+ -DANDROID_STL=c++_shared
204+
205+ ninja
206+ # Replace <YOUR_NDK_PATH>
207+ # e.g., cp libOpenCL.so /path/to/android-ndk-r26c/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android
208+ sudo cp libOpenCL.so < YOUR_NDK_PATH> /toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android
209+ cd ../..
210+ ` ` `
211+
212+ Build ` stable-diffusion.cpp` for Android with OpenCL:
213+
214+ ` ` ` bash
215+ mkdir build-android && cd build-android
216+
217+ # Replace <YOUR_NDK_PATH> with your actual NDK installation path
218+ # e.g., -DCMAKE_TOOLCHAIN_FILE=/path/to/android-ndk-r26c/build/cmake/android.toolchain.cmake
219+ cmake .. -G Ninja \
220+ -DCMAKE_TOOLCHAIN_FILE=< YOUR_NDK_PATH> /build/cmake/android.toolchain.cmake \
221+ -DANDROID_ABI=arm64-v8a \
222+ -DANDROID_PLATFORM=android-28 \
223+ -DGGML_OPENMP=OFF \
224+ -DSD_OPENCL=ON
225+
226+ ninja
227+ ` ` `
228+ * (Note: Don' t forget to include `LD_LIBRARY_PATH=/vendor/lib64` in your command line before running the binary)*
229+
163230##### Using SYCL
164231
165232Using SYCL makes the computation run on the Intel GPU. Please make sure you have installed the related driver and [Intel® oneAPI Base toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html) before start. More details and steps can refer to [llama.cpp SYCL backend](https://github.com/ggerganov/llama.cpp/blob/master/docs/backend/SYCL.md#linux).
0 commit comments