Skip to content

PR: Refine ggml-hexagon backend(Qualcomm Hexagon NPU backend) for latest ggml,whisper.cpp,llama.cpp #12326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 149 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
149 commits
Select commit Hold shift + click to select a range
4de23b7
ggml-qnn: add Qualcomm QNN backend for GGML
jeffzhou2000 Feb 14, 2025
0403217
ggml-qnn: santiy check
jeffzhou2000 Feb 15, 2025
afaca02
ggml-qnn: update script build-run-android.sh to compare peformance of…
jeffzhou2000 Feb 16, 2025
c827bd0
ggml-qnn: fix minor issue in test-backend-ops.cpp
jeffzhou2000 Feb 17, 2025
4b706a7
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
jeffzhou2000 Feb 18, 2025
f581970
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 18, 2025
cc22a50
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
jeffzhou2000 Feb 19, 2025
5f72aa5
ggml-qnn: remove redundant codes
jeffzhou2000 Feb 20, 2025
dbfee49
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
3996ca1
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
09465a1
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 21, 2025
6678248
ggml-qnn: add Qualcomm QNN backend for GGML
jeffzhou2000 Feb 14, 2025
a5697b3
ggml-qnn: merge QNN RPC feature from https://github.com/zhouwg/kantv/…
jeffzhou2000 Feb 18, 2025
b4012bd
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 18, 2025
2921a56
ggml-qnn: a concise approach to offload mulmat to QNN backend(sync fr…
jeffzhou2000 Feb 19, 2025
3fb44a8
ggml-qnn: remove redundant codes
jeffzhou2000 Feb 20, 2025
1a0377d
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
f2515f3
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 20, 2025
581b73e
ggml-qnn: sync from branch kantvai-ggmlqnn-npurpc
jeffzhou2000 Feb 21, 2025
70746e9
ggml-qnn: fix a minior typo in internal doc
jeffzhou2000 Feb 23, 2025
16a4bea
ggml-qnn: refine function ggml_qnn_create_general_tensor() to avoid c…
jeffzhou2000 Feb 23, 2025
13de3d1
ggml-qnn: fix a minor typo in source code
jeffzhou2000 Feb 24, 2025
566a4f5
build: avoid ggml-qnn backend breaking other backend's builds
jeffzhou2000 Feb 24, 2025
7f660d0
ggml-qnn: remove redundant codes to make PR reviewers happy
jeffzhou2000 Feb 25, 2025
b0d0ba3
ggml-qnn: refine code format
jeffzhou2000 Feb 25, 2025
30581ab
ggml-qnn: offload quantized type mulmat to QNN backend
jeffzhou2000 Feb 26, 2025
0871b50
ggml-qnn: refine source code structure to make code more clearly
jeffzhou2000 Feb 27, 2025
240381e
ggml-qnn: enable release build with necessary logs to make reviewers …
jeffzhou2000 Feb 27, 2025
2457ee8
ggml-qnn: enable all quantize type with 2d mulmat
jeffzhou2000 Feb 27, 2025
20988fd
ggml-qnn: enable log output of GGMLQNN_LOG_INFO in command line mode …
jeffzhou2000 Feb 28, 2025
4091cea
ggml-qnn: Windows port --- step2
jeffzhou2000 Feb 28, 2025
9ef3fae
ggml-qnn: merge UT code and corresponding script from local dev branc…
jeffzhou2000 Mar 2, 2025
6f27d6c
ggml-qnn: merge ggml_qnn_mul_mat_4d from local dev branch to make wor…
jeffzhou2000 Mar 2, 2025
e4505d1
ggml-qnn: submit AI-assisted ggml_qnn_mul_mat_4d(not worked currently…
jeffzhou2000 Mar 2, 2025
4253fed
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step2
jeffzhou2000 Mar 2, 2025
b1f7d93
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step3
jeffzhou2000 Mar 2, 2025
c66cda6
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step4
jeffzhou2000 Mar 2, 2025
bf5547b
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step5
jeffzhou2000 Mar 2, 2025
3651e26
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step6
jeffzhou2000 Mar 2, 2025
ab29b77
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step7
jeffzhou2000 Mar 2, 2025
546844b
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step8
jeffzhou2000 Mar 2, 2025
5cbd2c8
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- good in step9
jeffzhou2000 Mar 2, 2025
7a75a69
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
jeffzhou2000 Mar 2, 2025
2dc48f6
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step10
jeffzhou2000 Mar 2, 2025
e306a8a
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- narrow down t…
jeffzhou2000 Mar 2, 2025
98b958b
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- step11
jeffzhou2000 Mar 2, 2025
b88fed8
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 --- both ok in st…
jeffzhou2000 Mar 2, 2025
384b815
ggml-qnn: AI-assisted ggml_qnn_mul_mat_4d by Grok 3 ---finalizing ver…
jeffzhou2000 Mar 2, 2025
5a5666d
ggml-qnn: refine ggml_qnn_mul_mat and ggml_qnn_general_node according…
jeffzhou2000 Mar 2, 2025
8c18a83
ggml-qnn: remove no-needed comments
jeffzhou2000 Mar 2, 2025
6a018ff
ggml-qnn: Windows port --- step3
jeffzhou2000 Mar 3, 2025
ace495d
ggml-qnn: remove un-needed function
jeffzhou2000 Mar 4, 2025
a5f0ce9
ggml-qnn:rebase to upstream
jeffzhou2000 Mar 4, 2025
13f2407
ggml-qnn: fix a minior issue during rebase to upstream
jeffzhou2000 Mar 4, 2025
239db12
ggml-qnn: update script according to https://github.com/ggml-org/llam…
jeffzhou2000 Mar 4, 2025
9275a9c
ggml-qnn: fix a minior issue in ggmlqnn_create_general_tensor()
jeffzhou2000 Mar 4, 2025
d0cc868
ggml-qnn: active member variable _device_id in class qnn_instance
jeffzhou2000 Mar 4, 2025
86cd2a2
ggml-qnn: refine ggml_qnn_general_node and ggml_qnn_mul_mat to make c…
jeffzhou2000 Mar 4, 2025
6aea261
ggml-qnn: Windows port --- step4
jeffzhou2000 Mar 6, 2025
c1dee7a
ggml-qnn: Windows port -- step5
jeffzhou2000 Mar 7, 2025
ff61e33
ggml-qnn: WoA(Windows on ARM) -- step6
jeffzhou2000 Mar 8, 2025
3b33ff3
ggml-qnn: rebase to upstream
jeffzhou2000 Mar 9, 2025
0fe9f86
ggml-qnn: pr to upstream
jeffzhou2000 Mar 11, 2025
fbd962a
ggml-qnn: rebase to upstream
jeffzhou2000 Mar 18, 2025
0cb7f7d
ggml-qnn: self code-review
jeffzhou2000 Mar 18, 2025
8e47733
ggml-qnn: rebase upstream
jeffzhou2000 Mar 19, 2025
9cf4d7b
ggml-qnn: add approach through Hexagon cDSP
jeffzhou2000 Mar 22, 2025
70948f0
ggml-qnn: refine general approach through Hexagon cDSP
jeffzhou2000 Mar 23, 2025
f84314a
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
jeffzhou2000 Mar 24, 2025
cfe57f6
ggml-qnn: refine the entire ggml-qnn.cpp to make code more clear
jeffzhou2000 Mar 24, 2025
3ebdf4b
ggml-qnn: add build script for libggmlop_skel.so
jeffzhou2000 Mar 24, 2025
65a6046
ggml-qnn: remove redundant functions in this PR and make codes more c…
jeffzhou2000 Mar 25, 2025
c7b4e58
ggml-qnn: original ggml_compute_forward_add and ggml_compute_forward_…
jeffzhou2000 Mar 25, 2025
a98a24a
ggml-qnn: modify build-run-android.sh to verify mulmat and validate m…
jeffzhou2000 Mar 25, 2025
51c5292
ggml-qnn: make host code(ggml-qnn.cpp) more clear and more stable
jeffzhou2000 Mar 26, 2025
4280ce6
ggml-qnn: refine code according to self code-review and make code mor…
jeffzhou2000 Mar 26, 2025
dea2790
ggml-qnn: offload more ggml op to Hexagon cDSP
jeffzhou2000 Mar 27, 2025
2bc5b9d
ggml-hexagon: code on AP(arm-cpu) side is stable now
jeffzhou2000 Mar 28, 2025
f78c995
ggml-hexagon: optimize GGML_OP_ADD on cDSP side
jeffzhou2000 Mar 28, 2025
8cea7af
ggml-hexagon: simplify hexagon-kernel build logic in CMakeLists.txt
jeffzhou2000 Mar 29, 2025
e11c748
ggml-hexagon: release ggml-hexagon v0.98
jeffzhou2000 Mar 29, 2025
48d30d9
ggml-hexagon: release ggml-hexagon v0.99
jeffzhou2000 Mar 29, 2025
b85b348
ggml-hexagon: try to offload q6_k mulmat to cDSP
jeffzhou2000 Mar 29, 2025
1b9fa84
ggml-hexagon: fix minior issue in ggml-hexagon.cpp after self code-re…
jeffzhou2000 Mar 29, 2025
e847e58
ggml-hexagon: check validation of ggml-hexagon.cfg before create appr…
jeffzhou2000 Mar 30, 2025
7357641
ggml-hexagon: fix all compiler warnings in ggml-hexagon.cpp
jeffzhou2000 Mar 30, 2025
dcfc33a
ggml-hexagon: enable only one backend device for HWACCEL_CDSP and ena…
jeffzhou2000 Mar 31, 2025
07f4a7e
ggml-hexagon: rpc ion memory pool and test-backend-ops works fine in …
jeffzhou2000 Mar 31, 2025
ae55221
ggml-hexagon: make comprision of mulmat performance between HWACCEL_Q…
jeffzhou2000 Mar 31, 2025
da21d65
ggml-hexagon: release ggml-hexagon v1.00
jeffzhou2000 Mar 31, 2025
4de775b
ggml-hexagon: rebase to upstream
jeffzhou2000 Apr 1, 2025
78589d0
ggml-hexagon: check configuration of enable_rpc_dma_mempool in functi…
jeffzhou2000 Apr 1, 2025
eff3de4
ggml-hexagon: uniform rpc_ion_memsize and rpc_ion_usage between HWACC…
jeffzhou2000 Apr 1, 2025
b7b780c
ggml-hexagon: make buffer mechanism more clear in HWACCEL_CDSP approach
jeffzhou2000 Apr 1, 2025
791771e
ggml-hexagon: add perf function in hexagon kernerls on cDSP side
jeffzhou2000 Apr 2, 2025
1c3628b
ggml-hexagon: fix a stupid issue of why set rpc latency failure and i…
jeffzhou2000 Apr 2, 2025
ba5e26e
ggml-hexagon: make helper function ggmlhexagon_get_timestring() threa…
jeffzhou2000 Apr 2, 2025
dc3fef8
ggml-hexagon: fix a typo in ggml-hexagon.cpp
jeffzhou2000 Apr 2, 2025
8b652c7
ggml-hexagon: list all known todo and fixme tasks in ggml-hexagon.cpp
jeffzhou2000 Apr 2, 2025
05c7521
ggml-hexagon: fix units MB -> MiB
jeffzhou2000 Apr 2, 2025
1c12ad0
ggml-hexagon: try to make ggml-hexagon backend works fine in a standa…
jeffzhou2000 Apr 3, 2025
b936d27
ggml-hexagon: remove reduament code and make debug log more clear
jeffzhou2000 Apr 3, 2025
0d82f2c
ggml-hexagon: add gemma-3-4b-it-Q8_0.gguf to verify q8_0 mulmat on cDSP
jeffzhou2000 Apr 3, 2025
73e7733
ggml-hexagon:add skeleton code of offload GGML_OP_SOFT_MAX/GGML_OP_RM…
jeffzhou2000 Apr 3, 2025
9f1b22e
ggml-hexagon: release ggml-dsp v0.60 on cDSP side
jeffzhou2000 Apr 4, 2025
3634b02
ggml-hexagon: merge build logic in kernels/Makefile to ggml-hexagon/C…
jeffzhou2000 Apr 5, 2025
aefdb3c
ggml-hexagon: fix a typo in ggml-hexagon.cpp
jeffzhou2000 Apr 5, 2025
b0a8e96
ggml-hexagon: uniform NDEBUG usage in ggml-hexagon.cpp and ggml-dsp.c
jeffzhou2000 Apr 6, 2025
f3c91c0
ggml-hexagon: add profiler feature for purpose of visualize NPU perfo…
jeffzhou2000 Apr 7, 2025
b3a1312
ggml-hexagon: remove so-called dma memory pool to avoid confusion and…
jeffzhou2000 Apr 8, 2025
ee81ee4
ggml-hexagon: make function ggmlhexagon_init_rpcmempool in ggml-hexag…
jeffzhou2000 Apr 8, 2025
42b1c6f
ggml-hexagon: fix potential resource leak in class hexagon_profiler
jeffzhou2000 Apr 8, 2025
b308084
ggml-hexagon: enable multi-threading feature on cDSP side
jeffzhou2000 Apr 8, 2025
63effdf
ggml-hexagon: upgrade QNN SDK to v2.33.0.250327
jeffzhou2000 Apr 9, 2025
6114a0e
ggml-hexagon: fix typo in ggml-hexagon.cpp
jeffzhou2000 Apr 9, 2025
a417702
ggml-dsp: probe QuRT RTOS information in function ggmlop_dsp_open
jeffzhou2000 Apr 9, 2025
a6371ee
ggml-hexagon: setting enable_rpc_ion_mempool to 1 and make test-backe…
jeffzhou2000 Apr 10, 2025
9af188b
ggml-hexagon: check whether user's specified htp arch is valid in CMa…
jeffzhou2000 Apr 10, 2025
eaaf19e
ggml-hexagon: sync with upstream
jeffzhou2000 Apr 11, 2025
0685571
ggml-hexagon: refine pinned-memory feature
jeffzhou2000 Apr 11, 2025
9fe6434
ggml-hexagon: refine build system in ggml-hexagon
jeffzhou2000 Apr 11, 2025
abba6f2
ggml-hexagon: remove redundant code in struct ggml_backend_hexagon_bu…
jeffzhou2000 Apr 11, 2025
c66333a
ggml-hexagon: upgrade Android NDK to android-ndk-r28
jeffzhou2000 Apr 11, 2025
754fb54
ggml-dsp: split ggml-dsp.c into multiple files and cleanup
jeffzhou2000 Apr 11, 2025
8d26b68
ggml-dsp: refine ggml-dsp and make ggml-dsp more clear
jeffzhou2000 Apr 12, 2025
0d2fb27
ggml-hexagon: fix a minior issue in dev ops
jeffzhou2000 Apr 12, 2025
5b1caf3
ggml-hexagon: fix a build issue in CI
jeffzhou2000 Apr 12, 2025
f4a68ff
ggml-dsp: cleanup code
jeffzhou2000 Apr 15, 2025
65c7fe5
ggml-hexagon: sync with upstream
jeffzhou2000 Apr 15, 2025
a1e66bc
ggml-dsp: cleanup code
jeffzhou2000 Apr 16, 2025
878165d
ggml-dsp:refine ggmlhexagon_dsp_add_f32
jeffzhou2000 Apr 16, 2025
93d8683
ggml-dsp: refine logic of thread_counts
jeffzhou2000 Apr 17, 2025
efda38a
ggml-hexagon: release v1.06 and ready for code review
jeffzhou2000 Apr 17, 2025
6ded507
ggml-dsp: make GGML_OP_ADD more faster on cDSP side
jeffzhou2000 Apr 19, 2025
919c870
ggml-hexagon: sync from project kantv(make ggml-hexagon backend can w…
jeffzhou2000 Apr 24, 2025
8fb4d15
sync with upstream llama.cpp and sync ggml-hexagon.cpp from project k…
jeffzhou2000 Apr 29, 2025
811515c
sync with upstream
jeffzhou2000 May 7, 2025
3019fb2
sync with upstream
jeffzhou2000 May 10, 2025
8063847
ggml-hexagon: upgrade QNN SDK to v2.34.0.250424
jeffzhou2000 May 11, 2025
c809ab8
sync with upstream
jeffzhou2000 May 16, 2025
ae160d1
ggml-hexagon: sync from project kantv(fix a long-term issue which int…
jeffzhou2000 May 17, 2025
3f7f607
ggml-hexagon: sync with upstream llama.cpp
jeffzhou2000 May 23, 2025
3509526
ggml-hexagon: add set_hexagon_cfg(int new_hexagon_backend, int new_hw…
jeffzhou2000 Jun 3, 2025
62792f5
ggml-hexagon: sync with branch self-build
jeffzhou2000 Jun 19, 2025
8465fe3
ggml-hexagon:sycn with branch self-build
jeffzhou2000 Jun 23, 2025
d02cf1d
project: sync with upstream(PR-14501:remove kompute backend)
jeffzhou2000 Jul 3, 2025
7c07a78
ggml:fix minior issue during rebase upstream PR-14501: remove kompute…
jeffzhou2000 Jul 4, 2025
657ccee
ggml-hexagon: minimum viable PR
jeffzhou2000 Jul 7, 2025
b32ba64
ggml-hexagon: sync with self-build branch
jeffzhou2000 Jul 10, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -146,3 +146,5 @@ poetry.toml
# Local scripts
/run-vim.sh
/run-chat.sh

/prebuilts
25 changes: 25 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,30 @@ set(CMAKE_WARN_UNUSED_CLI YES)

set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

if(CMAKE_SYSTEM_NAME STREQUAL "Android")
set(CMAKE_VERBOSE_MAKEFILE ON)
if(DEFINED HTP_ARCH_VERSION)
if (${HTP_ARCH_VERSION} STREQUAL "v75" OR ${HTP_ARCH_VERSION} STREQUAL "v79")
#works fine on Snapdragon 8Gen3 & 8Elite
set(OPT_FLAG " -O3 -march=armv8.7-a+dotprod+fp16+i8mm -mcpu=cortex-x1 -mtune=cortex-x1 -ffp-model=fast -fno-finite-math-only")
else()
#should be works fine with mainstream mobile SoC
set(OPT_FLAG " -O3 -march=armv8.2-a+dotprod+fp16 -ffp-model=fast -fno-finite-math-only")
endif()
else()
#should be works fine with mainstream mobile SoC
set(OPT_FLAG " -O3 -march=armv8.2-a+dotprod+fp16 -ffp-model=fast -fno-finite-math-only")
endif()

message("OPT_FLAG:${OPT_FLAG}")
#ensure the same toolchain optimization for ggml-opencl, ggml-vulkan, ggml-hexagon on Android phone
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} ${DEBUG_FLAG} ${OPT_FLAG}")

endif()

if (NOT XCODE AND NOT MSVC AND NOT CMAKE_BUILD_TYPE)
set(CMAKE_BUILD_TYPE Release CACHE STRING "Build type" FORCE)
set_property(CACHE CMAKE_BUILD_TYPE PROPERTY STRINGS "Debug" "Release" "MinSizeRel" "RelWithDebInfo")
Expand Down Expand Up @@ -127,6 +151,7 @@ llama_option_depr(WARNING LLAMA_RPC GGML_RPC)
llama_option_depr(WARNING LLAMA_SYCL GGML_SYCL)
llama_option_depr(WARNING LLAMA_SYCL_F16 GGML_SYCL_F16)
llama_option_depr(WARNING LLAMA_CANN GGML_CANN)
llama_option_depr(WARNING LLAMA_HEXAGON GGML_HEXAGON)

if (NOT MSVC)
if (LLAMA_SANITIZE_THREAD)
Expand Down
2 changes: 2 additions & 0 deletions ggml/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,7 @@ option(GGML_OPENCL_EMBED_KERNELS "ggml: embed kernels"
option(GGML_OPENCL_USE_ADRENO_KERNELS "ggml: use optimized kernels for Adreno" ON)
set (GGML_OPENCL_TARGET_VERSION "300" CACHE STRING
"gmml: OpenCL API version to target")
option(GGML_HEXAGON "ggml: use HEXAGON" OFF)

# toolchain for vulkan-shaders-gen
set (GGML_VULKAN_SHADERS_GEN_TOOLCHAIN "" CACHE FILEPATH "ggml: toolchain file for vulkan-shaders-gen")
Expand Down Expand Up @@ -270,6 +271,7 @@ set(GGML_PUBLIC_HEADERS
include/ggml-rpc.h
include/ggml-sycl.h
include/ggml-vulkan.h
include/ggml-hexagon.h
include/gguf.h)

set_target_properties(ggml PROPERTIES PUBLIC_HEADER "${GGML_PUBLIC_HEADERS}")
Expand Down
48 changes: 48 additions & 0 deletions ggml/include/ggml-hexagon.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
#pragma once

#include "ggml.h"
#include "ggml-backend.h"

#ifdef __cplusplus
extern "C" {
#endif

#define GGML_HEXAGON_MAX_DEVICES 4
#define GGML_HEXAGON_BACKEND_NAME "hexagon"

enum HEXAGONBackend {
HEXAGON_BACKEND_QNNCPU = 0,
HEXAGON_BACKEND_QNNGPU = 1,
HEXAGON_BACKEND_QNNNPU = 2,
HEXAGON_BACKEND_CDSP = 3,
HEXAGON_BACKEND_GGML = 4, //"fake" HEXAGON backend for compare performance between HEXAGON backend and ggml backend
};

//0: general approach through QNN:offload ggmlop to QNN(QNNCPU, QNNGPU, QNNNPU)
//1: special approach through QNN-SINGLEGRAPH:mapping entire ggml cgraph to a single QNN graph
//2: general approach through Hexagon cDSP:offload ggmlop to Hexagon cDSP directly
enum hwaccel_approach_type {
HWACCEL_QNN = 0,
HWACCEL_QNN_SINGLEGRAPH= 1,
HWACCEL_CDSP = 2,
};

GGML_BACKEND_API ggml_backend_t ggml_backend_hexagon_init(size_t dev_num, const char * qnn_lib_path);

GGML_BACKEND_API bool ggml_backend_is_hexagon(ggml_backend_t backend);

GGML_BACKEND_API int ggml_backend_hexagon_get_device_count(void);

GGML_BACKEND_API ggml_backend_reg_t ggml_backend_hexagon_reg(void);

GGML_BACKEND_API const char * ggml_backend_hexagon_get_devname(size_t dev_num);

GGML_BACKEND_API void ggml_backend_hexagon_set_cfg(int new_hexagon_backend, int new_hwaccel_approach);

GGML_BACKEND_API int ggml_backend_hexagon_get_mulmat_algotype(void);

GGML_BACKEND_API void ggml_backend_hexagon_set_mulmat_algotype(int new_mulmat_algotype);

#ifdef __cplusplus
}
#endif
1 change: 1 addition & 0 deletions ggml/src/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -371,6 +371,7 @@ ggml_add_backend(RPC)
ggml_add_backend(SYCL)
ggml_add_backend(Vulkan)
ggml_add_backend(OpenCL)
ggml_add_backend(HEXAGON)

foreach (target ggml-base ggml)
target_include_directories(${target} PUBLIC $<BUILD_INTERFACE:${CMAKE_CURRENT_SOURCE_DIR}/../include> $<INSTALL_INTERFACE:include>)
Expand Down
8 changes: 8 additions & 0 deletions ggml/src/ggml-backend-reg.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,10 @@
#include "ggml-cann.h"
#endif

#ifdef GGML_USE_HEXAGON
#include "ggml-hexagon.h"
#endif

// disable C++17 deprecation warning for std::codecvt_utf8
#if defined(__clang__)
# pragma clang diagnostic push
Expand Down Expand Up @@ -185,6 +189,9 @@ struct ggml_backend_registry {
#ifdef GGML_USE_RPC
register_backend(ggml_backend_rpc_reg());
#endif
#ifdef GGML_USE_HEXAGON
register_backend(ggml_backend_hexagon_reg());
#endif
#ifdef GGML_USE_CPU
register_backend(ggml_backend_cpu_reg());
#endif
Expand Down Expand Up @@ -574,6 +581,7 @@ void ggml_backend_load_all_from_path(const char * dir_path) {
ggml_backend_load_best("vulkan", silent, dir_path);
ggml_backend_load_best("opencl", silent, dir_path);
ggml_backend_load_best("musa", silent, dir_path);
ggml_backend_load_best("hexagon", silent, dir_path);
ggml_backend_load_best("cpu", silent, dir_path);
// check the environment variable GGML_BACKEND_PATH to load an out-of-tree backend
const char * backend_path = std::getenv("GGML_BACKEND_PATH");
Expand Down
128 changes: 128 additions & 0 deletions ggml/src/ggml-hexagon/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
project(ggml-hexagon)
message(STATUS "Using HEXAGON backend")
message("CMAKE_SYSTEM_NAME : ${CMAKE_SYSTEM_NAME}")

set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)

if(NOT DEFINED QNN_SDK_PATH)
message(FATAL_ERROR "QNN_SDK_PATH not defined")
endif()

if(NOT DEFINED HEXAGON_SDK_PATH)
message(FATAL_ERROR "HEXAGON_SDK_PATH not defined")
endif()

message("QNN_SDK_PATH : ${QNN_SDK_PATH}")
message("HEXAGON_SDK_PATH: ${HEXAGON_SDK_PATH}")
message("HTP_ARCH_VERSION: ${HTP_ARCH_VERSION}")

if (CMAKE_BUILD_TYPE STREQUAL "Debug")
set(DEBUG_FLAG "-DDEBUG -Wall")
message("Debug mode:${DEBUG_FLAG}")
else()
set(DEBUG_FLAG "-DNDEBUG -Wall")
#manually disable all verbose logs in ggml-hexagon/CMakeLists.txt to
#make compare NPU performance through llama-bench more clear
#set(DEBUG_FLAG "-DNDEBUG -Wall -DDISABLE_ALL_LOG")
message("Release mode:${DEBUG_FLAG}")
endif()

#v68 --- Snapdragon 888
#v69 --- Snapdragon 8 Gen1
#v73 --- Snapdragon 8 Gen2
#v75 --- Snapdragon 8 Gen3
#v79 --- Snapdragon 8 Elite
if(NOT DEFINED HTP_ARCH_VERSION)
message(FATAL_ERROR "HTP_ARCH_VERSION not defined, valid htp arch: v68,v69,v73,v75,v79")
endif()

#check whether user's specified htp arch is valid
set(CHECK_HTP_ARCH "WRONG")
foreach (feat v68 v69 v73 v75 v79)
if (${feat} STREQUAL ${HTP_ARCH_VERSION})
set(CHECK_HTP_ARCH "GOOD")
endif()
endforeach()
if (${CHECK_HTP_ARCH} STREQUAL "WRONG")
message(FATAL_ERROR "ggml-hexagon backend only support htp arch v68,v69,v73,v75,v79")
endif()

#check optimization flags
message("OPT_FLAG:${OPT_FLAG}")

if(CMAKE_SYSTEM_NAME STREQUAL "Android")
find_library(LOG_LIB log)

add_library(cdsprpc
SHARED
IMPORTED)
set_target_properties(cdsprpc
PROPERTIES
IMPORTED_LOCATION
${HEXAGON_SDK_PATH}/ipc/fastrpc/remote/ship/android_aarch64/libcdsprpc.so)

set(QNN_LINK_LIBRARIES ${LOG_LIB} cdsprpc)
set(QNN_DEFAULT_LIB_SEARCH_PATH "/data/local/tmp/" CACHE STRING "customized library search path for QNN backend")

include_directories(${HEXAGON_SDK_PATH}/incs)
include_directories(${HEXAGON_SDK_PATH}/incs/stddef)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/incs)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/rpcmem/inc)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/remote/ship/android_Debug_aarch64)
include_directories(${HEXAGON_SDK_PATH}/utils/examples)
include_directories(${HEXAGON_SDK_PATH}/ipc/fastrpc/rtld/ship/android_aarch64)
include_directories(${HEXAGON_SDK_PATH}/libs/atomic/inc)
include_directories(${HEXAGON_SDK_PATH}/libs/atomic/android_Debug_aarch64/ship)
include_directories(${CMAKE_SOURCE_DIR}/ggml/src/ggml-hexagon/)
include_directories(${CMAKE_SOURCE_DIR}/ggml/src/ggml-hexagon/kernels/)
elseif(CMAKE_SYSTEM_NAME STREQUAL "Windows")
set(QNN_DEFAULT_LIB_SEARCH_PATH "C:\\" CACHE STRING "customized library search path for QNN backend")
else()
message(FATAL_ERROR "ggml-hexagon now only available on Android and Windows(Windows on ARM)")
endif()

set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_C_FLAGS_RELEASE "${CMAKE_C_FLAGS_RELEASE} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -DGGML_USE_HEXAGON ${DEBUG_FLAG} ${OPT_FLAG}")

file(GLOB HEXAGON_SOURCES "${CMAKE_CURRENT_LIST_DIR}/*.cpp" "${CMAKE_CURRENT_LIST_DIR}/kernels/stub.c")
ggml_add_backend_library(ggml-hexagon ${HEXAGON_SOURCES})

target_include_directories(ggml-hexagon PRIVATE ${QNN_SDK_PATH}/include/QNN ${HEXAGON_SDK_PATH} ${CMAKE_CURRENT_LIST_DIR})
target_link_libraries(ggml-hexagon PRIVATE ${QNN_LINK_LIBRARIES})

string(REGEX REPLACE "/$" "" QNN_DEFAULT_LIB_SEARCH_PATH "${QNN_DEFAULT_LIB_SEARCH_PATH}")
target_compile_definitions(ggml-hexagon PRIVATE QNN_DEFAULT_LIB_SEARCH_PATH="${QNN_DEFAULT_LIB_SEARCH_PATH}/")

#cross compiling source codes of hexagon kernels which running on cDSP side
function(ggml_hexagon_build_kernel KNAME)
message(STATUS "ggml_hexagon: build hexagon-kernel ${KNAME}")

add_custom_command(
TARGET ${PROJECT_NAME}
POST_BUILD
COMMAND echo "current working path:`pwd`\n"
COMMAND echo "${CMAKE_CURRENT_LIST_DIR}/kernels"
COMMAND make -C ${CMAKE_CURRENT_LIST_DIR}/kernels/ clean
COMMAND make -C ${CMAKE_CURRENT_LIST_DIR}/kernels/ HEXAGON_SDK_PATH=${HEXAGON_SDK_PATH} HTP_ARCH_VERSION=${HTP_ARCH_VERSION} DEBUG_FLAG=${DEBUG_FLAG}
COMMAND echo "current working path:`pwd`\n"
COMMAND ls -l ../../../bin/libggmldsp-skel.so
COMMENT "build hexagon-kernel"
)
endfunction()

function(ggml_hexagon_setup_cfg KNAME)
message(STATUS "ggml_hexagon: setup runtime configuration file ${KNAME}")
add_custom_command(
TARGET ${PROJECT_NAME}
POST_BUILD
COMMAND echo "current working path:`pwd`\n"
COMMAND /bin/cp -fv ../../../../../scripts/${KNAME} ../../../bin/
COMMENT "setup runtime configuration file"
)
endfunction()

ggml_hexagon_build_kernel("cdsp")
ggml_hexagon_setup_cfg("ggml-hexagon.cfg")
Loading
Loading