Releases: stevenkuang-tencent/llama.cpp
Releases · stevenkuang-tencent/llama.cpp
b6345
convert : remove redundant code (#15708) Signed-off-by: Jie Fu <[email protected]>
b6098
CANN: add support for ACL Graph (#15065)
* feat(cann): add optional support for ACL Graph execution
This commit adds support for executing ggml computational graphs using
Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be
enabled at compile time using the CMake option:
-DUSE_CANN_GRAPH=ON
By default, ACL graph execution is **disabled**, and the fallback path
uses node-by-node execution.
Key additions:
- CMake option to toggle graph mode
- Graph capture and execution logic using
- Tensor property matching to determine whether graph update is required
- Safe fallback and logging if the environment variable LLAMA_SET_ROWS
is unset or invalid
This prepares the backend for performance improvements in repetitive graph
execution scenarios on Ascend devices.
Signed-off-by: noemotiovon <[email protected]>
* Fix review comments
Signed-off-by: noemotiovon <[email protected]>
* remane USE_CANN_GRAPH to USE_ACL_GRAPH
Signed-off-by: noemotiovon <[email protected]>
* fix typo
Signed-off-by: noemotiovon <[email protected]>
---------
Signed-off-by: noemotiovon <[email protected]>
b5988
mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503) * [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip * Update export-lora.cpp * Update clip.cpp * Update export-lora.cpp * format: use space to replace tab
b5977
docs: add libcurl-dev install hint for Linux distros (#14801) * docs: add libcurl-dev install hint for Linux distros Signed-off-by: PouyaGhahramanian <[email protected]> * Update docs/build.md --------- Signed-off-by: PouyaGhahramanian <[email protected]> Co-authored-by: Xuan-Son Nguyen <[email protected]>
b5952
kleidiai: add support for get_rows (#14676) * kleidiai: add support for get_rows * apply fixes based on code review * apply more fixes based on code review
b5929
graph : refactor context to not pass gf explicitly (#14629) ggml-ci
b5896
ggml : refactor llamafile_sgemm PPC code (#14673) Remove un-necessary templates from class definition and packing functions Reduce deeply nested conditionals, if-else switching in mnapck function Replace repetitive code with inline functions in Packing functions 2 ~ 7% improvement in Q8 Model 15 ~ 50% improvement in Q4 Model Signed-off-by: Shalini Salomi Bodapati <[email protected]>