Releases · stevenkuang-tencent/llama.cpp

01 Sep 16:56

4b20d8b

b6345 Latest

Latest

convert : remove redundant code (#15708)

Signed-off-by: Jie Fu <[email protected]>

Assets 15

cudart-llama-bin-win-cuda-12.4-x64.zip

sha256:8c79a9b226de4b3cacfd1f83d24f962d0773be79f1e7b75c6af4ded7e32ae1d6

373 MB 2025-09-01T16:56:55Z
llama-b6345-bin-macos-arm64.zip

sha256:81ea84a4c3b0ac58d6b8e98319167ab1faf8351b9b2045ca1cb62235cbb6a40b

11 MB 2025-09-01T16:57:05Z
llama-b6345-bin-macos-x64.zip

sha256:9a127cb382b791c8a5229c3971a638fa24ef2db04ad7589f70ee9f79cfdc89f3

28.4 MB 2025-09-01T16:57:06Z
llama-b6345-bin-ubuntu-vulkan-x64.zip

sha256:453ef8da3abb4848317d0be35d86edf2bcb08b75274e4145021a43e92513bc56

25.8 MB 2025-09-01T16:57:07Z
llama-b6345-bin-ubuntu-x64.zip

sha256:6f2926c5e0f9dac10cb5c3be94a384919085b8da06292b9a4576610e8b6002c8

13 MB 2025-09-01T16:57:08Z
llama-b6345-bin-win-cpu-arm64.zip

sha256:c6e7d442182cf21a644be9f49876bc3d7fb50650ed6e780b7ee2f789048b28ab

11.2 MB 2025-09-01T16:57:09Z
llama-b6345-bin-win-cpu-x64.zip

sha256:df1d45fc634aea500657507796a4f8ac4bdb161f164c74ac5f1be7bcfd0dc01a

14.2 MB 2025-09-01T16:57:10Z
llama-b6345-bin-win-cuda-12.4-x64.zip

sha256:01f87e221964493366782a755534c5d5541d7b0e6f764a50e8ad4c86da7f27ba

138 MB 2025-09-01T16:57:11Z
llama-b6345-bin-win-hip-radeon-x64.zip

sha256:c1d9128e057f88695a2f6041f5bff65cc4041fcc03d572cdb9f51b2229661571

287 MB 2025-09-01T16:57:16Z
llama-b6345-bin-win-opencl-adreno-arm64.zip

sha256:beba9d71dab52185d761329a65d54b4ad37cb0ab2e5a820b64bc94c51839ce54

11.6 MB 2025-09-01T16:57:24Z
Source code (zip)

2025-09-01T15:53:31Z
Source code (tar.gz)

2025-09-01T15:53:31Z

06 Aug 08:05

github-actions

b6098

2241453

b6098

CANN: add support for ACL Graph (#15065)

* feat(cann): add optional support for ACL Graph execution

This commit adds support for executing ggml computational graphs using
Huawei's ACL graph mode via the USE_CANN_GRAPH flag. The support can be
enabled at compile time using the CMake option:

    -DUSE_CANN_GRAPH=ON

By default, ACL graph execution is **disabled**, and the fallback path
uses node-by-node execution.

Key additions:
- CMake option  to toggle graph mode
- Graph capture and execution logic using
- Tensor property matching to determine whether graph update is required
- Safe fallback and logging if the environment variable LLAMA_SET_ROWS
  is unset or invalid

This prepares the backend for performance improvements in repetitive graph
execution scenarios on Ascend devices.

Signed-off-by: noemotiovon <[email protected]>

* Fix review comments

Signed-off-by: noemotiovon <[email protected]>

* remane USE_CANN_GRAPH to USE_ACL_GRAPH

Signed-off-by: noemotiovon <[email protected]>

* fix typo

Signed-off-by: noemotiovon <[email protected]>

---------

Signed-off-by: noemotiovon <[email protected]>

Assets 15

25 Jul 11:31

github-actions

b5988

749e0d2

b5988

mtmd : fix 32-bit narrowing issue in export-lora and mtmd clip (#14503)

* [fix] Fix 32-bit narrowing issue in export-lora and mtmd clip

* Update export-lora.cpp

* Update clip.cpp

* Update export-lora.cpp

* format: use space to replace tab

Assets 15

24 Jul 10:55

github-actions

b5977

39cffdf

b5977

docs: add libcurl-dev install hint for Linux distros (#14801)

* docs: add libcurl-dev install hint for Linux distros

Signed-off-by: PouyaGhahramanian <[email protected]>

* Update docs/build.md

---------

Signed-off-by: PouyaGhahramanian <[email protected]>
Co-authored-by: Xuan-Son Nguyen <[email protected]>

Assets 15

21 Jul 16:23

github-actions

b5952

9220426

b5952

kleidiai: add support for get_rows (#14676)

* kleidiai: add support for get_rows

* apply fixes based on code review

* apply more fixes based on code review

Assets 15

18 Jul 06:32

github-actions

b5929

8f974bc

b5929

graph : refactor context to not pass gf explicitly (#14629)

ggml-ci

Assets 15

14 Jul 15:18

github-actions

b5896

55c509d

b5896

ggml : refactor llamafile_sgemm PPC code (#14673)

Remove un-necessary templates from class definition and packing functions
Reduce deeply nested conditionals, if-else switching in mnapck function
Replace repetitive code with inline functions in Packing functions

2 ~ 7% improvement in Q8 Model
15 ~ 50% improvement in Q4 Model

Signed-off-by: Shalini Salomi Bodapati <[email protected]>

Assets 15

Releases: stevenkuang-tencent/llama.cpp

b6345

Uh oh!

b6098

Uh oh!

b5988

Uh oh!

b5977

Uh oh!

b5952

Uh oh!

b5929

Uh oh!

b5896

Uh oh!