-
Couldn't load subscription status.
- Fork 13.5k
ggml-cpu: arm64: q4_K repack gemm and gemv implementations (i8mm) #16739
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: Alberto Cabrera <[email protected]>
Signed-off-by: Alberto Cabrera <[email protected]>
Signed-off-by: Alberto Cabrera <[email protected]>
Signed-off-by: Alberto Cabrera <[email protected]>
Signed-off-by: Alberto Cabrera <[email protected]>
| q4sb_scales[i] = vmovl_s8(vld1_s8(aux_q4sb)); | ||
| } | ||
|
|
||
| const uint8_t *q4_base = q4_ptr[b].qs + sb * QK_K; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix a few instances of this code style:
| const uint8_t *q4_base = q4_ptr[b].qs + sb * QK_K; | |
| const uint8_t * q4_base = q4_ptr[b].qs + sb * QK_K; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied clang-format. Sorry about that!
This PR improves q4_k_q8_k gemm and gemv in arm64 using i8mm and vecdot instructions.
Tested on an Apple M4 with Liquid LFM2-1.2B model:
Master build: 8cf6b42 (6824)
This PR: c4f1358
Perplexity remains unchanged (teste current build vs master):
As for test-backend-ops, I've checked the output of the layer tensors manually comparing REPACK vs master, since #16182 is still ongoing.
Any suggestions on how to better test the PR is welcomed.