Bug: KV quantization fails when using vulkan #9551

jmars · 2024-09-19T13:03:47Z

What happened?

I'm trying to run a model with a large context (128k) split across 2 GPU (A770 and P40) but running with -c 131072 -ctk q4_0 -ctv q4_0 complains that "V cache quantization requires flash_attn". If I then add -fa then it complains " ggml-backend.c:1174: pre-allocated tensor in a back end that cannot run the operation".

Is this possible to support? If so, what would be required? Does someone (possibly myself) just need to implement FA for the vulkan backend?

Name and Version

Version: 3758 (3c7989f)

What operating system are you seeing the problem on?

Windows

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

JohannesGaessler · 2024-09-19T14:45:59Z

Is this possible to support? If so, what would be required? Does someone (possibly myself) just need to implement FA for the vulkan backend?

That would be the most straightforward solution. Alternatively if there was support for V cache quantization without FA that would also work (this is currently supported for none of the backends, would be more work to implement than FA, and yield worse results.).

JohannesGaessler · 2024-09-19T14:52:04Z

I forgot: if you end up deciding to implement FA for Vulkan, take a look at the corresponding tests in tests/test-backend-ops.cpp. You don't have to implement support for all of those cases but for those cases where ggml_backend_vk_supports_op returns true the tests should succeed (defined as giving the same results as the CPU backends within some numerical precision).

jmars added bug-unconfirmed medium severity Used to report medium severity bugs in llama.cpp (e.g. Malfunctioning Features but still useable) labels Sep 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: KV quantization fails when using vulkan #9551

Bug: KV quantization fails when using vulkan #9551

jmars commented Sep 19, 2024

JohannesGaessler commented Sep 19, 2024

JohannesGaessler commented Sep 19, 2024 •

edited

Loading

Bug: KV quantization fails when using vulkan #9551

Bug: KV quantization fails when using vulkan #9551

Comments

jmars commented Sep 19, 2024

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

JohannesGaessler commented Sep 19, 2024

JohannesGaessler commented Sep 19, 2024 • edited Loading

JohannesGaessler commented Sep 19, 2024 •

edited

Loading