You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5, VMM: yes
version: 4790 (438a839)
this will use llama-cpp-deepseek-r1.jinja as the template, however, when using stream mode, the output content will miss the start <think> tag, but the </think> still exists, if remove the flag --chat-template-file /root/git/llama.cpp/models/templates/llama-cpp-deepseek-r1.jinja, problem gone
Name and Version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 CUDA devices:
Device 0: Tesla T4, compute capability 7.5, VMM: yes
version: 4790 (438a839)
Operating systems
Linux
GGML backends
CUDA
Hardware
tesla t4
Models
gguf deepseek-r1:14b, downloaded from ollama
Problem description & steps to reproduce
this will use llama-cpp-deepseek-r1.jinja as the template, however, when using stream mode, the output content will miss the start
<think>
tag, but the</think>
still exists, if remove the flag--chat-template-file /root/git/llama.cpp/models/templates/llama-cpp-deepseek-r1.jinja
, problem goneFirst Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: