What happened?
llama-server + mistral vibe, after some work it fails with core dump. After restarting the llama-server and repeating the context llama fails again.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x0000772d5904527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x0000772d590288ff in __GI_abort () at ./stdlib/abort.c:79
#5 0x0000772d594a5ff5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#6 0x0000772d594bb0da in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48
#7 0x0000772d594a5a55 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:58
#8 0x0000772d594bb391 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x5cc26c0bf790 <typeinfo for std::runtime_error@GLIBCXX_3.4>, dest=0x772d594d2150 <std::runtime_error::~runtime_error()>)
at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:98
#9 0x00005cc26ae7aef5 in llama_grammar_accept_token(llama_grammar&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [clone .cold] ()
#10 0x00005cc26b20a624 in llama_grammar_accept_impl(llama_grammar&, llama_vocab const*, llama_sampling const*, int) ()
#11 0x00005cc26b09175a in common_sampler_accept(llama_sampling_context*, llama_context*, int, bool) ()
#12 0x00005cc26af7af74 in server_context::process_batch_tokens(int&) ()
#13 0x00005cc26af7c280 in server_context::update_slots() ()
#14 0x00005cc26af1be67 in server_queue::start_loop() ()
#15 0x00005cc26ae943ee in main ()
Name and Version
./work/ik_llama.cpp/build/bin/llama-server --version
version: 4191 (1fdbc0d)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
INFO [ main] HTTP server listening | tid="131036636012736" timestamp=1771667791 n_threads_http="23" port="8080" hostname="0.0.0.0"
INFO [ slots_idle] all slots are idle | tid="131036636012736" timestamp=1771667791
======== Prompt cache: cache size: 0, n_keep: 0, n_discarded_prompt: 0, cache_ram_n_min: 0, f_keep: 0.00, cache_ram_similarity: 0.50
INFO [ launch_slot_with_task] slot is processing task | tid="131036636012736" timestamp=1771667847 id_slot=0 id_task=0
======== Cache: cache_size = 0, n_past0 = 0, n_past1 = 0, n_past_prompt1 = 0, n_past2 = 0, n_past_prompt2 = 0
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667847 id_slot=0 id_task=0 p0=0
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667851 id_slot=0 id_task=0 p0=2048
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667858 id_slot=0 id_task=0 p0=4096
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667866 id_slot=0 id_task=0 p0=6144
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667875 id_slot=0 id_task=0 p0=8192
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667887 id_slot=0 id_task=0 p0=10240
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667900 id_slot=0 id_task=0 p0=12288
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667915 id_slot=0 id_task=0 p0=14336
INFO [ release_slots] slot released | tid="131036636012736" timestamp=1771667933 id_slot=0 id_task=0 n_ctx=32768 n_past=14840 n_system_tokens=0 n_cache_tokens=14840 truncated=false
slot print_timing: id 0 | task -1 |
prompt eval time = 70315.84 ms / 14586 tokens ( 4.82 ms per token, 207.44 tokens per second)
eval time = 15744.81 ms / 255 tokens ( 61.74 ms per token, 16.20 tokens per second)
total time = 86060.65 ms / 14841 tokens
INFO [ slots_idle] all slots are idle | tid="131036636012736" timestamp=1771667933
INFO [ log_server_request] request | tid="131015620921024" timestamp=1771667933 remote_addr="127.0.0.1" remote_port=38840 status=200 method="POST" path="/v1/chat/completions" params={}
======== Prompt cache: cache size: 14840, n_keep: 0, n_discarded_prompt: 0, cache_ram_n_min: 0, f_keep: 1.00, cache_ram_similarity: 0.50
INFO [ launch_slot_with_task] slot is processing task | tid="131036636012736" timestamp=1771667933 id_slot=0 id_task=263
======== Cache: cache_size = 14840, n_past0 = 14765, n_past1 = 14765, n_past_prompt1 = 14765, n_past2 = 14767, n_past_prompt2 = 14766
Common part does not match fully
cache : parameter=todos>
[
{
"id": "1",
"content": "xxxxxxxx xxxxx xxxxx
prompt: parameter=todos>
[{'id': '1', 'content': 'xxxxxxxx xxxxx xxxxx xxxx
INFO [ batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667933 id_slot=0 id_task=263 p0=14765
terminate called after throwing an instance of 'std::runtime_error'
what(): Unexpected empty grammar stack after accepting piece: =search (96598)
Aborted (core dumped)
What happened?
llama-server + mistral vibe, after some work it fails with core dump. After restarting the llama-server and repeating the context llama fails again.
Name and Version
./work/ik_llama.cpp/build/bin/llama-server --version
version: 4191 (1fdbc0d)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output