Bug: coredump at  llama_grammar_accept_token()

### What happened?

llama-server + mistral vibe, after some work it fails with core dump. After restarting the llama-server and repeating the context llama fails again.
```
(gdb)  bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=<optimized out>) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=<optimized out>, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x0000772d5904527e in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x0000772d590288ff in __GI_abort () at ./stdlib/abort.c:79
#5  0x0000772d594a5ff5 in __gnu_cxx::__verbose_terminate_handler () at ../../../../src/libstdc++-v3/libsupc++/vterminate.cc:95
#6  0x0000772d594bb0da in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:48
#7  0x0000772d594a5a55 in std::terminate () at ../../../../src/libstdc++-v3/libsupc++/eh_terminate.cc:58
#8  0x0000772d594bb391 in __cxxabiv1::__cxa_throw (obj=<optimized out>, tinfo=0x5cc26c0bf790 <typeinfo for std::runtime_error@GLIBCXX_3.4>, dest=0x772d594d2150 <std::runtime_error::~runtime_error()>)
    at ../../../../src/libstdc++-v3/libsupc++/eh_throw.cc:98
#9  0x00005cc26ae7aef5 in llama_grammar_accept_token(llama_grammar&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) [clone .cold] ()
#10 0x00005cc26b20a624 in llama_grammar_accept_impl(llama_grammar&, llama_vocab const*, llama_sampling const*, int) ()
#11 0x00005cc26b09175a in common_sampler_accept(llama_sampling_context*, llama_context*, int, bool) ()
#12 0x00005cc26af7af74 in server_context::process_batch_tokens(int&) ()
#13 0x00005cc26af7c280 in server_context::update_slots() ()
#14 0x00005cc26af1be67 in server_queue::start_loop() ()
#15 0x00005cc26ae943ee in main ()
```


### Name and Version

./work/ik_llama.cpp/build/bin/llama-server --version
version: 4191 (1fdbc0da)
built with cc (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0 for x86_64-linux-gnu

### What operating system are you seeing the problem on?

Linux

### Relevant log output

```shell
INFO [                    main] HTTP server listening | tid="131036636012736" timestamp=1771667791 n_threads_http="23" port="8080" hostname="0.0.0.0"
INFO [              slots_idle] all slots are idle | tid="131036636012736" timestamp=1771667791
======== Prompt cache: cache size: 0, n_keep: 0, n_discarded_prompt: 0, cache_ram_n_min: 0, f_keep: 0.00, cache_ram_similarity: 0.50
INFO [   launch_slot_with_task] slot is processing task | tid="131036636012736" timestamp=1771667847 id_slot=0 id_task=0
======== Cache: cache_size = 0, n_past0 =  0, n_past1 =  0, n_past_prompt1 = 0,  n_past2 =  0, n_past_prompt2 =  0
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667847 id_slot=0 id_task=0 p0=0
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667851 id_slot=0 id_task=0 p0=2048
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667858 id_slot=0 id_task=0 p0=4096
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667866 id_slot=0 id_task=0 p0=6144
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667875 id_slot=0 id_task=0 p0=8192
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667887 id_slot=0 id_task=0 p0=10240
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667900 id_slot=0 id_task=0 p0=12288
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667915 id_slot=0 id_task=0 p0=14336
INFO [           release_slots] slot released | tid="131036636012736" timestamp=1771667933 id_slot=0 id_task=0 n_ctx=32768 n_past=14840 n_system_tokens=0 n_cache_tokens=14840 truncated=false
slot print_timing: id  0 | task -1 |                                             
prompt eval time =   70315.84 ms / 14586 tokens (    4.82 ms per token,   207.44 tokens per second)
       eval time =   15744.81 ms /   255 tokens (   61.74 ms per token,    16.20 tokens per second)
      total time =   86060.65 ms / 14841 tokens                                  
INFO [              slots_idle] all slots are idle | tid="131036636012736" timestamp=1771667933
INFO [      log_server_request] request | tid="131015620921024" timestamp=1771667933 remote_addr="127.0.0.1" remote_port=38840 status=200 method="POST" path="/v1/chat/completions" params={}
======== Prompt cache: cache size: 14840, n_keep: 0, n_discarded_prompt: 0, cache_ram_n_min: 0, f_keep: 1.00, cache_ram_similarity: 0.50
INFO [   launch_slot_with_task] slot is processing task | tid="131036636012736" timestamp=1771667933 id_slot=0 id_task=263
======== Cache: cache_size = 14840, n_past0 =  14765, n_past1 =  14765, n_past_prompt1 = 14765,  n_past2 =  14767, n_past_prompt2 =  14766
Common part does not match fully                                                 
cache : parameter=todos>                                                         
[                                                                                
  {                                                                              
    "id": "1",                                                                   
    "content": "xxxxxxxx xxxxx xxxxx                                        
prompt: parameter=todos>                                                         
[{'id': '1', 'content': 'xxxxxxxx xxxxx xxxxx xxxx           
INFO [    batch_pending_prompt] kv cache rm [p0, end) | tid="131036636012736" timestamp=1771667933 id_slot=0 id_task=263 p0=14765
terminate called after throwing an instance of 'std::runtime_error'              
  what():  Unexpected empty grammar stack after accepting piece: =search (96598) 
Aborted (core dumped)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: coredump at llama_grammar_accept_token() #1297

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug: coredump at llama_grammar_accept_token() #1297

Description

What happened?

Name and Version

What operating system are you seeing the problem on?

Relevant log output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions