Eval bug: The answers have some problems with the example/llama.android #12158

chtfrank · 2025-03-03T06:11:39Z

Name and Version

The master branch: commit cc473ca

Operating systems

Linux

GGML backends

CPU

Hardware

Android 13
RedMi Modile K60

Models

qwen2.5-3b-instruct-q5_k_m.gguf

downloaded from qwen2.5-3b-instruct-q5_k_m.gguf

Problem description & steps to reproduce

Thanks for this repo.

I tried the example/llama.android example and compiled it successfully with Android Studio. The app runs ok and loads the offline models successfully. But when I send something, it seems not to output the ending token. The assistant outputs text all the time.

I have tried other apps to check the gguf model file. For example, I installed the pockedtpal-ai and loads the same gguf file. The output text is normal in pockedtpal-ai app. So I think there is no problem with the GGUF file.

I guess the problems exist in the process of loading the model or the inference process, especially the template. This example does not load chat templates? Does anyone have some idea?

Settings: the default codes from this example.
Problems:
I: who are you
Assistant:
? I am a large language model created by Alibaba Cloud. I'm called Qwen. How can I assist you today? You can ask me questions, and I'll do my best to provide you with helpful answers. Let's get started! If you have any specific topic or question in mind, feel
(it stops because of the limitation of nlen=64)

First Bad Commit

No response

Relevant log output

2025-03-03 14:10:26.042  1314-1314  vendor.qti...al-service ven...qti.hardware.perf-hal-service  E  PerfLockHelper: GetCustBoostConfig() 342: applist: com.example.llama, it->applist com.miui.calculator, com.miui.weather2, com.miui.notes, com.miui.gallery, com.android.calendar, com.android.deskclock,                      com.android.soundrecorder, com.android.contacts, com.android.mms, com.duokan.reader, com.miui.securitycenter, com.android.settings,                      com.xiaomi.market, com.android.quicksearchbox, com.android.fileexplorer, com.xiaomi.gamecenter, com.miui.video
2025-03-03 14:10:26.077 13702-14207 ActivityManagerWrapper  com.miui.home                        E  getRecentTasks: mainTaskId=11255   userId=0   baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.example.llama/com.example.llama.MainActivity} }
2025-03-03 14:10:26.124 11494-11841 SuggestManager          com.miui.securitycenter.remote       E  openApp name = com.example.llama
---------------------------- PROCESS STARTED (10569) for package com.example.llama ----------------------------
2025-03-03 14:10:26.140 10569-10569 ziparchive              com.example.llama                    W  Unable to open '/data/data/com.example.llama/code_cache/.overlay/base.apk/classes4.dm': No such file or directory
2025-03-03 14:10:26.140 10569-10569 ziparchive              com.example.llama                    W  Unable to open '/data/app/~~3SipRxKykJ_b3BwLrltlMQ==/com.example.llama-GDKXE2hnIan_uK5gtVXhPg==/base.dm': No such file or directory
2025-03-03 14:10:26.140 10569-10569 ziparchive              com.example.llama                    W  Unable to open '/data/app/~~3SipRxKykJ_b3BwLrltlMQ==/com.example.llama-GDKXE2hnIan_uK5gtVXhPg==/base.dm': No such file or directory
2025-03-03 14:10:26.167 13702-14207 ActivityManagerWrapper  com.miui.home                        E  getRecentTasks: mainTaskId=11255   userId=0   baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.example.llama/com.example.llama.MainActivity} }
2025-03-03 14:10:26.173 13702-14207 ActivityManagerWrapper  com.miui.home                        E  getRecentTasks: mainTaskId=11255   userId=0   baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.example.llama/com.example.llama.MainActivity} }
2025-03-03 14:10:26.211 10569-10569 Perf                    com.example.llama                    I  Connecting to perf service.
2025-03-03 14:10:26.214 10569-10569 GraphicsEnvironment     com.example.llama                    V  ANGLE Developer option for 'com.example.llama' set to: 'default'
2025-03-03 14:10:26.214 10569-10569 GraphicsEnvironment     com.example.llama                    V  ANGLE GameManagerService for com.example.llama: false
2025-03-03 14:10:26.214 10569-10569 GraphicsEnvironment     com.example.llama                    V  Updatable production driver is not supported on the device.
2025-03-03 14:10:26.216 10569-10569 ForceDarkHelperStubImpl com.example.llama                    I  initialize for com.example.llama , ForceDarkOrigin
2025-03-03 14:10:26.218 10569-10569 m.example.llama         com.example.llama                    D  JNI_OnLoad success
2025-03-03 14:10:26.218 10569-10569 MiuiForceDarkConfig     com.example.llama                    I  setConfig density:3.500000, mainRule:0, secondaryRule:0, tertiaryRule:0
2025-03-03 14:10:26.220 10569-10569 NetworkSecurityConfig   com.example.llama                    D  No Network Security Config specified, using platform default
2025-03-03 14:10:26.220 10569-10569 NetworkSecurityConfig   com.example.llama                    D  No Network Security Config specified, using platform default
2025-03-03 14:10:26.235 10569-10569 MiuiMultiWindowAdapter  com.example.llama                    D  MiuiMultiWindowAdapter::getFreeformVideoWhiteListInSystem::LIST_ABOUT_SUPPORT_LANDSCAPE_VIDEO = [com.hunantv.imgo.activity, com.tencent.qqlive, com.qiyi.video, com.hunantv.imgo.activity.inter, com.tencent.qqlivei18n, com.iqiyi.i18n, tv.danmaku.bili]
2025-03-03 14:10:26.269 10569-10569 libc                    com.example.llama                    W  Access denied finding property "ro.vendor.df.effect.conflict"
2025-03-03 14:10:26.264 10569-10569 m.example.llama         com.example.llama                    W  type=1400 audit(0.0:1987456): avc: denied { read } for name="u:object_r:vendor_displayfeature_prop:s0" dev="tmpfs" ino=388 scontext=u:r:untrusted_app:s0:c139,c257,c512,c768 tcontext=u:object_r:vendor_displayfeature_prop:s0 tclass=file permissive=0 app=com.example.llama
2025-03-03 14:10:26.285 10569-25898 ViewContentFactory      com.example.llama                    D  initViewContentFetcherClass
2025-03-03 14:10:26.285 10569-25898 ViewContentFactory      com.example.llama                    D  getInterceptorPackageInfo
2025-03-03 14:10:26.286 10569-25898 ViewContentFactory      com.example.llama                    D  getInitialApplication took 0ms
2025-03-03 14:10:26.286 10569-25898 ViewContentFactory      com.example.llama                    D  packageInfo.packageName: com.miui.catcherpatch
2025-03-03 14:10:26.291 10569-25898 ViewContentFactory      com.example.llama                    D  initViewContentFetcherClass took 6ms
2025-03-03 14:10:26.291 10569-25898 ContentCatcher          com.example.llama                    I  ViewContentFetcher : ViewContentFetcher
2025-03-03 14:10:26.291 10569-25898 ViewContentFactory      com.example.llama                    D  createInterceptor took 6ms
2025-03-03 14:10:26.298 10569-10569 IS_CTS_MODE             com.example.llama                    D  false
2025-03-03 14:10:26.298 10569-10569 MULTI_WINDOW_ENABLED    com.example.llama                    D  false
2025-03-03 14:10:26.300 10569-10569 DecorView[]             com.example.llama                    D  getWindowModeFromSystem  windowmode is 1
2025-03-03 14:10:26.383 10569-10569 m.example.llama         com.example.llama                    W  Method java.lang.Object androidx.compose.runtime.snapshots.SnapshotStateMap.mutate(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
                                                                                                    Common causes for lock verification issues are non-optimized dex code
                                                                                                    and incorrect proguard optimizations.
2025-03-03 14:10:26.383 10569-10569 m.example.llama         com.example.llama                    W  Method void androidx.compose.runtime.snapshots.SnapshotStateMap.update(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.384 10569-10569 m.example.llama         com.example.llama                    W  Method boolean androidx.compose.runtime.snapshots.SnapshotStateMap.removeIf$runtime_release(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.426 10569-10569 m.example.llama         com.example.llama                    W  Method boolean androidx.compose.runtime.snapshots.SnapshotStateList.conditionalUpdate(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.426 10569-10569 m.example.llama         com.example.llama                    W  Method java.lang.Object androidx.compose.runtime.snapshots.SnapshotStateList.mutate(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.426 10569-10569 m.example.llama         com.example.llama                    W  Method void androidx.compose.runtime.snapshots.SnapshotStateList.update(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.434 10569-10569 Compatibil...geReporter com.example.llama                    D  Compat change id reported: 171228096; UID 10395; state: ENABLED
2025-03-03 14:10:26.588 10569-25895 AdrenoGLES-0            com.example.llama                    I  QUALCOMM build                   : ee4b625, I41c6f366e1
                                                                                                    Build Date                       : 02/16/23
                                                                                                    OpenGL ES Shader Compiler Version: EV031.36.08.19
                                                                                                    Local Branch                     : 
                                                                                                    Remote Branch                    : 
                                                                                                    Remote Branch                    : 
                                                                                                    Reconstruct Branch               : 
2025-03-03 14:10:26.588 10569-25895 AdrenoGLES-0            com.example.llama                    I  Build Config                     : S P 12.1.1 AArch64
2025-03-03 14:10:26.588 10569-25895 AdrenoGLES-0            com.example.llama                    I  Driver Path                      : /vendor/lib64/egl/libGLESv2_adreno.so
2025-03-03 14:10:26.588 10569-25895 AdrenoGLES-0            com.example.llama                    I  Driver Version                   : 0615.60
2025-03-03 14:10:26.590 10569-25895 AdrenoGLES-0            com.example.llama                    I  PFP: 0x01730155, ME: 0x00000000
2025-03-03 14:10:26.601 10569-25895 libc                    com.example.llama                    W  Access denied finding property "vendor.migl.debug"
2025-03-03 14:10:26.603 10569-25895 libEGL                  com.example.llama                    E  pre_cache appList: ,,
2025-03-03 14:10:26.605 10569-25895 m.example.llama         com.example.llama                    I  Support FEAS product mondrian: 
2025-03-03 14:10:26.619 10569-25895 m.example.llama         com.example.llama                    D  MiuiProcessManagerServiceStub setSchedFifo
2025-03-03 14:10:26.619 10569-25895 MiuiProcessManagerImpl  com.example.llama                    I  setSchedFifo pid:10569, mode:3
2025-03-03 14:10:26.624 10569-25895 LB                      com.example.llama                    E  fail to open file: No such file or directory
2025-03-03 14:10:26.620 10569-10569 RenderThread            com.example.llama                    W  type=1400 audit(0.0:1987457): avc: denied { getattr } for path="/sys/module/metis/parameters/minor_window_app" dev="sysfs" ino=69775 scontext=u:r:untrusted_app:s0:c139,c257,c512,c768 tcontext=u:object_r:sysfs_migt:s0 tclass=file permissive=0 app=com.example.llama
2025-03-03 14:10:26.626 10569-25895 Parcel                  com.example.llama                    W  Expecting binder but got null!
2025-03-03 14:10:26.627 10569-10569 Looper                  com.example.llama                    W  PerfMonitor doFrame : time=305ms vsyncFrame=0 latency=88ms procState=-1 historyMsgCount=3 (msgIndex=1 wall=88ms seq=4 running=77ms runnable=1ms binder=64ms slowpath=12ms late=117ms h=android.app.ActivityThread$H w=159)
2025-03-03 14:10:26.642 10569-10569 Choreographer           com.example.llama                    I  Skipped 35 frames!  The application may be doing too much work on its main thread.
2025-03-03 14:10:26.652 10569-10569 DecorView[]             com.example.llama                    D  onWindowFocusChanged hasWindowFocus true
2025-03-03 14:10:26.652 10569-10569 HandWritingStubImpl     com.example.llama                    I  refreshLastKeyboardType: 1
2025-03-03 14:10:26.652 10569-10569 HandWritingStubImpl     com.example.llama                    I  getCurrentKeyboardType: 1
2025-03-03 14:10:26.661  9186-9186  BaseInputMethodService  com.sohu.inputmethod.sogou.xiaomi    E  onStartInput app:com.example.llama restarting:false
2025-03-03 14:10:26.664 10569-10569 HandWritingStubImpl     com.example.llama                    I  getCurrentKeyboardType: 1
2025-03-03 14:10:28.253 10569-25913 LLamaAndroid            com.example.llama                    D  Dedicated thread for native code: Llm-RunLoop
2025-03-03 14:10:28.261 10569-25913 LLamaAndroid            com.example.llama                    D  CPU : NEON = 1 | ARM_FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 | 
2025-03-03 14:10:28.261 10569-25913 llama-android.cpp       com.example.llama                    I  Loading model from /storage/emulated/0/Android/data/com.example.llama/files/qwen2.5-3b-instruct-q5_k_m.gguf
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: loaded meta data with 26 key-value pairs and 435 tensors from /storage/emulated/0/Android/data/com.example.llama/files/qwen2.5-3b-instruct-q5_k_m.gguf (version GGUF V3 (latest))
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   0:                       general.architecture str              = qwen2
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   1:                               general.type str              = model
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   2:                               general.name str              = qwen2.5-3b-instruct
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   3:                            general.version str              = v0.1-v0.1
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   4:                           general.finetune str              = qwen2.5-3b-instruct
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   5:                         general.size_label str              = 3.4B
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   6:                          qwen2.block_count u32              = 36
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   7:                       qwen2.context_length u32              = 32768
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   8:                     qwen2.embedding_length u32              = 2048
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv   9:                  qwen2.feed_forward_length u32              = 11008
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  10:                 qwen2.attention.head_count u32              = 16
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  11:              qwen2.attention.head_count_kv u32              = 2
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  12:                       qwen2.rope.freq_base f32              = 1000000.000000
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  13:     qwen2.attention.layer_norm_rms_epsilon f32              = 0.000001
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  14:                          general.file_type u32              = 17
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  15:                       tokenizer.ggml.model str              = gpt2
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  16:                         tokenizer.ggml.pre str              = qwen2
2025-03-03 14:10:28.332 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  17:                      tokenizer.ggml.tokens arr[str,151936]  = ["!", "\"", "#", "$", "", "&", "'", ...
2025-03-03 14:10:28.339 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  18:                  tokenizer.ggml.token_type arr[i32,151936]  = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  19:                      tokenizer.ggml.merges arr[str,151387]  = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  20:                tokenizer.ggml.eos_token_id u32              = 151645
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  21:            tokenizer.ggml.padding_token_id u32              = 151643
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  22:                tokenizer.ggml.bos_token_id u32              = 151643
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  23:               tokenizer.ggml.add_bos_token bool             = false
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  24:                    tokenizer.chat_template str              = { 0f tools }\n    {{- '<|im_start|>...
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - kv  25:               general.quantization_version u32              = 2
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - type  f32:  181 tensors
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - type q5_K:  217 tensors
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  llama_model_loader: - type q6_K:   37 tensors
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: file format = GGUF V3 (latest)
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: file type   = Q5_K - Medium
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: file size   = 2.27 GiB (5.73 BPW) 
2025-03-03 14:10:28.489 10569-25913 llama-android.cpp       com.example.llama                    I  load: special tokens cache size = 22
2025-03-03 14:10:28.541 10569-25913 llama-android.cpp       com.example.llama                    I  load: token to piece cache size = 0.9310 MB
2025-03-03 14:10:28.541 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: arch             = qwen2
2025-03-03 14:10:28.541 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: vocab_only       = 0
2025-03-03 14:10:28.541 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_ctx_train      = 32768
2025-03-03 14:10:28.541 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_embd           = 2048
2025-03-03 14:10:28.541 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_layer          = 36
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_head           = 16
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_head_kv        = 2
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_rot            = 128
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_swa            = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_embd_head_k    = 128
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_embd_head_v    = 128
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_gqa            = 8
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_embd_k_gqa     = 256
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_embd_v_gqa     = 256
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: f_norm_eps       = 0.0e+00
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: f_norm_rms_eps   = 1.0e-06
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: f_clamp_kqv      = 0.0e+00
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: f_max_alibi_bias = 0.0e+00
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: f_logit_scale    = 0.0e+00
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_ff             = 11008
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_expert         = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_expert_used    = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: causal attn      = 1
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: pooling type     = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: rope type        = 2
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: rope scaling     = linear
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: freq_base_train  = 1000000.0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: freq_scale_train = 1
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_ctx_orig_yarn  = 32768
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: rope_finetuned   = unknown
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: ssm_d_conv       = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: ssm_d_inner      = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: ssm_d_state      = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: ssm_dt_rank      = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: ssm_dt_b_c_rms   = 0
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: model type       = 3B
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: model params     = 3.40 B
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: general.name     = qwen2.5-3b-instruct
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: vocab type       = BPE
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_vocab          = 151936
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: n_merges         = 151387
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: BOS token        = 151643 '<|endoftext|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: EOS token        = 151645 '<|im_end|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: EOT token        = 151645 '<|im_end|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: PAD token        = 151643 '<|endoftext|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: LF token         = 198 'Ċ'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: FIM PRE token    = 151659 '<|fim_prefix|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: FIM SUF token    = 151661 '<|fim_suffix|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: FIM MID token    = 151660 '<|fim_middle|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: FIM PAD token    = 151662 '<|fim_pad|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: FIM REP token    = 151663 '<|repo_name|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: FIM SEP token    = 151664 '<|file_sep|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: EOG token        = 151643 '<|endoftext|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: EOG token        = 151645 '<|im_end|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: EOG token        = 151662 '<|fim_pad|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: EOG token        = 151663 '<|repo_name|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: EOG token        = 151664 '<|file_sep|>'
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  print_info: max token length = 256
2025-03-03 14:10:28.542 10569-25913 llama-android.cpp       com.example.llama                    I  load_tensors: loading model tensors, this can take a while... (mmap = true)
2025-03-03 14:10:28.838 10569-25913 llama-android.cpp       com.example.llama                    I  load_tensors:   CPU_Mapped model buffer size =  2320.08 MiB
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  Using 6 threads
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: n_seq_max     = 1
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: n_ctx         = 2048
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: n_ctx_per_seq = 2048
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: n_batch       = 2048
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: n_ubatch      = 512
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: flash_attn    = 0
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: freq_base     = 1000000.0
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: freq_scale    = 1
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    W  llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
2025-03-03 14:10:28.841 10569-25913 llama-android.cpp       com.example.llama                    I  llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 36, can_shift = 1
2025-03-03 14:10:28.865 10569-25913 llama-android.cpp       com.example.llama                    I  llama_kv_cache_init:        CPU KV buffer size =    72.00 MiB
2025-03-03 14:10:28.865 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: KV self size  =   72.00 MiB, K (f16):   36.00 MiB, V (f16):   36.00 MiB
2025-03-03 14:10:28.866 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model:        CPU  output buffer size =     0.58 MiB
2025-03-03 14:10:28.868 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model:        CPU compute buffer size =   300.75 MiB
2025-03-03 14:10:28.868 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: graph nodes  = 1266
2025-03-03 14:10:28.868 10569-25913 llama-android.cpp       com.example.llama                    I  llama_init_from_model: graph splits = 1
2025-03-03 14:10:28.868 10569-25913 LLamaAndroid            com.example.llama                    I  Loaded model /storage/emulated/0/Android/data/com.example.llama/files/qwen2.5-3b-instruct-q5_k_m.gguf
2025-03-03 14:10:28.912 10569-25857 m.example.llama         com.example.llama                    I  This is non sticky GC, maxfree is 33554432 minfree is 8388608
2025-03-03 14:10:28.913 10569-10569 Compose Focus           com.example.llama                    D  Owner FocusChanged(true)
2025-03-03 14:10:28.908 10569-10569 FinalizerDaemon         com.example.llama                    W  type=1400 audit(0.0:1987458): avc: denied { getopt } for path="/dev/socket/usap_pool_primary" scontext=u:r:untrusted_app:s0:c139,c257,c512,c768 tcontext=u:r:zygote:s0 tclass=unix_stream_socket permissive=0 app=com.example.llama
2025-03-03 14:10:28.915 10569-25859 StrictMode              com.example.llama                    D  StrictMode policy violation: android.os.strictmode.LeakedClosableViolation: A resource was acquired at attached stack trace but never released. See java.io.Closeable for information on avoiding resource leaks. Callsite: close
                                                                                                    	at android.os.StrictMode$AndroidCloseGuardReporter.report(StrictMode.java:1991)
                                                                                                    	at dalvik.system.CloseGuard.warnIfOpen(CloseGuard.java:338)
                                                                                                    	at sun.nio.fs.UnixSecureDirectoryStream.finalize(UnixSecureDirectoryStream.java:580)
                                                                                                    	at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:319)
                                                                                                    	at java.lang.Daemons$FinalizerDaemon.runInternal(Daemons.java:306)
                                                                                                    	at java.lang.Daemons$Daemon.run(Daemons.java:140)
                                                                                                    	at java.lang.Thread.run(Thread.java:1012)
2025-03-03 14:10:28.935 10569-10569 HandWritingStubImpl     com.example.llama                    I  refreshLastKeyboardType: 1
2025-03-03 14:10:28.936 10569-10569 HandWritingStubImpl     com.example.llama                    I  getCurrentKeyboardType: 1
2025-03-03 14:10:28.939 10569-10569 HandWritingStubImpl     com.example.llama                    I  getCurrentKeyboardType: 1
2025-03-03 14:10:28.940  9186-9186  BaseInputMethodService  com.sohu.inputmethod.sogou.xiaomi    E  onStartInput app:com.example.llama restarting:false
2025-03-03 14:10:28.941 10569-10569 InsetsController        com.example.llama                    D  show(ime(), fromIme=false)
2025-03-03 14:10:28.941  9186-9186  BaseInputMethodService  com.sohu.inputmethod.sogou.xiaomi    E  onStartInput app:com.example.llama restarting:true
2025-03-03 14:10:28.941 10569-10569 InputMethodManager      com.example.llama                    D  showSoftInput() view=androidx.compose.ui.platform.AndroidComposeView{89074eb VFED..... .F....ID 0,0-1440,3024 aid=1073741824} flags=0 reason=SHOW_SOFT_INPUT_BY_INSETS_API
2025-03-03 14:10:28.944  9186-9186  BaseInputMethodService  com.sohu.inputmethod.sogou.xiaomi    E  onStartInputView  app:com.example.llama restarting:false
2025-03-03 14:10:28.971 10569-10569 OnBackInvokedCallback   com.example.llama                    W  OnBackInvokedCallback is not enabled for the application.
                                                                                                    Set 'android:enableOnBackInvokedCallback="true"' in the application manifest.
2025-03-03 14:10:28.991 10569-10569 InsetsController        com.example.llama                    D  show(ime(), fromIme=true)
2025-03-03 14:10:31.444 10569-26535 ProfileInstaller        com.example.llama                    D  Installing profile for com.example.llama
2025-03-03 14:10:32.124  1314-1314  vendor.qti...al-service ven...qti.hardware.perf-hal-service  E  PerfLockHelper: GetCustBoostConfig() 342: applist: com.example.llama, it->applist com.miui.home, com.zhihu.android
2025-03-03 14:10:34.214 10569-25891 m.example.llama         com.example.llama                    I  ProcessProfilingInfo new_methods=0 is saved saved_to_disk=0 resolve_classes_delay=8000
2025-03-03 14:10:35.807 10569-25913 llama-android.cpp       com.example.llama                    I  n_len = 64, n_ctx = 2048, n_kv_req = 64
2025-03-03 14:10:35.807 10569-25913 llama-android.cpp       com.example.llama                    I  token: `who`-> 14623 
2025-03-03 14:10:35.807 10569-25913 llama-android.cpp       com.example.llama                    I  token: ` are`-> 525 
2025-03-03 14:10:35.807 10569-25913 llama-android.cpp       com.example.llama                    I  token: ` you`-> 498 
2025-03-03 14:10:35.812 10569-10569 HandWritingStubImpl     com.example.llama                    I  getCurrentKeyboardType: 1
2025-03-03 14:10:35.816  9186-9186  BaseInputMethodService  com.sohu.inputmethod.sogou.xiaomi    E  onStartInput app:com.example.llama restarting:true
2025-03-03 14:10:36.237 10569-25913 llama-android.cpp       com.example.llama                    I  cached: ?, new_token_chars: `?`, id: 30
2025-03-03 14:10:36.411 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  I, new_token_chars: ` I`, id: 358
2025-03-03 14:10:36.594 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  am, new_token_chars: ` am`, id: 1079
2025-03-03 14:10:36.766 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  a, new_token_chars: ` a`, id: 264
2025-03-03 14:10:36.938 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  large, new_token_chars: ` large`, id: 3460
2025-03-03 14:10:37.094 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  language, new_token_chars: ` language`, id: 4128
2025-03-03 14:10:37.244 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  model, new_token_chars: ` model`, id: 1614
2025-03-03 14:10:37.394 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  created, new_token_chars: ` created`, id: 3465
2025-03-03 14:10:37.536 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  by, new_token_chars: ` by`, id: 553
2025-03-03 14:10:37.690 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  Alibaba, new_token_chars: ` Alibaba`, id: 54364
2025-03-03 14:10:37.838 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  Cloud, new_token_chars: ` Cloud`, id: 14817
2025-03-03 14:10:37.992 10569-25913 llama-android.cpp       com.example.llama                    I  cached: ., new_token_chars: `.`, id: 13
2025-03-03 14:10:38.167 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  I, new_token_chars: ` I`, id: 358
2025-03-03 14:10:38.311 10569-25913 llama-android.cpp       com.example.llama                    I  cached: 'm, new_token_chars: `'m`, id: 2776
2025-03-03 14:10:38.464 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  called, new_token_chars: ` called`, id: 2598
2025-03-03 14:10:38.620 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  Q, new_token_chars: ` Q`, id: 1207
2025-03-03 14:10:38.771 10569-25913 llama-android.cpp       com.example.llama                    I  cached: wen, new_token_chars: `wen`, id: 16948
2025-03-03 14:10:38.927 10569-25913 llama-android.cpp       com.example.llama                    I  cached: ., new_token_chars: `.`, id: 13
2025-03-03 14:10:39.093 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  How, new_token_chars: ` How`, id: 2585
2025-03-03 14:10:39.276 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  can, new_token_chars: ` can`, id: 646
2025-03-03 14:10:39.457 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  I, new_token_chars: ` I`, id: 358
2025-03-03 14:10:39.644 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  assist, new_token_chars: ` assist`, id: 7789
2025-03-03 14:10:39.825 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  you, new_token_chars: ` you`, id: 498
2025-03-03 14:10:40.065 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  today, new_token_chars: ` today`, id: 3351
2025-03-03 14:10:40.285 10569-25913 llama-android.cpp       com.example.llama                    I  cached: ?, new_token_chars: `?`, id: 30
2025-03-03 14:10:40.502 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  You, new_token_chars: ` You`, id: 1446
2025-03-03 14:10:40.726 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  can, new_token_chars: ` can`, id: 646
2025-03-03 14:10:40.965 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  ask, new_token_chars: ` ask`, id: 2548
2025-03-03 14:10:41.215 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  me, new_token_chars: ` me`, id: 752
2025-03-03 14:10:41.448 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  questions, new_token_chars: ` questions`, id: 4755
2025-03-03 14:10:41.699 10569-25913 llama-android.cpp       com.example.llama                    I  cached: ,, new_token_chars: `,`, id: 11
2025-03-03 14:10:41.896 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  and, new_token_chars: ` and`, id: 323
2025-03-03 14:10:42.116 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  I, new_token_chars: ` I`, id: 358
2025-03-03 14:10:42.378 10569-25913 llama-android.cpp       com.example.llama                    I  cached: 'll, new_token_chars: `'ll`, id: 3278
2025-03-03 14:10:42.663 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  do, new_token_chars: ` do`, id: 653
2025-03-03 14:10:42.924 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  my, new_token_chars: ` my`, id: 847
2025-03-03 14:10:43.181 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  best, new_token_chars: ` best`, id: 1850
2025-03-03 14:10:43.440 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  to, new_token_chars: ` to`, id: 311
2025-03-03 14:10:43.702 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  provide, new_token_chars: ` provide`, id: 3410
2025-03-03 14:10:43.977 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  you, new_token_chars: ` you`, id: 498
2025-03-03 14:10:44.254 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  with, new_token_chars: ` with`, id: 448
2025-03-03 14:10:44.514 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  helpful, new_token_chars: ` helpful`, id: 10950
2025-03-03 14:10:44.787 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  answers, new_token_chars: ` answers`, id: 11253
2025-03-03 14:10:45.056 10569-25913 llama-android.cpp       com.example.llama                    I  cached: ., new_token_chars: `.`, id: 13
2025-03-03 14:10:45.295 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  Let, new_token_chars: ` Let`, id: 6771
2025-03-03 14:10:45.532 10569-25913 llama-android.cpp       com.example.llama                    I  cached: 's, new_token_chars: `'s`, id: 594
2025-03-03 14:10:45.783 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  get, new_token_chars: ` get`, id: 633
2025-03-03 14:10:46.027 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  started, new_token_chars: ` started`, id: 3855
2025-03-03 14:10:46.266 10569-25913 llama-android.cpp       com.example.llama                    I  cached: !, new_token_chars: `!`, id: 0
2025-03-03 14:10:46.491 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  If, new_token_chars: ` If`, id: 1416
2025-03-03 14:10:46.699 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  you, new_token_chars: ` you`, id: 498
2025-03-03 14:10:46.893 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  have, new_token_chars: ` have`, id: 614
2025-03-03 14:10:47.103 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  any, new_token_chars: ` any`, id: 894
2025-03-03 14:10:47.347 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  specific, new_token_chars: ` specific`, id: 3151
2025-03-03 14:10:47.570 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  topic, new_token_chars: ` topic`, id: 8544
2025-03-03 14:10:47.812 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  or, new_token_chars: ` or`, id: 476
2025-03-03 14:10:48.052 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  question, new_token_chars: ` question`, id: 3405
2025-03-03 14:10:48.293 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  in, new_token_chars: ` in`, id: 304
2025-03-03 14:10:48.525 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  mind, new_token_chars: ` mind`, id: 3971
2025-03-03 14:10:48.767 10569-25913 llama-android.cpp       com.example.llama                    I  cached: ,, new_token_chars: `,`, id: 11
2025-03-03 14:10:49.005 10569-25913 llama-android.cpp       com.example.llama                    I  cached:  feel, new_token_chars: ` feel`, id: 2666

chtfrank added the bug-unconfirmed label Mar 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval bug: The answers have some problems with the example/llama.android #12158

Eval bug: The answers have some problems with the example/llama.android #12158

chtfrank commented Mar 3, 2025

Eval bug: The answers have some problems with the example/llama.android #12158

Eval bug: The answers have some problems with the example/llama.android #12158

Comments

chtfrank commented Mar 3, 2025

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Relevant log output