You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried the example/llama.android example and compiled it successfully with Android Studio. The app runs ok and loads the offline models successfully. But when I send something, it seems not to output the ending token. The assistant outputs text all the time.
I have tried other apps to check the gguf model file. For example, I installed the pockedtpal-ai and loads the same gguf file. The output text is normal in pockedtpal-ai app. So I think there is no problem with the GGUF file.
I guess the problems exist in the process of loading the model or the inference process, especially the template. This example does not load chat templates? Does anyone have some idea?
Settings: the default codes from this example.
Problems:
I: who are you
Assistant:
? I am a large language model created by Alibaba Cloud. I'm called Qwen. How can I assist you today? You can ask me questions, and I'll do my best to provide you with helpful answers. Let's get started! If you have any specific topic or question in mind, feel
(it stops because of the limitation of nlen=64)
First Bad Commit
No response
Relevant log output
2025-03-03 14:10:26.042 1314-1314 vendor.qti...al-service ven...qti.hardware.perf-hal-service E PerfLockHelper: GetCustBoostConfig() 342: applist: com.example.llama, it->applist com.miui.calculator, com.miui.weather2, com.miui.notes, com.miui.gallery, com.android.calendar, com.android.deskclock, com.android.soundrecorder, com.android.contacts, com.android.mms, com.duokan.reader, com.miui.securitycenter, com.android.settings, com.xiaomi.market, com.android.quicksearchbox, com.android.fileexplorer, com.xiaomi.gamecenter, com.miui.video
2025-03-03 14:10:26.077 13702-14207 ActivityManagerWrapper com.miui.home E getRecentTasks: mainTaskId=11255 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.example.llama/com.example.llama.MainActivity} }
2025-03-03 14:10:26.124 11494-11841 SuggestManager com.miui.securitycenter.remote E openApp name = com.example.llama
---------------------------- PROCESS STARTED (10569) for package com.example.llama ----------------------------
2025-03-03 14:10:26.140 10569-10569 ziparchive com.example.llama W Unable to open '/data/data/com.example.llama/code_cache/.overlay/base.apk/classes4.dm': No such file or directory
2025-03-03 14:10:26.140 10569-10569 ziparchive com.example.llama W Unable to open '/data/app/~~3SipRxKykJ_b3BwLrltlMQ==/com.example.llama-GDKXE2hnIan_uK5gtVXhPg==/base.dm': No such file or directory
2025-03-03 14:10:26.140 10569-10569 ziparchive com.example.llama W Unable to open '/data/app/~~3SipRxKykJ_b3BwLrltlMQ==/com.example.llama-GDKXE2hnIan_uK5gtVXhPg==/base.dm': No such file or directory
2025-03-03 14:10:26.167 13702-14207 ActivityManagerWrapper com.miui.home E getRecentTasks: mainTaskId=11255 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.example.llama/com.example.llama.MainActivity} }
2025-03-03 14:10:26.173 13702-14207 ActivityManagerWrapper com.miui.home E getRecentTasks: mainTaskId=11255 userId=0 baseIntent=Intent { act=android.intent.action.MAIN flag=268435456 cmp=ComponentInfo{com.example.llama/com.example.llama.MainActivity} }
2025-03-03 14:10:26.211 10569-10569 Perf com.example.llama I Connecting to perf service.
2025-03-03 14:10:26.214 10569-10569 GraphicsEnvironment com.example.llama V ANGLE Developer option for'com.example.llama'set to: 'default'
2025-03-03 14:10:26.214 10569-10569 GraphicsEnvironment com.example.llama V ANGLE GameManagerService for com.example.llama: false
2025-03-03 14:10:26.214 10569-10569 GraphicsEnvironment com.example.llama V Updatable production driver is not supported on the device.
2025-03-03 14:10:26.216 10569-10569 ForceDarkHelperStubImpl com.example.llama I initialize for com.example.llama , ForceDarkOrigin
2025-03-03 14:10:26.218 10569-10569 m.example.llama com.example.llama D JNI_OnLoad success
2025-03-03 14:10:26.218 10569-10569 MiuiForceDarkConfig com.example.llama I setConfig density:3.500000, mainRule:0, secondaryRule:0, tertiaryRule:0
2025-03-03 14:10:26.220 10569-10569 NetworkSecurityConfig com.example.llama D No Network Security Config specified, using platform default
2025-03-03 14:10:26.220 10569-10569 NetworkSecurityConfig com.example.llama D No Network Security Config specified, using platform default
2025-03-03 14:10:26.235 10569-10569 MiuiMultiWindowAdapter com.example.llama D MiuiMultiWindowAdapter::getFreeformVideoWhiteListInSystem::LIST_ABOUT_SUPPORT_LANDSCAPE_VIDEO = [com.hunantv.imgo.activity, com.tencent.qqlive, com.qiyi.video, com.hunantv.imgo.activity.inter, com.tencent.qqlivei18n, com.iqiyi.i18n, tv.danmaku.bili]
2025-03-03 14:10:26.269 10569-10569 libc com.example.llama W Access denied finding property "ro.vendor.df.effect.conflict"
2025-03-03 14:10:26.264 10569-10569 m.example.llama com.example.llama W type=1400 audit(0.0:1987456): avc: denied { read } for name="u:object_r:vendor_displayfeature_prop:s0" dev="tmpfs" ino=388 scontext=u:r:untrusted_app:s0:c139,c257,c512,c768 tcontext=u:object_r:vendor_displayfeature_prop:s0 tclass=file permissive=0 app=com.example.llama
2025-03-03 14:10:26.285 10569-25898 ViewContentFactory com.example.llama D initViewContentFetcherClass
2025-03-03 14:10:26.285 10569-25898 ViewContentFactory com.example.llama D getInterceptorPackageInfo
2025-03-03 14:10:26.286 10569-25898 ViewContentFactory com.example.llama D getInitialApplication took 0ms
2025-03-03 14:10:26.286 10569-25898 ViewContentFactory com.example.llama D packageInfo.packageName: com.miui.catcherpatch
2025-03-03 14:10:26.291 10569-25898 ViewContentFactory com.example.llama D initViewContentFetcherClass took 6ms
2025-03-03 14:10:26.291 10569-25898 ContentCatcher com.example.llama I ViewContentFetcher : ViewContentFetcher
2025-03-03 14:10:26.291 10569-25898 ViewContentFactory com.example.llama D createInterceptor took 6ms
2025-03-03 14:10:26.298 10569-10569 IS_CTS_MODE com.example.llama D false
2025-03-03 14:10:26.298 10569-10569 MULTI_WINDOW_ENABLED com.example.llama D false
2025-03-03 14:10:26.300 10569-10569 DecorView[] com.example.llama D getWindowModeFromSystem windowmode is 1
2025-03-03 14:10:26.383 10569-10569 m.example.llama com.example.llama W Method java.lang.Object androidx.compose.runtime.snapshots.SnapshotStateMap.mutate(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
Common causes for lock verification issues are non-optimized dex code
and incorrect proguard optimizations.
2025-03-03 14:10:26.383 10569-10569 m.example.llama com.example.llama W Method void androidx.compose.runtime.snapshots.SnapshotStateMap.update(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.384 10569-10569 m.example.llama com.example.llama W Method boolean androidx.compose.runtime.snapshots.SnapshotStateMap.removeIf$runtime_release(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.426 10569-10569 m.example.llama com.example.llama W Method boolean androidx.compose.runtime.snapshots.SnapshotStateList.conditionalUpdate(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.426 10569-10569 m.example.llama com.example.llama W Method java.lang.Object androidx.compose.runtime.snapshots.SnapshotStateList.mutate(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.426 10569-10569 m.example.llama com.example.llama W Method void androidx.compose.runtime.snapshots.SnapshotStateList.update(kotlin.jvm.functions.Function1) failed lock verification and will run slower.
2025-03-03 14:10:26.434 10569-10569 Compatibil...geReporter com.example.llama D Compat change id reported: 171228096; UID 10395; state: ENABLED
2025-03-03 14:10:26.588 10569-25895 AdrenoGLES-0 com.example.llama I QUALCOMM build : ee4b625, I41c6f366e1
Build Date : 02/16/23
OpenGL ES Shader Compiler Version: EV031.36.08.19
Local Branch :
Remote Branch :
Remote Branch :
Reconstruct Branch :
2025-03-03 14:10:26.588 10569-25895 AdrenoGLES-0 com.example.llama I Build Config : S P 12.1.1 AArch64
2025-03-03 14:10:26.588 10569-25895 AdrenoGLES-0 com.example.llama I Driver Path : /vendor/lib64/egl/libGLESv2_adreno.so
2025-03-03 14:10:26.588 10569-25895 AdrenoGLES-0 com.example.llama I Driver Version : 0615.60
2025-03-03 14:10:26.590 10569-25895 AdrenoGLES-0 com.example.llama I PFP: 0x01730155, ME: 0x00000000
2025-03-03 14:10:26.601 10569-25895 libc com.example.llama W Access denied finding property "vendor.migl.debug"
2025-03-03 14:10:26.603 10569-25895 libEGL com.example.llama E pre_cache appList: ,,
2025-03-03 14:10:26.605 10569-25895 m.example.llama com.example.llama I Support FEAS product mondrian:
2025-03-03 14:10:26.619 10569-25895 m.example.llama com.example.llama D MiuiProcessManagerServiceStub setSchedFifo
2025-03-03 14:10:26.619 10569-25895 MiuiProcessManagerImpl com.example.llama I setSchedFifo pid:10569, mode:3
2025-03-03 14:10:26.624 10569-25895 LB com.example.llama E fail to open file: No such file or directory
2025-03-03 14:10:26.620 10569-10569 RenderThread com.example.llama W type=1400 audit(0.0:1987457): avc: denied { getattr } for path="/sys/module/metis/parameters/minor_window_app" dev="sysfs" ino=69775 scontext=u:r:untrusted_app:s0:c139,c257,c512,c768 tcontext=u:object_r:sysfs_migt:s0 tclass=file permissive=0 app=com.example.llama
2025-03-03 14:10:26.626 10569-25895 Parcel com.example.llama W Expecting binder but got null!
2025-03-03 14:10:26.627 10569-10569 Looper com.example.llama W PerfMonitor doFrame : time=305ms vsyncFrame=0 latency=88ms procState=-1 historyMsgCount=3 (msgIndex=1 wall=88ms seq=4 running=77ms runnable=1ms binder=64ms slowpath=12ms late=117ms h=android.app.ActivityThread$H w=159)
2025-03-03 14:10:26.642 10569-10569 Choreographer com.example.llama I Skipped 35 frames! The application may be doing too much work on its main thread.
2025-03-03 14:10:26.652 10569-10569 DecorView[] com.example.llama D onWindowFocusChanged hasWindowFocus true
2025-03-03 14:10:26.652 10569-10569 HandWritingStubImpl com.example.llama I refreshLastKeyboardType: 1
2025-03-03 14:10:26.652 10569-10569 HandWritingStubImpl com.example.llama I getCurrentKeyboardType: 1
2025-03-03 14:10:26.661 9186-9186 BaseInputMethodService com.sohu.inputmethod.sogou.xiaomi E onStartInput app:com.example.llama restarting:false
2025-03-03 14:10:26.664 10569-10569 HandWritingStubImpl com.example.llama I getCurrentKeyboardType: 1
2025-03-03 14:10:28.253 10569-25913 LLamaAndroid com.example.llama D Dedicated thread for native code: Llm-RunLoop
2025-03-03 14:10:28.261 10569-25913 LLamaAndroid com.example.llama D CPU : NEON = 1 | ARM_FMA = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
2025-03-03 14:10:28.261 10569-25913 llama-android.cpp com.example.llama I Loading model from /storage/emulated/0/Android/data/com.example.llama/files/qwen2.5-3b-instruct-q5_k_m.gguf
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: loaded meta data with 26 key-value pairs and 435 tensors from /storage/emulated/0/Android/data/com.example.llama/files/qwen2.5-3b-instruct-q5_k_m.gguf (version GGUF V3 (latest))
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 0: general.architecture str = qwen2
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 1: general.type str = model
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 2: general.name str = qwen2.5-3b-instruct
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 3: general.version str = v0.1-v0.1
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 4: general.finetune str = qwen2.5-3b-instruct
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 5: general.size_label str = 3.4B
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 6: qwen2.block_count u32 = 36
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 7: qwen2.context_length u32 = 32768
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 8: qwen2.embedding_length u32 = 2048
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 9: qwen2.feed_forward_length u32 = 11008
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 10: qwen2.attention.head_count u32 = 16
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 11: qwen2.attention.head_count_kv u32 = 2
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 12: qwen2.rope.freq_base f32 = 1000000.000000
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 13: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 14: general.file_type u32 = 17
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 15: tokenizer.ggml.model str = gpt2
2025-03-03 14:10:28.306 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 16: tokenizer.ggml.pre str = qwen2
2025-03-03 14:10:28.332 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 17: tokenizer.ggml.tokens arr[str,151936] = ["!", "\"", "#", "$", "", "&", "'", ...
2025-03-03 14:10:28.339 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 18: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 19: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 20: tokenizer.ggml.eos_token_id u32 = 151645
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 21: tokenizer.ggml.padding_token_id u32 = 151643
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 22: tokenizer.ggml.bos_token_id u32 = 151643
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 23: tokenizer.ggml.add_bos_token bool =false
2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 24: tokenizer.chat_template str = { 0f tools }\n {{- '<|im_start|>...2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - kv 25: general.quantization_version u32 = 22025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - type f32: 181 tensors2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - type q5_K: 217 tensors2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I llama_model_loader: - type q6_K: 37 tensors2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I print_info: file format = GGUF V3 (latest)2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I print_info: file type = Q5_K - Medium2025-03-03 14:10:28.361 10569-25913 llama-android.cpp com.example.llama I print_info: file size = 2.27 GiB (5.73 BPW) 2025-03-03 14:10:28.489 10569-25913 llama-android.cpp com.example.llama I load: special tokens cache size = 222025-03-03 14:10:28.541 10569-25913 llama-android.cpp com.example.llama I load: token to piece cache size = 0.9310 MB2025-03-03 14:10:28.541 10569-25913 llama-android.cpp com.example.llama I print_info: arch = qwen22025-03-03 14:10:28.541 10569-25913 llama-android.cpp com.example.llama I print_info: vocab_only = 02025-03-03 14:10:28.541 10569-25913 llama-android.cpp com.example.llama I print_info: n_ctx_train = 327682025-03-03 14:10:28.541 10569-25913 llama-android.cpp com.example.llama I print_info: n_embd = 20482025-03-03 14:10:28.541 10569-25913 llama-android.cpp com.example.llama I print_info: n_layer = 362025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_head = 162025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_head_kv = 22025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_rot = 1282025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_swa = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_embd_head_k = 1282025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_embd_head_v = 1282025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_gqa = 82025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_embd_k_gqa = 2562025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_embd_v_gqa = 2562025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: f_norm_eps = 0.0e+002025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: f_norm_rms_eps = 1.0e-062025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: f_clamp_kqv = 0.0e+002025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: f_max_alibi_bias = 0.0e+002025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: f_logit_scale = 0.0e+002025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_ff = 110082025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_expert = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_expert_used = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: causal attn = 12025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: pooling type = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: rope type = 22025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: rope scaling = linear2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: freq_base_train = 1000000.02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: freq_scale_train = 12025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_ctx_orig_yarn = 327682025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: rope_finetuned = unknown2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: ssm_d_conv = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: ssm_d_inner = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: ssm_d_state = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: ssm_dt_rank = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: ssm_dt_b_c_rms = 02025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: model type = 3B2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: model params = 3.40 B2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: general.name = qwen2.5-3b-instruct2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: vocab type = BPE2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_vocab = 1519362025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: n_merges = 1513872025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: BOS token = 151643 '<|endoftext|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: EOS token = 151645 '<|im_end|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: EOT token = 151645 '<|im_end|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: PAD token = 151643 '<|endoftext|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: LF token = 198 'Ċ'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: FIM PRE token = 151659 '<|fim_prefix|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: FIM SUF token = 151661 '<|fim_suffix|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: FIM MID token = 151660 '<|fim_middle|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: FIM PAD token = 151662 '<|fim_pad|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: FIM REP token = 151663 '<|repo_name|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: FIM SEP token = 151664 '<|file_sep|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: EOG token = 151643 '<|endoftext|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: EOG token = 151645 '<|im_end|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: EOG token = 151662 '<|fim_pad|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: EOG token = 151663 '<|repo_name|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: EOG token = 151664 '<|file_sep|>'2025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I print_info: max token length = 2562025-03-03 14:10:28.542 10569-25913 llama-android.cpp com.example.llama I load_tensors: loading model tensors, this can take a while... (mmap = true)2025-03-03 14:10:28.838 10569-25913 llama-android.cpp com.example.llama I load_tensors: CPU_Mapped model buffer size = 2320.08 MiB2025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I Using 6 threads2025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: n_seq_max = 12025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: n_ctx = 20482025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: n_ctx_per_seq = 20482025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: n_batch = 20482025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: n_ubatch = 5122025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: flash_attn = 02025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: freq_base = 1000000.02025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: freq_scale = 12025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama W llama_init_from_model: n_ctx_per_seq (2048) < n_ctx_train (32768) -- the full capacity of the model will not be utilized2025-03-03 14:10:28.841 10569-25913 llama-android.cpp com.example.llama I llama_kv_cache_init: kv_size = 2048, offload = 1, type_k = 'f16', type_v = 'f16', n_layer = 36, can_shift = 12025-03-03 14:10:28.865 10569-25913 llama-android.cpp com.example.llama I llama_kv_cache_init: CPU KV buffer size = 72.00 MiB2025-03-03 14:10:28.865 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: KV self size = 72.00 MiB, K (f16): 36.00 MiB, V (f16): 36.00 MiB2025-03-03 14:10:28.866 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: CPU output buffer size = 0.58 MiB2025-03-03 14:10:28.868 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: CPU compute buffer size = 300.75 MiB2025-03-03 14:10:28.868 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: graph nodes = 12662025-03-03 14:10:28.868 10569-25913 llama-android.cpp com.example.llama I llama_init_from_model: graph splits = 12025-03-03 14:10:28.868 10569-25913 LLamaAndroid com.example.llama I Loaded model /storage/emulated/0/Android/data/com.example.llama/files/qwen2.5-3b-instruct-q5_k_m.gguf2025-03-03 14:10:28.912 10569-25857 m.example.llama com.example.llama I This is non sticky GC, maxfree is 33554432 minfree is 83886082025-03-03 14:10:28.913 10569-10569 Compose Focus com.example.llama D Owner FocusChanged(true)2025-03-03 14:10:28.908 10569-10569 FinalizerDaemon com.example.llama W type=1400 audit(0.0:1987458): avc: denied { getopt } for path="/dev/socket/usap_pool_primary" scontext=u:r:untrusted_app:s0:c139,c257,c512,c768 tcontext=u:r:zygote:s0 tclass=unix_stream_socket permissive=0 app=com.example.llama2025-03-03 14:10:28.915 10569-25859 StrictMode com.example.llama D StrictMode policy violation: android.os.strictmode.LeakedClosableViolation: A resource was acquired at attached stack trace but never released. See java.io.Closeable for information on avoiding resource leaks. Callsite: close at android.os.StrictMode$AndroidCloseGuardReporter.report(StrictMode.java:1991) at dalvik.system.CloseGuard.warnIfOpen(CloseGuard.java:338) at sun.nio.fs.UnixSecureDirectoryStream.finalize(UnixSecureDirectoryStream.java:580) at java.lang.Daemons$FinalizerDaemon.doFinalize(Daemons.java:319) at java.lang.Daemons$FinalizerDaemon.runInternal(Daemons.java:306) at java.lang.Daemons$Daemon.run(Daemons.java:140) at java.lang.Thread.run(Thread.java:1012)2025-03-03 14:10:28.935 10569-10569 HandWritingStubImpl com.example.llama I refreshLastKeyboardType: 12025-03-03 14:10:28.936 10569-10569 HandWritingStubImpl com.example.llama I getCurrentKeyboardType: 12025-03-03 14:10:28.939 10569-10569 HandWritingStubImpl com.example.llama I getCurrentKeyboardType: 12025-03-03 14:10:28.940 9186-9186 BaseInputMethodService com.sohu.inputmethod.sogou.xiaomi E onStartInput app:com.example.llama restarting:false2025-03-03 14:10:28.941 10569-10569 InsetsController com.example.llama D show(ime(), fromIme=false)2025-03-03 14:10:28.941 9186-9186 BaseInputMethodService com.sohu.inputmethod.sogou.xiaomi E onStartInput app:com.example.llama restarting:true2025-03-03 14:10:28.941 10569-10569 InputMethodManager com.example.llama D showSoftInput() view=androidx.compose.ui.platform.AndroidComposeView{89074eb VFED..... .F....ID 0,0-1440,3024 aid=1073741824} flags=0 reason=SHOW_SOFT_INPUT_BY_INSETS_API2025-03-03 14:10:28.944 9186-9186 BaseInputMethodService com.sohu.inputmethod.sogou.xiaomi E onStartInputView app:com.example.llama restarting:false2025-03-03 14:10:28.971 10569-10569 OnBackInvokedCallback com.example.llama W OnBackInvokedCallback is not enabled for the application. Set 'android:enableOnBackInvokedCallback="true"' in the application manifest.2025-03-03 14:10:28.991 10569-10569 InsetsController com.example.llama D show(ime(), fromIme=true)2025-03-03 14:10:31.444 10569-26535 ProfileInstaller com.example.llama D Installing profile for com.example.llama2025-03-03 14:10:32.124 1314-1314 vendor.qti...al-service ven...qti.hardware.perf-hal-service E PerfLockHelper: GetCustBoostConfig() 342: applist: com.example.llama, it->applist com.miui.home, com.zhihu.android2025-03-03 14:10:34.214 10569-25891 m.example.llama com.example.llama I ProcessProfilingInfo new_methods=0 is saved saved_to_disk=0 resolve_classes_delay=80002025-03-03 14:10:35.807 10569-25913 llama-android.cpp com.example.llama I n_len = 64, n_ctx = 2048, n_kv_req = 642025-03-03 14:10:35.807 10569-25913 llama-android.cpp com.example.llama I token: `who`-> 14623 2025-03-03 14:10:35.807 10569-25913 llama-android.cpp com.example.llama I token: ` are`-> 525 2025-03-03 14:10:35.807 10569-25913 llama-android.cpp com.example.llama I token: ` you`-> 498 2025-03-03 14:10:35.812 10569-10569 HandWritingStubImpl com.example.llama I getCurrentKeyboardType: 12025-03-03 14:10:35.816 9186-9186 BaseInputMethodService com.sohu.inputmethod.sogou.xiaomi E onStartInput app:com.example.llama restarting:true2025-03-03 14:10:36.237 10569-25913 llama-android.cpp com.example.llama I cached: ?, new_token_chars: `?`, id: 302025-03-03 14:10:36.411 10569-25913 llama-android.cpp com.example.llama I cached: I, new_token_chars: ` I`, id: 3582025-03-03 14:10:36.594 10569-25913 llama-android.cpp com.example.llama I cached: am, new_token_chars: ` am`, id: 10792025-03-03 14:10:36.766 10569-25913 llama-android.cpp com.example.llama I cached: a, new_token_chars: ` a`, id: 2642025-03-03 14:10:36.938 10569-25913 llama-android.cpp com.example.llama I cached: large, new_token_chars: ` large`, id: 34602025-03-03 14:10:37.094 10569-25913 llama-android.cpp com.example.llama I cached: language, new_token_chars: ` language`, id: 41282025-03-03 14:10:37.244 10569-25913 llama-android.cpp com.example.llama I cached: model, new_token_chars: ` model`, id: 16142025-03-03 14:10:37.394 10569-25913 llama-android.cpp com.example.llama I cached: created, new_token_chars: ` created`, id: 34652025-03-03 14:10:37.536 10569-25913 llama-android.cpp com.example.llama I cached: by, new_token_chars: ` by`, id: 5532025-03-03 14:10:37.690 10569-25913 llama-android.cpp com.example.llama I cached: Alibaba, new_token_chars: ` Alibaba`, id: 543642025-03-03 14:10:37.838 10569-25913 llama-android.cpp com.example.llama I cached: Cloud, new_token_chars: ` Cloud`, id: 148172025-03-03 14:10:37.992 10569-25913 llama-android.cpp com.example.llama I cached: ., new_token_chars: `.`, id: 132025-03-03 14:10:38.167 10569-25913 llama-android.cpp com.example.llama I cached: I, new_token_chars: ` I`, id: 3582025-03-03 14:10:38.311 10569-25913 llama-android.cpp com.example.llama I cached: 'm, new_token_chars: `'m`, id: 27762025-03-03 14:10:38.464 10569-25913 llama-android.cpp com.example.llama I cached: called, new_token_chars: ` called`, id: 25982025-03-03 14:10:38.620 10569-25913 llama-android.cpp com.example.llama I cached: Q, new_token_chars: ` Q`, id: 12072025-03-03 14:10:38.771 10569-25913 llama-android.cpp com.example.llama I cached: wen, new_token_chars: `wen`, id: 169482025-03-03 14:10:38.927 10569-25913 llama-android.cpp com.example.llama I cached: ., new_token_chars: `.`, id: 132025-03-03 14:10:39.093 10569-25913 llama-android.cpp com.example.llama I cached: How, new_token_chars: ` How`, id: 25852025-03-03 14:10:39.276 10569-25913 llama-android.cpp com.example.llama I cached: can, new_token_chars: ` can`, id: 6462025-03-03 14:10:39.457 10569-25913 llama-android.cpp com.example.llama I cached: I, new_token_chars: ` I`, id: 3582025-03-03 14:10:39.644 10569-25913 llama-android.cpp com.example.llama I cached: assist, new_token_chars: ` assist`, id: 77892025-03-03 14:10:39.825 10569-25913 llama-android.cpp com.example.llama I cached: you, new_token_chars: ` you`, id: 4982025-03-03 14:10:40.065 10569-25913 llama-android.cpp com.example.llama I cached: today, new_token_chars: ` today`, id: 33512025-03-03 14:10:40.285 10569-25913 llama-android.cpp com.example.llama I cached: ?, new_token_chars: `?`, id: 302025-03-03 14:10:40.502 10569-25913 llama-android.cpp com.example.llama I cached: You, new_token_chars: ` You`, id: 14462025-03-03 14:10:40.726 10569-25913 llama-android.cpp com.example.llama I cached: can, new_token_chars: ` can`, id: 6462025-03-03 14:10:40.965 10569-25913 llama-android.cpp com.example.llama I cached: ask, new_token_chars: ` ask`, id: 25482025-03-03 14:10:41.215 10569-25913 llama-android.cpp com.example.llama I cached: me, new_token_chars: ` me`, id: 7522025-03-03 14:10:41.448 10569-25913 llama-android.cpp com.example.llama I cached: questions, new_token_chars: ` questions`, id: 47552025-03-03 14:10:41.699 10569-25913 llama-android.cpp com.example.llama I cached: ,, new_token_chars: `,`, id: 112025-03-03 14:10:41.896 10569-25913 llama-android.cpp com.example.llama I cached: and, new_token_chars: ` and`, id: 3232025-03-03 14:10:42.116 10569-25913 llama-android.cpp com.example.llama I cached: I, new_token_chars: ` I`, id: 3582025-03-03 14:10:42.378 10569-25913 llama-android.cpp com.example.llama I cached: 'll, new_token_chars: `'ll`, id: 32782025-03-03 14:10:42.663 10569-25913 llama-android.cpp com.example.llama I cached: do, new_token_chars: ` do`, id: 6532025-03-03 14:10:42.924 10569-25913 llama-android.cpp com.example.llama I cached: my, new_token_chars: ` my`, id: 8472025-03-03 14:10:43.181 10569-25913 llama-android.cpp com.example.llama I cached: best, new_token_chars: ` best`, id: 18502025-03-03 14:10:43.440 10569-25913 llama-android.cpp com.example.llama I cached: to, new_token_chars: ` to`, id: 3112025-03-03 14:10:43.702 10569-25913 llama-android.cpp com.example.llama I cached: provide, new_token_chars: ` provide`, id: 34102025-03-03 14:10:43.977 10569-25913 llama-android.cpp com.example.llama I cached: you, new_token_chars: ` you`, id: 4982025-03-03 14:10:44.254 10569-25913 llama-android.cpp com.example.llama I cached: with, new_token_chars: ` with`, id: 4482025-03-03 14:10:44.514 10569-25913 llama-android.cpp com.example.llama I cached: helpful, new_token_chars: ` helpful`, id: 109502025-03-03 14:10:44.787 10569-25913 llama-android.cpp com.example.llama I cached: answers, new_token_chars: ` answers`, id: 112532025-03-03 14:10:45.056 10569-25913 llama-android.cpp com.example.llama I cached: ., new_token_chars: `.`, id: 132025-03-03 14:10:45.295 10569-25913 llama-android.cpp com.example.llama I cached: Let, new_token_chars: ` Let`, id: 67712025-03-03 14:10:45.532 10569-25913 llama-android.cpp com.example.llama I cached: 's, new_token_chars: `'s`, id: 5942025-03-03 14:10:45.783 10569-25913 llama-android.cpp com.example.llama I cached: get, new_token_chars: ` get`, id: 6332025-03-03 14:10:46.027 10569-25913 llama-android.cpp com.example.llama I cached: started, new_token_chars: ` started`, id: 38552025-03-03 14:10:46.266 10569-25913 llama-android.cpp com.example.llama I cached: !, new_token_chars: `!`, id: 02025-03-03 14:10:46.491 10569-25913 llama-android.cpp com.example.llama I cached: If, new_token_chars: ` If`, id: 14162025-03-03 14:10:46.699 10569-25913 llama-android.cpp com.example.llama I cached: you, new_token_chars: ` you`, id: 4982025-03-03 14:10:46.893 10569-25913 llama-android.cpp com.example.llama I cached: have, new_token_chars: ` have`, id: 6142025-03-03 14:10:47.103 10569-25913 llama-android.cpp com.example.llama I cached: any, new_token_chars: ` any`, id: 8942025-03-03 14:10:47.347 10569-25913 llama-android.cpp com.example.llama I cached: specific, new_token_chars: ` specific`, id: 31512025-03-03 14:10:47.570 10569-25913 llama-android.cpp com.example.llama I cached: topic, new_token_chars: ` topic`, id: 85442025-03-03 14:10:47.812 10569-25913 llama-android.cpp com.example.llama I cached: or, new_token_chars: ` or`, id: 4762025-03-03 14:10:48.052 10569-25913 llama-android.cpp com.example.llama I cached: question, new_token_chars: ` question`, id: 34052025-03-03 14:10:48.293 10569-25913 llama-android.cpp com.example.llama I cached: in, new_token_chars: ` in`, id: 3042025-03-03 14:10:48.525 10569-25913 llama-android.cpp com.example.llama I cached: mind, new_token_chars: ` mind`, id: 39712025-03-03 14:10:48.767 10569-25913 llama-android.cpp com.example.llama I cached: ,, new_token_chars: `,`, id: 112025-03-03 14:10:49.005 10569-25913 llama-android.cpp com.example.llama I cached: feel, new_token_chars: ` feel`, id: 2666
The text was updated successfully, but these errors were encountered:
Name and Version
The master branch: commit cc473ca
Operating systems
Linux
GGML backends
CPU
Hardware
Android 13
RedMi Modile K60
Models
qwen2.5-3b-instruct-q5_k_m.gguf
downloaded from qwen2.5-3b-instruct-q5_k_m.gguf
Problem description & steps to reproduce
Thanks for this repo.
I tried the example/llama.android example and compiled it successfully with Android Studio. The app runs ok and loads the offline models successfully. But when I send something, it seems not to output the ending token. The assistant outputs text all the time.
I have tried other apps to check the gguf model file. For example, I installed the pockedtpal-ai and loads the same gguf file. The output text is normal in pockedtpal-ai app. So I think there is no problem with the GGUF file.
I guess the problems exist in the process of loading the model or the inference process, especially the template. This example does not load chat templates? Does anyone have some idea?
Settings: the default codes from this example.
Problems:
I: who are you
Assistant:
? I am a large language model created by Alibaba Cloud. I'm called Qwen. How can I assist you today? You can ask me questions, and I'll do my best to provide you with helpful answers. Let's get started! If you have any specific topic or question in mind, feel
(it stops because of the limitation of nlen=64)
First Bad Commit
No response
Relevant log output
The text was updated successfully, but these errors were encountered: