You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Details:
I am currently integrating GPU support for my projects using llama.cpp, as it is the only solution supporting GPU in my environment. However, I've encountered an issue where the /api/generate endpoint, which I believe is used by Ollama-Logseq and Copilot for Obsidian, is not supported by llama.cpp.
Issue:
When attempting to use /api/generate, the server returns a 404 error ({"error":{"code":404,"message":"File Not Found","type":"not_found_error"}}).
Instead, llama.cpp uses the /v1/completions endpoint for text generation.
Reference:
For more details on the correct API paths supported by llama.cpp, please see their official API documentation.
Request:
Could you update the integrations or provide guidance on how to configure Ollama-Logseq and Copilot for Obsidian to work with the /v1/completions endpoint? This would greatly help users like me who rely on llama.cpp for GPU support. Test
#!/bin/bash# Base URL for llama.cpp server
BASE_URL="http://localhost:11434"# Test /api/generate endpointecho"Testing /api/generate endpoint..."
curl -X POST "$BASE_URL/api/generate" \
-H "Content-Type: application/json" \
-d '{"model": "ggml-model-q8_0.gguf", "prompt": "Test prompt"}'# Test /v1/completions endpointecho"Testing /v1/completions endpoint..."
curl -X POST "$BASE_URL/v1/completions" \
-H "Content-Type: application/json" \
-d '{"model": "ggml-model-q8_0.gguf", "prompt": "Test prompt", "max_tokens": 50}'
output
Testing /api/generate endpoint...
{"error":{"code":404,"message":"File Not Found","type":"not_found_error"}}Testing /v1/completions endpoint...
{"content":":\nWrite a letter to your friend describing your experience with a recent hike you went on.\nDear [Friend],\n\nI hope this letter finds you doing well. I wanted to share with you my recent experience on a hike that I went on last weekend.","id_slot":0,"stop":true,"model":"ggml-model-q8_0.gguf","tokens_predicted":50,"tokens_evaluated":3,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"ggml-model-q8_0.gguf","seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"penalty_prompt_tokens":[],"use_penalty_prompt_tokens":false,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":50,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"]},"prompt":"Test prompt","truncated":false,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":"","tokens_cached":52,"timings":{"prompt_n":3,"prompt_ms":245.916,"prompt_per_token_ms":81.972,"prompt_per_second":12.199287561606402,"predicted_n":50,"predicted_ms":6826.133,"predicted_per_token_ms":136.52266,"predicted_per_second":7.324791356980592}}
Thank you for your attention to this matter.
The text was updated successfully, but these errors were encountered:
Details:
I am currently integrating GPU support for my projects using
llama.cpp
, as it is the only solution supporting GPU in my environment. However, I've encountered an issue where the/api/generate
endpoint, which I believe is used by Ollama-Logseq and Copilot for Obsidian, is not supported byllama.cpp
.Issue:
/api/generate
, the server returns a 404 error ({"error":{"code":404,"message":"File Not Found","type":"not_found_error"}}
).llama.cpp
uses the/v1/completions
endpoint for text generation.Reference:
For more details on the correct API paths supported by
llama.cpp
, please see their official API documentation.Request:
Could you update the integrations or provide guidance on how to configure Ollama-Logseq and Copilot for Obsidian to work with the
/v1/completions
endpoint? This would greatly help users like me who rely onllama.cpp
for GPU support.Test
output
Thank you for your attention to this matter.
The text was updated successfully, but these errors were encountered: