Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Endpoint /api/generate Not Supported by llama.cpp – Request for Compatibility #27

Open
vanabel opened this issue Aug 26, 2024 · 2 comments

Comments

@vanabel
Copy link

vanabel commented Aug 26, 2024

Details:
I am currently integrating GPU support for my projects using llama.cpp, as it is the only solution supporting GPU in my environment. However, I've encountered an issue where the /api/generate endpoint, which I believe is used by Ollama-Logseq and Copilot for Obsidian, is not supported by llama.cpp.

Issue:

  • When attempting to use /api/generate, the server returns a 404 error ({"error":{"code":404,"message":"File Not Found","type":"not_found_error"}}).
  • Instead, llama.cpp uses the /v1/completions endpoint for text generation.

Reference:
For more details on the correct API paths supported by llama.cpp, please see their official API documentation.

Request:
Could you update the integrations or provide guidance on how to configure Ollama-Logseq and Copilot for Obsidian to work with the /v1/completions endpoint? This would greatly help users like me who rely on llama.cpp for GPU support.
Test

#!/bin/bash

# Base URL for llama.cpp server
BASE_URL="http://localhost:11434"

# Test /api/generate endpoint
echo "Testing /api/generate endpoint..."
curl -X POST "$BASE_URL/api/generate" \
-H "Content-Type: application/json" \
-d '{"model": "ggml-model-q8_0.gguf", "prompt": "Test prompt"}'

# Test /v1/completions endpoint
echo "Testing /v1/completions endpoint..."
curl -X POST "$BASE_URL/v1/completions" \
-H "Content-Type: application/json" \
-d '{"model": "ggml-model-q8_0.gguf", "prompt": "Test prompt", "max_tokens": 50}'

output

Testing /api/generate endpoint...
{"error":{"code":404,"message":"File Not Found","type":"not_found_error"}}Testing /v1/completions endpoint...
{"content":":\nWrite a letter to your friend describing your experience with a recent hike you went on.\nDear [Friend],\n\nI hope this letter finds you doing well. I wanted to share with you my recent experience on a hike that I went on last weekend.","id_slot":0,"stop":true,"model":"ggml-model-q8_0.gguf","tokens_predicted":50,"tokens_evaluated":3,"generation_settings":{"n_ctx":8192,"n_predict":-1,"model":"ggml-model-q8_0.gguf","seed":4294967295,"temperature":0.800000011920929,"dynatemp_range":0.0,"dynatemp_exponent":1.0,"top_k":40,"top_p":0.949999988079071,"min_p":0.05000000074505806,"tfs_z":1.0,"typical_p":1.0,"repeat_last_n":64,"repeat_penalty":1.0,"presence_penalty":0.0,"frequency_penalty":0.0,"penalty_prompt_tokens":[],"use_penalty_prompt_tokens":false,"mirostat":0,"mirostat_tau":5.0,"mirostat_eta":0.10000000149011612,"penalize_nl":false,"stop":[],"max_tokens":50,"n_keep":0,"n_discard":0,"ignore_eos":false,"stream":false,"logit_bias":[],"n_probs":0,"min_keep":0,"grammar":"","samplers":["top_k","tfs_z","typical_p","top_p","min_p","temperature"]},"prompt":"Test prompt","truncated":false,"stopped_eos":false,"stopped_word":false,"stopped_limit":true,"stopping_word":"","tokens_cached":52,"timings":{"prompt_n":3,"prompt_ms":245.916,"prompt_per_token_ms":81.972,"prompt_per_second":12.199287561606402,"predicted_n":50,"predicted_ms":6826.133,"predicted_per_token_ms":136.52266,"predicted_per_second":7.324791356980592}}

Thank you for your attention to this matter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants
@vanabel and others