Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Can't change the output token limit with novita ai #3046

Open
nidupb opened this issue Jan 29, 2025 · 1 comment
Open

[BUG]: Can't change the output token limit with novita ai #3046

nidupb opened this issue Jan 29, 2025 · 1 comment
Assignees
Labels
investigating Core team or maintainer will or is currently looking into this issue possible bug Bug was reported but is not confirmed or is unable to be replicated.

Comments

@nidupb
Copy link

nidupb commented Jan 29, 2025

How are you running AnythingLLM?

Docker (local)

What happened?

When experimenting with deepseek, Novita seems to limit the token output to 2048 when it should go up to 8196.

Novita provide an example of API implementation with the base limit :

from openai import OpenAI
  
client = OpenAI(
    base_url="https://api.novita.ai/v3/openai",
    api_key="<YOUR Novita AI API Key>",
)

model = "deepseek/deepseek-r1"
stream = True # or False
max_tokens = 2048
system_content = """Be a helpful assistant"""
temperature = 1
top_p = 1
min_p = 0
top_k = 50
presence_penalty = 0
frequency_penalty = 0
repetition_penalty = 1
response_format = { "type": "text" }

chat_completion_res = client.chat.completions.create(
    model=model,
    messages=[
        {
            "role": "system",
            "content": system_content,
        },
        {
            "role": "user",
            "content": "Hi there!",
        }
    ],
    stream=stream,
    max_tokens=max_tokens,
    temperature=temperature,
    top_p=top_p,
    presence_penalty=presence_penalty,
    frequency_penalty=frequency_penalty,
    response_format=response_format,
    extra_body={
      "top_k": top_k,
      "repetition_penalty": repetition_penalty,
      "min_p": min_p
    }
  )

if stream:
    for chunk in chat_completion_res:
        print(chunk.choices[0].delta.content or "", end="")
else:
    print(chat_completion_res.choices[0].message.content)
  

Are there known steps to reproduce?

Ask any complex issue that'll likely prompt an output above 2048 tokens

@nidupb nidupb added the possible bug Bug was reported but is not confirmed or is unable to be replicated. label Jan 29, 2025
@timothycarambat timothycarambat added the investigating Core team or maintainer will or is currently looking into this issue label Jan 29, 2025
@Karasowl
Copy link

Karasowl commented Feb 4, 2025

I wanted to report that I reproduced the issue using the standard AnythingLLM setup (not running in Docker) with the deepseek model via Novita AI. The response is still truncated.

Image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
investigating Core team or maintainer will or is currently looking into this issue possible bug Bug was reported but is not confirmed or is unable to be replicated.
Projects
None yet
Development

No branches or pull requests

4 participants