Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncated Response Issue ? #2762

Open
fredmo opened this issue Feb 26, 2025 · 3 comments
Open

Truncated Response Issue ? #2762

fredmo opened this issue Feb 26, 2025 · 3 comments
Assignees
Labels
bug Something isn't working respond

Comments

@fredmo
Copy link

fredmo commented Feb 26, 2025

Hello,

I am encountering an issue with the llama-3.1-405b model when using it through a Python script. Here is the script I am using:

import g4f, asyncio, sys
from g4f.client import Client
import subprocess

def gpt_free(QUERY):
client = Client()
response = client.chat.completions.create(
model="llama-3.1-405b",
messages=[{"role": "user", "content": QUERY}],
web_search=False,
)
return response.choices[0].message.content

if name == "main":
arguments = sys.stdin.read().splitlines()
QUERY = ' '.join(arguments)
ANSWER = gpt_free(QUERY)
print(ANSWER)

Execution Command:

my_script.py "my_question"

Issue:
The response returned by response.choices[0].message.content is truncated. The end of the reply is missing, particularly when I ask the model to generate a Python script.

Questions:
Is there a parameter to set a maximum size for the response to avoid truncation?
If the response is truncated, is there a way to access the rest of the response?
Are there known limitations with the llama-3.1-405b model regarding response length?

Additional Information:
I am requesting the model to generate Python scripts, which require relatively long responses.
I have tried adjusting the max_tokens parameter, but the issue persists.

Thank you for your assistance.
Best regards,

@fredmo fredmo added the bug Something isn't working label Feb 26, 2025
@hlohaus
Copy link
Collaborator

hlohaus commented Feb 27, 2025

Several providers impose internal maximum token limits. The G4F platform also supports a max_tokens parameter; however, only the HuggingFace limit is currently defined within G4F, which is set at 4000 tokens total, with a 2000-token limit for both input and generated text. Consequently, increasing the token limit within G4F is not possible. @fredmo

@fredmo
Copy link
Author

fredmo commented Feb 27, 2025

I have the feeling that a month ago, I hadn't the truncated respond.
I change for a big max_tokens without seen the improvement. ( yes a small max_tokens will trunc more )

Isn't there a timer somewhere to configure to wait the answer? Could it be a timeout too short ?
Or another element in the "response" structure to check to have more details?

@hlohaus
Copy link
Collaborator

hlohaus commented Feb 27, 2025

To facilitate troubleshooting, please enable debug logging on your server by adding the --debug argument. Additionally, review the response to identify the responding provider. @fredmo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working respond
Projects
None yet
Development

No branches or pull requests

3 participants