You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
if name == "main":
arguments = sys.stdin.read().splitlines()
QUERY = ' '.join(arguments)
ANSWER = gpt_free(QUERY)
print(ANSWER)
Execution Command:
my_script.py "my_question"
Issue:
The response returned by response.choices[0].message.content is truncated. The end of the reply is missing, particularly when I ask the model to generate a Python script.
Questions:
Is there a parameter to set a maximum size for the response to avoid truncation?
If the response is truncated, is there a way to access the rest of the response?
Are there known limitations with the llama-3.1-405b model regarding response length?
Additional Information:
I am requesting the model to generate Python scripts, which require relatively long responses.
I have tried adjusting the max_tokens parameter, but the issue persists.
Thank you for your assistance.
Best regards,
The text was updated successfully, but these errors were encountered:
Several providers impose internal maximum token limits. The G4F platform also supports a max_tokens parameter; however, only the HuggingFace limit is currently defined within G4F, which is set at 4000 tokens total, with a 2000-token limit for both input and generated text. Consequently, increasing the token limit within G4F is not possible. @fredmo
I have the feeling that a month ago, I hadn't the truncated respond.
I change for a big max_tokens without seen the improvement. ( yes a small max_tokens will trunc more )
Isn't there a timer somewhere to configure to wait the answer? Could it be a timeout too short ?
Or another element in the "response" structure to check to have more details?
To facilitate troubleshooting, please enable debug logging on your server by adding the --debug argument. Additionally, review the response to identify the responding provider. @fredmo
Hello,
I am encountering an issue with the llama-3.1-405b model when using it through a Python script. Here is the script I am using:
import g4f, asyncio, sys
from g4f.client import Client
import subprocess
def gpt_free(QUERY):
client = Client()
response = client.chat.completions.create(
model="llama-3.1-405b",
messages=[{"role": "user", "content": QUERY}],
web_search=False,
)
return response.choices[0].message.content
if name == "main":
arguments = sys.stdin.read().splitlines()
QUERY = ' '.join(arguments)
ANSWER = gpt_free(QUERY)
print(ANSWER)
Execution Command:
my_script.py "my_question"
Issue:
The response returned by response.choices[0].message.content is truncated. The end of the reply is missing, particularly when I ask the model to generate a Python script.
Questions:
Is there a parameter to set a maximum size for the response to avoid truncation?
If the response is truncated, is there a way to access the rest of the response?
Are there known limitations with the llama-3.1-405b model regarding response length?
Additional Information:
I am requesting the model to generate Python scripts, which require relatively long responses.
I have tried adjusting the max_tokens parameter, but the issue persists.
Thank you for your assistance.
Best regards,
The text was updated successfully, but these errors were encountered: