Closed
Description
Just looking in to this, but wanted to report it in case it was a known issue or someone has more information.
While running guidellm against a locally running vllm serve
, I am seeing a very large amount of these log messages in the vLLM output:
WARNING 06-27 14:57:36 [protocol.py:58] The following fields were present in the request but ignored: {'max_completion_tokens'}
Running a request manually against the endpoint is happy, with no errors in the vllm logs:
curl -k -s -H "Content-Type: application/json" http://localhost:8000/v1/chat/completions -d '{"model":"llama3.1-8b-instruct","messages":[{"role":"user","content":"What is an AI tensorized weight?"}],"max_completion_tokens":35}' | jq .
{
"id": "chatcmpl-e00ce83f-121f-482a-b43c-5c4494ad29ae",
"object": "chat.completion",
"created": 1751037186,
"model": "llama3.1-8b-instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": null,
"content": "A tensorized weight, also known as a tensor weight or a weight tensor, is a type of weight used in artificial neural networks (ANNs) and deep learning models.",
"tool_calls": []
},
"logprobs": null,
"finish_reason": "length",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 43,
"total_tokens": 78,
"completion_tokens": 35,
"prompt_tokens_details": null
},
"prompt_logprobs": null
}
Which leads me to believe the prompt being formed by guidellm must be placing max_completion_tokens
somewhere other than as a top level property of the request struct.
Metadata
Metadata
Assignees
Labels
No labels