Skip to content

Possible invalid request formatting for max_completion_tokens #208

Closed
@stmcginnis

Description

@stmcginnis

Just looking in to this, but wanted to report it in case it was a known issue or someone has more information.

While running guidellm against a locally running vllm serve, I am seeing a very large amount of these log messages in the vLLM output:

WARNING 06-27 14:57:36 [protocol.py:58] The following fields were present in the request but ignored: {'max_completion_tokens'}

Running a request manually against the endpoint is happy, with no errors in the vllm logs:

curl -k -s -H "Content-Type: application/json" http://localhost:8000/v1/chat/completions -d '{"model":"llama3.1-8b-instruct","messages":[{"role":"user","content":"What is an AI tensorized weight?"}],"max_completion_tokens":35}' | jq .
{
  "id": "chatcmpl-e00ce83f-121f-482a-b43c-5c4494ad29ae",
  "object": "chat.completion",
  "created": 1751037186,
  "model": "llama3.1-8b-instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "reasoning_content": null,
        "content": "A tensorized weight, also known as a tensor weight or a weight tensor, is a type of weight used in artificial neural networks (ANNs) and deep learning models.",
        "tool_calls": []
      },
      "logprobs": null,
      "finish_reason": "length",
      "stop_reason": null
    }
  ],
  "usage": {
    "prompt_tokens": 43,
    "total_tokens": 78,
    "completion_tokens": 35,
    "prompt_tokens_details": null
  },
  "prompt_logprobs": null
}

Which leads me to believe the prompt being formed by guidellm must be placing max_completion_tokens somewhere other than as a top level property of the request struct.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions