You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
The /v1/generate endpoint returns a request_id as part of the JSON response. I assume that when finished is set to false, I can somehow use this request ID to query for the rest of the output later. However, the OpenAPI documentation I can access under http://127.0.0.1:3000/ does not seem to document anywhere which endpoint to use for that. Or am I mistaken and this is not possible?
I am using the latest ghcr.io/bentoml/openllm Docker image like this: docker run --rm -it -p 3000:3000 --platform linux/x86_64 ghcr.io/bentoml/openllm start facebook/opt-1.3b --backend pt
Kind regards,
Alexander
Motivation
This feature would allow me to find out how the request_id can be used to follow up on incomplete queries, if this is at all possible.
Other
No response
The text was updated successfully, but these errors were encountered:
Feature request
Hi,
The /v1/generate endpoint returns a request_id as part of the JSON response. I assume that when
finished
is set tofalse
, I can somehow use this request ID to query for the rest of the output later. However, the OpenAPI documentation I can access under http://127.0.0.1:3000/ does not seem to document anywhere which endpoint to use for that. Or am I mistaken and this is not possible?I am using the latest ghcr.io/bentoml/openllm Docker image like this:
docker run --rm -it -p 3000:3000 --platform linux/x86_64 ghcr.io/bentoml/openllm start facebook/opt-1.3b --backend pt
Kind regards,
Alexander
Motivation
This feature would allow me to find out how the request_id can be used to follow up on incomplete queries, if this is at all possible.
Other
No response
The text was updated successfully, but these errors were encountered: