-
Notifications
You must be signed in to change notification settings - Fork 248
API call from deployment to deployment hangs forever #424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Clement-Lelievre 👋 Thanks for the detailed report! This behavior is likely due to network restrictions or internal timeouts in the deployed environment, especially when one deployment tries to poll another repeatedly. Here are a few things to try:
Then handle the result via webhook, or periodically check the final status externally.
Let me know if it helps — happy to dig deeper if needed! |
thanks, as it was urgent I changed the approach to avoid this issue; I'll have to go back to this when I get the time and let you know |
Uh oh!
There was an error while loading. Please reload this page.
Hi,
I'm having an issue that I don't get locally, it happens in the following scenario:
Deployments
)cog==0.13.7
,replicate==1.0.4
, and thecog CLI 0.14.3
,python 3.11
,ubuntu==22.04
Here's how I call one deployment from the other:
The called deployment does complete the inference, and I can see the status as
succeeded
on Replicate.In the logs of the calling deployment, I can see about 30-ish GET requests, all looking like
INFO:httpx:HTTP Request: GET https://api.replicate.com/v1/predictions/7atmc23wmsrga0cp7ag9y5s6pm "HTTP/1.1 200 OK"
I have investigated the replicate python client source code, I can see that the
prediction.wait()
method calls the '.reload()' method which itselfs performs the GET requests.I've tried increasing the env var
REPLICATE_POLL_INTERVAL
but to no effect.Strange thing is, as said above, locally it works! ie:
cog predict -i ...
, inference goes through, but at the end after my inference completes I get this error log:{"logger": "cog.server.worker", "timestamp": "2025-04-15T19:11:52.878929Z", "exception": "Traceback (most recent call last):\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/cog/server/worker.py\", line 299, in _consume_events\n self._consume_events_inner()\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/cog/server/worker.py\", line 337, in _consume_events_inner\n ev = self._events.recv()\n ^^^^^^^^^^^^^^^^^^^\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/multiprocessing/connection.py\", line 251, in recv\n return _ForkingPickler.loads(buf.getbuffer())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: URLPath.__init__() missing 3 required keyword-only arguments: 'source', 'filename', and 'fileobj'", "severity": "ERROR", "message": "unhandled error in _consume_events"}
So far I'm clueless as to why everything suddenly hangs, making all my project useless. I guess it's due to the deployed environment.
@zeke @erbridge @meatballhat @aron @mattt
thanks for your help
The text was updated successfully, but these errors were encountered: