API call from deployment to deployment hangs forever #424

Clement-Lelievre · 2025-04-15T18:59:15Z

Hi,

I'm having an issue that I don't get locally, it happens in the following scenario:

I have two cog models deployed on Replicate (as Deployments)
one of them at some point calls the other (see snippet below)
they were built and deployed using cog==0.13.7 , replicate==1.0.4 , and the cog CLI 0.14.3, python 3.11, ubuntu==22.04

Here's how I call one deployment from the other:

from replicate.helpers import base64_encode_file

vectorizer_deployment = replicate.deployments.get(VECTORIZER_DEPLOYMENT)


with open(img_path, "rb") as f:
        b64 = base64_encode_file(f)
prediction = vectorizer_deployment.predictions.create(
            input={"images": [b64_images]} ,
        )
logger.debug(f"{prediction.id=}")
prediction.wait() # this line hangs forever after 30-ish GET requests

The called deployment does complete the inference, and I can see the status as succeeded on Replicate.
In the logs of the calling deployment, I can see about 30-ish GET requests, all looking like INFO:httpx:HTTP Request: GET https://api.replicate.com/v1/predictions/7atmc23wmsrga0cp7ag9y5s6pm "HTTP/1.1 200 OK"

I have investigated the replicate python client source code, I can see that the prediction.wait() method calls the '.reload()' method which itselfs performs the GET requests.
I've tried increasing the env var REPLICATE_POLL_INTERVAL but to no effect.

Strange thing is, as said above, locally it works! ie:

when I run locally in python the main endpoint everything works well (I run like predictor.predict(...) )
when I run locally with cog predict -i ..., inference goes through, but at the end after my inference completes I get this error log:
{"logger": "cog.server.worker", "timestamp": "2025-04-15T19:11:52.878929Z", "exception": "Traceback (most recent call last):\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/cog/server/worker.py\", line 299, in _consume_events\n self._consume_events_inner()\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/site-packages/cog/server/worker.py\", line 337, in _consume_events_inner\n ev = self._events.recv()\n ^^^^^^^^^^^^^^^^^^^\n File \"/root/.pyenv/versions/3.11.10/lib/python3.11/multiprocessing/connection.py\", line 251, in recv\n return _ForkingPickler.loads(buf.getbuffer())\n ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\nTypeError: URLPath.__init__() missing 3 required keyword-only arguments: 'source', 'filename', and 'fileobj'", "severity": "ERROR", "message": "unhandled error in _consume_events"}

So far I'm clueless as to why everything suddenly hangs, making all my project useless. I guess it's due to the deployed environment.

@zeke @erbridge @meatballhat @aron @mattt

thanks for your help

The text was updated successfully, but these errors were encountered:

Ivan-developer0 · 2025-05-15T15:32:35Z

Hi @Clement-Lelievre 👋

Thanks for the detailed report!

This behavior is likely due to network restrictions or internal timeouts in the deployed environment, especially when one deployment tries to poll another repeatedly.

Here are a few things to try:

Use webhook_completed=True when creating the prediction to avoid active polling:
prediction = vectorizer_deployment.predictions.create( input={"images": [b64_images]}, webhook_completed=True, )

Then handle the result via webhook, or periodically check the final status externally.

Avoid synchronous .wait() in production deployments — it’s better suited for local or CLI environments. Instead, check the prediction status in a non-blocking way or poll with delays and max retries.
Double-check if both deployments are in the same region and using compatible versions of replicate and cog.

Let me know if it helps — happy to dig deeper if needed!
Thanks again 🙌

Clement-Lelievre · 2025-05-15T19:09:10Z

Hi @Clement-Lelievre 👋

Thanks for the detailed report!

This behavior is likely due to network restrictions or internal timeouts in the deployed environment, especially when one deployment tries to poll another repeatedly.

Here are a few things to try:

Use webhook_completed=True when creating the prediction to avoid active polling:
prediction = vectorizer_deployment.predictions.create( input={"images": [b64_images]}, webhook_completed=True, )

Then handle the result via webhook, or periodically check the final status externally.

Avoid synchronous .wait() in production deployments — it’s better suited for local or CLI environments. Instead, check the prediction status in a non-blocking way or poll with delays and max retries.

Double-check if both deployments are in the same region and using compatible versions of replicate and cog.

Let me know if it helps — happy to dig deeper if needed! Thanks again 🙌

thanks, as it was urgent I changed the approach to avoid this issue; I'll have to go back to this when I get the time and let you know

Clement-Lelievre changed the title ~~API call hangs forever, extremely annoying~~ API call from deployment to deployment hangs forever Apr 16, 2025

Clement-Lelievre mentioned this issue Apr 17, 2025

Don't force pydantic to downgrade to <2 replicate/cog#2253

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

API call from deployment to deployment hangs forever #424

API call from deployment to deployment hangs forever #424

Clement-Lelievre commented Apr 15, 2025 •

edited

Loading

Ivan-developer0 commented May 15, 2025

Uh oh!

Clement-Lelievre commented May 15, 2025

Uh oh!

API call from deployment to deployment hangs forever #424

API call from deployment to deployment hangs forever #424

Comments

Clement-Lelievre commented Apr 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ivan-developer0 commented May 15, 2025

Uh oh!

Clement-Lelievre commented May 15, 2025

Uh oh!

Clement-Lelievre commented Apr 15, 2025 •

edited

Loading