You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have tested with the :latest image tag (i.e. quay.io/argoproj/workflow-controller:latest) and can confirm the issue still exists on :latest. If not, I have explained why, in detail, in my description below.
I have searched existing issues and could not find a match for this bug
am running 10000s of wfs a day, and all have been fine except one i noticed was still running for more than 24hrs, i saw the only step within it was 'running' but when i look at the pod it was completed (all 3 containers inside completed as well). i saw this in workflow controller logs:
{"time":"2025-01-18T23:35:00.111391931Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.974Z\" level=info msg=\"Outbound nodes of mywf-2812951024 is [mywf-468570289]\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111400081Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.974Z\" level=info msg=\"node mywf-2812951024 phase Running -> Succeeded\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111405241Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.974Z\" level=info msg=\"node mywf-2812951024 finished: 2025-01-18 23:34:39.974935407 +0000 UTC\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111411672Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.975Z\" level=info msg=\"node mywf-835224713 phase Running -> Succeeded\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111417542Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.975Z\" level=info msg=\"node mywf-835224713 message: retryStrategy.expression evaluated to false\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111421852Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.975Z\" level=info msg=\"node mywf-835224713 finished: 2025-01-18 23:34:39.975516899 +0000 UTC\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111427012Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.975Z\" level=info msg=\"Updated phase Running -> Succeeded\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111431822Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.975Z\" level=info msg=\"Marking workflow completed\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111437272Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:39.975Z\" level=info msg=\"Marking workflow as pending archiving\" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:00.111453102Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:47.911Z\" level=warning msg=\"Waited for 7.93542039s, request: Update:https://clusteripredact:443/apis/argoproj.io/v1alpha1/namespaces/myns/workflows/mywf\""
{"time":"2025-01-18T23:35:00.111459222Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:34:47.911Z\" level=warning msg=\"Error updating workflow: rpc error: code = Unavailable desc = error reading from server: read tcp ip1redact:52824->ip2redact:2379: read: connection timed out \" namespace=myns workflow=mywf"
{"time":"2025-01-18T23:35:43.810561717Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:35:43.806Z\" level=info msg=\"Workflow processing has been postponed due to max parallelism limit\" key=myns/mywf"
{"time":"2025-01-18T23:36:28.781077898Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:35:55.733Z\" level=info msg=\"Workflow processing has been postponed due to max parallelism limit\" key=myns/mywf"
{"time":"2025-01-18T23:43:00.05557023Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:41:15.809Z\" level=info msg=\"Workflow processing has been postponed due to max parallelism limit\" key=myns/mywf"
{"time":"2025-01-18T23:50:00.102084482Z","stream":"stdout","_p":"F","log":"time=\"2025-01-18T23:49:57.811Z\" level=info msg=\"Workflow processing has been postponed due to max parallelism limit\" key=myns/mywf"
Pre-requisites
:latest
image tag (i.e.quay.io/argoproj/workflow-controller:latest
) and can confirm the issue still exists on:latest
. If not, I have explained why, in detail, in my description below.What happened? What did you expect to happen?
am running 10000s of wfs a day, and all have been fine except one i noticed was still running for more than 24hrs, i saw the only step within it was 'running' but when i look at the pod it was completed (all 3 containers inside completed as well). i saw this in workflow controller logs:
argo-workflows/workflow/controller/operator.go
Line 763 in 89d75a6
even manually deleting the pod did not help the workflow to become success, it stays running
i have seen a couple of other 'unavailable' errors around same time, however all those workflows appear to have completed gracefully:
2nd case (note that its trying Delete not Update)
3rd case (seems to be for create pods)
4th case (seems to be for DeleteCollection workflowtaskresults)
even activedeadlineseconds is being ignored!
am also seeing "Non-transient error: etcdserver: request timed out"
Version(s)
3.4.11
Paste a minimal workflow that reproduces the issue. We must be able to run the workflow; don't enter a workflow that uses private images.
n/a
Logs from the workflow controller
Logs from in your workflow's wait container
The text was updated successfully, but these errors were encountered: