Get resource is stuck #9398
-
Beta Was this translation helpful? Give feedback.
Replies: 9 comments 11 replies
-
|
Moved this over to a discussion and then we can make an issue if we figure out concrete steps to reproduce. I think you've provided enough info here. To summarize what I'm seeing so far:
Some assumptions I'll make when trying to reproduce this:
@Kump3r This sounds similar to the issue you were DM'ing me about, doesn't it? |
Beta Was this translation helpful? Give feedback.
-
|
well sound similar to #8639 as far as I can tell |
Beta Was this translation helpful? Give feedback.
-
|
From what I was able to deduce purely codewise, the fact to the matter is that the time-resource has a problem with the ttrpc, the check of the time resource how I understand it emits a single request and there is a case when this leaks is missed and the ttrpc is stuck waiting for the event, emitting logs like: and eventually: Not sure if here the idea would be to adjust the Run/Wait of our containerd implementation to something similar to #8639, or actually trying to make the time resource more robust, to ensure somehow the version is "received". Still trying to have a full-proof way to reproduce, but our clusters that have a lot of "check-every" time resources seem to hit this from time to time. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
|
I think I was able to confirm a leak at least, by:
Sadly there is no metric for |
Beta Was this translation helpful? Give feedback.
-
|
I have seen also a stucked task, what I was able to pull as info from our side: Environment
SymptomsPipeline step hangs until timeouts are reached. Intermittent usually right after container start
Process state:The container task PID corresponds to gdn-init: Process tree: Observations:
Notably NOT the causeExplicitly ruled out:
Trigger characteristics
|
Beta Was this translation helpful? Give feedback.
-
Logs during reproduce compared to another check before it of the same resource |
Beta Was this translation helpful? Give feedback.
-
|
Potentially when you have a lot of resource versions, that is more reproducible, particularly in our case around 20 000 of that resource. |
Beta Was this translation helpful? Give feedback.
-
|
Converting to issue #9511 |
Beta Was this translation helpful? Give feedback.



Converting to issue #9511