You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Tried to ask this in a generic way, but let me give a specific use-case. I have a bunch of images I use for tasks in Harbor, and I have 6 workers. Some of those images are relatively large (~ 400MB from Harbor, closer to 1GB on disk). I only maintain one active version of that image at a time. When it updates, I'd love it if that image were cached on each worker relatively quickly, so streaming wouldn't have to happen.
Assumptions that may be wrong:
When ATC is preparing to stream a resource from worker A to worker B, it ensures that worker B doesn't already have version X of the resource, since streaming it if it already existed on B would be unnecessary.
Streamed resources are cached.
I was chatting with @taylorsilva about it and I had some ideas, plus some I didn't mention:
A gossip-style "protocol" between workers (directly if p2p sharing is enabled or through ATC) to ensure every worker has all the resources the others have. This feels bulky and potentially insecure.
The concept of a DaemonSet-style pipeline, to borrow the k8s concept, that would run on every worker that matched the pipeline's config (i.e. team and tags)
Maybe add a cache-globally: bool arg to a resource definition, which would allow workers that could collect that resource (that have the correct tag / team combo) to all run the check container and then get when a new version exists -- this has the drawback of the cache having to either get the "raw" version of the resource, to allow get params to vary, or require that it can parse pipelines for all gets to a resource and collect its variants.
This is more specific to the container issue, but gets at the heart of my issue: Ensure that image_resource calls are a) cached as normal resources and b) executed on the same worker that the task referencing it are
My gut instinct is that option 4 is the most Concourse-y while also maintaining the most control for the end user, but it's also the least generic.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Tried to ask this in a generic way, but let me give a specific use-case. I have a bunch of images I use for tasks in Harbor, and I have 6 workers. Some of those images are relatively large (~ 400MB from Harbor, closer to 1GB on disk). I only maintain one active version of that image at a time. When it updates, I'd love it if that image were cached on each worker relatively quickly, so streaming wouldn't have to happen.
Assumptions that may be wrong:
I was chatting with @taylorsilva about it and I had some ideas, plus some I didn't mention:
DaemonSet-style pipeline, to borrow the k8s concept, that would run on every worker that matched the pipeline's config (i.e. team and tags)cache-globally: boolarg to a resource definition, which would allow workers that could collect that resource (that have the correct tag / team combo) to all run thecheckcontainer and then get when a new version exists -- this has the drawback of the cache having to either get the "raw" version of the resource, to allowgetparams to vary, or require that it can parse pipelines for allgets to a resource and collect its variants.image_resourcecalls are a) cached as normal resources and b) executed on the same worker that the task referencing it areMy gut instinct is that option 4 is the most Concourse-y while also maintaining the most control for the end user, but it's also the least generic.
Beta Was this translation helpful? Give feedback.
All reactions