kind should GC unused images #735

BenTheElder · 2019-07-24T03:28:40Z

TDLR:

we need to (and do currently) turn off disk eviction because we cannot guarantee any particular portion of the host is free, or at the very least we need a very high eviction threshold
independent image GC options are deprecated in favor of the disk eviction thresholds also triggering image GC in kubelet
we need to not GC images used for kubernetes components, as they may not be pullable again (as in side-loaded locally built images that were not pushed)
we do however want to GC old versions of images the user side loaded / images the user sideloaded that are not in use.

Possible options:

add an option to kubelet upstream to blacklist GCing a set of images and convince sig-node to un-deprecate the seperate imageGC flags
block the deletion in containerd / at the CRI layer
use our own GC that only GCs unused images and isn't tied to disk eviction

I spoke to @Random-Liu about this offline, the "don't GC these images" option might make sense upstream but the deprecation of image GC flags suggests the last option might be best.

https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#deprecation-of-existing-feature-flags-to-reclaim-disk

/priority important-longterm

The text was updated successfully, but these errors were encountered:

BenTheElder · 2019-08-16T22:44:03Z

This one is tricky, how do we actually know when an image should be removed? If I have long running tests and I side-load the images, how do I ensure they don't get GCed midway through? How is this distinguished from repeatedly loading an app?

I'm thinking we could maybe specifically GC image layers that are not referenced by a current tag, but that may not be sufficient, especially for workflows that generate unique automatic tags for every build.

aojea · 2019-08-19T15:37:09Z

is this not a problem of kubernetes in general?
or is specific for kind?

BenTheElder · 2019-08-19T16:48:53Z

Kubernetes has imageGC but it's tied to how much disk is left and not particularly sophisticated (basically evict pods when disk is low and evict images not used by pods). On a non-noisy non-shared host the cluster administrator sets a threshold and the disk is dedicated to Kubernetes.

With kind the disk is just the users host disk and isn't dedicated. Users boot with low disk all the time with kind because we turned off evicting pods ...

BenTheElder · 2019-08-19T16:50:35Z

This problem is specific to long lived kind clusters that load / pull lots of images.

aojea · 2019-08-19T17:47:51Z

Then, seems the same problem that docker has :)

docker system prune
WARNING! This will remove:
        - all stopped containers
        - all networks not used by at least one container
        - all dangling images
        - all dangling build cache
Are you sure you want to continue? [y/N]

BenTheElder · 2019-08-19T17:57:57Z

A subset of docker prune is basically what kubelet does. The difference is the user manually triggers docker prune from the host. With Kubernetes you expect kubelet to handle this, but for kind we have it disabled.

…

On Mon, Aug 19, 2019, 10:47 Antonio Ojea ***@***.***> wrote: The seems the same problem that docker has :) docker system prune WARNING! This will remove: - all stopped containers - all networks not used by at least one container - all dangling images - all dangling build cache Are you sure you want to continue? [y/N] — You are receiving this because you were assigned. Reply to this email directly, view it on GitHub <#735?email_source=notifications&email_token=AAHADK5QPUZICV3VVJISBXDQFLMERA5CNFSM4IGLZZ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4TYKNQ#issuecomment-522683702>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAHADKZNGCUTHCQJVIW26R3QFLMERANCNFSM4IGLZZ4A> .

WalkerGriggs · 2019-08-20T16:13:13Z

With the image GC flag deprecated, it feels that the last option is probably the more reasonable. Thoughts on either of these solutions?

prune can filter images by their unused age. Instead of a set time frame, could we prune images that have been unused for a time proportional to cluster's lifetime (older than a certain threshold of course)?
Maybe this is a configurable feature, and leave the user responsible for their own host resources?

BenTheElder · 2020-01-29T00:02:13Z

kubernetes/enhancements#1007, I think we want this, whitelist the system images somehow and then leave GC enabled.

BenTheElder · 2020-03-15T21:01:33Z

That KEP seems to be abandoned but the concept seems to be generally accepted. I will try to file an updated version sometime

BenTheElder · 2020-04-28T06:54:24Z

reviving the KEP kubernetes/enhancements#1717

vaibhav2107 · 2022-03-09T10:24:56Z

@BenTheElder , the image GC flags were to be deprecated in favour of disk eviction, but why are not deprecated yet,

BenTheElder · 2022-03-10T07:34:11Z

I couldn't say, you'd have to check in with the owners in SIG Node.

BenTheElder · 2022-08-08T17:31:12Z

This one remains relevant, see also considerations outlined in #2865 (comment)

gvkhna · 2022-08-08T17:35:30Z

This one remains relevant, see also considerations outlined in #2865 (comment)

@BenTheElder Thanks Ben, I believe I came across this issue in my searching for possible fixes but it didn't have the solution.

On your point mentioned in my issue, I understand kind wasn't meant for long-lived deployments but that's really a shame. It's been rock solid for some months now and I hope it continues to be. I looked into k3s, k8s but as I'm running on Unraid OS, kind has been fantastic for my usecase, sort of exactly what I was looking for.

Hope the information in my issue is helpful and that a remedy can be found. Otherwise a simple pruning periodically I expect will do the trick. Cheers.

matheuscscp · 2024-08-07T22:41:30Z

+1 for this, I have a tiny development instance in GCP where I installed kind and it will quickly fill up the disk with the constant stream of new images I ship for my app

BenTheElder · 2024-08-07T23:00:29Z

Please use the thumbs up button unless you have a comment about how we might accomplish this.

Unfortunately it's not simple to solve this generally, because kubelet's GC behavior is based on disk percentage which isn't aimed at development machines.

In your specific case you can use a config patch to enable it, or you can exec to the node container and use crictl rmi.

we have made some very small progress recently by adopting the recently available containerd feature to mark core images as pinned and not deleteable which is one part of the problem, but that work is incomplete.

matheuscscp · 2024-08-07T23:14:16Z

In your specific case you can use a config patch to enable it

Are there any docs showing how to do this for this specific config? (I did try using crictl rmi --prune and it does work well despite being a manual solution)

BenTheElder · 2025-02-25T20:45:13Z

xref #3441

BenTheElder added kind/bug Categorizes issue or PR as related to a bug. kind/feature Categorizes issue or PR as related to a new feature. labels Jul 24, 2019

BenTheElder self-assigned this Jul 24, 2019

k8s-ci-robot added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jul 24, 2019

BenTheElder added the kind/design Categorizes issue or PR as related to design. label Aug 16, 2019

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 18, 2019

BenTheElder added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Nov 18, 2019

kubernetes-sigs deleted a comment from fejta-bot Nov 18, 2019

nicks mentioned this issue Feb 19, 2021

image garbage collection in dev clusters tilt-dev/tilt#4228

Open

BenTheElder removed their assignment Jul 29, 2021

BenTheElder added priority/backlog Higher priority than priority/awaiting-more-evidence. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Jul 29, 2021

BenTheElder mentioned this issue Aug 8, 2022

kind-control-plane fails to cleanup snapshots in a timely manner [SOLVED] #2865

Closed

BenTheElder mentioned this issue Jul 24, 2023

node disk full #3313

Closed

BenTheElder mentioned this issue Nov 30, 2023

leverage containerd image pinning #3441

Open

BenTheElder mentioned this issue Feb 25, 2025

[e2e test flake] Failed to run clusterctl move...failed calling webhook kubernetes-sigs/cluster-api#11856

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kind should GC unused images #735

kind should GC unused images #735

BenTheElder commented Jul 24, 2019 •

edited

Loading

BenTheElder commented Aug 16, 2019

aojea commented Aug 19, 2019

BenTheElder commented Aug 19, 2019

BenTheElder commented Aug 19, 2019

aojea commented Aug 19, 2019 •

edited

Loading

BenTheElder commented Aug 19, 2019 via email

WalkerGriggs commented Aug 20, 2019

BenTheElder commented Jan 29, 2020

BenTheElder commented Mar 15, 2020

BenTheElder commented Apr 28, 2020

vaibhav2107 commented Mar 9, 2022

BenTheElder commented Mar 10, 2022

BenTheElder commented Aug 8, 2022

gvkhna commented Aug 8, 2022 •

edited

Loading

matheuscscp commented Aug 7, 2024

BenTheElder commented Aug 7, 2024

matheuscscp commented Aug 7, 2024

BenTheElder commented Feb 25, 2025

kind should GC unused images #735

kind should GC unused images #735

Comments

BenTheElder commented Jul 24, 2019 • edited Loading

BenTheElder commented Aug 16, 2019

aojea commented Aug 19, 2019

BenTheElder commented Aug 19, 2019

BenTheElder commented Aug 19, 2019

aojea commented Aug 19, 2019 • edited Loading

BenTheElder commented Aug 19, 2019 via email

WalkerGriggs commented Aug 20, 2019

BenTheElder commented Jan 29, 2020

BenTheElder commented Mar 15, 2020

BenTheElder commented Apr 28, 2020

vaibhav2107 commented Mar 9, 2022

BenTheElder commented Mar 10, 2022

BenTheElder commented Aug 8, 2022

gvkhna commented Aug 8, 2022 • edited Loading

matheuscscp commented Aug 7, 2024

BenTheElder commented Aug 7, 2024

matheuscscp commented Aug 7, 2024

BenTheElder commented Feb 25, 2025

BenTheElder commented Jul 24, 2019 •

edited

Loading

aojea commented Aug 19, 2019 •

edited

Loading

gvkhna commented Aug 8, 2022 •

edited

Loading