clean up orphaned loop devices: use unique kubelet path #1333

pohly · 2020-02-13T20:03:46Z

When a container running under KIND binds a file to a loop device and then is terminated, the file remains bound even when removing the entire KIND cluster. This is a problem in particular for Prow because those leaked loop devices and the associated resources may accumulate over time.

There's no good fix because it's impossible to look at a bound loop device and determine whether it is still needed. All that one has is the file name, which is the same inside the original container and outside (no namespacing or anything).

What would you like to be added:

Here's a workaround for Prow:

add an option to KIND which changes the /var/lib/kubelet path so that it contains a unique ID chosen by the caller
use that for Kubernetes-CSI test jobs to ensure that kubelet and CSI drivers bind files whose full path name has a unique ID (like the Prow job ID)
add cleanup code somewhere (TBD) which unbinds all loop devices whose path contains that unique ID

Why is this needed:

This way we may be able to catch most tests that (theoretically) could leak loop devices.

The text was updated successfully, but these errors were encountered:

BenTheElder · 2020-02-14T01:18:23Z

add an option to KIND which changes the /var/lib/kubelet path so that it contains a unique ID chosen by the caller

this is somewhat awkward as a knob.
today it could be accomplished with a kubeadm config patch in the kind cluster config, I'll prototype something soon.

add cleanup code somewhere (TBD) which unbinds all loop devices whose path contains that unique ID

probably in the test-infra docker-in-docker logic

pohly · 2020-02-14T08:34:42Z

today it could be accomplished with a kubeadm config patch in the kind cluster config, I'll prototype something soon.

That's fine. It's really a corner-case, so the solution doesn't have to be nice. I had looked at that briefly but it wasn't immediately obvious where that path might be changed, so an example would be good.

BenTheElder · 2020-04-16T08:49:51Z

another thought from twitter: https://lkml.org/lkml/2020/4/8/506
see this thread 🧵 : https://twitter.com/filbranden/status/1249724120599691269

This implements loopfs, a loop device filesystem. It takes inspiration from the binderfs filesystem I implemented about two years ago and with which we had overall good experiences so far. Parts of it are also based on [3] but it's mostly a new, imho cleaner approach. Loopfs allows to create private loop devices instances to applications for various use-cases. It covers the use-case that was expressed on-list and in-person to get programmatic access to private loop devices for image building in sandboxes. An illustration for this is provided in [4]. Also loopfs is intended to provide loop devices to privileged and unprivileged containers which has been a frequent request from various major tools (Chromium, Kubernetes, LXD, Moby/Docker, systemd). I'm providing a non-exhaustive list of issues and requests (cf. [5]) around this feature mainly to illustrate that I'm not making the use-cases up. Currently none of this can be done safely since handing a loop device from the host into a container means that the container can see anything that the host is doing with that loop device and what other containers are doing with that device too. And (bind-)mounting devtmpfs inside of containers is not secure at all so also not an option (though sometimes done out of despair apparently). The workloads people run in containers are supposed to be indiscernible from workloads run on the host and the tools inside of the container are supposed to not be required to be aware that they are running inside a container apart from containerization tools themselves. This is especially true when running older distros in containers that did exist before containers were as ubiquitous as they are today. With loopfs user can call mount -o loop and in a correctly setup container things work the same way they would on the host. The filesystem representation allows us to do this in a very simple way. At container setup, a container manager can mount a private instance of loopfs somehwere, e.g. at /dev/loopfs and then bind-mount or symlink /dev/loopfs/loop-control to /dev/loop-control, pre allocate and symlink the number of standard devices into their standard location and have a service file or rules in place that symlink additionally allocated loop devices through losetup into place as well. With the new syscall interception logic this is also possible for unprivileged containers. In these cases when a user calls mount -o loop <image> <mountpoint> it will be possible to completely setup the loop device in the container. The final mount syscall is handled through syscall interception which we already implemented and released in earlier kernels (see [1] and [2]) and is actively used in production workloads. The mount is often rewritten to a fuse binary to provide safe access for unprivileged containers. Loopfs also allows the creation of hidden/detached dynamic loop devices and associated mounts which also was a often issued request. With the old mount api this can be achieved by creating a temporary loopfs and stashing a file descriptor to the mount point and the loop-control device and immediately unmounting the loopfs instance. With the new mount api a detached mount can be created directly (i.e. a mount not visible anywhere in the filesystem). New loop devices can then be allocated and configured. They can be mounted through /proc/self/<fd>/<nr> with the old mount api or by using the fd directly with the new mount api. Combined with a mount namespace this allows for fully auto-cleaned up loop devices on program crash. This ties back to various use-cases and is illustrated in [4]. The filesystem representation requires the standard boilerplate filesystem code we know from other tiny filesystems. And all of the loopfs code is hidden under a config option that defaults to false. This specifically means, that none of the code even exists when users do not have any use-case for loopfs. In addition, the loopfs code does not alter how loop devices behave at all, i.e. there are no changes to any existing workloads and I've taken care to ifdef all loopfs specific things out. Each loopfs mount is a separate instance. As such loop devices created in one instance are independent of loop devices created in another instance. This specifically entails that loop devices are only visible in the loopfs instance they belong to. The number of loop devices available in loopfs instances are hierarchically limited through /proc/sys/user/max_loop_devices via the ucount infrastructure (Thanks to David Rheinsberg for pointing out that missing piece.). An administrator could e.g. set echo 3 > /proc/sys/user/max_loop_devices at which point any loopfs instance mounted by uid x can only create 3 loop devices no matter how many loopfs instances they mount. This limit applies hierarchically to all user namespaces. In addition, loopfs has a "max" mount option which allows to set a limit on the number of loop devices for a given loopfs instance. This is mainly to cover use-cases where a single loopfs mount is shared as a bind-mount between multiple parties that are prevented from creating other loopfs mounts and is equivalent to the semantics of the binderfs and devpts "max" mount option. Note that in __loop_clr_fd() we now need not just check whether bdev is valid but also whether bdev->bd_disk is valid. This wasn't necessary before because in order to call LOOP_CLR_FD the loop device would need to be open and thus bdev->bd_disk was guaranteed to be allocated. For loopfs loop devices we allow callers to simply unlink them just as we do for binderfs binder devices and we do also need to account for the case where a loopfs superblock is shutdown while backing files might still be associated with some loop devices. In such cases no bd_disk device will be attached to bdev. This is not in itself noteworthy it's more about documenting the "why" of the added bdev->bd_disk check for posterity. [1]: 6a21cc5 ("seccomp: add a return code to trap to userspace") [2]: fb3c538 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE") [3]: https://lore.kernel.org/lkml/[email protected] [4]: https://gist.github.com/brauner/dcaf15e6977cc1bfadfb3965f126c02f [5]: kubernetes-sigs/kind#1333 kubernetes-sigs/kind#1248 https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#loop-mount https://gitlab.com/gitlab-com/support-forum/issues/3732 moby/moby#27886 https://twitter.com/_AkihiroSuda_/status/1249664478267854848 https://serverfault.com/questions/701384/loop-device-in-a-linux-container https://discuss.linuxcontainers.org/t/providing-access-to-loop-and-other-devices-in-containers/1352 https://discuss.concourse-ci.org/t/exposing-dev-loop-devices-in-privileged-mode/813 Cc: Jens Axboe <[email protected]> Cc: Steve Barber <[email protected]> Cc: Filipe Brandenburger <[email protected]> Cc: Kees Cook <[email protected]> Cc: Benjamin Elder <[email protected]> Cc: Seth Forshee <[email protected]> Cc: Stéphane Graber <[email protected]> Cc: Tom Gundersen <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christian Kellner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dylan Reid <[email protected]> Cc: David Rheinsberg <[email protected]> Cc: Akihiro Suda <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Christian Brauner <[email protected]> --- /* v2 */ - David Rheinsberg <[email protected]> / Christian Brauner <[email protected]>: - Correctly cleanup loop devices that are in-use after the loopfs instance has been shut down. This is important for some use-cases that David pointed out where they effectively create a loopfs instance, allocate devices and drop unnecessary references to it. - Christian Brauner <[email protected]>: - Replace lo_loopfs_i inode member in struct loop_device with a custom struct lo_info pointer which is only allocated for loopfs loop devices.

This implements loopfs, a loop device filesystem. It takes inspiration from the binderfs filesystem I implemented about two years ago and with which we had overall good experiences so far. Parts of it are also based on [3] but it's mostly a new, imho cleaner approach. Loopfs allows to create private loop devices instances to applications for various use-cases. It covers the use-case that was expressed on-list and in-person to get programmatic access to private loop devices for image building in sandboxes. An illustration for this is provided in [4]. Also loopfs is intended to provide loop devices to privileged and unprivileged containers which has been a frequent request from various major tools (Chromium, Kubernetes, LXD, Moby/Docker, systemd). I'm providing a non-exhaustive list of issues and requests (cf. [5]) around this feature mainly to illustrate that I'm not making the use-cases up. Currently none of this can be done safely since handing a loop device from the host into a container means that the container can see anything that the host is doing with that loop device and what other containers are doing with that device too. And (bind-)mounting devtmpfs inside of containers is not secure at all so also not an option (though sometimes done out of despair apparently). The workloads people run in containers are supposed to be indiscernible from workloads run on the host and the tools inside of the container are supposed to not be required to be aware that they are running inside a container apart from containerization tools themselves. This is especially true when running older distros in containers that did exist before containers were as ubiquitous as they are today. With loopfs user can call mount -o loop and in a correctly setup container things work the same way they would on the host. The filesystem representation allows us to do this in a very simple way. At container setup, a container manager can mount a private instance of loopfs somehwere, e.g. at /dev/loopfs and then bind-mount or symlink /dev/loopfs/loop-control to /dev/loop-control, pre allocate and symlink the number of standard devices into their standard location and have a service file or rules in place that symlink additionally allocated loop devices through losetup into place as well. With the new syscall interception logic this is also possible for unprivileged containers. In these cases when a user calls mount -o loop <image> <mountpoint> it will be possible to completely setup the loop device in the container. The final mount syscall is handled through syscall interception which we already implemented and released in earlier kernels (see [1] and [2]) and is actively used in production workloads. The mount is often rewritten to a fuse binary to provide safe access for unprivileged containers. Loopfs also allows the creation of hidden/detached dynamic loop devices and associated mounts which also was a often issued request. With the old mount api this can be achieved by creating a temporary loopfs and stashing a file descriptor to the mount point and the loop-control device and immediately unmounting the loopfs instance. With the new mount api a detached mount can be created directly (i.e. a mount not visible anywhere in the filesystem). New loop devices can then be allocated and configured. They can be mounted through /proc/self/<fd>/<nr> with the old mount api or by using the fd directly with the new mount api. Combined with a mount namespace this allows for fully auto-cleaned up loop devices on program crash. This ties back to various use-cases and is illustrated in [4]. The filesystem representation requires the standard boilerplate filesystem code we know from other tiny filesystems. And all of the loopfs code is hidden under a config option that defaults to false. This specifically means, that none of the code even exists when users do not have any use-case for loopfs. In addition, the loopfs code does not alter how loop devices behave at all, i.e. there are no changes to any existing workloads and I've taken care to ifdef all loopfs specific things out. Each loopfs mount is a separate instance. As such loop devices created in one instance are independent of loop devices created in another instance. This specifically entails that loop devices are only visible in the loopfs instance they belong to. The number of loop devices available in loopfs instances are hierarchically limited through /proc/sys/user/max_loop_devices via the ucount infrastructure (Thanks to David Rheinsberg for pointing out that missing piece.). An administrator could e.g. set echo 3 > /proc/sys/user/max_loop_devices at which point any loopfs instance mounted by uid x can only create 3 loop devices no matter how many loopfs instances they mount. This limit applies hierarchically to all user namespaces. In addition, loopfs has a "max" mount option which allows to set a limit on the number of loop devices for a given loopfs instance. This is mainly to cover use-cases where a single loopfs mount is shared as a bind-mount between multiple parties that are prevented from creating other loopfs mounts and is equivalent to the semantics of the binderfs and devpts "max" mount option. Note that in __loop_clr_fd() we now need not just check whether bdev is valid but also whether bdev->bd_disk is valid. This wasn't necessary before because in order to call LOOP_CLR_FD the loop device would need to be open and thus bdev->bd_disk was guaranteed to be allocated. For loopfs loop devices we allow callers to simply unlink them just as we do for binderfs binder devices and we do also need to account for the case where a loopfs superblock is shutdown while backing files might still be associated with some loop devices. In such cases no bd_disk device will be attached to bdev. This is not in itself noteworthy it's more about documenting the "why" of the added bdev->bd_disk check for posterity. [1]: 6a21cc5 ("seccomp: add a return code to trap to userspace") [2]: fb3c538 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE") [3]: https://lore.kernel.org/lkml/[email protected] [4]: https://gist.github.com/brauner/dcaf15e6977cc1bfadfb3965f126c02f [5]: kubernetes-sigs/kind#1333 kubernetes-sigs/kind#1248 https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#loop-mount https://gitlab.com/gitlab-com/support-forum/issues/3732 moby/moby#27886 https://twitter.com/_AkihiroSuda_/status/1249664478267854848 https://serverfault.com/questions/701384/loop-device-in-a-linux-container https://discuss.linuxcontainers.org/t/providing-access-to-loop-and-other-devices-in-containers/1352 https://discuss.concourse-ci.org/t/exposing-dev-loop-devices-in-privileged-mode/813 Cc: Jens Axboe <[email protected]> Cc: Steve Barber <[email protected]> Cc: Filipe Brandenburger <[email protected]> Cc: Kees Cook <[email protected]> Cc: Benjamin Elder <[email protected]> Cc: Seth Forshee <[email protected]> Cc: Stéphane Graber <[email protected]> Cc: Tom Gundersen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christian Kellner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dylan Reid <[email protected]> Cc: David Rheinsberg <[email protected]> Cc: Akihiro Suda <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Signed-off-by: Christian Brauner <[email protected]> --- /* v2 */ - David Rheinsberg <[email protected]> / Christian Brauner <[email protected]>: - Correctly cleanup loop devices that are in-use after the loopfs instance has been shut down. This is important for some use-cases that David pointed out where they effectively create a loopfs instance, allocate devices and drop unnecessary references to it. - Christian Brauner <[email protected]>: - Replace lo_loopfs_i inode member in struct loop_device with a custom struct lo_info pointer which is only allocated for loopfs loop devices. /* v3 */ unchanged

This implements loopfs, a loop device filesystem. It takes inspiration from the binderfs filesystem I implemented about two years ago and with which we had overall good experiences so far. Parts of it are also based on [3] but it's mostly a new, imho cleaner approach. Loopfs allows to create private loop devices instances to applications for various use-cases. It covers the use-case that was expressed on-list and in-person to get programmatic access to private loop devices for image building in sandboxes. An illustration for this is provided in [4]. Also loopfs is intended to provide loop devices to privileged and unprivileged containers which has been a frequent request from various major tools (Chromium, Kubernetes, LXD, Moby/Docker, systemd). I'm providing a non-exhaustive list of issues and requests (cf. [5]) around this feature mainly to illustrate that I'm not making the use-cases up. Currently none of this can be done safely since handing a loop device from the host into a container means that the container can see anything that the host is doing with that loop device and what other containers are doing with that device too. And (bind-)mounting devtmpfs inside of containers is not secure at all so also not an option (though sometimes done out of despair apparently). The workloads people run in containers are supposed to be indiscernible from workloads run on the host and the tools inside of the container are supposed to not be required to be aware that they are running inside a container apart from containerization tools themselves. This is especially true when running older distros in containers that did exist before containers were as ubiquitous as they are today. With loopfs user can call mount -o loop and in a correctly setup container things work the same way they would on the host. The filesystem representation allows us to do this in a very simple way. At container setup, a container manager can mount a private instance of loopfs somehwere, e.g. at /dev/loopfs and then bind-mount or symlink /dev/loopfs/loop-control to /dev/loop-control, pre allocate and symlink the number of standard devices into their standard location and have a service file or rules in place that symlink additionally allocated loop devices through losetup into place as well. With the new syscall interception logic this is also possible for unprivileged containers. In these cases when a user calls mount -o loop <image> <mountpoint> it will be possible to completely setup the loop device in the container. The final mount syscall is handled through syscall interception which we already implemented and released in earlier kernels (see [1] and [2]) and is actively used in production workloads. The mount is often rewritten to a fuse binary to provide safe access for unprivileged containers. Loopfs also allows the creation of hidden/detached dynamic loop devices and associated mounts which also was a often issued request. With the old mount api this can be achieved by creating a temporary loopfs and stashing a file descriptor to the mount point and the loop-control device and immediately unmounting the loopfs instance. With the new mount api a detached mount can be created directly (i.e. a mount not visible anywhere in the filesystem). New loop devices can then be allocated and configured. They can be mounted through /proc/self/<fd>/<nr> with the old mount api or by using the fd directly with the new mount api. Combined with a mount namespace this allows for fully auto-cleaned up loop devices on program crash. This ties back to various use-cases and is illustrated in [4]. The filesystem representation requires the standard boilerplate filesystem code we know from other tiny filesystems. And all of the loopfs code is hidden under a config option that defaults to false. This specifically means, that none of the code even exists when users do not have any use-case for loopfs. In addition, the loopfs code does not alter how loop devices behave at all, i.e. there are no changes to any existing workloads and I've taken care to ifdef all loopfs specific things out. Each loopfs mount is a separate instance. As such loop devices created in one instance are independent of loop devices created in another instance. This specifically entails that loop devices are only visible in the loopfs instance they belong to. The number of loop devices available in loopfs instances are hierarchically limited through /proc/sys/user/max_loop_devices via the ucount infrastructure (Thanks to David Rheinsberg for pointing out that missing piece.). An administrator could e.g. set echo 3 > /proc/sys/user/max_loop_devices at which point any loopfs instance mounted by uid x can only create 3 loop devices no matter how many loopfs instances they mount. This limit applies hierarchically to all user namespaces. In addition, loopfs has a "max" mount option which allows to set a limit on the number of loop devices for a given loopfs instance. This is mainly to cover use-cases where a single loopfs mount is shared as a bind-mount between multiple parties that are prevented from creating other loopfs mounts and is equivalent to the semantics of the binderfs and devpts "max" mount option. Note that in __loop_clr_fd() we now need not just check whether bdev is valid but also whether bdev->bd_disk is valid. This wasn't necessary before because in order to call LOOP_CLR_FD the loop device would need to be open and thus bdev->bd_disk was guaranteed to be allocated. For loopfs loop devices we allow callers to simply unlink them just as we do for binderfs binder devices and we do also need to account for the case where a loopfs superblock is shutdown while backing files might still be associated with some loop devices. In such cases no bd_disk device will be attached to bdev. This is not in itself noteworthy it's more about documenting the "why" of the added bdev->bd_disk check for posterity. [1]: 6a21cc5 ("seccomp: add a return code to trap to userspace") [2]: fb3c538 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE") [3]: https://lore.kernel.org/lkml/[email protected] [4]: https://gist.github.com/brauner/dcaf15e6977cc1bfadfb3965f126c02f [5]: kubernetes-sigs/kind#1333 kubernetes-sigs/kind#1248 https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#loop-mount https://gitlab.com/gitlab-com/support-forum/issues/3732 moby/moby#27886 https://twitter.com/_AkihiroSuda_/status/1249664478267854848 https://serverfault.com/questions/701384/loop-device-in-a-linux-container https://discuss.linuxcontainers.org/t/providing-access-to-loop-and-other-devices-in-containers/1352 https://discuss.concourse-ci.org/t/exposing-dev-loop-devices-in-privileged-mode/813 Cc: Jens Axboe <[email protected]> Cc: Steve Barber <[email protected]> Cc: Filipe Brandenburger <[email protected]> Cc: Kees Cook <[email protected]> Cc: Benjamin Elder <[email protected]> Cc: Seth Forshee <[email protected]> Cc: Stéphane Graber <[email protected]> Cc: Tom Gundersen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christian Kellner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dylan Reid <[email protected]> Cc: David Rheinsberg <[email protected]> Cc: Akihiro Suda <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Signed-off-by: Christian Brauner <[email protected]> --- /* v2 */ - David Rheinsberg <[email protected]> / Christian Brauner <[email protected]>: - Correctly cleanup loop devices that are in-use after the loopfs instance has been shut down. This is important for some use-cases that David pointed out where they effectively create a loopfs instance, allocate devices and drop unnecessary references to it. - Christian Brauner <[email protected]>: - Replace lo_loopfs_i inode member in struct loop_device with a custom struct lo_info pointer which is only allocated for loopfs loop devices. /* v3 */ - Christian Brauner <[email protected]>: - Fix loopfs_access() to not care about non-loopfs devices.

This implements loopfs, a loop device filesystem. It takes inspiration from the binderfs filesystem I implemented about two years ago and with which we had overall good experiences so far. Parts of it are also based on [3] but it's mostly a new, imho cleaner approach. Loopfs allows to create private loop devices instances to applications for various use-cases. It covers the use-case that was expressed on-list and in-person to get programmatic access to private loop devices for image building in sandboxes. An illustration for this is provided in [4]. Also loopfs is intended to provide loop devices to privileged and unprivileged containers which has been a frequent request from various major tools (Chromium, Kubernetes, LXD, Moby/Docker, systemd). I'm providing a non-exhaustive list of issues and requests (cf. [5]) around this feature mainly to illustrate that I'm not making the use-cases up. Currently none of this can be done safely since handing a loop device from the host into a container means that the container can see anything that the host is doing with that loop device and what other containers are doing with that device too. And (bind-)mounting devtmpfs inside of containers is not secure at all so also not an option (though sometimes done out of despair apparently). The workloads people run in containers are supposed to be indiscernible from workloads run on the host and the tools inside of the container are supposed to not be required to be aware that they are running inside a container apart from containerization tools themselves. This is especially true when running older distros in containers that did exist before containers were as ubiquitous as they are today. With loopfs user can call mount -o loop and in a correctly setup container things work the same way they would on the host. The filesystem representation allows us to do this in a very simple way. At container setup, a container manager can mount a private instance of loopfs somehwere, e.g. at /dev/loopfs and then bind-mount or symlink /dev/loopfs/loop-control to /dev/loop-control, pre allocate and symlink the number of standard devices into their standard location and have a service file or rules in place that symlink additionally allocated loop devices through losetup into place as well. With the new syscall interception logic this is also possible for unprivileged containers. In these cases when a user calls mount -o loop <image> <mountpoint> it will be possible to completely setup the loop device in the container. The final mount syscall is handled through syscall interception which we already implemented and released in earlier kernels (see [1] and [2]) and is actively used in production workloads. The mount is often rewritten to a fuse binary to provide safe access for unprivileged containers. Loopfs also allows the creation of hidden/detached dynamic loop devices and associated mounts which also was a often issued request. With the old mount api this can be achieved by creating a temporary loopfs and stashing a file descriptor to the mount point and the loop-control device and immediately unmounting the loopfs instance. With the new mount api a detached mount can be created directly (i.e. a mount not visible anywhere in the filesystem). New loop devices can then be allocated and configured. They can be mounted through /proc/self/<fd>/<nr> with the old mount api or by using the fd directly with the new mount api. Combined with a mount namespace this allows for fully auto-cleaned up loop devices on program crash. This ties back to various use-cases and is illustrated in [4]. The filesystem representation requires the standard boilerplate filesystem code we know from other tiny filesystems. And all of the loopfs code is hidden under a config option that defaults to false. This specifically means, that none of the code even exists when users do not have any use-case for loopfs. In addition, the loopfs code does not alter how loop devices behave at all, i.e. there are no changes to any existing workloads and I've taken care to ifdef all loopfs specific things out. Each loopfs mount is a separate instance. As such loop devices created in one instance are independent of loop devices created in another instance. This specifically entails that loop devices are only visible in the loopfs instance they belong to. The number of loop devices available in loopfs instances are hierarchically limited through /proc/sys/user/max_loop_devices via the ucount infrastructure (Thanks to David Rheinsberg for pointing out that missing piece.). An administrator could e.g. set echo 3 > /proc/sys/user/max_loop_devices at which point any loopfs instance mounted by uid x can only create 3 loop devices no matter how many loopfs instances they mount. This limit applies hierarchically to all user namespaces. In addition, loopfs has a "max" mount option which allows to set a limit on the number of loop devices for a given loopfs instance. This is mainly to cover use-cases where a single loopfs mount is shared as a bind-mount between multiple parties that are prevented from creating other loopfs mounts and is equivalent to the semantics of the binderfs and devpts "max" mount option. Note that in __loop_clr_fd() we now need not just check whether bdev is valid but also whether bdev->bd_disk is valid. This wasn't necessary before because in order to call LOOP_CLR_FD the loop device would need to be open and thus bdev->bd_disk was guaranteed to be allocated. For loopfs loop devices we allow callers to simply unlink them just as we do for binderfs binder devices and we do also need to account for the case where a loopfs superblock is shutdown while backing files might still be associated with some loop devices. In such cases no bd_disk device will be attached to bdev. This is not in itself noteworthy it's more about documenting the "why" of the added bdev->bd_disk check for posterity. [1]: 6a21cc5 ("seccomp: add a return code to trap to userspace") [2]: fb3c538 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE") [3]: https://lore.kernel.org/lkml/[email protected] [4]: https://gist.github.com/brauner/dcaf15e6977cc1bfadfb3965f126c02f [5]: kubernetes-sigs/kind#1333 kubernetes-sigs/kind#1248 https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#loop-mount https://gitlab.com/gitlab-com/support-forum/issues/3732 moby/moby#27886 https://twitter.com/_AkihiroSuda_/status/1249664478267854848 https://serverfault.com/questions/701384/loop-device-in-a-linux-container https://discuss.linuxcontainers.org/t/providing-access-to-loop-and-other-devices-in-containers/1352 https://discuss.concourse-ci.org/t/exposing-dev-loop-devices-in-privileged-mode/813 Cc: Jens Axboe <[email protected]> Cc: Steve Barber <[email protected]> Cc: Filipe Brandenburger <[email protected]> Cc: Kees Cook <[email protected]> Cc: Benjamin Elder <[email protected]> Cc: Seth Forshee <[email protected]> Cc: Stéphane Graber <[email protected]> Cc: Tom Gundersen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christian Kellner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dylan Reid <[email protected]> Cc: David Rheinsberg <[email protected]> Cc: Akihiro Suda <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Signed-off-by: Christian Brauner <[email protected]> --- /* v2 */ - David Rheinsberg <[email protected]> / Christian Brauner <[email protected]>: - Correctly cleanup loop devices that are in-use after the loopfs instance has been shut down. This is important for some use-cases that David pointed out where they effectively create a loopfs instance, allocate devices and drop unnecessary references to it. - Christian Brauner <[email protected]>: - Replace lo_loopfs_i inode member in struct loop_device with a custom struct lo_info pointer which is only allocated for loopfs loop devices. /* v3 */ - Christian Brauner <[email protected]>: - Fix loopfs_access() to not care about non-loopfs devices. - David Rheinsberg <[email protected]> / Serge Hallyn <[email protected]>: - Remove "max" mount option.

This implements loopfs, a loop device filesystem. It takes inspiration from the binderfs filesystem I implemented about two years ago and with which we had overall good experiences so far. Parts of it are also based on [3] but it's mostly a new, imho cleaner approach. Loopfs allows to create private loop devices instances to applications for various use-cases. It covers the use-case that was expressed on-list and in-person to get programmatic access to private loop devices for image building in sandboxes. An illustration for this is provided in [4]. Also loopfs is intended to provide loop devices to privileged and unprivileged containers which has been a frequent request from various major tools (Chromium, Kubernetes, LXD, Moby/Docker, systemd). I'm providing a non-exhaustive list of issues and requests (cf. [5]) around this feature mainly to illustrate that I'm not making the use-cases up. Currently none of this can be done safely since handing a loop device from the host into a container means that the container can see anything that the host is doing with that loop device and what other containers are doing with that device too. And (bind-)mounting devtmpfs inside of containers is not secure at all so also not an option (though sometimes done out of despair apparently). The workloads people run in containers are supposed to be indiscernible from workloads run on the host and the tools inside of the container are supposed to not be required to be aware that they are running inside a container apart from containerization tools themselves. This is especially true when running older distros in containers that did exist before containers were as ubiquitous as they are today. With loopfs user can call mount -o loop and in a correctly setup container things work the same way they would on the host. The filesystem representation allows us to do this in a very simple way. At container setup, a container manager can mount a private instance of loopfs somehwere, e.g. at /dev/loopfs and then bind-mount or symlink /dev/loopfs/loop-control to /dev/loop-control, pre allocate and symlink the number of standard devices into their standard location and have a service file or rules in place that symlink additionally allocated loop devices through losetup into place as well. With the new syscall interception logic this is also possible for unprivileged containers. In these cases when a user calls mount -o loop <image> <mountpoint> it will be possible to completely setup the loop device in the container. The final mount syscall is handled through syscall interception which we already implemented and released in earlier kernels (see [1] and [2]) and is actively used in production workloads. The mount is often rewritten to a fuse binary to provide safe access for unprivileged containers. Loopfs also allows the creation of hidden/detached dynamic loop devices and associated mounts which also was a often issued request. With the old mount api this can be achieved by creating a temporary loopfs and stashing a file descriptor to the mount point and the loop-control device and immediately unmounting the loopfs instance. With the new mount api a detached mount can be created directly (i.e. a mount not visible anywhere in the filesystem). New loop devices can then be allocated and configured. They can be mounted through /proc/self/<fd>/<nr> with the old mount api or by using the fd directly with the new mount api. Combined with a mount namespace this allows for fully auto-cleaned up loop devices on program crash. This ties back to various use-cases and is illustrated in [4]. The filesystem representation requires the standard boilerplate filesystem code we know from other tiny filesystems. And all of the loopfs code is hidden under a config option that defaults to false. This specifically means, that none of the code even exists when users do not have any use-case for loopfs. In addition, the loopfs code does not alter how loop devices behave at all, i.e. there are no changes to any existing workloads and I've taken care to ifdef all loopfs specific things out. Each loopfs mount is a separate instance. As such loop devices created in one instance are independent of loop devices created in another instance. This specifically entails that loop devices are only visible in the loopfs instance they belong to. The number of loop devices available in loopfs instances are hierarchically limited through /proc/sys/user/max_loop_devices via the ucount infrastructure (Thanks to David Rheinsberg for pointing out that missing piece.). An administrator could e.g. set echo 3 > /proc/sys/user/max_loop_devices at which point any loopfs instance mounted by uid x can only create 3 loop devices no matter how many loopfs instances they mount. This limit applies hierarchically to all user namespaces. In addition, loopfs has a "max" mount option which allows to set a limit on the number of loop devices for a given loopfs instance. This is mainly to cover use-cases where a single loopfs mount is shared as a bind-mount between multiple parties that are prevented from creating other loopfs mounts and is equivalent to the semantics of the binderfs and devpts "max" mount option. Note that in __loop_clr_fd() we now need not just check whether bdev is valid but also whether bdev->bd_disk is valid. This wasn't necessary before because in order to call LOOP_CLR_FD the loop device would need to be open and thus bdev->bd_disk was guaranteed to be allocated. For loopfs loop devices we allow callers to simply unlink them just as we do for binderfs binder devices and we do also need to account for the case where a loopfs superblock is shutdown while backing files might still be associated with some loop devices. In such cases no bd_disk device will be attached to bdev. This is not in itself noteworthy it's more about documenting the "why" of the added bdev->bd_disk check for posterity. [1]: 6a21cc5 ("seccomp: add a return code to trap to userspace") [2]: fb3c538 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE") [3]: https://lore.kernel.org/lkml/[email protected] [4]: https://gist.github.com/brauner/dcaf15e6977cc1bfadfb3965f126c02f [5]: kubernetes-sigs/kind#1333 kubernetes-sigs/kind#1248 https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#loop-mount https://gitlab.com/gitlab-com/support-forum/issues/3732 moby/moby#27886 https://twitter.com/_AkihiroSuda_/status/1249664478267854848 https://serverfault.com/questions/701384/loop-device-in-a-linux-container https://discuss.linuxcontainers.org/t/providing-access-to-loop-and-other-devices-in-containers/1352 https://discuss.concourse-ci.org/t/exposing-dev-loop-devices-in-privileged-mode/813 Cc: Jens Axboe <[email protected]> Cc: Steve Barber <[email protected]> Cc: Filipe Brandenburger <[email protected]> Cc: Kees Cook <[email protected]> Cc: Benjamin Elder <[email protected]> Cc: Seth Forshee <[email protected]> Cc: Stéphane Graber <[email protected]> Cc: Tom Gundersen <[email protected]> Cc: Serge Hallyn <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christian Kellner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dylan Reid <[email protected]> Cc: David Rheinsberg <[email protected]> Cc: Akihiro Suda <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Signed-off-by: Christian Brauner <[email protected]>

This implements loopfs, a loop device filesystem. It takes inspiration from the binderfs filesystem I implemented about two years ago and with which we had overall good experiences so far. Parts of it are also based on [3] but it's mostly a new, imho cleaner approach. Loopfs allows to create private loop devices instances to applications for various use-cases. It covers the use-case that was expressed on-list and in-person to get programmatic access to private loop devices for image building in sandboxes. An illustration for this is provided in [4]. Also loopfs is intended to provide loop devices to privileged and unprivileged containers which has been a frequent request from various major tools (Chromium, Kubernetes, LXD, Moby/Docker, systemd). I'm providing a non-exhaustive list of issues and requests (cf. [5]) around this feature mainly to illustrate that I'm not making the use-cases up. Currently none of this can be done safely since handing a loop device from the host into a container means that the container can see anything that the host is doing with that loop device and what other containers are doing with that device too. And (bind-)mounting devtmpfs inside of containers is not secure at all so also not an option (though sometimes done out of despair apparently). The workloads people run in containers are supposed to be indiscernible from workloads run on the host and the tools inside of the container are supposed to not be required to be aware that they are running inside a container apart from containerization tools themselves. This is especially true when running older distros in containers that did exist before containers were as ubiquitous as they are today. With loopfs user can call mount -o loop and in a correctly setup container things work the same way they would on the host. The filesystem representation allows us to do this in a very simple way. At container setup, a container manager can mount a private instance of loopfs somehwere, e.g. at /dev/loopfs and then bind-mount or symlink /dev/loopfs/loop-control to /dev/loop-control, pre allocate and symlink the number of standard devices into their standard location and have a service file or rules in place that symlink additionally allocated loop devices through losetup into place as well. With the new syscall interception logic this is also possible for unprivileged containers. In these cases when a user calls mount -o loop <image> <mountpoint> it will be possible to completely setup the loop device in the container. The final mount syscall is handled through syscall interception which we already implemented and released in earlier kernels (see [1] and [2]) and is actively used in production workloads. The mount is often rewritten to a fuse binary to provide safe access for unprivileged containers. Loopfs also allows the creation of hidden/detached dynamic loop devices and associated mounts which also was a often issued request. With the old mount api this can be achieved by creating a temporary loopfs and stashing a file descriptor to the mount point and the loop-control device and immediately unmounting the loopfs instance. With the new mount api a detached mount can be created directly (i.e. a mount not visible anywhere in the filesystem). New loop devices can then be allocated and configured. They can be mounted through /proc/self/<fd>/<nr> with the old mount api or by using the fd directly with the new mount api. Combined with a mount namespace this allows for fully auto-cleaned up loop devices on program crash. This ties back to various use-cases and is illustrated in [4]. The filesystem representation requires the standard boilerplate filesystem code we know from other tiny filesystems. And all of the loopfs code is hidden under a config option that defaults to false. This specifically means, that none of the code even exists when users do not have any use-case for loopfs. In addition, the loopfs code does not alter how loop devices behave at all, i.e. there are no changes to any existing workloads and I've taken care to ifdef all loopfs specific things out. Each loopfs mount is a separate instance. As such loop devices created in one instance are independent of loop devices created in another instance. This specifically entails that loop devices are only visible in the loopfs instance they belong to. The number of loop devices available in loopfs instances are hierarchically limited through /proc/sys/user/max_loop_devices via the ucount infrastructure (Thanks to David Rheinsberg for pointing out that missing piece.). An administrator could e.g. set echo 3 > /proc/sys/user/max_loop_devices at which point any loopfs instance mounted by uid x can only create 3 loop devices no matter how many loopfs instances they mount. This limit applies hierarchically to all user namespaces. In addition, loopfs has a "max" mount option which allows to set a limit on the number of loop devices for a given loopfs instance. This is mainly to cover use-cases where a single loopfs mount is shared as a bind-mount between multiple parties that are prevented from creating other loopfs mounts and is equivalent to the semantics of the binderfs and devpts "max" mount option. Note that in __loop_clr_fd() we now need not just check whether bdev is valid but also whether bdev->bd_disk is valid. This wasn't necessary before because in order to call LOOP_CLR_FD the loop device would need to be open and thus bdev->bd_disk was guaranteed to be allocated. For loopfs loop devices we allow callers to simply unlink them just as we do for binderfs binder devices and we do also need to account for the case where a loopfs superblock is shutdown while backing files might still be associated with some loop devices. In such cases no bd_disk device will be attached to bdev. This is not in itself noteworthy it's more about documenting the "why" of the added bdev->bd_disk check for posterity. [1]: 6a21cc5 ("seccomp: add a return code to trap to userspace") [2]: fb3c538 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE") [3]: https://lore.kernel.org/lkml/[email protected] [4]: https://gist.github.com/brauner/dcaf15e6977cc1bfadfb3965f126c02f [5]: kubernetes-sigs/kind#1333 kubernetes-sigs/kind#1248 https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#loop-mount https://gitlab.com/gitlab-com/support-forum/issues/3732 moby/moby#27886 https://twitter.com/_AkihiroSuda_/status/1249664478267854848 https://serverfault.com/questions/701384/loop-device-in-a-linux-container https://discuss.linuxcontainers.org/t/providing-access-to-loop-and-other-devices-in-containers/1352 https://discuss.concourse-ci.org/t/exposing-dev-loop-devices-in-privileged-mode/813 Cc: Jens Axboe <[email protected]> Cc: Steve Barber <[email protected]> Cc: Filipe Brandenburger <[email protected]> Cc: Kees Cook <[email protected]> Cc: Benjamin Elder <[email protected]> Cc: Seth Forshee <[email protected]> Cc: Stéphane Graber <[email protected]> Cc: Tom Gundersen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christian Kellner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dylan Reid <[email protected]> Cc: David Rheinsberg <[email protected]> Cc: Akihiro Suda <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Signed-off-by: Christian Brauner <[email protected]> --- /* v2 */ - David Rheinsberg <[email protected]> / Christian Brauner <[email protected]>: - Correctly cleanup loop devices that are in-use after the loopfs instance has been shut down. This is important for some use-cases that David pointed out where they effectively create a loopfs instance, allocate devices and drop unnecessary references to it. - Christian Brauner <[email protected]>: - Replace lo_loopfs_i inode member in struct loop_device with a custom struct lo_info pointer which is only allocated for loopfs loop devices. /* v3 */ - Christian Brauner <[email protected]>: - Fix loopfs_access() to not care about non-loopfs devices. - David Rheinsberg <[email protected]> / Serge Hallyn <[email protected]>: - Remove "max" mount option.

This implements loopfs, a loop device filesystem. It takes inspiration from the binderfs filesystem I implemented about two years ago and with which we had overall good experiences so far. Parts of it are also based on [3] but it's mostly a new, imho cleaner approach. Loopfs allows to create private loop devices instances to applications for various use-cases. It covers the use-case that was expressed on-list and in-person to get programmatic access to private loop devices for image building in sandboxes. An illustration for this is provided in [4]. Also loopfs is intended to provide loop devices to privileged and unprivileged containers which has been a frequent request from various major tools (Chromium, Kubernetes, LXD, Moby/Docker, systemd). I'm providing a non-exhaustive list of issues and requests (cf. [5]) around this feature mainly to illustrate that I'm not making the use-cases up. Currently none of this can be done safely since handing a loop device from the host into a container means that the container can see anything that the host is doing with that loop device and what other containers are doing with that device too. And (bind-)mounting devtmpfs inside of containers is not secure at all so also not an option (though sometimes done out of despair apparently). The workloads people run in containers are supposed to be indiscernible from workloads run on the host and the tools inside of the container are supposed to not be required to be aware that they are running inside a container apart from containerization tools themselves. This is especially true when running older distros in containers that did exist before containers were as ubiquitous as they are today. With loopfs user can call mount -o loop and in a correctly setup container things work the same way they would on the host. The filesystem representation allows us to do this in a very simple way. At container setup, a container manager can mount a private instance of loopfs somehwere, e.g. at /dev/loopfs and then bind-mount or symlink /dev/loopfs/loop-control to /dev/loop-control, pre allocate and symlink the number of standard devices into their standard location and have a service file or rules in place that symlink additionally allocated loop devices through losetup into place as well. With the new syscall interception logic this is also possible for unprivileged containers. In these cases when a user calls mount -o loop <image> <mountpoint> it will be possible to completely setup the loop device in the container. The final mount syscall is handled through syscall interception which we already implemented and released in earlier kernels (see [1] and [2]) and is actively used in production workloads. The mount is often rewritten to a fuse binary to provide safe access for unprivileged containers. Loopfs also allows the creation of hidden/detached dynamic loop devices and associated mounts which also was a often issued request. With the old mount api this can be achieved by creating a temporary loopfs and stashing a file descriptor to the mount point and the loop-control device and immediately unmounting the loopfs instance. With the new mount api a detached mount can be created directly (i.e. a mount not visible anywhere in the filesystem). New loop devices can then be allocated and configured. They can be mounted through /proc/self/<fd>/<nr> with the old mount api or by using the fd directly with the new mount api. Combined with a mount namespace this allows for fully auto-cleaned up loop devices on program crash. This ties back to various use-cases and is illustrated in [4]. The filesystem representation requires the standard boilerplate filesystem code we know from other tiny filesystems. And all of the loopfs code is hidden under a config option that defaults to false. This specifically means, that none of the code even exists when users do not have any use-case for loopfs. In addition, the loopfs code does not alter how loop devices behave at all, i.e. there are no changes to any existing workloads and I've taken care to ifdef all loopfs specific things out. Each loopfs mount is a separate instance. As such loop devices created in one instance are independent of loop devices created in another instance. This specifically entails that loop devices are only visible in the loopfs instance they belong to. The number of loop devices available in loopfs instances are hierarchically limited through /proc/sys/user/max_loop_devices via the ucount infrastructure (Thanks to David Rheinsberg for pointing out that missing piece.). An administrator could e.g. set echo 3 > /proc/sys/user/max_loop_devices at which point any loopfs instance mounted by uid x can only create 3 loop devices no matter how many loopfs instances they mount. This limit applies hierarchically to all user namespaces. In addition, loopfs has a "max" mount option which allows to set a limit on the number of loop devices for a given loopfs instance. This is mainly to cover use-cases where a single loopfs mount is shared as a bind-mount between multiple parties that are prevented from creating other loopfs mounts and is equivalent to the semantics of the binderfs and devpts "max" mount option. Note that in __loop_clr_fd() we now need not just check whether bdev is valid but also whether bdev->bd_disk is valid. This wasn't necessary before because in order to call LOOP_CLR_FD the loop device would need to be open and thus bdev->bd_disk was guaranteed to be allocated. For loopfs loop devices we allow callers to simply unlink them just as we do for binderfs binder devices and we do also need to account for the case where a loopfs superblock is shutdown while backing files might still be associated with some loop devices. In such cases no bd_disk device will be attached to bdev. This is not in itself noteworthy it's more about documenting the "why" of the added bdev->bd_disk check for posterity. [1]: 6a21cc5 ("seccomp: add a return code to trap to userspace") [2]: fb3c538 ("seccomp: add SECCOMP_USER_NOTIF_FLAG_CONTINUE") [3]: https://lore.kernel.org/lkml/[email protected] [4]: https://gist.github.com/brauner/dcaf15e6977cc1bfadfb3965f126c02f [5]: kubernetes-sigs/kind#1333 kubernetes-sigs/kind#1248 https://lists.freedesktop.org/archives/systemd-devel/2017-August/039453.html https://chromium.googlesource.com/chromiumos/docs/+/master/containers_and_vms.md#loop-mount https://gitlab.com/gitlab-com/support-forum/issues/3732 moby/moby#27886 https://twitter.com/_AkihiroSuda_/status/1249664478267854848 https://serverfault.com/questions/701384/loop-device-in-a-linux-container https://discuss.linuxcontainers.org/t/providing-access-to-loop-and-other-devices-in-containers/1352 https://discuss.concourse-ci.org/t/exposing-dev-loop-devices-in-privileged-mode/813 Cc: Jens Axboe <[email protected]> Cc: Steve Barber <[email protected]> Cc: Filipe Brandenburger <[email protected]> Cc: Kees Cook <[email protected]> Cc: Benjamin Elder <[email protected]> Cc: Seth Forshee <[email protected]> Cc: Stéphane Graber <[email protected]> Cc: Tom Gundersen <[email protected]> Cc: Tejun Heo <[email protected]> Cc: Christian Kellner <[email protected]> Cc: Greg Kroah-Hartman <[email protected]> Cc: "David S. Miller" <[email protected]> Cc: Dylan Reid <[email protected]> Cc: David Rheinsberg <[email protected]> Cc: Akihiro Suda <[email protected]> Cc: Dmitry Vyukov <[email protected]> Cc: "Rafael J. Wysocki" <[email protected]> Reviewed-by: Serge Hallyn <[email protected]> Signed-off-by: Christian Brauner <[email protected]> --- /* v2 */ - David Rheinsberg <[email protected]> / Christian Brauner <[email protected]>: - Correctly cleanup loop devices that are in-use after the loopfs instance has been shut down. This is important for some use-cases that David pointed out where they effectively create a loopfs instance, allocate devices and drop unnecessary references to it. - Christian Brauner <[email protected]>: - Replace lo_loopfs_i inode member in struct loop_device with a custom struct lo_info pointer which is only allocated for loopfs loop devices. /* v3 */ - Christian Brauner <[email protected]>: - Fix loopfs_access() to not care about non-loopfs devices. - Stash refcounted sbinfo in lo_info to simplify retrieval of user namespace. This way each loopfs instance just takes a single reference for each to the user namespace that is dropped when the last loop device is removed. This puts us on the safe side. (Thanks to Serge for making me aware of this issue. - David Rheinsberg <[email protected]> / Serge Hallyn <[email protected]>: - Remove "max" mount option.

BenTheElder · 2020-07-27T19:51:47Z

/lifecycle frozen

pohly · 2020-12-07T20:03:58Z

Did we agree to implement something which embeds a unique ID in a non-standard kubelet data directory? After the comment about loopfs I wasn't sure anymore.

As pointed out in kubernetes/kubernetes#92664, the CSI tests must then be configured to use the modified data directory.

BenTheElder · 2020-12-07T21:55:52Z

Did we agree to implement something which embeds a unique ID in a non-standard kubelet data directory? After the comment about loopfs I wasn't sure anymore.

I think it's a bit of an awkward layering issue, but something we should probably still consider.
It's technically possible to do already via kubeadm config patches.

pohly added the kind/feature Categorizes issue or PR as related to a new feature. label Feb 13, 2020

This was referenced Feb 14, 2020

[Flaky tests] "command terminated with exit code 1" errors in pull-kubernetes-e2e-kind kubernetes/kubernetes#87953

Closed

number of loop devices is fixed and unpredictable #1248

Closed

BenTheElder added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Feb 18, 2020

BenTheElder added this to the 2020 goals milestone Feb 27, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 15, 2020

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 27, 2020

kubernetes-sigs deleted a comment from fejta-bot Jul 27, 2020

BenTheElder mentioned this issue Dec 7, 2020

CSI Volume tests should work with non-default kubelet root directories kubernetes/kubernetes#92664

Closed

pohly mentioned this issue Dec 9, 2020

distributed provisioning kubernetes-csi/external-provisioner#524

Merged

4 tasks

BenTheElder modified the milestones: 2020 goals, 2021 goals Jan 25, 2021

BenTheElder removed this from the 2021 goals milestone Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

clean up orphaned loop devices: use unique kubelet path #1333

clean up orphaned loop devices: use unique kubelet path #1333

pohly commented Feb 13, 2020

BenTheElder commented Feb 14, 2020

pohly commented Feb 14, 2020

BenTheElder commented Apr 16, 2020

BenTheElder commented Jul 27, 2020

pohly commented Dec 7, 2020

BenTheElder commented Dec 7, 2020

clean up orphaned loop devices: use unique kubelet path #1333

clean up orphaned loop devices: use unique kubelet path #1333

Comments

pohly commented Feb 13, 2020

BenTheElder commented Feb 14, 2020

pohly commented Feb 14, 2020

BenTheElder commented Apr 16, 2020

BenTheElder commented Jul 27, 2020

pohly commented Dec 7, 2020

BenTheElder commented Dec 7, 2020