Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to initialize snapshotter: initialize filesystem thin layer: xxxx bootstrap xxxx/image.boot: not found" #632

Open
zzzyuan0 opened this issue Feb 7, 2025 · 5 comments
Labels
bug Something isn't working

Comments

@zzzyuan0
Copy link

zzzyuan0 commented Feb 7, 2025

The following error occurs when using nydus snapshotter

time="2025-02-07T16:20:29.466399849+08:00" level=warning msg="Can't umount /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/62/mnt, canonicalise path for /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/62/mnt: lstat /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/62: no such file or directory"
time="2025-02-07T16:20:29.466431719+08:00" level=warning msg="Can't delete residual unix socket /container_data/containerd/io.containerd.snapshotter.v1.nydus/socket/crptj2omfs89thb6io60/api.sock, remove /container_data/containerd/io.containerd.snapshotter.v1.nydus/socket/crptj2omfs89thb6io60/api.sock: no such file or directory"
time="2025-02-07T16:20:29.466291878+08:00" level=info msg="Unmounting /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/20/mnt when clear vestige"
time="2025-02-07T16:20:29.466596520+08:00" level=warning msg="Can't umount /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/20/mnt, canonicalise path for /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/20/mnt: lstat /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/20: no such file or directory"
time="2025-02-07T16:20:29.466618880+08:00" level=warning msg="Can't delete residual unix socket /container_data/containerd/io.containerd.snapshotter.v1.nydus/socket/crpob4gmfs89thb6io50/api.sock, remove /container_data/containerd/io.containerd.snapshotter.v1.nydus/socket/crpob4gmfs89thb6io50/api.sock: no such file or directory"
time="2025-02-07T16:20:29.466291918+08:00" level=info msg="Unmounting /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/44/mnt when clear vestige"
time="2025-02-07T16:20:29.466712151+08:00" level=warning msg="Can't umount /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/44/mnt, canonicalise path for /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/44/mnt: lstat /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/44: no such file or directory"
time="2025-02-07T16:20:29.466758001+08:00" level=warning msg="Can't delete residual unix socket /container_data/containerd/io.containerd.snapshotter.v1.nydus/socket/crpr6pgmfs89thb6io5g/api.sock, remove /container_data/containerd/io.containerd.snapshotter.v1.nydus/socket/crpr6pgmfs89thb6io5g/api.sock: no such file or directory"
time="2025-02-07T16:20:29.466352948+08:00" level=warning msg="Can't umount /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/96/mnt, canonicalise path for /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/96/mnt: lstat /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/96: no such file or directory"
time="2025-02-07T16:20:29.466835191+08:00" level=warning msg="Can't delete residual unix socket /container_data/containerd/io.containerd.snapshotter.v1.nydus/socket/ctjad60mfs89u58sbs4g/api.sock, remove /container_data/containerd/io.containerd.snapshotter.v1.nydus/socket/ctjad60mfs89u58sbs4g/api.sock: no such file or directory"
time="2025-02-07T16:20:29.466900792+08:00" level=fatal msg="failed to start nydus-snapshotter" error="failed to initialize snapshotter: initialize filesystem thin layer: start daemon crptj2omfs89thb6io60: create command for daemon crptj2omfs89thb6io60: locate bootstrap : bootstrap /container_data/containerd/io.containerd.snapshotter.v1.nydus/snapshots/62/fs/image.boot: not found"

The above error occurs after the machine is powered off and restarted. Is it possible that the snapshot was cleaned up earlier, but the initialization operation of the restart was not checked until then? However, I checked the corresponding config and ctr -n k8s.io image ls| grep xxxx and found that the corresponding image does exist.

@zzzyuan0 zzzyuan0 changed the title failed to initialize snapshotter: initialize filesystem thin layer: xxxx bootstrap xxxx\image.boot: not found" failed to initialize snapshotter: initialize filesystem thin layer: xxxx bootstrap xxxx/image.boot: not found" Feb 7, 2025
@imeoer imeoer added the bug Something isn't working label Feb 10, 2025
@zzzyuan0
Copy link
Author

zzzyuan0 commented Feb 10, 2025

In fact, the startup of nydus-snapshotter should not be reversely dependent on the startup of daemon. I think the parent process should not depend on the child process.

In theory, the startup of nydus-snapshotter and daemon should be asynchronous, or if a daemon fails to run, it should be cleaned up or the image should be pulled again.

@imeoer
Copy link
Collaborator

imeoer commented Feb 10, 2025

@zzzyuan0 Did you changed the containerd root directory to /container_data/containerd ?

@zzzyuan0
Copy link
Author

zzzyuan0 commented Feb 10, 2025

We found the reason for the loss of bootstrap. It was a problem with the order of running our disk mount program and nydus, which caused nydus-snapshotter to read the unmounted directory (root) and think that the current config was empty. However, when cleaning the bootstrap, the mount was normal and the normal data was read, so all were cleaned up. This caused the config data and bootstrap data to be inconsistent.
However, I think it is still necessary to deal with the scenario I mentioned above, otherwise it will not be able to self-recover in abnormal scenarios, but will restart infinitely.

@imeoer
Copy link
Collaborator

imeoer commented Feb 11, 2025

@zzzyuan0 It makes sense, so we need to improve the logic about recovering daemon on the boot stage of snapshotter, ignoring the failed daemons here:

Image

@BraveY
Copy link

BraveY commented Feb 11, 2025

I'd like to work on this issue. I'll start working on it and submit a PR soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants