Skip to content

cri-containerd doesnt successfully reboot kubelet #26

@w9n

Description

@w9n

Description

Steps to reproduce the issue:
start master, wait until all pods are started, poweroff and restart. In ~<10% it reboots successfully

Describe the results you received:
kubelet log

kubelet.sh: kubelet already configured
kubelet.sh: waiting for /etc/kubernetes/kubelet.conf
kubelet.sh: /etc/kubernetes/kubelet.conf has arrived
I1213 19:31:02.058981     572 feature_gate.go:156] feature gates: map[]
I1213 19:31:02.059026     572 controller.go:114] kubelet config controller: starting controller
I1213 19:31:02.059031     572 controller.go:118] kubelet config controller: validating combination of defaults and flags
I1213 19:31:02.075793     572 feature_gate.go:156] feature gates: map[]
W1213 19:31:02.075912     572 server.go:289] --cloud-provider=auto-detect is deprecated. The desired cloud provider should be set e
xplicitly
I1213 19:31:02.079398     572 manager.go:149] cAdvisor running in container: "/sys/fs/cgroup/cpu"
W1213 19:31:02.080964     572 manager.go:153] Unable to connect to Docker: Cannot connect to the Docker daemon. Is the docker daemo
n running on this host?
W1213 19:31:02.082506     572 manager.go:157] unable to connect to Rkt api service: rkt: cannot tcp Dial rkt api service: dial tcp 
[::1]:15441: getsockopt: connection refused
W1213 19:31:02.082564     572 manager.go:166] unable to connect to CRI-O api service: Get http://%2Fvar%2Frun%2Fcrio.sock/info: dia
l unix /var/run/crio.sock: connect: no such file or directory
I1213 19:31:02.082950     572 fs.go:139] Filesystem UUIDs: map[d358f4da-8808-4f6c-8b8a-fcc647914e7d:/dev/sda1]
I1213 19:31:02.082962     572 fs.go:140] Filesystem partitions: map[shm:{mountpoint:/dev/shm major:0 minor:18 fsType:tmpfs blockSiz
e:0} tmpfs:{mountpoint:/run major:0 minor:15 fsType:tmpfs blockSize:0} /dev/sda1:{mountpoint:/var/lib major:8 minor:1 fsType:ext4 b
lockSize:0}]
I1213 19:31:02.091136     572 info.go:51] Couldn't collect info from any of the files in "/etc/machine-id,/var/lib/dbus/machine-id"

I1213 19:31:02.093083     572 manager.go:216] Machine: {NumCores:1 CpuFrequency:3400022 MemoryCapacity:4574240768 HugePages:[{PageS
ize:2048 NumPages:0}] MachineID: SystemUUID:D740F169-85BB-4FD2-9F1E-A81CED65D3FD BootID:799bdd7f-4d34-4d88-9189-c376c589bf85 Filesy
stems:[{Device:shm DeviceMajor:0 DeviceMinor:18 Capacity:2287120384 Type:vfs Inodes:558379 HasInodes:true} {Device:tmpfs DeviceMajo
r:0 DeviceMinor:15 Capacity:457424896 Type:vfs Inodes:558379 HasInodes:true} {Device:/dev/sda1 DeviceMajor:8 DeviceMinor:1 Capacity
:8386961408 Type:vfs Inodes:524288 HasInodes:true} {Device:overlay DeviceMajor:0 DeviceMinor:49 Capacity:2287120384 Type:vfs Inodes
:558379 HasInodes:true}] DiskMap:map[43:0:{Name:nbd0 Major:43 Minor:0 Size:0 Scheduler:none} 43:3:{Name:nbd3 Major:43 Minor:3 Size:
0 Scheduler:none} 43:5:{Name:nbd5 Major:43 Minor:5 Size:0 Scheduler:none} 43:9:{Name:nbd9 Major:43 Minor:9 Size:0 Scheduler:none} 8
:0:{Name:sda Major:8 Minor:0 Size:8589934592 Scheduler:deadline} 43:15:{Name:nbd15 Major:43 Minor:15 Size:0 Scheduler:none} 43:2:{N
ame:nbd2 Major:43 Minor:2 Size:0 Scheduler:none} 43:7:{Name:nbd7 Major:43 Minor:7 Size:0 Scheduler:none} 43:8:{Name:nbd8 Major:43 M
inor:8 Size:0 Scheduler:none} 43:10:{Name:nbd10 Major:43 Minor:10 Size:0 Scheduler:none} 43:4:{Name:nbd4 Major:43 Minor:4 Size:0 Sc
heduler:none} 43:12:{Name:nbd12 Major:43 Minor:12 Size:0 Scheduler:none} 43:13:{Name:nbd13 Major:43 Minor:13 Size:0 Scheduler:none}
 43:14:{Name:nbd14 Major:43 Minor:14 Size:0 Scheduler:none} 43:6:{Name:nbd6 Major:43 Minor:6 Size:0 Scheduler:none} 43:1:{Name:nbd1
 Major:43 Minor:1 Size:0 Scheduler:none} 43:11:{Name:nbd11 Major:43 Minor:11 Size:0 Scheduler:none}] NetworkDevices:[{Name:eth0 Mac
Address:82:3f:cf:43:6d:fc Speed:-1 Mtu:1500} {Name:ip6tnl0 MacAddress:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 Speed:0 Mtu:1
452} {Name:tunl0 MacAddress:00:00:00:00 Speed:0 Mtu:1480}] Topology:[{Id:0 Memory:0 Cores:[{Id:0 Threads:[0] Caches:[{Size:32768 Ty
pe:Data Level:1} {Size:32768 Type:Instruction Level:1} {Size:4194304 Type:Unified Level:2}]}] Caches:[{Size:16777216 Type:Unified L
evel:3}]}] CloudProvider:Unknown InstanceType:Unknown InstanceID:None}
I1213 19:31:02.099803     572 manager.go:222] Version: {KernelVersion:4.9.62-linuxkit ContainerOsVersion:LinuxKit Kubernetes Projec
t DockerVersion:Unknown DockerAPIVersion:Unknown CadvisorVersion: CadvisorRevision:}
I1213 19:31:02.122241     572 container_manager_linux.go:257] Creating Container Manager object based on Node Config: {RuntimeCgrou
psName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:remote CgroupsPerQOS:false CgroupRoot: CgroupDriver:cgroupfs Protec
tKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: EnforceNodeAllocatable:map[] KubeRes
erved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentag
e:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:
0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<
nil>}]} ExperimentalQOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s}
I1213 19:31:02.124036     572 container_manager_linux.go:288] Creating device plugin handler: false
I1213 19:31:02.124167     572 kubelet.go:273] Adding manifest file: /etc/kubernetes/manifests
I1213 19:31:02.124211     572 kubelet.go:283] Watching apiserver
E1213 19:31:02.127909     572 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:422: Failed to list *v1.Node: Get https://
10.10.10.127:6443/api/v1/nodes?fieldSelector=metadata.name%3Dlinuxkit-823fcf436dfc&resourceVersion=0: dial tcp 10.10.10.127:6443: g
etsockopt: connection refused
E1213 19:31:02.167876     572 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:413: Failed to list *v1.Service: Get https
://10.10.10.127:6443/api/v1/services?resourceVersion=0: dial tcp 10.10.10.127:6443: getsockopt: connection refused
E1213 19:31:02.167939     572 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get h
ttps://10.10.10.127:6443/api/v1/pods?fieldSelector=spec.nodeName%3Dlinuxkit-823fcf436dfc&resourceVersion=0: dial tcp 10.10.10.127:6
443: getsockopt: connection refused
W1213 19:31:02.260304     572 kubelet_network.go:62] Hairpin mode set to "promiscuous-bridge" but container runtime is "remote", ig
noring
I1213 19:31:02.263340     572 kubelet.go:517] Hairpin mode set to "none"
I1213 19:31:02.263547     572 remote_runtime.go:43] Connecting to runtime service unix:///var/run/cri-containerd.sock
2017/12/13 19:31:02 grpc: addrConn.resetTransport failed to create client transport: connection error: desc = "transport: dial unix
 /var/run/cri-containerd.sock: connect: no such file or directory"; Reconnecting to {/var/run/cri-containerd.sock <nil>}
E1213 19:31:02.282627     572 remote_runtime.go:69] Version from runtime service failed: rpc error: code = Unavailable desc = grpc:
 the connection is unavailable
E1213 19:31:02.282857     572 kuberuntime_manager.go:160] Get runtime version failed: rpc error: code = Unavailable desc = grpc: th
e connection is unavailable
error: failed to run Kubelet: failed to create kubelet: rpc error: code = Unavailable desc = grpc: the connection is unavailable

cri-containerd

I1213 19:31:00.959698     535 cri_containerd.go:100] Run cri-containerd &{Config:{ContainerdConfig:{RootDir:/var/lib/containerd Sna
pshotter:overlayfs Endpoint:/run/containerd/containerd.sock Runtime:io.containerd.runtime.v1.linux RuntimeEngine: RuntimeRoot:} Cni
Config:{NetworkPluginBinDir:/var/lib/cni/opt/bin NetworkPluginConfDir:/var/lib/cni/etc/net.d} SocketPath:/var/run/cri-containerd.so
ck RootDir:/var/lib/cri-containerd StreamServerAddress: StreamServerPort:10010 CgroupPath: EnableSelinux:false SandboxImage:gcr.io/
google_containers/pause:3.0 StatsCollectPeriod:10 SystemdCgroup:false OOMScore:-999 EnableProfiling:true ProfilingPort:10011 Profil
ingAddress:127.0.0.1} ConfigFilePath:/etc/cri-containerd/config.toml}
I1213 19:31:00.961008     535 cri_containerd.go:104] Start profiling server
I1213 19:31:00.961020     535 cri_containerd.go:108] Run cri-containerd grpc server on socket "/var/run/cri-containerd.sock"
I1213 19:31:00.966705     535 service.go:155] Get device uuid "d358f4da-8808-4f6c-8b8a-fcc647914e7d" for image filesystem "/var/lib
/containerd/io.containerd.snapshotter.v1.overlayfs"
time="2017-12-13T19:31:00Z" level=info msg="CNI network weave (type=weave-net) is used from /var/lib/cni/etc/net.d/10-weave.conf" 
time="2017-12-13T19:31:00Z" level=info msg="CNI network weave (type=weave-net) is used from /var/lib/cni/etc/net.d/10-weave.conf" 
I1213 19:31:00.977500     535 service.go:182] Start cri-containerd service
I1213 19:31:00.977526     535 service.go:184] Start recovering state
I1213 19:31:02.346005     535 service.go:190] Start event monitor
I1213 19:31:02.346016     535 service.go:194] Start snapshots syncer
I1213 19:31:02.346028     535 service.go:203] Start streaming server
I1213 19:31:02.346032     535 service.go:214] Start grpc server
I1213 19:31:02.346127     535 events.go:94] TaskExit event &TaskExit{ContainerID:rngd,ID:rngd,Pid:636,ExitStatus:1,ExitedAt:2017-12
-13 19:31:01.834075925 +0000 UTC,}
E1213 19:31:02.346164     535 events.go:100] Failed to get container "rngd": does not exist
I1213 19:31:02.347662     535 events.go:94] TaskExit event &TaskExit{ContainerID:kubelet,ID:kubelet,Pid:572,ExitStatus:1,ExitedAt:2
017-12-13 19:31:02.300502105 +0000 UTC,}
E1213 19:31:02.347679     535 events.go:100] Failed to get container "kubelet": does not exist

Describe the results you expected:
running kubelet

Additional information you deem important (e.g. issue happens only occasionally):
I could reproduce this back to at least f9a2a31

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions