Install AMD GPU Kernel drivers if required #5875
Open
Azure Pipelines / Agentbaker E2E
failed
Feb 27, 2025 in 23m 8s
Build #20250227.2 had test failures
Details
- Failed: 3 (4.11%)
- Passed: 70 (95.89%)
- Other: 0 (0.00%)
- Total: 73
Annotations
Check failure on line 9631 in Build log
azure-pipelines / Agentbaker E2E
Build log #L9631
Bash exited with code '1'.
Check failure on line 1 in Test_Ubuntu2204_AirGap_NonAnonymousACR
azure-pipelines / Agentbaker E2E
Test_Ubuntu2204_AirGap_NonAnonymousACR
Failed
Raw output
cluster.go:269: cluster abe2e-kubenet-nonanonpull-airgap-b9a80 already exists in rg abe2e-westus3
cluster.go:123: node resource group: MC_abe2e-westus3_abe2e-kubenet-nonanonpull-airgap-b9a80_westus3
cluster.go:134: using private acr "privateace2enonanonpullwestus3" isAnonyomusPull true
aks_model.go:208: Creating private Azure Container Registry privateace2enonanonpullwestus3 in rg abe2e-westus3
aks_model.go:338: Checking if private Azure Container Registry cache rules are correct in rg abe2e-westus3
aks_model.go:353: Private ACR cache is correct
aks_model.go:217: Private ACR already exists at id /subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/abe2e-westus3/providers/Microsoft.ContainerRegistry/registries/privateace2enonanonpullwestus3, skipping creation
aks_model.go:72: Adding network settings for airgap cluster abe2e-kubenet-nonanonpull-airgap-b9a80 in rg MC_abe2e-westus3_abe2e-kubenet-nonanonpull-airgap-b9a80_westus3
aks_model.go:156: Checking if private endpoint for private container registry is in rg MC_abe2e-westus3_abe2e-kubenet-nonanonpull-airgap-b9a80_westus3
aks_model.go:197: Private Endpoint already exists with ID: /subscriptions/8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8/resourceGroups/MC_abe2e-westus3_abe2e-kubenet-nonanonpull-airgap-b9a80_westus3/providers/Microsoft.Network/privateEndpoints/PE-for-ABE2ETests
aks_model.go:165: Private Endpoint already exists, skipping creation
aks_model.go:108: updated cluster abe2e-kubenet-nonanonpull-airgap-b9a80 subnet with airgap settings
cluster.go:205: assigning ACR-Pull role to a0b1a1cd-db2f-4ffd-b22f-d91d13bec140
kube.go:364: Creating daemonset debug-mariner-tolerated with image privateace2enonanonpullwestus3.azurecr.io/cbl-mariner/base/core:2.0
kube.go:364: Creating daemonset debugnonhost-mariner-tolerated with image privateace2enonanonpullwestus3.azurecr.io/cbl-mariner/base/core:2.0
kube.go:85: waiting for pod app=debug-mariner-tolerated in "default" namespace to be ready
kube.go:106: time before timeout: 14m43.868294178s
kube.go:268: {
"Name": "debug-mariner-tolerated-f7vfc",
"Namespace": "default",
"Containers": [
{
"Name": "mariner",
"Image": "privateace2enonanonpullwestus3.azurecr.io/cbl-mariner/base/core:2.0",
"Ports": null
}
],
"Conditions": null,
"Phase": "Pending",
"StartTime": "2025-02-26T22:10:48Z",
"Events": [
{
"Reason": "FailedToRetrieveImagePullSecret",
"Message": "Unable to retrieve some image pull secrets (acr-secret-code2); attempting to pull the image may not succeed.",
"Count": 1355,
"LastTimestamp": "2025-02-27T03:05:53Z"
}
],
"Logs": "{\"kind\":\"Status\",\"apiVersion\":\"v1\",\"metadata\":{},\"status\":\"Failure\",\"message\":\"container \\\"mariner\\\" in pod \\\"debug-mariner-tolerated-f7vfc\\\" is waiting to start: trying and failing to pull image\",\"reason\":\"BadRequest\",\"code\":400}\n"
}
kube.go:106: time before timeout: 9m43.867558684s
kube.go:268: {
"Name": "debug-mariner-tolerated-f7vfc",
"Namespace": "default",
"Containers": [
{
"Name": "mariner",
"Image": "privateace2enonanonpullwestus3.azurecr.io/cbl-mariner/base/core:2.0",
"Ports": null
}
],
"Conditions": null,
"Phase": "Pending",
"StartTime": "2025-02-26T22:10:48Z",
"Events": [
{
"Reason": "FailedToRetrieveImagePullSecret",
"Message": "Unable to retrieve some image pull secrets (acr-secret-code2); attempting to pull the image may not succeed.",
"Count": 1378,
"LastTimestamp": "2025-02-27T03:11:01Z"
}
],
Check failure on line 1 in Test_Ubuntu2204_GPUNoDriver
azure-pipelines / Agentbaker E2E
Test_Ubuntu2204_GPUNoDriver
Failed
Raw output
azure.go:501: creating VMSS uish-2025-02-27-ubuntu2204gpunodriver in resource group MC_abe2e-westus3_abe2e-kubenet-322d3_westus3
azure.go:514: created VMSS uish-2025-02-27-ubuntu2204gpunodriver in resource group MC_abe2e-westus3_abe2e-kubenet-322d3_westus3
exec.go:190: SSH Instructions: (VM will be automatically deleted after the test finishes, set KEEP_VMSS=true to preserve it or pause the test with a breakpoint before the test finishes)
========================
az account set --subscription 8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8
az aks get-credentials --resource-group abe2e-westus3 --name abe2e-kubenet-322d3 --overwrite-existing
kubectl exec -it debug-mariner-tolerated-swglt -- bash -c "chroot /proc/1/root /bin/bash -c 'ssh -i sshkey102240109 -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=5 [email protected]'"
scenario_helpers_test.go:146: vmss uish-2025-02-27-ubuntu2204gpunodriver creation succeeded
kube.go:147: waiting for node uish-2025-02-27-ubuntu2204gpunodriver to be ready
kube.go:170: node uish-2025-02-27-ubuntu2204gpunodriver000000 is ready. Taints: [{"key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true","effect":"NoSchedule"}] Conditions: [{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2025-02-27T03:03:14Z","lastTransitionTime":"2025-02-27T03:03:14Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2025-02-27T03:03:14Z","lastTransitionTime":"2025-02-27T03:03:14Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","status":"False","lastHeartbeatTime":"2025-02-27T03:03:14Z","lastTransitionTime":"2025-02-27T03:03:14Z","reason":"KubeletHasSufficientPID","message":"kubelet has sufficient PID available"},{"type":"Ready","status":"True","lastHeartbeatTime":"2025-02-27T03:03:14Z","lastTransitionTime":"2025-02-27T03:03:14Z","reason":"KubeletReady","message":"kubelet is posting ready status"}]
scenario_helpers_test.go:101: Choosing the private ACR "privateacre2ewestus3" for the vm validation
pod.go:18: creating pod "uish-2025-02-27-ubuntu2204gpunodriver000000-test-pod"
kube.go:85: waiting for pod metadata.name=uish-2025-02-27-ubuntu2204gpunodriver000000-test-pod in "default" namespace to be ready
kube.go:106: time before timeout: 9m49.253335098s
kube.go:268: {
"Name": "uish-2025-02-27-ubuntu2204gpunodriver000000-test-pod",
"Namespace": "default",
"Containers": [
{
"Name": "mariner",
"Image": "mcr.microsoft.com/cbl-mariner/busybox:2.0",
"Ports": [
{
"containerPort": 80,
"protocol": "TCP"
}
]
}
],
"Conditions": null,
"Phase": "Pending",
"StartTime": null,
"Events": [
{
"Reason": "FailedScheduling",
"Message": "0/43 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 10 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 32 node(s) had untolerated taint {node.kubernetes.io/network-unavailable: }. preemption: 0/43 nodes are available: 43 Preemption is not helpful for scheduling.",
"Count": 0,
"LastTimestamp": null
},
{
"Reason": "FailedScheduling",
"Message": "0/44 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 10 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 33 node(s) had untolerated taint {node.kubernetes.io/network-unavailable: }. preemption: 0/44 nodes are available: 44 Preemption is not helpful for scheduling.",
"Count": 0,
Check failure on line 1 in Test_Ubuntu2404Gen2_GPUNoDriver
azure-pipelines / Agentbaker E2E
Test_Ubuntu2404Gen2_GPUNoDriver
Failed
Raw output
vhd.go:211: finding the latest image version for 2404gen2containerd,
azure.go:412: found the latest image version for 2404gen2containerd, 1.1740597228.1051
vhd.go:224: found the latest image version for 2404gen2containerd, /subscriptions/c4c3550e-a965-4993-a50c-628fd38cd3e1/resourceGroups/aksvhdtestbuildrg/providers/Microsoft.Compute/galleries/PackerSigGalleryEastUS/images/2404gen2containerd/versions/1.1740597228.1051
azure.go:501: creating VMSS ahw6-2025-02-27-ubuntu2404gen2gpunodriver in resource group MC_abe2e-westus3_abe2e-kubenet-322d3_westus3
azure.go:514: created VMSS ahw6-2025-02-27-ubuntu2404gen2gpunodriver in resource group MC_abe2e-westus3_abe2e-kubenet-322d3_westus3
exec.go:190: SSH Instructions: (VM will be automatically deleted after the test finishes, set KEEP_VMSS=true to preserve it or pause the test with a breakpoint before the test finishes)
========================
az account set --subscription 8ecadfc9-d1a3-4ea4-b844-0d9f87e4d7c8
az aks get-credentials --resource-group abe2e-westus3 --name abe2e-kubenet-322d3 --overwrite-existing
kubectl exec -it debug-mariner-tolerated-swglt -- bash -c "chroot /proc/1/root /bin/bash -c 'ssh -i sshkey10224026 -o PasswordAuthentication=no -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ConnectTimeout=5 [email protected]'"
scenario_helpers_test.go:146: vmss ahw6-2025-02-27-ubuntu2404gen2gpunodriver creation succeeded
kube.go:147: waiting for node ahw6-2025-02-27-ubuntu2404gen2gpunodriver to be ready
kube.go:170: node ahw6-2025-02-27-ubuntu2404gen2gpunodriver000000 is ready. Taints: [{"key":"node.cloudprovider.kubernetes.io/uninitialized","value":"true","effect":"NoSchedule"}] Conditions: [{"type":"MemoryPressure","status":"False","lastHeartbeatTime":"2025-02-27T03:03:52Z","lastTransitionTime":"2025-02-27T03:03:52Z","reason":"KubeletHasSufficientMemory","message":"kubelet has sufficient memory available"},{"type":"DiskPressure","status":"False","lastHeartbeatTime":"2025-02-27T03:03:52Z","lastTransitionTime":"2025-02-27T03:03:52Z","reason":"KubeletHasNoDiskPressure","message":"kubelet has no disk pressure"},{"type":"PIDPressure","status":"False","lastHeartbeatTime":"2025-02-27T03:03:52Z","lastTransitionTime":"2025-02-27T03:03:52Z","reason":"KubeletHasSufficientPID","message":"kubelet has sufficient PID available"},{"type":"Ready","status":"True","lastHeartbeatTime":"2025-02-27T03:03:52Z","lastTransitionTime":"2025-02-27T03:03:52Z","reason":"KubeletReady","message":"kubelet is posting ready status"}]
scenario_helpers_test.go:101: Choosing the private ACR "privateacre2ewestus3" for the vm validation
pod.go:18: creating pod "ahw6-2025-02-27-ubuntu2404gen2gpunodriver000000-test-pod"
kube.go:85: waiting for pod metadata.name=ahw6-2025-02-27-ubuntu2404gen2gpunodriver000000-test-pod in "default" namespace to be ready
kube.go:106: time before timeout: 9m3.846523656s
kube.go:268: {
"Name": "ahw6-2025-02-27-ubuntu2404gen2gpunodriver000000-test-pod",
"Namespace": "default",
"Containers": [
{
"Name": "mariner",
"Image": "mcr.microsoft.com/cbl-mariner/busybox:2.0",
"Ports": [
{
"containerPort": 80,
"protocol": "TCP"
}
]
}
],
"Conditions": null,
"Phase": "Pending",
"StartTime": null,
"Events": [
{
"Reason": "FailedScheduling",
"Message": "0/52 nodes are available: 4 node(s) had untolerated taint {node.cloudprovider.kubernetes.io/uninitialized: true}, 42 node(s) had untolerated taint {node.kubernetes.io/network-unavailable: }, 6 node(s) didn't match Pod's node affinity/selector. preemption: 0/52 nodes are available: 52 Preemption is not helpful for scheduling.",
"Count": 0,
"LastTimestamp": nul
Loading