This directory contains files related to the AgentBaker E2E testing framework.
AgentBaker E2E tests verify that node bootstrapping artifacts generated by the AgentBaker API are correct and capable of integrating Azure VMs into Azure Kubernetes Service (AKS) clusters.
The E2E scenario template is defined in RunScenario. From a high-level, for each scenario,
- a new VMSS containing a single VM will be created.
- CSE and custom data are generated, which will then be applied to the new VM so it can bootstrap and register with the AKS cluster apiserver.
- Liveness and health checks and then run to make sure the new VM's kubelet is posting NodeReady, and that workload pods can successfully be scheduled and run on the new node.
To write an E2E scenario,
- choose a testing cluster. There are several defined
in cache.go, e.g,
- ClusterKubenet
- ClusterAzureNetwork
- ClusterAzureOverlayNetwork
- ClusterAzureOverlayNetworkDualStack
- ClusterCiliumNetwork
- ClusterLatestKubernetesVersion
- ClusterAzureBootstrapProfileCache (private ACR)
- ClusterAzureNetworkIsolated (no internet access)
- use
NodeBootstrappingConfiugration(nbc) to setup your scenario. it is used to invoke the primary node-bootstrapping API GetLatestNodeBootstrapping. to modify agentpool properties, usually you need to set bothnbc.containerService.properties.AgentPoolProfiles[0].xxxas well asnbc.agentPoolProfile. It is because when RP invokes AgentBaker, it will set the properties in this way and in e2e we follow the pattern. - use
VMConfigMutatorto set VMSS properties such as SKU when needed. Check vmss for other configs. it is necessary to setnbc.agentPoolProfile.VMSizeto match the VMSS SKU if you choose to change. - use
Validatorto include your own verification of the VM's live state, such as file existsnce, sysctl settings, etc.
All E2E clusters share a single VNet and Azure Bastion in the abe2e-{location} resource group. This
avoids creating a per-cluster Bastion (~10 min each) and ensures all clusters are reachable from a
single SSH entry point.
graph TB
subgraph RG["abe2e-{location} Resource Group"]
subgraph VNET["abe2e-shared-vnet (10.0.0.0/8)"]
BASTION_SUBNET["AzureBastionSubnet<br/>10.0.0.0/26"]
FW_SUBNET["AzureFirewallSubnet<br/>10.0.1.0/24"]
PE_SUBNET["abe2e-pe-subnet<br/>10.0.2.0/24<br/>(shared private endpoints)"]
KUBENET_SUBNET["aks-subnet-abe2e-kubenet-v5<br/>10.x.x.0/20"]
AZNET_SUBNET["aks-subnet-abe2e-azure-network-v4<br/>10.x.x.0/20"]
MORE_SUBNETS["... more cluster subnets"]
end
BASTION["abe2e-shared-bastion<br/>(Standard SKU, Tunneling)"]
FIREWALL["abe2e-fw<br/>(Azure Firewall)"]
IDENTITY["abe2e-cluster-identity<br/>(User-Assigned MSI)"]
PE_ACR["PE-for-abe2eprivate{location}<br/>PE-for-abe2eprivatenonanon{location}<br/>(shared ACR private endpoints)"]
DNS_ZONE["privatelink.azurecr.io<br/>(Private DNS Zone)"]
ACR_ANON["abe2eprivate{location}<br/>(Private ACR)"]
ACR_NONANON["abe2eprivatenonanon{location}<br/>(Non-anonymous Private ACR)"]
end
subgraph MC_KUBENET["MC_abe2e-kubenet-v5 Resource Group"]
VMSS_K["VMSS (system pool)"]
VMSS_K_TEST["VMSS (test VMs)"]
RT_K["Route Table<br/>(pod routes + firewall)"]
end
subgraph MC_NI["MC_abe2e-azure-networkisolated-v2 Resource Group"]
VMSS_NI["VMSS (system pool)"]
NSG_NI["NSG<br/>(blocks internet)"]
end
BASTION --> BASTION_SUBNET
FIREWALL --> FW_SUBNET
PE_ACR --> PE_SUBNET
DNS_ZONE -.->|VNet link| VNET
VMSS_K --> KUBENET_SUBNET
RT_K -.->|associated| KUBENET_SUBNET
VMSS_NI --> AZNET_SUBNET
NSG_NI -.->|associated| AZNET_SUBNET
DEV["Developer / CI"]
DEV -->|SSH via tunnel| BASTION
BASTION -->|"connects to any VM<br/>in shared VNet"| VMSS_K_TEST
The shared infrastructure is created automatically on first test run via cached idempotent functions — no separate setup script is needed.
| Resource | Name | Details |
|---|---|---|
| VNet | abe2e-shared-vnet |
10.0.0.0/8 — supports ~4096 /20 cluster subnets |
| Bastion | abe2e-shared-bastion |
Standard SKU with tunneling enabled for native SSH |
| Bastion Subnet | AzureBastionSubnet |
10.0.0.0/26 (required by Azure Bastion) |
| Firewall Subnet | AzureFirewallSubnet |
10.0.1.0/24 |
| PE Subnet | abe2e-pe-subnet |
10.0.2.0/24 — hosts shared private endpoints for ACRs |
| Identity | abe2e-cluster-identity |
User-assigned MSI with Network Contributor on the VNet |
| Private DNS Zone | privatelink.azurecr.io |
Shared zone in abe2e-{location} RG, linked to the VNet |
Each AKS cluster gets its own /20 subnet (4091 usable IPs) in the shared VNet. The subnet is
named aks-subnet-{clusterName}. CIDRs are auto-allocated from a hash of the cluster name to
avoid collisions.
All clusters use BYOV (Bring Your Own VNet) with the shared VNet. They differ in networking plugin, isolation level, and whether private ACR is needed.
| Cluster | Network Plugin | Special Features | Private ACR |
|---|---|---|---|
abe2e-kubenet-v5 |
Kubenet | Basic pod routing via route table | ❌ |
abe2e-azure-network-v4 |
Azure CNI | Pods get IPs from subnet (MaxPods=30) | ❌ |
abe2e-azure-overlay-network-v4 |
Azure CNI Overlay | Pods in virtual overlay, not subnet | ❌ |
abe2e-azure-overlay-dualstack-v4 |
Azure CNI Overlay | IPv4+IPv6 dual-stack | ❌ |
abe2e-cilium-network-v4 |
Azure CNI + Cilium | eBPF dataplane, replaces kube-proxy | ❌ |
abe2e-latest-kubernetes-version-v2 |
Kubenet | Auto-discovers latest GA K8s version | ❌ |
abe2e-azure-bootstrapprofile-cache-v2 |
Azure CNI | Bootstrap artifact caching from private ACR | ✅ |
abe2e-azure-networkisolated-v2 |
Azure CNI | NSG blocks all internet except allowlist | ✅ |
Network-isolated cluster adds an NSG to its subnet that blocks all outbound traffic except
management.azure.com, the cluster FQDN, and packages.aks.azure.com. Private endpoints for
the ACRs are in the shared PE subnet, with DNS records in the shared privatelink.azurecr.io zone.
CachedEnsureSharedInfra— runs once per location per test run. Creates/verifies the shared VNet, Bastion, Firewall, PE subnet, and user-assigned identity.configureSharedVNet— tags the cluster model for BYOV. After the cluster name is hashed,CachedEnsureClusterSubnetcreates the cluster's dedicated/20subnet.prepareCluster— creates/gets the AKS cluster, then runs a DAG of parallel tasks:- Bastion lookup (shared)
- Firewall route table (non-isolated clusters)
- NSG association (network-isolated cluster)
- Private DNS zone + VNet link (if ACR needed, runs once before ACR tasks)
- Private ACR + PE creation (bootstrapprofile-cache and network-isolated)
- VMSS garbage collection
- Debug daemonsets
- SSH to test VMs goes through the shared Bastion, which can reach any VM in the VNet.
sequenceDiagram
participant CI as Developer / CI
participant Infra as Shared Infra (cached)
participant ARM as Azure Resource Manager
participant AB as AgentBaker API
participant Bastion as Shared Bastion
participant VM as Test VM
participant K8s as Kube API Server
CI->>Infra: Ensure shared VNet + Bastion
Infra-->>CI: Ready (cached after first run)
CI->>Infra: Ensure cluster subnet
Infra-->>CI: Subnet ID
CI->>ARM: Create/Get AKS cluster (BYOV subnet)
ARM-->>CI: Cluster details
CI->>AB: Generate CSE + CustomData
AB-->>CI: VM configuration
CI->>ARM: Create VMSS in cluster subnet
ARM-->>CI: VM instance
CI->>Bastion: SSH tunnel to VM private IP
Bastion->>VM: Forward SSH connection
CI->>VM: Run health checks + validators
VM-->>CI: Results
CI->>K8s: Verify node ready
K8s-->>CI: Node ready ✓
Bastion-->>CI: Close tunnel
Note: if you have changed code or artifacts used to generate custom data or custom script extension payloads, you
should set DISABLE_SCRIPTLESS=true in .env. Otherwise scriptless provisioning only uses scripts that are built into VHD.
To run the E2E test suite locally, use e2e-local.sh. This script sets up the go test command.
Check config.go for the default configuration parameters. You can override these parameters by setting ENV variables.
Create a .env file in the e2e directory to set environment variables and avoid manual setup each time you run tests.
Refer to .env.sample for an example.
Use TAGS_TO_RUN= to specify scenarios based on tags. By default, all scenarios run. Multiple tags should be
comma-separated and are case-insensitive. Check logs for test tags.
Example:
TAGS_TO_RUN="os=ubuntu,arch=amd64,wasm=false,gpu=false,imagename=2404gen2containerd" ./e2e-local.shTo exclude scenarios, use TAGS_TO_SKIP=. Scenarios with any specified tags will be skipped (this logic is different to
TAGS_TO_RUN).
To run a specific test, use the test name:
TAGS_TO_RUN="name=Test_azurelinuxv2" ./e2e-local.sh
# or
go test -run Test_azurelinuxv2 -v -timeout 90mTo run multiple specific scenarios by name, provide multiple Name= filters. When all filters are
Name= filters, OR semantics are used automatically (matching any of the listed names):
TAGS_TO_RUN="name=Test_azurelinuxv2,name=Test_ubuntu2204" ./e2e-local.shSet KEEP_VMSS=true to retain bootstrapped VMs for debugging. Setting this will also have the VM's private SSH key
included in each scenario's log bundle. When using this flag, please ensure to run only test you need to debug, as the
VMs will not be
deleted after the test run.
Run tests with custom arguments after setting required environment variables:
go test -parallel 100 -timeout 90m -v -count 1Important go test flags:
-v: Verbose output-parallel 100: Run 100 tests in parallel, default is limited to the number of cores-timeout 90m: Set timeout, default is 10 minutes which is often exceeded-count 1: Disable test caching
Azure resources are deleted periodically by an external garbage collector. Locally stopped tests attempt a graceful
shutdown to clean up resources. Old VMs are deleted on startup unless created with KEEP_VMSS=true.
Set GOFLAGS="-timeout=90m -parallel=100" in your shell configuration file.
In Run > Edit Configurations..., set -timeout=90m -parallel=100 in the Go tool arguments field.
Add to settings.json:
{
"go.testFlags": [
"-parallel=100",
"-v"
],
"go.testTimeout": "90m"
}The top-level package of the Golang E2E implementation is named e2e and is entirely separate from all AgentBaker
packages.
The definitions and entry points for each test scenario, ran by go test, are located
in scenario_test.go.
Node images are pushed to Shared Image Gallery (SIG). Each image is tagged with branch name and build id.
By default E2E tests use latest version of images from SIG with branch=refs/heads/main tag.
Set SIG_VERSION_TAG_NAME and SIG_VERSION_TAG_VALUE to specify custom VHD builds:
SIG_VERSION_TAG_NAME=buildId SIG_VERSION_TAG_VALUE=123456789 TAGS_TO_RUN="os=ubuntu2204" ./e2e-local.shWhen adding tests for a new VHD image, ensure to add a delete-lock to prevent the garbage collector from deleting the image version.
E2E scenarios can be configured with VMSS configuration mutators that change/set properties on the VMSS model used to deploy the new VM to be bootstrapped. This is primarily useful when testing out different VM SKUs, especially for GPU-enabled scenarios which affect which code paths AgentBaker will use to generate CSE and custom data
Further, in order to support E2E scenarios which test different underlying AKS cluster configurations, such as the cluster's network plugin, each E2E scenario uses one of the predefined clusters. Same cluster can be reused in different test runs. If cluster doesn't exist a new one will be created automatically.
Lastly, E2E scenarios also consist of a list of live VM validators. Each live VM validator consists of a description, a bash command which will actually be run on the newly bootstrapped VM, and an "asserter" function that will perform assertions on the contents of both the stdout and stderr streams that result from the execution of the command. The validators can be used to assert on numerous types of properties of the live VM, such as the live file system and kernel state.
Each E2E scenario will generate its own logs after execution. Currently, these logs consist of:
cluster-provision.log- CSE execution log, retrieved from/var/log/azure/aks/cluster-provision.log(collected in success and CSE failure cases)kubelet.log- the kubelet systemd unit's logs retrived by runningjournalctl -u kubeleton the VM after bootstrapping has finished (collected in success and CSE failure cases)vmssId.txt- a single line text file containing the unique resource ID of the VMSS created by the respective scenario, mainly collected for the purposes of posthoc resource deletion (collected in all cases where the VMSS is able to be created)
These logs will be uploaded in a bundle of the format:
└── scenario-logs
└── <scenario>
├── cluster-provision.log
├── kubelet.log
├── vmssId.txtAfter a PR is created in AgentBaker's repo on GitHub, a pipeline calculating code coverage changes will automatically run.
We are utilizing coveralls to display the coverage report. The coverage report will be available in the PR's description. You can also view previous runs for the AgentBaker repo here.
We calculate code coverage for both unit tests and E2E tests.
To generate E2E coverage reports, we use code coverage changes introduced in Go 1.20.
Coverage report is generated by running AgentBaker's API server locally as a binary created with the -cover flag. E2E tests are then ran against that binary.
The following packages are used during calculation of coverage for E2E tests:
- github.com/Azure/agentbaker/apiserver
- github.com/Azure/agentbaker/cmd
- github.com/Azure/agentbaker/cmd/starter
- github.com/Azure/agentbaker/pkg/agent
- github.com/Azure/agentbaker/pkg/agent/datamodel
- github.com/Azure/agentbaker/pkg/templates
You can generate an E2E coverage report while running the E2E tests locally. To do so, follow the steps below:
- Build the AgentBaker server binary with -cover flag:
cd cmd
go build -cover -o baker -covermode count
GOCOVERDIR=covdatafiles ./baker start &- Create directory for coverage report files
mkdir -p covdatafiles- Run the binary
GOCOVERDIR=covdatafiles ./baker start &- Run the E2E tests locally
/bin/bash e2e/e2e-local.sh- Stop the binary - once the tests finish executing, you have to stop the binary with exit code 0 to generate the report. See the docs here.
kill $(pgrep baker)- Display the coverage report within the terminal
go tool covdata percent -i=./cmd/somedata