KangalPatch automates rolling upgrades of Talos Linux nodes in Kubernetes clusters. The operator handles node draining, OS updates, reboots, and readiness checks while respecting PodDisruptionBudgets and failure thresholds.
Key features:
- Controlled concurrent node upgrades
- Automatic workload draining with PDB enforcement
- Configurable failure budgets with automatic halt on threshold breach
- Maintenance window support with date exclusions
- Pause and resume support for manual intervention
- Real-time upgrade status tracking
The operator watches PatchPlan resources. When you create one, it:
- Selects nodes based on labels and role
- For each node: drain → upgrade → reboot → verify
- Respects your concurrency, timing, and failure settings
If failures exceed your threshold, it stops automatically.
- Kubernetes cluster running Talos Linux
kubectlconfigured to access your cluster- Helm 3.x (for Helm installation)
# Install KangalPatch
helm install kangal-patch oci://ghcr.io/uozalp/helm/kangal-patch \
--version 0.1.2 \
--namespace kangal-patch \
--create-namespace# Install CRDs
kubectl apply -k config/crd
# Install RBAC and operator
kubectl apply -k config/managerFirst, create a secret containing your Talos API credentials:
# Extract credentials from your talosconfig (typically ~/.talos/config)
# Encode credentials to base64
CA_CERT=$(base64 -w0 < /path/to/ca.crt)
CLIENT_CERT=$(base64 -w0 < /path/to/client.crt)
CLIENT_KEY=$(base64 -w0 < /path/to/client.key)
# Create the secret with base64-encoded values
kubectl create secret generic talos-credentials \
--namespace kangal-patch \
--from-literal=ca.crt="$CA_CERT" \
--from-literal=tls.crt="$CLIENT_CERT" \
--from-literal=tls.key="$CLIENT_KEY"Create a PatchPlan custom resource to define your upgrade:
apiVersion: kangalpatch.ozalp.dk/v1alpha1
kind: PatchPlan
metadata:
name: simple-upgrade
spec:
target:
version: v1.11.6
source: ghcr
# Patch workers first, then control plane
patchWorkers: true
patchControlPlane: true
controlPlaneFirst: false
# Batch configuration
maxConcurrency: 1
# Timing
delayBetweenNodes: 300s
# Safety
respectPDBs: true
drainTimeout: 5m
rebootTimeout: 10m
maxFailures: 1
# Talos API
talosConfig:
endpoints:
- 10.0.0.10:50000
secretRef:
name: talos-credentials
namespace: kangal-patchExample using Talos Factory images:
The target specification uses individual fields to construct the factory image URL. The operator builds the full URL in the format:
factory.talos.dev/{installer}-installer[-secureboot]/{schematicID}:{version}
Field breakdown:
target:
version: v1.11.6 # The Talos version tag
source: factory # Use factory.talos.dev (vs ghcr)
installer: nocloud # The installer type (aws, azure, nocloud, etc.)
schematicID: 95d432d6bb... # The factory schematic hash
secureBoot: true # Adds -secureboot suffix to installerFull example:
apiVersion: kangalpatch.ozalp.dk/v1alpha1
kind: PatchPlan
metadata:
name: talos-upgrade-factory
spec:
target:
version: v1.12.1
source: factory
installer: aws
schematicID: 376567988ad370138ad8b2698212367b8edcb69b5fd68c80be1f2ec7d603b4ba
secureBoot: true
patchWorkers: true
patchControlPlane: true
maxConcurrency: 2
talosConfig:
endpoints:
- 10.0.0.10:50000
secretRef:
name: talos-credentials
namespace: kangal-patchApply the PatchPlan:
kubectl apply -f patchplan.yamlRestrict patching to specific time windows and exclude certain dates:
apiVersion: kangalpatch.ozalp.dk/v1alpha1
kind: PatchPlan
metadata:
name: talos-upgrade-maintenance
spec:
target:
version: v1.11.6
source: ghcr
# ... other configuration ...
# Maintenance windows
maintenance:
# Exclude specific dates (holidays, blackout periods)
excludeDates:
- "2026-12-24"
- "2026-12-25"
- "2026-12-26"
- "2026-12-31"
- "2027-01-01"
# Define when patching is allowed (UTC)
windows:
# Monday and Friday early morning
- days: ["Monday", "Friday"] # Supports: "Monday", "Mon", "monday"
startTime: "01:00"
endTime: "05:00"
# Wednesday night window
- days: ["Wed"]
startTime: "22:00"
endTime: "02:00" # Spans midnight
# Every day window (omit days field or use ["Any"])
- startTime: "03:00"
endTime: "04:00"Notes on maintenance windows:
- All times are in UTC
- Day names support full names ("Monday"), 3-letter abbreviations ("Mon"), case-insensitive
- Omit
daysfield or use["Any"]to match all days - Windows can span midnight (e.g., 22:00 to 02:00)
- Patching will be paused outside maintenance windows
- Exclude dates use YYYY-MM-DD format
Watch the upgrade progress:
# Watch status in real-time
$ kubectl get patchplan -w
NAME PHASE TARGET TOTAL COMPLETED FAILED AGE
simple-upgrade Completed v1.11.6 6 6 0 79m
# Check individual node status
$ kubectl get patchplan simple-upgrade -o jsonpath='{.status}' | jq
{
"completedNodes": 6,
"completionTime": "2025-12-28T20:54:03Z",
"failedNodes": 0,
"lastNodeScheduledAt": "2025-12-28T20:54:03Z",
"message": "all nodes processed",
"phase": "Completed",
"startTime": "2025-12-28T19:32:29Z",
"targetVersion": "v1.11.6",
"totalNodes": 6
}Pause an ongoing upgrade:
kubectl patch patchplan simple-upgrade --type merge -p '{"spec":{"paused":true}}'Resume:
kubectl patch patchplan simple-upgrade --type merge -p '{"spec":{"paused":false}}'| Field | Type | Description | Default |
|---|---|---|---|
target |
object | Target Talos image specification | Required |
nodeSelector |
map | Label selector for nodes | {} |
maxConcurrency |
int | Max nodes to patch concurrently | 1 |
maxFailures |
int | Max allowed failures before stopping | 0 |
delayBetweenNodes |
duration | Delay between nodes | 5m |
respectPDBs |
bool | Respect PodDisruptionBudgets | true |
drainTimeout |
duration | Max time for node drain | 5m |
rebootTimeout |
duration | Max time for reboot | 10m |
patchControlPlane |
bool | Patch control plane nodes | true |
patchWorkers |
bool | Patch worker nodes | true |
controlPlaneFirst |
bool | Patch control plane first | false |
paused |
bool | Pause operation | false |
maintenance |
object | Maintenance window configuration | nil |
| Field | Type | Description | Default |
|---|---|---|---|
version |
string | Talos version (e.g., v1.12.1) | Required |
source |
string | Image source: "ghcr" or "factory" | ghcr |
installer |
string | Installer type (e.g., "aws", "nocloud"). Required when source=factory | - |
schematicID |
string | Talos factory schematic ID. Required when source=factory | - |
secureBoot |
bool | Enable secure boot. Only applicable when source=factory | false |
| Field | Type | Description |
|---|---|---|
excludeDates |
[]string | List of dates (YYYY-MM-DD) to exclude from patching |
windows |
[]MaintenanceWindow | List of time windows when patching is allowed |
| Field | Type | Description |
|---|---|---|
days |
[]string | Days of week (e.g., "Monday", "Mon"). Empty = all days |
startTime |
string | Start time in HH:MM format (UTC) |
endTime |
string | End time in HH:MM format (UTC) |
disabled |
bool | Temporarily disable this window |
# Build the binary
make build
# Run tests
make test
# Build Docker image
make docker-build IMG=ghcr.io/uozalp/kangal-patch:dev
# Generate manifests
make manifests
# Generate code
make generateContributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
