Skip to content

Commit 12c889c

Browse files
committed
Add docs for decommissioning nodes with the operator
1 parent 7d48082 commit 12c889c

File tree

3 files changed

+150
-0
lines changed

3 files changed

+150
-0
lines changed

src/current/v25.2/scale-cockroachdb-operator.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,53 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C
104104
~~~ shell
105105
kubectl get pods
106106
~~~
107+
108+
## Decommission nodes
109+
110+
When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline.
111+
112+
{{site.data.alerts.callout_info}}
113+
The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node.
114+
115+
If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling.
116+
{{site.data.alerts.end}}
117+
118+
The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node:
119+
120+
- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example:
121+
{% include_cached copy-clipboard.html %}
122+
~~~ yaml
123+
containers:
124+
- name: cockroach-operator
125+
image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }}
126+
args:
127+
- "-enable-k8s-node-controller=true"
128+
~~~
129+
- The role-based access control system must be configured to allow the operator to patch nodes.
130+
- At least one replica of the operator must not be on the target node.
131+
- There must be no under-replicated ranges on the CockroachDB cluster.
132+
133+
To mark a node for decommissioning, follow these steps:
134+
135+
1. Identify the name of the Kubernetes node that is to be removed.
136+
137+
1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example:
138+
139+
{% include_cached copy-clipboard.html %}
140+
~~~ shell
141+
kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true"
142+
~~~
143+
144+
1. Monitor the cluster:
145+
- Confirm the decommissioned node's cordoned status:
146+
{% include_cached copy-clipboard.html %}
147+
~~~ shell
148+
kubectl describe node {example-node-name}
149+
~~~
150+
- Monitor operator events and logs for decommission start and completion messages:
151+
{% include_cached copy-clipboard.html %}
152+
~~~ shell
153+
kubectl logs pod {operator-pod-name}
154+
~~~
155+
156+
If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled.

src/current/v25.3/scale-cockroachdb-operator.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,53 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C
104104
~~~ shell
105105
kubectl get pods
106106
~~~
107+
108+
## Decommission nodes
109+
110+
When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline.
111+
112+
{{site.data.alerts.callout_info}}
113+
The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node.
114+
115+
If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling.
116+
{{site.data.alerts.end}}
117+
118+
The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node:
119+
120+
- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example:
121+
{% include_cached copy-clipboard.html %}
122+
~~~ yaml
123+
containers:
124+
- name: cockroach-operator
125+
image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }}
126+
args:
127+
- "-enable-k8s-node-controller=true"
128+
~~~
129+
- The role-based access control system must be configured to allow the operator to patch nodes.
130+
- At least one replica of the operator must not be on the target node.
131+
- There must be no under-replicated ranges on the CockroachDB cluster.
132+
133+
To mark a node for decommissioning, follow these steps:
134+
135+
1. Identify the name of the Kubernetes node that is to be removed.
136+
137+
1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example:
138+
139+
{% include_cached copy-clipboard.html %}
140+
~~~ shell
141+
kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true"
142+
~~~
143+
144+
1. Monitor the cluster:
145+
- Confirm the decommissioned node's cordoned status:
146+
{% include_cached copy-clipboard.html %}
147+
~~~ shell
148+
kubectl describe node {example-node-name}
149+
~~~
150+
- Monitor operator events and logs for decommission start and completion messages:
151+
{% include_cached copy-clipboard.html %}
152+
~~~ shell
153+
kubectl logs pod {operator-pod-name}
154+
~~~
155+
156+
If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled.

src/current/v25.4/scale-cockroachdb-operator.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -104,3 +104,53 @@ Do not scale down to fewer than 3 nodes. This is considered an anti-pattern on C
104104
~~~ shell
105105
kubectl get pods
106106
~~~
107+
108+
## Decommission nodes
109+
110+
When a Kubernetes node is scheduled for removal or maintenance, the {{ site.data.products.cockroachdb-operator }} can be instructed to decommission the node which safely moves data and workloads away before it goes offline.
111+
112+
{{site.data.alerts.callout_info}}
113+
The CockroachDB node begins immediately decommissioning once the annotation is applied, it is not a mark for future removal. Once decommissioned, the node is cordoned so no further pods are scheduled on the node.
114+
115+
If cluster capacity is limited, replacement pods may remain in the `Pending` state until new nodes are available. This is expected, as the operator prioritizes data safety and full replication over immediate scheduling.
116+
{{site.data.alerts.end}}
117+
118+
The following prerequisites are necessary for the {{ site.data.products.cockroachdb-operator }} to be able to decommission a CockroachDB node:
119+
120+
- The `--enable-k8s-node-/controller=true` flag must be enabled in the operator's `.yaml` values file, for example:
121+
{% include_cached copy-clipboard.html %}
122+
~~~ yaml
123+
containers:
124+
- name: cockroach-operator
125+
image: {{ .Values.image.registry }}/{{ .Values.image.repository }}:{{ .Values.image.tag }}
126+
args:
127+
- "-enable-k8s-node-controller=true"
128+
~~~
129+
- The role-based access control system must be configured to allow the operator to patch nodes.
130+
- At least one replica of the operator must not be on the target node.
131+
- There must be no under-replicated ranges on the CockroachDB cluster.
132+
133+
To mark a node for decommissioning, follow these steps:
134+
135+
1. Identify the name of the Kubernetes node that is to be removed.
136+
137+
1. Annotate the Kubernetes node with `crdb.cockroachlabs.com/decommission="true"`. The decommissioning process begins immediately after this annotation is applied. Using `kubectl` for example:
138+
139+
{% include_cached copy-clipboard.html %}
140+
~~~ shell
141+
kubectl annotate node {example-node-name} crdb.cockroachlabs.com/decommission="true"
142+
~~~
143+
144+
1. Monitor the cluster:
145+
- Confirm the decommissioned node's cordoned status:
146+
{% include_cached copy-clipboard.html %}
147+
~~~ shell
148+
kubectl describe node {example-node-name}
149+
~~~
150+
- Monitor operator events and logs for decommission start and completion messages:
151+
{% include_cached copy-clipboard.html %}
152+
~~~ shell
153+
kubectl logs pod {operator-pod-name}
154+
~~~
155+
156+
If the replacement pods remain in a `Pending` state, this typically means there is not enough available capacity in the cluster for these pods to be scheduled.

0 commit comments

Comments
 (0)