You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -157,31 +158,31 @@ With this proposal it's also necessary to recalculate and update OOMScoreAdj and
157
158
158
159
### User Stories
159
160
160
-
#### Story 1
161
+
#### Story 1: Specialized Hardware
161
162
162
163
As a Kubernetes user, I want to allocate more resources (CPU, memory) to a node with existing specialized hardware or CPU Capabilities (for example:https://www.kernel.org/doc/html/v5.8/arm64/elf_hwcaps.html)
163
164
so that additional workloads can leverage the hardware to be efficiently scheduled and run without manual intervention.
164
165
165
-
#### Story 2
166
+
#### Story 2: Optimize System Performance
166
167
167
-
As a Kubernetes Application Developer, I want the kernel to optimize system performance by making better use of local resources when a node is resized, so that my applications run faster with fewer disruptions. This is achieved when there are
168
+
As a Performance Analyst I want the kernel to optimize system performance by making better use of local resources when a node is resized, so that my applications run faster with fewer disruptions. This is achieved when there are
168
169
Fewer Context Switches: With more CPU cores and memory on a resized node, the kernel has a better chance to spread workloads out efficiently. This can reduce contention between processes, leading to fewer context switches (which can be costly in terms of CPU time)
169
170
and less process interference and also reduces latency.
170
171
Better Memory Allocation: If the kernel has more memory available, it can allocate larger contiguous memory blocks, which can lead to better memory locality (i.e., keeping related data closer in physical memory),improved paging and swap limits thus
171
172
reducing latency for applications that rely on large datasets, in the case of a database applications.
172
173
173
-
#### Story 3
174
+
#### Story 3: Reduce Operational Complexity
174
175
175
176
As a Site Reliability Engineer (SRE), I want to reduce the operational complexity of managing multiple worker nodes, so that I can focus on fewer resources and simplify troubleshooting and monitoring.
176
177
177
-
#### Story 4
178
+
#### Story 4: Increase Compute Capacity on-the-fly
178
179
179
180
As a Cluster administrator, I want to resize a Kubernetes node dynamically, so that I can quickly hot plug resources without waiting for new nodes to join the cluster.
180
181
181
-
#### Story 5
182
+
#### Story 5: Avoid Workload Disruption
182
183
183
184
As a Cluster administrator, I expect my existing workloads to function without having to undergo a disruption which is induced during capacity addition followed by a node/kubelet restart to
184
-
detect the change in compute capacity, which can bring in additional complications.
185
+
detect the change in compute capacity, which can bring in further complications.
185
186
186
187
### Notes/Constraints/Caveats (Optional)
187
188
@@ -277,7 +278,7 @@ With increase in cluster resources the following components will be updated:
277
278
### Handling hotplug events
278
279
279
280
Once the capacity of the node is altered, the following are the sequence of events that occur in the kubelet. If any errors are
280
-
observed in any of the steps, operation is retried from step 1 along with a `FailedNodeResize` event under the node object.
281
+
observed in any of the steps, operation is retried from step 1 along with a `FailedNodeResize` event and condition under the node object.
281
282
1. Resizing existing containers:
282
283
a. With the increased memory capacity of the nodes, the kubelet proceeds to update fields that are directly related to
283
284
the available memory on the host. This would lead to recalculation of oom_score_adj and swap_limits.
@@ -307,7 +308,7 @@ T=0: Node Resources:
307
308
Runtime:
308
309
- <cgroup_path>/memory.swap.max: 1.33G
309
310
310
-
T=1: Resize Instance to Hotplug Memory:
311
+
T=1: Resize Node to Hotplug Memory:
311
312
- Memory: 8G
312
313
- Swap: 4G
313
314
Pod:
@@ -333,7 +334,7 @@ To ensure the Cluster Autoscaler acknowledges resource hotplug, the following ap
333
334
2. Identify Nodes Affected by Hotplug:
334
335
* By flagging a Node as being impacted by hotplug, the Cluster Autoscaler could revert to a less reliable but more conservative "scale from 0 nodes" logic.
335
336
336
-
Given that this KEP and autoscaler are inter-related, the above approaches were discussed in the community with relevant stakeholders, and have decided approaching this problem through the former route.
337
+
Given that this KEP and autoscaler are inter-related, the above approaches were discussed in the community with relevant stakeholders, and have decided approaching this problem through the approach 1.
337
338
The same will be targeted around the beta graduation of this KEP
338
339
339
340
### Handling HotUnplug Events
@@ -352,19 +353,18 @@ In this case, valid configuration refers to a state which can either be previous
0 commit comments