Skip to content

Commit 00410c8

Browse files
committed
Address review comments
1 parent c3f9c1c commit 00410c8

File tree

1 file changed

+28
-26
lines changed
  • keps/sig-node/3953-node-resource-hot-plug

1 file changed

+28
-26
lines changed

keps/sig-node/3953-node-resource-hot-plug/README.md

Lines changed: 28 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,11 @@ tags, and then generate with `hack/update-toc.sh`.
1919
- [Non-Goals](#non-goals)
2020
- [Proposal](#proposal)
2121
- [User Stories](#user-stories)
22-
- [Story 1](#story-1)
23-
- [Story 2](#story-2)
24-
- [Story 3](#story-3)
25-
- [Story 4](#story-4)
26-
- [Story 5](#story-5)
22+
- [Story 1: Specialized Hardware](#story-1-specialized-hardware)
23+
- [Story 2: Optimize System Performance](#story-2-optimize-system-performance)
24+
- [Story 3: Reduce Operational Complexity](#story-3-reduce-operational-complexity)
25+
- [Story 4: Increase Compute Capacity on-the-fly](#story-4-increase-compute-capacity-on-the-fly)
26+
- [Story 5: Avoid Workload Disruption](#story-5-avoid-workload-disruption)
2727
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
2828
- [Risks and Mitigations](#risks-and-mitigations)
2929
- [Design Details](#design-details)
@@ -113,6 +113,7 @@ restarting the node or at-least restarting the kubelet, which does not have a ce
113113

114114
However, this approach does carry a few drawbacks such as
115115
- Introducing a downtime for the existing/to-be-scheduled workloads on the cluster until the node is available.
116+
- For baremetal clusters it involves significant amount time for the Nodes to be available.
116117
- Necessity to reconfigure the underlying services post node-reboot.
117118
- Managing the associated nuances that a kubelet restart or node reboot carries such as
118119
- https://github.com/kubernetes/kubernetes/issues/109595
@@ -157,31 +158,31 @@ With this proposal it's also necessary to recalculate and update OOMScoreAdj and
157158

158159
### User Stories
159160

160-
#### Story 1
161+
#### Story 1: Specialized Hardware
161162

162163
As a Kubernetes user, I want to allocate more resources (CPU, memory) to a node with existing specialized hardware or CPU Capabilities (for example:https://www.kernel.org/doc/html/v5.8/arm64/elf_hwcaps.html)
163164
so that additional workloads can leverage the hardware to be efficiently scheduled and run without manual intervention.
164165

165-
#### Story 2
166+
#### Story 2: Optimize System Performance
166167

167-
As a Kubernetes Application Developer, I want the kernel to optimize system performance by making better use of local resources when a node is resized, so that my applications run faster with fewer disruptions. This is achieved when there are
168+
As a Performance Analyst I want the kernel to optimize system performance by making better use of local resources when a node is resized, so that my applications run faster with fewer disruptions. This is achieved when there are
168169
Fewer Context Switches: With more CPU cores and memory on a resized node, the kernel has a better chance to spread workloads out efficiently. This can reduce contention between processes, leading to fewer context switches (which can be costly in terms of CPU time)
169170
and less process interference and also reduces latency.
170171
Better Memory Allocation: If the kernel has more memory available, it can allocate larger contiguous memory blocks, which can lead to better memory locality (i.e., keeping related data closer in physical memory),improved paging and swap limits thus
171172
reducing latency for applications that rely on large datasets, in the case of a database applications.
172173

173-
#### Story 3
174+
#### Story 3: Reduce Operational Complexity
174175

175176
As a Site Reliability Engineer (SRE), I want to reduce the operational complexity of managing multiple worker nodes, so that I can focus on fewer resources and simplify troubleshooting and monitoring.
176177

177-
#### Story 4
178+
#### Story 4: Increase Compute Capacity on-the-fly
178179

179180
As a Cluster administrator, I want to resize a Kubernetes node dynamically, so that I can quickly hot plug resources without waiting for new nodes to join the cluster.
180181

181-
#### Story 5
182+
#### Story 5: Avoid Workload Disruption
182183

183184
As a Cluster administrator, I expect my existing workloads to function without having to undergo a disruption which is induced during capacity addition followed by a node/kubelet restart to
184-
detect the change in compute capacity, which can bring in additional complications.
185+
detect the change in compute capacity, which can bring in further complications.
185186

186187
### Notes/Constraints/Caveats (Optional)
187188

@@ -277,7 +278,7 @@ With increase in cluster resources the following components will be updated:
277278
### Handling hotplug events
278279

279280
Once the capacity of the node is altered, the following are the sequence of events that occur in the kubelet. If any errors are
280-
observed in any of the steps, operation is retried from step 1 along with a `FailedNodeResize` event under the node object.
281+
observed in any of the steps, operation is retried from step 1 along with a `FailedNodeResize` event and condition under the node object.
281282
1. Resizing existing containers:
282283
a. With the increased memory capacity of the nodes, the kubelet proceeds to update fields that are directly related to
283284
the available memory on the host. This would lead to recalculation of oom_score_adj and swap_limits.
@@ -307,7 +308,7 @@ T=0: Node Resources:
307308
Runtime:
308309
- <cgroup_path>/memory.swap.max: 1.33G
309310
310-
T=1: Resize Instance to Hotplug Memory:
311+
T=1: Resize Node to Hotplug Memory:
311312
- Memory: 8G
312313
- Swap: 4G
313314
Pod:
@@ -333,7 +334,7 @@ To ensure the Cluster Autoscaler acknowledges resource hotplug, the following ap
333334
2. Identify Nodes Affected by Hotplug:
334335
* By flagging a Node as being impacted by hotplug, the Cluster Autoscaler could revert to a less reliable but more conservative "scale from 0 nodes" logic.
335336

336-
Given that this KEP and autoscaler are inter-related, the above approaches were discussed in the community with relevant stakeholders, and have decided approaching this problem through the former route.
337+
Given that this KEP and autoscaler are inter-related, the above approaches were discussed in the community with relevant stakeholders, and have decided approaching this problem through the approach 1.
337338
The same will be targeted around the beta graduation of this KEP
338339

339340
### Handling HotUnplug Events
@@ -352,19 +353,18 @@ In this case, valid configuration refers to a state which can either be previous
352353
```
353354
T=0: Node initial Resources:
354355
- Memory: 10G
355-
- Pod: Memory
356356
357-
T=1: Resize Instance to Hotplug Memory
357+
T=1: Resize Node to Hotplug Memory
358358
- Current Memory: 10G
359359
- Update Memory: 15G
360360
- Node state: Ready
361361
362-
T=2: Resize Instance to HotUnplug Memory
362+
T=2: Resize Node to HotUnplug Memory
363363
- Current Memory: 15G
364364
- UpdatedMemory: 5G
365365
- Node state: NotReady
366366
367-
T=3: Resize Instance to Hotplug Memory
367+
T=3: Resize Node to Hotplug Memory
368368
- Current Memory: 5G
369369
- Updated Memory Size: 15G
370370
- Node state: Ready
@@ -390,6 +390,14 @@ syncCh <-chan time.Time, housekeepingCh <-chan time.Time, plegCh <-chan *pleg.Po
390390
.
391391
.
392392
case machineInfo := <-kl.nodeResourceManager.MachineInfo():
393+
// Resync the resource managers.
394+
klog.InfoS("Resync resource managers because of change in MachineInfo")
395+
if err := kl.containerManager.ResyncComponents(machineInfo); err != nil {
396+
klog.ErrorS(err, "Failed to resync resource managers with machine info update")
397+
kl.recorder.Eventf(kl.nodeRef, v1.EventTypeWarning, events.FailedNodeResize, err.Error())
398+
break
399+
}
400+
393401
// Resize the containers.
394402
klog.InfoS("Resizing containers due to change in MachineInfo")
395403
if err := resizeContainers(); err != nil {
@@ -398,13 +406,7 @@ syncCh <-chan time.Time, housekeepingCh <-chan time.Time, plegCh <-chan *pleg.Po
398406
break
399407
}
400408

401-
// Resync the resource managers.
402-
klog.InfoS("ResyncComponents resource managers because of change in MachineInfo")
403-
if err := kl.containerManager.ResyncComponents(machineInfo); err != nil {
404-
klog.ErrorS(err, "Failed to resync resource managers with machine info update")
405-
kl.recorder.Eventf(kl.nodeRef, v1.EventTypeWarning, events.FailedNodeResize, err.Error())
406-
break
407-
}
409+
408410

409411
// Update the cached MachineInfo.
410412
kl.setCachedMachineInfo(machineInfo)

0 commit comments

Comments
 (0)