Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Memory Limits (#494, @konradmalik) #494

Merged
merged 15 commits into from
Mar 29, 2021

Conversation

konradmalik
Copy link
Contributor

@konradmalik konradmalik commented Feb 12, 2021

What

Docker based memory limit per node container. Separate limits for servers and agents. Also, per single node limiting is possible when creating nodes one after another using k3d node create.

Why

See issue #491 and mentions.
This is an implementation of docker-based memory limiting per node type (server/agent) that tries to make those limits also visible to kubelet via cadvisor, so that kubectl describe node gives proper (limited) memory capacity and does not schedule pods when limits are reached.
In my tests it works as expected, below is a simple demonstration:

Create a cluster:

$ ./k3d cluster create test --no-lb --servers-memory 1g --agents 1 --agents-memory 1.5g

Docker stats:

$ docker stats
CONTAINER ID   NAME                CPU %     MEM USAGE / LIMIT   MEM %     NET I/O          BLOCK I/O         PIDS
0b2607a587b2   k3d-test-agent-0    186.49%   138.3MiB / 1.5GiB   9.00%     68.4MB / 919kB   57.3kB / 33MB     73
7d0f1e5f289e   k3d-test-server-0   28.14%    465.2MiB / 1GiB     45.43%    

Nodes description (memory fragment):

$ kubectl describe nodes | grep memory
  MemoryPressure       False   Fri, 12 Feb 2021 21:16:59 +0100   Fri, 12 Feb 2021 21:16:28 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  memory:             1073741Ki
  memory:             1073741Ki
  memory             0 (0%)    0 (0%)
  MemoryPressure       False   Fri, 12 Feb 2021 21:17:09 +0100   Fri, 12 Feb 2021 21:16:38 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  memory:             1610612Ki
  memory:             1610612Ki
  memory             70Mi (4%)  170Mi (10%)

Here, provision something that will surely reach the limits. I deployed 10 replicas of a service that needs 100m CPU and 300Mi memory. The 8th replica is in a pending state due to lack of memory:

$ kubectl get pods
static-file-server-statefulset-0   1/1     Running   0          57s
static-file-server-statefulset-1   1/1     Running   0          49s
static-file-server-statefulset-2   1/1     Running   0          47s
static-file-server-statefulset-3   1/1     Running   0          44s
static-file-server-statefulset-4   1/1     Running   0          38s
static-file-server-statefulset-5   1/1     Running   0          35s
static-file-server-statefulset-6   1/1     Running   0          33s
static-file-server-statefulset-7   1/1     Running   0          30s
static-file-server-statefulset-8   0/1     Pending   0          28s
$ kubectl describe pod static-file-server-8
...
Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  54s   default-scheduler  0/2 nodes are available: 2 Insufficient memory.
  Warning  FailedScheduling  54s   default-scheduler  0/2 nodes are available: 2 Insufficient memory.

Let's see the resources:

$ kubectl describe nodes | grep memory
  MemoryPressure       False   Fri, 12 Feb 2021 21:18:39 +0100   Fri, 12 Feb 2021 21:16:38 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  memory:             1610612Ki
  memory:             1610612Ki
  memory             1570Mi (99%)  1670Mi (106%)
  MemoryPressure       False   Fri, 12 Feb 2021 21:18:58 +0100   Fri, 12 Feb 2021 21:16:28 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  memory:             1073741Ki
  memory:             1073741Ki
  memory             900Mi (85%)  900Mi (85%)

Look good I guess? 😉

Implications

The route taken was vastly different from what was previously tried. After investigating cadvisor source code it seems that it uses /proc/meminfo to read the available total memory and swap. The main idea behind this implementation is to simply create a minimal, fake meminfo file and mount it as read-only to the node containers.
Other stuff worth mentioning in a "tldr" version below:

  • this of course breaks parts free, top as well as for example prometheus node exporter etc. but those were already "broken" for k3d as they showed host, not container resources
  • serversMemory, agentsMemory options under "runtime" yaml conf
  • servers-memory, agents-memory cli args under cluster create
  • memory arg under node create
  • valid values and units are the same as for docker limits
  • fake meminfo is created during the creation of any node that has limits. It is a file in a hidden folder placed in ~/.k3d as it needs to persist across reboots potentially. It is (the folder) made unique per node using the name of the node for which it was created.
  • removal is performed during the node (or cluster) removal
  • even in the case of manual container removal when the folder cannot be cleaned, it simply either won't be taken into consideration in the case of no-limited new nodes or the files will be overwritten when the same named node is created in the future
  • fake meminfo is truly minimal, it contains only swap and total memory info

Further work

Potentially TODO is a check that someone wants to create containers that have larger memory limit (in total) than the current machine allows. This is not validated currently but maybe should not be checked in the first place? When creating servers and agents without limits, the total memory limit is always larger than the memory of the host.

CPU limits need a little bit more work but I'm pretty sure they can be implemented in a very similar way. I can start doing this after receiving positive feedback on this one
Seems like cores number is read from cpuinfo as a last resort. First, the topology of the cpu is built based on the sysfs. Minimal mocking and mounting cpus under sysfs crashes k3s, so full-fledged solution would be needed. In short, cpu seem like a no-go at the moment, unless something comes to my mind along the way.

Edit 1:
Moved from singular files to a folder for easier further mocking.

Currently working on a "edac" folder mock. The above solution for memory limit works on a non-ecc memory. I cannot test, but I think it won't work on ecc memory, as the true, larger capacity will be read from the "edac" folder. We cannot just mount fake edac always because when it does not exist on the host (for example my macbook) it will fail. Sysfs is read only in docker, so docker cannot create this folder if it does not exist in the system. The solution is to check if it exists but this gets tricky on docker for mac and docker for windows.

Edit 2:
Mocking edac to force using meminfo is now working (checked on linux with existing edac folder, but no ecc memory) and is made through an in-container directory existence check so it is universal and should work across linux, windows and mac.

@konradmalik konradmalik marked this pull request as draft February 15, 2021 20:17
@konradmalik konradmalik marked this pull request as ready for review February 17, 2021 08:31
@iwilltry42 iwilltry42 self-requested a review March 5, 2021 16:29
@iwilltry42 iwilltry42 added the enhancement New feature or request label Mar 5, 2021
@iwilltry42 iwilltry42 added this to the v4.3.0 milestone Mar 5, 2021
Copy link
Member

@iwilltry42 iwilltry42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a tough one... Good job so far 👍

Copy link
Member

@iwilltry42 iwilltry42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

...

@iwilltry42 iwilltry42 self-assigned this Mar 8, 2021
@iwilltry42 iwilltry42 modified the milestones: v4.3.0, v4.4.0 Mar 8, 2021
@iwilltry42
Copy link
Member

iwilltry42 commented Mar 8, 2021

Also, this may be interesting for you: k3s-io/k3s#3005 (good progress lately: https://twitter.com/_AkihiroSuda_/status/1366689973672402945)

@konradmalik
Copy link
Contributor Author

Also, this may be interesting for you: k3s-io/k3s#3005 (good progress lately: https://twitter.com/_AkihiroSuda_/status/1366689973672402945)

Thanks, I'll follow and investigate this one. It seems it requires running systemd inside the container... we'll see.

@konradmalik konradmalik requested a review from iwilltry42 March 13, 2021 16:44
@iwilltry42
Copy link
Member

LGTM!
I'll test it a bit on my systems now and then will go ahead and merge it, if everything's fine 👍

@iwilltry42
Copy link
Member

iwilltry42 commented Mar 16, 2021

Hi @konradmalik , I just tested this on my Linux machine and it seems to fail 🤔

$ k3d cluster create memtest --servers-memory 1g --agents 1 --agents-memory 1.5g

# ...

$ kubectl get nodes -o go-template='{{range $index, $item := .items}}{{ $item.metadata.name }}{{ ":"}}{{ $item.status.capacity.memory }}{{ "\n" }}{{ end }}'     

k3d-memtest-agent-0:32475116Ki
k3d-memtest-server-0:32475116Ki

$ docker inspect k3d-memtest-server-0 | jq '.[0].HostConfig.Memory'                                                                                                                                             
1073741824

$ docker inspect k3d-memtest-agent-0 | jq '.[0].HostConfig.Memory'                                                                                                                                              
1610612736

Note: edac folder is present on my machine.

@konradmalik
Copy link
Contributor Author

ha, no wonder it did not work. Volume binding was reversed 😄 any idea what could cause the order to change? For sure was working before 🤔

    {                                        
        "Type": "bind",                      
        "Source": "/proc/meminfo",           
        "Destination": "/home/konrad/.k3d/.k3d-memtest-server-0/meminfo",                         
        "Mode": "ro",                        
        "RW": false,                         
        "Propagation": "rprivate"            
    },   

anyway, the newest commit fixes this

@konradmalik konradmalik requested a review from iwilltry42 March 20, 2021 18:46
@iwilltry42
Copy link
Member

iwilltry42 commented Mar 29, 2021

Hi @konradmalik , finally I found some time to get back to this :)

Volume binding was reversed smile any idea what could cause the order to change?

Nothing changes afaik and the order should always have been src:dest (hostpath:containerpath) 🤔

Anyway, I can confirm that it works now on my machine (Linux, edac present).
I'll give it another go on Windows now and then merge it, if everything's fine there 👍

$ kubectl get nodes -o go-template='{{range $index, $item := .items}}{{ $item.metadata.name }}{{ ":"}}{{ $item.status.capacity.memory }}{{ "\n" }}{{ end }}'
k3d-memtest-server-0:1073741Ki
k3d-memtest-agent-0:1610612Ki

Update 1: ✔️ Windows w/ Docker for Desktop w/ WSL2 backend
Update 2: ✔️ Windows w/ Docker for Desktop w/ Hyper-V backend

@iwilltry42 iwilltry42 changed the title [FEATURE] Memory Limits [FEATURE] Memory Limits (#494, @konradmalik) Mar 29, 2021
@iwilltry42 iwilltry42 merged commit e495fe8 into k3d-io:main Mar 29, 2021
rancherio-gh-m pushed a commit that referenced this pull request Mar 29, 2021
Author: Konrad Malik <[email protected]>
Date:   Mon Mar 29 17:52:15 2021 +0200

    [FEATURE] Memory Limits (#494, @konradmalik)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants