Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(cgroup): Properly handle cgroup v2 systems #2989

Closed
wants to merge 1 commit into from

Conversation

CalvoM
Copy link
Contributor

@CalvoM CalvoM commented Nov 21, 2023

WALA was just handling cgroup v1 systems and failing on cgroup v2 systems

We are adding proper handling of cgroup v2 files

Description

Issue #2815


PR information

  • The title of the PR is clear and informative.
  • There are a small number of commits, each of which has an informative message. This means that previously merged commits do not appear in the history of the PR. For information on cleaning up the commits in your pull request, see this page.
  • If applicable, the PR references the bug/issue that it fixes in the description.
  • New Unit tests were added for the changes made

Quality of Code and Contribution Guidelines

@narrieta
Copy link
Member

@CalvoM Thank you.

@maddieford is currently working on adding support for cgroups v2 to the Agent. She can provide context about the approach we are taking and ensure this PR matches that.

@CalvoM CalvoM force-pushed the fix_cgroupv2_handling branch 7 times, most recently from 080c46a to 81f57b4 Compare November 24, 2023 11:25
@CalvoM CalvoM marked this pull request as ready for review November 28, 2023 10:24
@maddieford
Copy link
Contributor

Thanks for the PR @CalvoM, I'm currently working on adding support for v2. I need some time to look into this and I'll get back

@CalvoM
Copy link
Contributor Author

CalvoM commented Feb 12, 2024

Hello @maddieford, any progress on the review of this PR?

WALA was just handling cgroup v1 systems and failing on cgroup v2 systems

We are adding proper handling of cgroup v2 files
@maddieford
Copy link
Contributor

Hey @CalvoM, we had an urgent issue we needed to attend to since early December and just now are resuming work on cgroups v2. We decided to refactor our CGroups API to support v2, work for this is in progress

@mirespace
Copy link
Contributor

Hello @maddieford , I'm working now in the WALinuxAgent package in Ubuntu.

I'm wondering if the refactor you were referring is #3096 ... could you confirm it? Or need more work to be done, e.g.:

#3135
#3136

I'm asking because, in Ubuntu, this PR is handled like a patch over WALinuxAgent, and when upgrading the Ubuntu package to the latest WALinuxAgent release this doesn't apply anymore... I'm looking trough the possible changes to refresh the patch, but I see #3096 is still on develop branch... Guessing if this "skip cgroup monitoring if log collector doesn't start by the agent. #2939 [Added in 2.10.0.8]" could impact too ...

Anyway, If you can give me a clue about the possible (or not) compatibility of this (or if this has been fixed with another approach through other refactors/additions) I'll appreciate so much.

@maddieford
Copy link
Contributor

Hi @mirespace, the support is being merged in incrementally.
The first set of refactoring changes was in #3096
There's another set of refactoring changes in review now: #3135

Even with these changes, cgroup v2 is not yet supported by the agent. Any cgroup usage by the agent is gated behind a flag. In the latest version of the agent, the agent looks for the v1 mountpoints at initialization, and if it doesn't find them then it disables any usage of cgroups (the agent handles this case gracefully so this shouldn't break the agent or require a patch).
Here are guest agent logs from the latest guest agent version on an Ubuntu22 machine which shows the agent disables any cgroup usage since v1 mountpoints can't be identified:

2024-06-12T17:43:01.704957Z INFO ExtHandler ExtHandler [CGW] The CPU cgroup controller is not mounted
2024-06-12T17:43:01.705751Z INFO ExtHandler ExtHandler [CGW] The memory cgroup controller is not mounted
2024-06-12T17:43:01.708836Z INFO ExtHandler ExtHandler [CGI] cgroups v2 mounted at /sys/fs/cgroup.  Controllers: [cpuset cpu io memory hugetlb pids rdma misc]
2024-06-12T17:43:01.709525Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a CPU cgroup
2024-06-12T17:43:01.709882Z INFO ExtHandler ExtHandler [CGW] The agent's process is not within a memory cgroup
2024-06-12T17:43:01.710207Z INFO ExtHandler ExtHandler [CGI] **Agent cgroups enabled: False**

The changes in #3096 update the agent to identify whether v1 or v2 is being used. If v2, then the agent also disables any cgroup usage, but now with a more helpful log. This change has not been released yet.

If the changes in this PR (#2989) were included in the agent, then the agent would start identifying v2 mountpoints without updating other areas of the code (including monitoring) which hasn't been tested and will likely cause unexpected/unwanted behaviors.

What behavior are you seeing from the guest agent that would require a patch?

@mirespace
Copy link
Contributor

Hi @maddieford, I was checking the general status of the Ubuntu deb package and proceeding to update it to version v2.11.1.4, when I discovered that we have a patch (a deb patch) that doesn't apply to that version.

Following the patch information, I saw that it had been forwarded here (this PR is the submitted code). I assumed that the patch fixed #2815, as I read in the description of this PR.

Since #2815 is open, I assumed it is not fixed in another way, so that would be the incorrect behaviour that should still be occurring.

@maddieford
Copy link
Contributor

Hey @mirespace, sorry I missed that the original issue was in regards to Log Collector (#2815)

Right now, it is intentional that the agent does not enable cgroup usage when v2 is detected. The agent requires that cgroup usage is enabled for the log collector to run periodically. As a result, the log collector is not running periodically on v2 machines. We're working on those changes now to support v2.

In 2.9.0.4 there was a bug which prevented users from using the command line option to run log collector due to v2 not yet being supported, as pointed out in #2815. We released a fix for that issue (as you pointed out #2939) in v2.10.0.8 to allow users to run the log collector by command line without cgroup monitoring.

@mirespace
Copy link
Contributor

Hi @maddieford! There's nothing to be sorry about... That's a lot of questions/numbers. Thanks to you for your answer!

@maddieford
Copy link
Contributor

Thanks for opening this. We've been adding cgroup v2 support incrementally: #3096, #3135, #3188. These changes should be included in our next release. I'm going to close this PR since all of the changes are included in these PRs I linked.

@maddieford maddieford closed this Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants