-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add custom choom
helper to lower crocochrome's OOM score
#122
Open
nadiamoe
wants to merge
5
commits into
main
Choose a base branch
from
oom-adjust-helper
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
f34502c
chore: cmd/choom: add custom choom binary to change a process OOM score
nadiamoe 153c8f7
chore: move crocochrome main to cmd/crocochrome
nadiamoe 7fb2bae
chore: Dockerfile: build both binaries from their new paths
nadiamoe 76b9d81
feat: crocochrome: try to use choom to lower oom_score_adj, failing g…
nadiamoe dcd9942
fix: exit with status 1 if there is an error setting up http listener
nadiamoe File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# `choom` | ||
|
||
`choom` is a portable, simplified, not compatible version of [`chroom(1)`](https://man7.org/linux/man-pages/man1/choom.1.html), which allows to change a process' `oom_score_adj` to make it more or less attractive to the kernel's OOM killer. | ||
|
||
This tiny program exists because binaries need a specific capability to lower OOM scores, `cap_sys_resource`. This capability, however, is not granted in the [default set docker uses](https://github.com/moby/moby/blob/master/oci/caps/defaults.go#L6-L19). Due to how linux capabilities work, binaries with a given capability added will fail to start if the container does not have that capability added as well. In practice, what this means is that if we granted `cap_sys_resource` to the main binary of the container, and attempted to run the container naively with `docker run crocochrome:latest`, the container would cryptically fail to start. | ||
|
||
This is a UX problem, where users unfamiliar with the codebase would need to troubleshoot a cryptic error message, but also a problem for testing, where developers would need to go out of the "standard route" to figure out how to add specific capabilities to local kubernetes clusters, or testcontainers, in order to perform tests to the container. | ||
|
||
Instead of that, using a tiny helper with the capability added, and calling this helper from the main binary, allows the oom score adjust process to fail gracefully. If the container does not have the required `sys_resource` capability, the OOM score will not be adjusted and a log will be errored: | ||
|
||
``` | ||
{"time":"2025-01-30T12:50:37.771068928Z","level":"ERROR","msg":"Error changing OOM score. Assuming this is a test environment and continuing anyway.","err":"fork/exec /usr/local/bin/choom: operation not permitted"} | ||
``` | ||
|
||
But execution will continue, and as OOM score adjusting is not critical to functionality whatsoever, tests will do their job just fine. In production, we grant the container the `sys_resource` capability to handle low-memory situations better. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
package main | ||
|
||
import ( | ||
"log" | ||
"os" | ||
"path/filepath" | ||
) | ||
|
||
func main() { | ||
if len(os.Args) < 3 { | ||
log.Fatalf("Usage: %s <pid> <oom_score_adj>", os.Args[0]) | ||
} | ||
|
||
pid := os.Args[1] | ||
newOOMScore := os.Args[2] | ||
|
||
adjFile, err := os.OpenFile(filepath.Join("/", "proc", pid, "oom_score_adj"), os.O_WRONLY, os.FileMode(0o600)) | ||
if err != nil { | ||
log.Fatalf("Opening adjust file: %v", err) | ||
} | ||
|
||
defer func() { | ||
if err := adjFile.Close(); err != nil { | ||
log.Fatalf("Closing file: %v", err) | ||
} | ||
}() | ||
|
||
_, err = adjFile.Write([]byte(newOOMScore)) | ||
if err != nil { | ||
log.Fatalf("Writing oom_score_adj: %v", err) | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,6 +5,8 @@ import ( | |
"log/slog" | ||
"net/http" | ||
"os" | ||
"os/exec" | ||
"strconv" | ||
|
||
"github.com/grafana/crocochrome" | ||
crocohttp "github.com/grafana/crocochrome/http" | ||
|
@@ -18,6 +20,17 @@ func main() { | |
Level: slog.LevelDebug, | ||
})) | ||
|
||
const oomScore = -500 | ||
if out, err := choom(oomScore); err != nil { | ||
logger.Error( | ||
"Error changing OOM score, assuming this is not a production environment and continuing anyway", | ||
"err", err, | ||
"choomOutput", string(out), | ||
) | ||
} else { | ||
logger.Info("Main process OOM score adjusted successfully", "oomScore", oomScore) | ||
} | ||
|
||
mux := http.NewServeMux() | ||
|
||
registry := prometheus.NewRegistry() | ||
|
@@ -54,5 +67,13 @@ func main() { | |
err = http.ListenAndServe(address, mux) | ||
if err != nil { | ||
logger.Error("Setting up HTTP listener", "err", err) | ||
os.Exit(1) | ||
return | ||
Comment on lines
+70
to
+71
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is unrelated to this PR, just couldn't resist. |
||
} | ||
} | ||
|
||
// choom runs the choom helper (source included in this repo) to lower the current process OOM score. | ||
func choom(score int) ([]byte, error) { | ||
choom := exec.Command("choom", strconv.Itoa(os.Getpid()), strconv.Itoa(score)) | ||
return choom.CombinedOutput() | ||
} |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel you are leaving something out here, and I cannot spot what that might be.
If I look at choom.c it's as simple as it gets. The only difference with the program you are adding is 1) the ability to display a PID's score and adjust value; 2) reporting what the previous adjust value was before changing it. That later point is arguably a useful piece of data.
So... what am I missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is nothing special on this choom implementation w.r.t.
choom.c
, they do the same thing. It exists becausechoom
is not installed on alpine by default, and thought that implementing this simple thing was going to be just as easy asapk install
ing something, keeping the Dockerfile simple, and not having to deal with however alpine wants to runchoom
(I suspect they will rely onsetuid
, or just being root, instead of granting the specific capabilities so a regular user can use them).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's no good reason for doing this, then just install the corresponding package providing choom (util-linux-misc) and copy over the file to the final image.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a very strong reason to do it, but I neither see a strong reason to not do it.
Copying the file is tricky: There might be dependencies that need to be brought along, it might need special requirements, it might use
setuid
at some point without us noticing, etc. It may not be doing these things today, but it may do them tomorrow. The way I see it,procfs
is a stable API, the inner workings of that particularchoom
implementation (which we are coupled with) are not.I still see the homemade one as the safest option here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uhm...
You are putting forward a hypothetical issue with backwards compatibility for a utility that hasn't seen any significant changes in over 6 years while introducing a custom incompatible version of that utility... seriously?
If this was less trivial (external libraries, complex logic, etc), I might be open to the argument, but at this level of triviality it doesn't make sense. I honestly thought there was something I was missing and that's why I asked.