-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Prevent OOM from malformed snappy payloads by validating decoded length #1917
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
cccd992 to
c5b4a57
Compare
A specially crafted remote-write request can declare an extremely large
decoded length while providing only a small encoded payload. Prometheus
allocates memory based on the declared decoded size, so a single request
can trigger an allocation of ~2.5 GB. A few such requests are enough to
crash the process with OOM.
Here's the script that can be used to reproduce the issue:
echo
"97eab4890a170a085f5f6e616d655f5f120b746573745f6d6574726963121009000000000000f03f10d48fc9b2a333"
\
| xxd -r -p \
| curl -X POST \
"http://127.0.0.1:9090/api/v1/write" \
-H "Content-Type: application/x-protobuf" \
-H "Content-Encoding: snappy" \
-H "X-Prometheus-Remote-Write-Version: 0.1.0" \
--data-binary @-
This change adds a hard limit: the requested decoded length must be less
than 32 MB. Requests exceeding the limit are rejected with HTTP 400
before any allocation occurs.
Signed-off-by: Max Kotliar <[email protected]>
c5b4a57 to
18310e4
Compare
kakkoyun
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the contribution.
We need some documentation.
exp/api/remote/remote_api.go
Outdated
| } | ||
| } | ||
|
|
||
| var maxDecodedSize = 32 * 1024 * 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we document how did we come up with this number?
Is there another source of truth we need to be in sync with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the max remote-write size that VictoriaMetrics as const value. Users can override it via a flag -maxInsertRequestSize if needed. Prometheus previously had no limit, so enforcing 32 MB might break some setups.
Need some guidance here, should we raise or lower the value, or make it explicitly configurable?
Co-authored-by: Kemal Akkoyun <[email protected]> Signed-off-by: Max Kotliar <[email protected]>
I can take a look. Can you please point out where I can find the docs' sources? |
bwplotka
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
We need some documentation.
I think @kakkoyun meant just a clear commentary.
We also need a unit test please, but thank you very much for adding this up 💪🏽 important!
Sorry, if it is not clear. Yes, that was what I meant. |
A specially crafted remote-write request can declare an extremely large decoded length while providing only a small encoded payload. Prometheus allocates memory based on the declared decoded size, so a single request can trigger an allocation of ~2.5 GB. A few such requests are enough to crash the process with OOM.
Here's the script that can be used to reproduce the issue:
This change adds a hard limit: the requested decoded length must be less than 32 MB. Requests exceeding the limit are rejected with HTTP 400 before any allocation occurs.