Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repeated disk reads during retransformation contributing to iowait spikes #13280

Open
alexandergunnarson opened this issue Feb 12, 2025 · 4 comments
Labels
bug Something isn't working needs author feedback Waiting for additional feedback from the author needs triage New issue that requires triage

Comments

@alexandergunnarson
Copy link

Describe the bug

I've been trying to figure out why our OTEL-instrumented JVMs keep experiencing massively high iowait, and I suspect the OTEL Java agent is a primary cause.

Below is partial output from running sudo /usr/share/bcc/tools/filetop on an EC2 instance of mine:

Image

There are other class files this happens to, but Throwable.class is the most common. I'd expect

Steps to reproduce

Start a JVM with the OTEL Java agent, with probably any real-world .jar. Observe via sudo /usr/share/bcc/tools/filetop that there are repeated reads from class files.

Expected behavior

I would expect that the class bytes are fully cached in memory upon first read. I suppose using Caffeine or similar with configurable eviction strategy is fine, but really I'd like zero unnecessary disk reads.

Actual behavior

There are repeated reads from class files, contributing to massive iowait spikes.

Javaagent or library instrumentation version

v2.9.0

Environment

JDK: 24ea (perhaps the EA is part of the problem?)
OS: Amazon Linux, within Docker

Additional context

No response

@alexandergunnarson alexandergunnarson added bug Something isn't working needs triage New issue that requires triage labels Feb 12, 2025
@laurit
Copy link
Contributor

laurit commented Feb 14, 2025

Are you seeing this only when your application starts?

@alexandergunnarson
Copy link
Author

alexandergunnarson commented Feb 14, 2025

No, it's ongoing throughout the lifetime of the application, and has huge spikes at times. (Why there are spikes, I'm not sure.) sudo /usr/share/bcc/tools/filetop — and bpftrace, though the signal is weaker because it's complicated to ensure that reported reads are actually persistent disk reads and not ramdisk reads — both show that java.lang.Throwable is by far the most common one to get read (sometimes thousands of times a second), but there are others. When I turn off all agents, our application has essentially zero disk reads. When I turn on the OTEL agent exclusively, there are thousands per second.

@alexandergunnarson
Copy link
Author

For now the way I've gotten around it is to mount both our uberjar and the JDK to a ramdisk. Now there are essentially zero disk reads and the iowait + corresponding service lockups are gone.

@laurit
Copy link
Contributor

laurit commented Feb 14, 2025

Is your application constantly defining new classes? Have you tried to profile your application or capture stack traces that would show what it is doing and which code path might trigger the disk access?

@laurit laurit added the needs author feedback Waiting for additional feedback from the author label Feb 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs author feedback Waiting for additional feedback from the author needs triage New issue that requires triage
Projects
None yet
Development

No branches or pull requests

2 participants