Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spring boot runtime metrics #13078

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

zeitlinger
Copy link
Member

Fixes #12812

@zeitlinger zeitlinger requested a review from a team as a code owner January 21, 2025 13:21
@zeitlinger zeitlinger self-assigned this Jan 21, 2025
@github-actions github-actions bot added the test native This label can be applied to PRs to trigger them to run native tests label Jan 21, 2025
@zeitlinger
Copy link
Member Author

@jeanbisutti can you help me with the native failures:

  1. not sure in this is transient:
Failures (1):
  JUnit Jupiter:OtelSpringStarterSmokeTest:shouldSendTelemetry()
    MethodSource [className = 'io.opentelemetry.spring.smoketest.OtelSpringStarterSmokeTest', methodName = 'shouldSendTelemetry', methodParameterTypes = '']
    => org.awaitility.core.ConditionTimeoutException: Assertion condition defined as a Lambda expression in io.opentelemetry.instrumentation.testing.InstrumentationTestRunner
Expecting actual not to be empty within 10 seconds.
       org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:167)
       org.awaitility.core.AssertionCondition.await(AssertionCondition.java:119)
       org.awaitility.core.AssertionCondition.await(AssertionCondition.java:31)
       org.awaitility.core.ConditionFactory.until(ConditionFactory.java:1006)
       org.awaitility.core.ConditionFactory.untilAsserted(ConditionFactory.java:790)
       [...]
     Caused by: org.awaitility.core.DeadlockException: Deadlocked threads detected:


       org.awaitility.core.ConditionAwaiter.await(ConditionAwaiter.java:159)
       [...]
  1. numLogsCapturedBeforeOtelInstall value of the OpenTelemetry appender is too small. - should we increase the buffer?

  2. thread started: this if for JFR - I'll try @PreDestry for this

The web application [ROOT] appears to have started a thread named [BatchLogRecordProcessor_WorkerThread-1] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread:
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.headers.Pthread.pthread_cond_timedwait(Pthread.java)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixParker.park0(PosixPlatformThreads.java:379)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixParker.park(PosixPlatformThreads.java:354)
 org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.parkCurrentPlatformOrCarrierThread(PlatformThreads.java:1001)
 [email protected]/jdk.internal.misc.Unsafe.park(Unsafe.java:56)
 [email protected]/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:269)
 [email protected]/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1763)
 [email protected]/java.util.concurrent.ArrayBlockingQueue.poll(ArrayBlockingQueue.java:435)
 io.opentelemetry.sdk.logs.export.BatchLogRecordProcessor$Worker.run(BatchLogRecordProcessor.java:246)
 [email protected]/java.lang.Thread.runWith(Thread.java:1596)
 [email protected]/java.lang.Thread.run(Thread.java:1583)

@jeanbisutti
Copy link
Member

@zeitlinger About 1., it seems an awaitility issue. Does the problem only appear with the new changes? Perhaps it may be worth to do something like

But I am not sure today it would be a good thing to do. It would require some further investigations.

About 2., numLogsCapturedBeforeOtelInstall default value is high: 1 000 logs. I suspect that the warning is related to something specific to the test.

About 3., it seems related to Tomcat searching memory leaks. With the full log we could know if it really comes from Tomcat. It does not seem possible to stop the BatchLogRecordProcessor thread. Surprised it could be JFR related. @jack-berg, would you know if some users have already reported the following log?

appears to have started a thread named [BatchLogRecordProcessor_WorkerThread-1] but has failed to stop it.

Native tests of this PR are failing during the native compilation step:

[native-image-plugin] Native Image written to: /home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/smoke-tests-otel-starter/spring-boot-3.2/build/native/nativeTestCompile

[Incubating] Problems report is available at: file:///home/runner/work/opentelemetry-java-instrumentation/opentelemetry-java-instrumentation/build/reports/problems/problems-report.html

I would try to focus on the JMX or JFR metrics for a GraalVM native execution. GraalVM supports some JFR events, but not all of them. So, not sure that all the JFR metrics can work today in the native mode.

@jack-berg
Copy link
Member

@jack-berg, would you know if some users have already reported the following log?

I haven't seen that log before.

@zeitlinger
Copy link
Member Author

we also have com.oracle.svm.core.jdk.UnsupportedFeatureError: ThreadMXBean methods - see oracle/graal#6101

@zeitlinger zeitlinger force-pushed the spring-boot-runtime-metrics branch from c0de41a to b3c0f10 Compare January 28, 2025 08:25
@zeitlinger
Copy link
Member Author

@jeanbisutti turned out that all prior errors were just a side effect of a jmx issue which is resolved now

can you take a look again?

@jeanbisutti
Copy link
Member

@roberttoyonaga If you have time, it would be great if you could have a look at this PR.

private static boolean useThreads() {
// GraalVM native image does not support ThreadMXBean yet
// see https://github.com/oracle/graal/issues/6101
return !isJava9OrNewer() || System.getProperty("org.graalvm.nativeimage.imagecode") != null;
Copy link
Contributor

@roberttoyonaga roberttoyonaga Jan 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that GraalVM Native Image would work fine with Threads::java8Callback. Native Image implements some ThreadMXBean functionality, notably all the functionality needed by Threads::java8Callback (see here and here). Although, Threads::java9AndNewerCallback still won't work with Native Image since we don't support ThreadMXBean#getAllThreadIds() or ThreadMXBean.getThreadInfo() yet.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried JMX - and ran into #13078 (comment)

@zeitlinger zeitlinger force-pushed the spring-boot-runtime-metrics branch from a130e3d to e057797 Compare January 30, 2025 12:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
test native This label can be applied to PRs to trigger them to run native tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add runtime-telemetry to spring starter
4 participants