Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AeronArchive fails to start on some versions of linux. #1766

Closed
NextToBe opened this issue Mar 21, 2025 · 1 comment
Closed

AeronArchive fails to start on some versions of linux. #1766

NextToBe opened this issue Mar 21, 2025 · 1 comment

Comments

@NextToBe
Copy link

NextToBe commented Mar 21, 2025

Hi, sorry to disturb.

We encountered ConductorServiceTimeoutException after upgrade aeron to 1.45.0. (We are using Oracle JDK21, our previous aeron version was 1.41.4)

Exception in thread "main" java.lang.reflect.InvocationTargetException
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:118)
        at java.base/java.lang.reflect.Method.invoke(Method.java:580)
        at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
        at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
        at org.springframework.boot.loader.PropertiesLauncher.main(PropertiesLauncher.java:593)
Caused by: io.aeron.exceptions.ConductorServiceTimeoutException: FATAL - service interval exceeded: timeout=10000000000ns, interval=90797486770ns
        at io.aeron.ClientConductor.checkServiceInterval(ClientConductor.java:1525)
        at io.aeron.ClientConductor.checkTimeouts(ClientConductor.java:1509)
        at io.aeron.ClientConductor.awaitResponse(ClientConductor.java:1463)
        at io.aeron.ClientConductor.addSubscription(ClientConductor.java:661)
        at io.aeron.Aeron.addSubscription(Aeron.java:398)
        at io.aeron.archive.ArchiveConductor.<init>(ArchiveConductor.java:165)
        at io.aeron.archive.DedicatedModeArchiveConductor.<init>(DedicatedModeArchiveConductor.java:35)
        at io.aeron.archive.Archive.<init>(Archive.java:86)
        at io.aeron.archive.Archive.launch(Archive.java:192)
        at io.aeron.archive.ArchivingMediaDriver.launch(ArchivingMediaDriver.java:95)
        at io.aeron.archive.ArchivingMediaDriver.main(ArchivingMediaDriver.java:56)
        at java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        ... 5 more

It is similar to this issue: #1688 and we reproduce this problem using very simple code:

SYSTEM_PROPERTIES="-Daeron.dir=${1} -Daeron.archive.dir=${2}  \
    -Daeron.archive.control.channel=aeron:udp?endpoint=localhost:${3} \
    -Daeron.archive.replication.channel=aeron:udp?endpoint=localhost:0 \
    -Daeron.archive.control.response.channel=aeron:udp?endpoint=localhost:0 \
    "
export SYSTEM_PROPERTIES && shift 3

bash $bin_dir/run-class.sh "io.aeron.archive.ArchivingMediaDriver" "$@" || usage

After diagnostic, we found the root cause via stack info.

"main" #1 [5918] prio=5 os_prio=0 cpu=801.39ms elapsed=71.77s tid=0x00007f73ac0286e0 nid=5918 runnable  [0x00007f73b43ac000]
   java.lang.Thread.State: RUNNABLE
        at java.io.FileInputStream.readBytes(java.base@21.0.2/Native Method)
        at java.io.FileInputStream.read(java.base@21.0.2/FileInputStream.java:287)
        at java.io.FilterInputStream.read(java.base@21.0.2/FilterInputStream.java:119)
        at sun.security.provider.NativePRNG$RandomIO.readFully(java.base@21.0.2/NativePRNG.java:426)
        at sun.security.provider.NativePRNG$RandomIO.ensureBufferValid(java.base@21.0.2/NativePRNG.java:529)
        at sun.security.provider.NativePRNG$RandomIO.implNextBytes(java.base@21.0.2/NativePRNG.java:548)
        - locked <0x000000071693c380> (a java.lang.Object)
        at sun.security.provider.NativePRNG$Blocking.engineNextBytes(java.base@21.0.2/NativePRNG.java:270)
        at java.security.SecureRandom.nextBytes(java.base@21.0.2/SecureRandom.java:768)
        at java.security.SecureRandom.next(java.base@21.0.2/SecureRandom.java:824)
        at java.util.Random.nextLong(java.base@21.0.2/Random.java:592)
        at io.aeron.archive.ArchiveConductor.stronglySeededRandom(ArchiveConductor.java:2596)
        at io.aeron.archive.ArchiveConductor.<init>(ArchiveConductor.java:141)
        at io.aeron.archive.DedicatedModeArchiveConductor.<init>(DedicatedModeArchiveConductor.java:35)
        at io.aeron.archive.Archive.<init>(Archive.java:86)
        at io.aeron.archive.Archive.launch(Archive.java:192)
        at io.aeron.archive.ArchivingMediaDriver.launch(ArchivingMediaDriver.java:95)
        at io.aeron.archive.ArchivingMediaDriver.main(ArchivingMediaDriver.java:56)

As https://stackoverflow.com/questions/70768857/securerandom-is-unreasonably-slow-or-freezes-the-system mentioned, this was caused by SecureRandom, the solutions mentioned here, however, do not apply to our scenario.

In ArchiveConductor, line 2596:

seed = SecureRandom.getInstanceStrong().nextLong();

This code calls SecureRandom.getInstanceStrong() which use property in security config file to create random seed.

So we have to set security property manually to deal with it.

Security.setProperty("securerandom.source", "file:/dev/urandom");
Security.setProperty("securerandom.strongAlgorithms", "NativePRNGNonBlocking:SUN,DRBG:SUN");

We're not sure this is as expected.

Looking forward to any replies.


Other references:

https://tersesystems.com/blog/2015/12/17/the-right-way-to-use-securerandom/

https://www.unix.com/man_page/centos/4/urandom/

@vyazelenko
Copy link
Contributor

We've changed for secure Random is configured in 1.47.0 release.

You can now configure the algorithm explicitly via Archive.Context#secureRandomAlgorithm(java.lang.String) or aeron.secure.random.algorithm system property. The default algorithm value is OS specified, i.e. on Windows it is Windows-PRNG otherwise it is NativePRNGNonBlocking.

public SecureRandom secureRandom()
{
try
{
if ("strong".equalsIgnoreCase(secureRandomAlgorithm))
{
return SecureRandom.getInstanceStrong();
}
else
{
return SecureRandom.getInstance(secureRandomAlgorithm);
}
}
catch (final NoSuchAlgorithmException ex)
{
throw new AeronException(
"unable to create SecureRandom for algorithm=" + secureRandomAlgorithm, ex, ERROR);
}
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants