Use ThreadLocal caching for ExtendedRandom PRNG contexts#1255
Conversation
75c4eb8 to
71f77f3
Compare
| OCKContext ockContext; | ||
| final long ockPRNGContextId; | ||
|
|
||
| private static final ThreadLocal<PRNGContextPointer> prngContextBufferSha256 = new ThreadLocal<PRNGContextPointer>(); |
There was a problem hiding this comment.
I think we still have outstanding questions here if we can share contexts in the same thread. In the scenario we are worried about we would have two instances of SecureRandom and the state data associated with one instance will be influenced by the state data in the context by the other instance of SecureRandom. Im not sure if this is a problem or not in OCKC while generating randoms. It might be fine to share state between two streams of random like this but not sure.
There was a problem hiding this comment.
From this code change, ThreadLocal prevents sharing across threads, but yes, it still allows sharing within the same thread between different SecureRandom instances that use the same algorithm. And, we keep synchronized on nextBytes(byte[] bytes) and setSeed(byte[] seed), so access to the native context is serialized.
From NativeInterface_EXTRAND_create, the native context is created based on the algorithm and then returned. I do not see any instance-specific state passed in during creation, so from this code, it looks possible that the same context could be reused within the same thread.
The GSKit user guide states that RNG_CTXs are not intrinsically thread-safe, but it does not specify whether reusing the same context sequentially within a single thread across multiple instances is supported/allowed or not.
From the benchmark tests, we have tests: reusing one SecureRandom instance repeatedly, and creating a new instance each time before calling nextBytes(). I ran the JMH tests with both 1 thread and 16 threads, and I did not see any failures or issues.
So for the question of whether contexts can be shared within the same thread by multiple instances, I don't think we have documentation that clearly proves it. Just based on the tests, I did not see any issue.
And one additional is, in the existing benchmark, we already reuse the same context repeatedly through a single SecureRandom instance on the same thread. That is not exactly the same scenario as here, where multiple SecureRandom instances with the same algorithm may sharing one context on the same thread. However, since I don't see any instance-specific state in context creation, and the benchmark tests did not show any issue, I don't see that same-thread multiple instances sharing is causing a problem.
There was a problem hiding this comment.
We probably need to test this on all the platforms. Threading is different on the different platforms and there maybe a probably on some that do not happen on others.
There was a problem hiding this comment.
So far, since Fyre VM does not have many platform options, I have run it on both x and z Linux and did not find any threading issues. Is there any particular platform you would like me to check?
There was a problem hiding this comment.
Yes, I ran the functional tests on these platforms and they look good. Right now, I’m running the JMH performance tests on these platforms and will share the results once the runs are complete.
There was a problem hiding this comment.
Sorry for the late update. I ran all the benchmark tests in the SecurityPerformancePipeline on both ppc64_aix and x86-64_windows, and the results look good. I did not see any issues.
| return prngCtx.getCtx(); | ||
| } | ||
|
|
||
| public synchronized void nextBytes(byte[] bytes) throws OCKException { |
There was a problem hiding this comment.
I can understand that since the test is crashed so we have to add the synchronized back.
However, i feel there is a contradiction between using the synchronized and the threadlocal, isn't it? I mean the primary advantage of using ThreadLocal is to eliminate contention by giving each thread its own isolated resource. Using synchronized on the instance level, any multi-threaded workload sharing a SecureRandom instance will be forced to execute serially anyway.
There was a problem hiding this comment.
For the synchronized issue we discussed last Friday, the failure does not occur in the newly added test RandomNewInstanceBenchmark. The crash happens in the existing JMH test RandomBenchmark.
I moved the payload initialization into the benchmark method, as shown below, but it still crashes.
@Benchmark
public byte[] runOpenJCEPlusSHA256DRBG() {
byte[] payload = new byte[payloadSize];
random.nextBytes(payload);
randomOpenJCEPlusSHA256DRBG.nextBytes(payload);
return payload;
}
I think it is not only the shared payload, but also the shared SecureRandom instance created in @Setup.
@Setup
public void setup() throws Exception {
insertProvider("OpenJCEPlus");
randomOpenJCEPlusSHA256DRBG = SecureRandom.getInstance("SHA256DRBG", "OpenJCEPlus");
randomOpenJCEPlusSHA512DRBG = SecureRandom.getInstance("SHA512DRBG", "OpenJCEPlus");
randomSUNSHA1PRNG = SecureRandom.getInstance("SHA1PRNG", "SUN");
randomSUNDRBG = SecureRandom.getInstance("DRBG", "SUN");
payload = new byte[payloadSize];
random.nextBytes(payload);
}
With @State(Scope.Benchmark), JMH creates one shared state object for the whole benchmark. That means the fields initialized in @Setup can be accessed by all threads. So, even ExtendedRandom.java uses ThreadLocal for the native PRNG context, the SecureRandom instance itself is still shared across threads because it is created once in @Setup and stored in the shared benchmark state.
ThreadLocal value is resolved during ExtendedRandom initialization, and the native PRNG context ID is then stored inside the object. After that, all threads use the same shared object, so they end up using the same stored native context ID to get the native context, even though that context was originally initialized from one thread’s ThreadLocal.
So either we remove synchronized and change the benchmark test from @State(Scope.Benchmark) to @State(Scope.Thread), or we keep synchronized, since removing it does not show any noticeable performance difference.
| this.ockContext = ockContext; | ||
| this.provider = provider; | ||
|
|
||
| this.provider.registerCleanable(this, cleanOCKResources(prngCtx, ockContext)); |
There was a problem hiding this comment.
I have one concern for the cleaner methods considering a specific scenario which is Thread Pools.
Under this scenario, a thread maintains an internal ThreadLocalMap. In this ThreadLocalMap, the key is the ThreadLocal object itself, and the value is the instance of the internal class PRNGContextPointer in ExtendedRandom class. Because the ThreadLocal variables are defined as private static, so the keys inside the host thread's ThreadLocalMap are held by strong references.
Im afraid that the map entries are never naturally cleared. Because the values (PRNGContextPointer instances) remain strongly reachable indefinitely, they will never be eligible for garbage collection, completely preventing the Cleaner from ever being triggered. This will inevitably result in a native memory leak.
There was a problem hiding this comment.
We discussed this in last Friday’s meeting. If there are any further concerns, just let me know.
4a97876 to
fb77737
Compare
c8e276f to
ff9268f
Compare
|
@johnpeck-us-ibm Sorry, I accidentally clicked the refresh button. I noticed that you had already approved this PR, but I’m not sure if you need to click Approve button again because of that. |
4d16a3f to
fe5383a
Compare
fe5383a to
df3eb8c
Compare
Add ThreadLocal caching for native PRNG contexts used by ExtendedRandom. Each thread creates and reuses a PRNG context for supported DRBG algorithms. This avoids repeated EXTRAND_create calls when instances are created frequently. Benefits: - Reduce native allocation overhead - Reuse PRNG contexts per thread - Improve performance in RNG-heavy workloads Signed-off-by: Tao Liu <tao.liu@ibm.com>
c05f971 to
1de96ad
Compare
Add ThreadLocal caching for native PRNG contexts used by ExtendedRandom. Each thread creates and reuses a PRNG context for supported DRBG algorithms.
This avoids repeated EXTRAND_create calls when instances are created frequently.
Benefits: