You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When performing Jedis operations in the production environment, the system experiences lags lasting several minutes. After troubleshooting with jstack, we found that numerous threads enter a WAITING state after calling getSlotConnection(). Upon examining the source code for JedisClusterInfoCache, I noticed that this class uses a ReentrantReadWriteLock, leading me to suspect that a write lock is being held, which is causing prolonged read lock blocking. Based on this, I developed a tool to proactively acquire the write lock, as outlined below:
1. Initialize JedisCluster.
2. Acquire the write lock.
3. Start a child thread to execute the get command (executing get in the same thread would re-enter, which doesn’t fit the scenario).
// STEP init ClsuterJedisClustercluster = JedisClient.getCluster();
// STEP init clock utilJedisClusterInfoCacheLockUtilutil = newJedisClusterInfoCacheLockUtil(cluster);
// STEP lockutil.lockWrite();
// STEP Start a child thread to execute the get command // (executing get in the same thread would re-enter, which doesn’t fit the Executors.newFixedThreadPool(1).execute(() ->{
// At this point, execution will be indefinitely blocked.cluster.get("test-key");
log.info("sub finish");
});
Thread.sleep(10000000);
util.unlockWrite();
log.info("main finish");
Execution result:
The child thread’s get command will remain stalled, waiting for the write lock to be released. Even if maxWaitMillis, connectionTimeout, soTimeout, and maxAttempts are configured, the operation will not trigger an interruption.
This leads to a multi-minute blocking delay.
ENV
Jedis Configuration
maxWaitMillis: 4000ms
connectionTimeout: 2000ms
maxAttempts: 3
soTimeout: 350ms
Jedis version:3.5.0
Redis version:6.2.14
java version:8
The text was updated successfully, but these errors were encountered:
sorry for long time no response. I am trying to understand the possible ways you end up with what you experienced.
Configuration parameters you mentioned;
Jedis Configuration
maxWaitMillis: 4000ms
connectionTimeout: 2000ms
maxAttempts: 3
soTimeout: 350ms
they are (except maxWaitMillis, not able to find it anywhere) all around the transport layer protocol; targeting their limitations/weaknesses and attempting to make them more practical and predictable.
What you are demonstrating on the other hand is about having race conditions and deadlocks in client side code.
Could you provide the stack trace of the WAITING threads as well as more info around getSlotConnection() method you mentioned ?
Expected behavior
The command timeout can be interrupted.
Actual behavior
When performing Jedis operations in the production environment, the system experiences lags lasting several minutes. After troubleshooting with jstack, we found that numerous threads enter a WAITING state after calling getSlotConnection(). Upon examining the source code for JedisClusterInfoCache, I noticed that this class uses a ReentrantReadWriteLock, leading me to suspect that a write lock is being held, which is causing prolonged read lock blocking. Based on this, I developed a tool to proactively acquire the write lock, as outlined below:
Then, execute the following demo:
Execution result:
The child thread’s get command will remain stalled, waiting for the write lock to be released. Even if maxWaitMillis, connectionTimeout, soTimeout, and maxAttempts are configured, the operation will not trigger an interruption.
This leads to a multi-minute blocking delay.
ENV
The text was updated successfully, but these errors were encountered: