Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi node pipeline executor #4070

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

dolavb
Copy link

@dolavb dolavb commented Jan 22, 2025

This allow clients to provide an ExecutorService at pipeline creation instead of having a new ExecutorService created at every pipeline Sync call. An Executor service creation will create new threads, which are expensive resource to creates. In a high throughput application developed internally, we are writing at a rate of ~100k set per seconds, on a six node cluster, on an instance equivalent to an EC2 m5.12xlarge. The creation of threads uses 40% of CPU, and adds substantial latency. This new approach will allow clients to send a pooled Executor that is tuned to there load patterns.

This change is non breaking, but will come with a slight optimization for the clients currently using the created thread pool. In the current approach even if a pipeline has a single connection to close the Executor service will create MULTI_NODE_PIPELINE_SYNC_WORKERS threads. In the default mode would mean wasting 2 thread creation.


From: https://stackoverflow.com/questions/5483047/why-is-creating-a-thread-said-to-be-expensive

Thread lifecycle overhead. Thread creation and teardown are not free. The actual overhead varies across platforms, but thread creation takes time, introducing latency into request processing, and requires some processing activity by the JVM and OS. If requests are frequent and lightweight, as in most server applications, creating a new thread for each request can consume significant computing resources.

From Java Concurrency in Practice
By Brian Goetz, Tim Peierls, Joshua Bloch, Joseph Bowbeer, David Holmes, Doug Lea
Print ISBN-10: 0-321-34960-1

Java thread creation is expensive because there is a fair bit of work involved:

  • A large block of memory has to be allocated and initialized for the thread stack.
  • System calls need to be made to create / register the native thread with the host OS.
  • Descriptors need to be created, initialized and added to JVM-internal data structures.

dolavb and others added 2 commits January 17, 2025 13:50
This allows passing an ExecutorService when creating a ClusterPipeline. The previous
parallelization approach for pipeline syncing/closing would create a new executor
service for each sync operation, resulting in excessive thread creation and
termination. On an EC2 m5.12xlarge instance with ~100k single writes/sec, this
thread creation consumed 40% CPU and increased operation latency.

The change also optimizes thread usage when no ExecutorService is provided.
Previously, even a single pipeline within a multipipeline would create 3 threads
for syncing. This improvement removes that overhead, though callers are encouraged
to provide their own ExecutorService for optimal CPU usage and latency.
@sazzad16
Copy link
Contributor

@dolavb Thank you for your effort to improve Jedis.
Your concern is justified. But we have our hands full ATM. We'll try to get to this PR ASAP.

@dolavb dolavb marked this pull request as ready for review January 24, 2025 01:51
@dolavb
Copy link
Author

dolavb commented Feb 7, 2025

Executed some more load testing, and we have host spending 60% our total CPU usage on starting threads.

This shows the latency resulting from starting thread, 5 times the time than waiting on Redis to finish the write.
Screenshot 2025-02-07 at 2 02 03 PM

This shows the CPU usage resulting from the thread starting.
Screenshot 2025-02-07 at 2 02 46 PM

@sazzad16
Copy link
Contributor

sazzad16 commented Feb 9, 2025

cc @uglide @ggivo @atakavci

@atakavci
Copy link
Contributor

lets start considering on a pool of executorService instances. And discuss tradeoffs of each solution we have. For sure we are doing this for performance but we need to regard other aspects from API standpoint, like principles of information hiding and encapsulation .

Copy link
Contributor

@ggivo ggivo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the described use-case make sense and also the direction of the PR will help resolve performance consideration.
As @atakavci pointed it is good to discuss from API point of view if it make sense to expose directly ExecutorService
@atakavci Do you have anything in mind?


syncing = false;
private void closeConnection(Map.Entry<HostAndPort, Queue<Response<?>>> entry) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The method name is misleading. It is actually reading the command responses rather than closing the connections

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will address.

try {
countDownLatch.await();
awaitAllCompleted.get();
if (executorService != this.executorService) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a more explicit approach.
e.g, when creating the MultiNodePipelineBase, if an external executor is provided, have a flag useSharedExecutor = true.
This way it will be clearer that lifecycle is managed externaly

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will address.

@dolavb
Copy link
Author

dolavb commented Feb 13, 2025

@atakavci Yes, I think the API considerations are important. It is unfortunate that this now have an API that is essentially a global variable in the static field. Ohterwise, I would have chosen one of the typical approaches:

  • Have an optional ExecutorService in the client config.
    • Handle the option at the implementation level, if present parallel calls, if absent, serial calls.
  • Have a thread pool configuration in the client config (threadcount and the such).
    • And expose the resulting threadpool in some way for clients who are adding JMX sensor or other things to their threadpools, like in my usecase.

These approaches would keep changes to the API small by avoiding injecting the ExecutorService in the client's pipeline method. The executorService would be a field in the client config that would be injected from there when creating a pipeline.

I did not go for any of these approach because of the unfortunate public static variable. There might be a way to do it, but I am not sure it would be simpler if we want to maintain backward compatibility. So do we want to break people who have started depending on that static field?

As for information hiding, I am not sure I understand, a ThreadPool is not information it is resources, if it is the responsibility of the client to create it, it needs to be exposed in some way. I am running a system where resources are important, we monitor all of our threadpools for tuning, and we do this using JMX sensors. I would not use a lib that hides a threadpool.

@atakavci
Copy link
Contributor

hey @dolavb i partially agree on what you stated. There are couple of ways to do it without introducing a breaking change.
i was thinking suggestion to "have a pool of ExecutorService " is kind of self descriptive but let me come up with a draft or at least some pseudo code around it.
BTW, with information hiding i m basicaly referring to exposing the use of a ExecutorService is not an ideal thing to do. The ones using public API has nothing to do with how(or with what) it actually runs under the hood, and it should remain that way. Leaving a small question here to clarify; what happens as library maintainers we decide to move away from existing ExecutorService implementations tomorrow ?

@dolavb
Copy link
Author

dolavb commented Feb 17, 2025

@atakavci I agree there are a couple of ways to do it without adding a breaking change, but I believe the complexity is not worth it. But your call.

@atakavci
Copy link
Contributor

atakavci commented Feb 18, 2025

@dolavb , @ggivo please take a look at this and let me know what you think of.
downside of this, as is, the lack of mechanism to collect the servicePool

@ggivo
Copy link
Contributor

ggivo commented Feb 18, 2025

@dolavb @atakavci

downside of this, as is, the lack of mechanism to collect the servicePool

The downside mentioned is a critical one since there is no way to free the resources at will
Also, with this approach, we still have the configuration of the pool/executors hidden to the user, and they lack the option to optimise it for their use case.

Providing the option for using external ExecutorService will allow for customisable ExecutorService based on the user's needs, and users will have control over the lifecycle of the ExecutorService.

Why do we consider this to be a breaking change if we preserve the default behaviour to be the existing one?

  • If an external ExecutorService is not explicitly provided, we still use the existing approach (create new ExecutorService using the statically provided configuration )
  • To use shared ExecutorService, users should explicitly set it (via JedisClientConfig, or provide it when constructing the pipeline)

From API perspective, we are adding a new configuration option, which should not be a breaking change

@atakavci
Copy link
Contributor

atakavci commented Feb 18, 2025

to get it clear, no breaking solutions proposed in this PR as i see.
yet i am on behalf of confirming if we are comfortable at adding Executors to public API?
Its nice that it gives great control to the app!
May be we should emphasize/declare more on two modes of execution with MultiNodePipelineExecutor, not sure.
Question: Does it make sense to introduce a simple interface that provides only submit(...) on it ?
Having shared ExecutorService and owned one interchangeable like the way here, could it lead to potential confusion or misuse in the future ?

@ggivo
Copy link
Contributor

ggivo commented Feb 19, 2025

Maybe we should emphasize/declare more on two modes of execution with MultiNodePipelineExecutor, not sure. Question: Does it make sense to introduce a simple interface that provides only submit(...) on it ? Having shared ExecutorService and owned one interchangeable like the way here, could it lead to potential confusion or misuse in the future?

What I can suggest here is to document the API with a clear statement that when external ExecutorService is provided it's lifecycle is not managed by the Jedis client and is the user's responsibility for proper resource management (shutdown).

In addition, I suggest wrapping the ExecutorService in simple PipelineExecutor interface with submit/shutdown methods, which will allow us some control in the future if we need to add logging and metrics or even implement an executor that performs the call in the caller thread directly.

public interface PipelineExecutor {

  void submit(Runnable task);

  default void shutdown() {
    //  you would implement shutdown behavior 
  }

  static PipelineExecutor from(ExecutorService executorService) {
    return new PipelineExecutor() {
      @Override
      public void submit(Runnable task) {
        executorService.submit(task);
      }

      @Override
      public void shutdown() {
        executorService.shutdown();  // Shuts down the ExecutorService
      }
    };
  }
}

@dolavb @atakavci
Appreciate the time spent & effort in this issue. Let me know if the above suggestion makes sense

Comment on lines 49 to 57
/**
* Sub-classes must call this method, if graph commands are going to be used.
* @param connectionProvider connection provider
*/
protected final void prepareGraphCommands(ConnectionProvider connectionProvider) {
GraphCommandObjects graphCommandObjects = new GraphCommandObjects(connectionProvider);
graphCommandObjects.setBaseCommandArgumentsCreator((comm) -> this.commandObjects.commandArguments(comm));
super.setGraphCommands(graphCommandObjects);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Graph module support got removed with #4073

@dolavb
Copy link
Author

dolavb commented Feb 21, 2025

@ggivo I agree this is a better approach.

@atakavci upon review of your gist I have the following question, why the countdownlatch? Since the introduction of CompletableFuture (JDK8), future composition is the idiomatic way to wait/merge threads. I don't see any backward compatibility concern here since JDK7 is EOL.

@atakavci
Copy link
Contributor

i dont mind if its countdownlatch or Futures, i just did it in min lines of change with existing code. rest is the diff editor's trick.

@dolavb
Copy link
Author

dolavb commented Feb 22, 2025

While working on this I saw this. Do I understand correctly that we shutdown the topologyRefreshExecutor in the JedisClusterInfoCache whenever we close a pipeline?

I apologize for the lack of clean reference. I am not the most versed with GitHub, and could not find a way to better reference this here.

@ggivo
Copy link
Contributor

ggivo commented Feb 22, 2025

While working on this I saw this. Do I understand correctly that we shut down the topologyRefreshExecutor in the JedisClusterInfoCache whenever we close a pipeline?

I don't think that is the case, but let's keep the discussion in referred PR, not to clutter this one

@dolavb
Copy link
Author

dolavb commented Feb 22, 2025

As part of this PR, should we mark public static volatile int MULTI_NODE_PIPELINE_SYNC_WORKERS = 3; as deprecated? With a deprecation comment like this:

* @deprecated Client using this approach are paying the thread creation cost for every pipeline sync. Clients
   * should use refer to {@link JedisClientConfig#getPipelineExecutorProvider()} to provide a single Executor for
   * gain in performance.

I would advise for having a default Executor to be provided in the future and this would be in preparation of that change. Otherwise I will keep the comment, but remove the deprecated annotation.

This allow the configuration of ClusterPipelineExecutor to
sync pipeline in parallel. The default implementation remain
problematic. This new approach will allow clients to address
the performance issue fo the default approach.
@dolavb
Copy link
Author

dolavb commented Feb 22, 2025

@ggivo This makes for a bigger change but offer a path away from the current static implementation. Let me know what you think.

@ggivo
Copy link
Contributor

ggivo commented Feb 26, 2025

@ggivo This makes for a bigger change but offer a path away from the current static implementation. Let me know what you think.
@dolavb
Sorry for the delay. Got distracted with other tasks, but I will take a look till the end of the week

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants