[SPARK-52971] [PYTHON] Limit idle Python worker queue size #51684

craiuconstantintiberiu · 2025-07-28T08:03:15Z

What changes were proposed in this pull request?

Makes the number of idle workers in the PythonWorkerFactory pool configurable.

Why are the changes needed?

Without limiting the maximum queue size, the idle worker pool can grow unbounded. Allows better control over number of workers allowed.

Does this PR introduce any user-facing change?

Yes, adds a new optional configuration entry: spark.python.factory.idleWorkerMaxPoolSize, from Spark 4.1.0

How was this patch tested?

This patch adds two new test to verify behavior with and without the worker limit configuration.

Was this patch authored or co-authored using generative AI tooling?

No

ueshin · 2025-07-30T22:51:33Z

core/src/test/scala/org/apache/spark/api/python/PythonWorkerFactorySuite.scala

+    mockWorkers.foreach(_.stop())
+    worker3.stop()


Shall we surround the above with try block and put these in the final clause to make sure all the workers are cleaned up?

Thanks, made this change.

ueshin · 2025-07-30T22:52:09Z

core/src/test/scala/org/apache/spark/api/python/PythonWorkerFactorySuite.scala

@@ -33,6 +33,12 @@ import org.apache.spark.util.ThreadUtils
 // Tests for PythonWorkerFactory.
 class PythonWorkerFactorySuite extends SparkFunSuite with SharedSparkContext {

+  private def getIdleWorkerCount(factory: PythonWorkerFactory): Int = {
+    val field = factory.getClass.getDeclaredField("idleWorkers")


I guess it's ok to make idleWorkers as private[spark]? cc @HyukjinKwon

but let's comment that this exposed for testing purpose.

Thank you, changed to private[spark] and added comment.

core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala

benrobby

Thanks for making this change, just a few nits

benrobby · 2025-08-01T08:48:16Z

core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala

+  // Visible for testing
+  private[spark] val idleWorkers = new mutable.Queue[PythonWorker]()
+  @GuardedBy("self")
+  private val idleWorkerPoolSize = authHelper.conf.get(PYTHON_FACTORY_IDLE_WORKER_MAX_POOL_SIZE)


why do we access the spark conf via authHelper?

Thanks, changed everywhere to use conf instead of authHelper.conf when needed.

benrobby · 2025-08-01T08:50:17Z

core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala

+  // Visible for testing
+  private[spark] val idleWorkers = new mutable.Queue[PythonWorker]()
+  @GuardedBy("self")
+  private val idleWorkerPoolSize = authHelper.conf.get(PYTHON_FACTORY_IDLE_WORKER_MAX_POOL_SIZE)


could you adjust the variable name to reflect that it's the maximum size?

Thanks, done.

benrobby · 2025-08-01T08:50:47Z

core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala

@@ -484,6 +487,15 @@ private[spark] class PythonWorkerFactory(
      self.synchronized {
        lastActivityNs = System.nanoTime()
        idleWorkers.enqueue(worker)
+        if (idleWorkerPoolSize.exists(idleWorkers.size > _)) {


should this be >=? Otherwise you'll have idleWorkerPoolSize + 1 worker in the pool

Yes, but should not be a problem at the moment.
If we are at maxIdleWorkerPoolSize workers in the queue and we add 1 more, we will go over the limit and then remove the worker.

Not a problem now but could be an issue in the future if the queue implementation is changed to one that actually enforces this and throws an exception.

Changed.

Makes sense, I see that the worker was enqueued anyways before you were doing this check, so in either case it will be at most workers

ueshin

LGTM, pending @benrobby's comments.

benrobby

LGTM

benrobby · 2025-08-04T12:38:00Z

core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala

@@ -484,6 +487,15 @@ private[spark] class PythonWorkerFactory(
      self.synchronized {
        lastActivityNs = System.nanoTime()
        idleWorkers.enqueue(worker)
+        if (idleWorkerPoolSize.exists(idleWorkers.size > _)) {


Makes sense, I see that the worker was enqueued anyways before you were doing this check, so in either case it will be at most workers

ueshin · 2025-08-05T18:34:04Z

Thanks! merging to master.

Make idle worker pool size limit configurable

2d2ee05

github-actions bot added CORE PYTHON labels Jul 28, 2025

ueshin reviewed Jul 30, 2025

View reviewed changes

tibicrai-db added 2 commits July 31, 2025 02:51

Simplify testing, call stopWorker

428b861

Fix scalastyle.

6b2c9e3

craiuconstantintiberiu requested review from HyukjinKwon and ueshin July 31, 2025 18:21

benrobby reviewed Aug 1, 2025

View reviewed changes

ueshin approved these changes Aug 1, 2025

View reviewed changes

Minor fixes.

7eeeef6

craiuconstantintiberiu requested a review from benrobby August 2, 2025 00:15

benrobby approved these changes Aug 4, 2025

View reviewed changes

ueshin closed this in ca02481 Aug 5, 2025

[SPARK-52971] [PYTHON] Limit idle Python worker queue size #51684

[SPARK-52971] [PYTHON] Limit idle Python worker queue size #51684

Uh oh!

Conversation

craiuconstantintiberiu commented Jul 28, 2025

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

benrobby left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benrobby Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ueshin left a comment

Choose a reason for hiding this comment

Uh oh!

benrobby left a comment

Choose a reason for hiding this comment

Uh oh!

benrobby Aug 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ueshin commented Aug 5, 2025

Uh oh!

Uh oh!

benrobby Aug 4, 2025 •

edited

Loading

benrobby Aug 4, 2025 •

edited

Loading