[SPARK-54346][SS] Introduce state repartition API and repartition runner #53056

micheal-o · 2025-11-14T05:37:44Z

What changes were proposed in this pull request?

Introducing the API for offline repartitioning of streaming state. This is currently not exposed, since it is still in development. Also implemented some of the core functionalities of the repartition batch runner, that validates the checkpoint, creates the repartition batch and commits. Subsequent PRs will build on this. Also Spark connect and pyspark APIs will be added in subsequent PRs.

Also introduce the streamingCheckpointManager for performing operations on the streaming checkpoint. This is currently not exposed, since it is still in development.

Why are the changes needed?

Streaming state repartitioning

Does this PR introduce any user-facing change?

No

How was this patch tested?

New test suite added

Was this patch authored or co-authored using generative AI tooling?

No

anishshri-db · 2025-11-14T23:36:59Z

common/utils/src/main/resources/error/error-conditions.json

+    },
+    "sqlState" : "55019"
+  },
+  "STATE_REPARTITION_INVALID_PARAMETER" : {


nit: INVALID_OPTIONS ?

I want to avoid confusion, since options in spark means .option(). So using parameter instead.

anishshri-db · 2025-11-14T23:37:43Z

sql/core/src/main/scala/org/apache/spark/sql/classic/SQLContext.scala

+  /**
+   * Returns a `StreamingCheckpointManager` that allows managing any streaming checkpoint.
+   */
+  private[spark] def streamingCheckpointManager: StreamingCheckpointManager =


We need to add to Spark connect also ?

Spark connect and pyspark will be added in subsequent PRs.

anishshri-db · 2025-11-14T23:39:20Z

sql/core/src/main/scala/org/apache/spark/sql/classic/StreamingCheckpointManager.scala

+  /** @inheritdoc */
+  override private[spark] def repartition(
+      checkpointLocation: String,
+      numPartitions: Int,


I guess the underlying recorded value is Int, but should we consider bumping this to Long eventually - probably unlikely for users to have those many partitions though

lets keep it as int since that is what we record in checkpoint. Also using Long is very unrealistic.

anishshri-db · 2025-11-14T23:40:57Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/utils/StreamingUtils.scala

+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+
+object StreamingUtils {


can we put this under this directory -

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/utils/

anishshri-db · 2025-11-14T23:42:23Z

sql/core/src/main/scala/org/apache/spark/sql/classic/StreamingCheckpointManager.scala

+      throw OfflineStateRepartitionErrors.parameterIsNotGreaterThanZeroError("numPartitions")
+    }
+
+    val runner = new OfflineStateRepartitionRunner(


Should we encapsulate this whole block in a try-catch in case we want to catch and log any warnings ?

see run method

anishshri-db · 2025-11-14T23:43:03Z

sql/core/src/main/scala/org/apache/spark/sql/classic/StreamingCheckpointManager.scala

+      numPartitions,
+      enforceExactlyOnceSink
+    )
+    runner.run()


Can we also add some logging to indicate the repartition started/ended and the time it took to complete the operation along with other identifying information about the query ?

see the runner run method

anishshri-db · 2025-11-14T23:43:42Z

...ain/scala/org/apache/spark/sql/execution/streaming/state/OfflineStateRepartitionRunner.scala

+
+    val newBatchId = createNewBatchIfNeeded(lastBatchId, lastCommittedBatchId)
+
+    // todo: Do the repartitioning here, in subsequent PR


Can we create SPARK JIRAs and link them here ?

runner

1d34ced

github-actions bot added SQL STRUCTURED STREAMING labels Nov 14, 2025

micheal-o added 2 commits November 14, 2025 11:17

err clss formating

c7eb1cd

nit

fba27ba

anishshri-db reviewed Nov 14, 2025

View reviewed changes

micheal-o added 3 commits November 14, 2025 16:37

lint

60f7a3f

move utils

51a7458

extra log

7eefad3


		val newBatchId = createNewBatchIfNeeded(lastBatchId, lastCommittedBatchId)

		// todo: Do the repartitioning here, in subsequent PR

[SPARK-54346][SS] Introduce state repartition API and repartition runner #53056

Are you sure you want to change the base?

[SPARK-54346][SS] Introduce state repartition API and repartition runner #53056

Uh oh!

Conversation

micheal-o commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

micheal-o Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

micheal-o commented Nov 14, 2025 •

edited

Loading

micheal-o Nov 15, 2025 •

edited

Loading