You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-54194][CONNECT][FOLLOWUP] Spark Connect Proto Plan Compression - Scala Client
### What changes were proposed in this pull request?
In the previous PR #52894 of Spark Connect Proto Plan Compression, both Server-side and PySpark client changes were implemented.
In this PR, the corresponding Scala client changes are implemented, so plan compression are now supported on the Scala client as well.
To reproduce the existing issue we are solving here, run this code on Spark Connect Scala client:
```
import scala.util.Random
import org.apache.spark.sql.DataFrame
import spark.implicits._
def randomLetters(n: Int): String = {
Iterator.continually(Random.nextPrintableChar())
.filter(_.isLetter)
.take(n)
.mkString
}
val numUniqueSmallRelations = 5
val sizePerSmallRelation = 512 * 1024
val smallDfs: Seq[DataFrame] =
(0 until numUniqueSmallRelations).map { _ =>
Seq(randomLetters(sizePerSmallRelation)).toDF("value")
}
var resultDf = smallDfs.head
for (_ <- 0 until 500) {
val idx = Random.nextInt(smallDfs.length)
resultDf = resultDf.unionByName(smallDfs(idx))
}
resultDf.collect()
```
It fails with RESOURCE_EXHAUSTED error with message `gRPC message exceeds maximum size 134217728: 269207219`, because the server is trying to send an ExecutePlanResponse of ~260MB to the client.
With the improvement introduced by the PR, the above code runs successfully and prints the expected result.
### Why are the changes needed?
It improves Spark Connect stability when handling large plans.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
New tests.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#53003 from xi-db/plan-compression-scala-client.
Authored-by: Xi Lyu <[email protected]>
Signed-off-by: Herman van Hovell <[email protected]>
(cherry picked from commit 6cb88c1)
Signed-off-by: Herman van Hovell <[email protected]>
0 commit comments