Skip to content

[CI] Do not upload dockerbuild #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 485 commits into
base: master
Choose a base branch
from

Conversation

EnricoMi
Copy link

@EnricoMi EnricoMi commented Sep 4, 2024

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

yaooqinn and others added 29 commits April 25, 2025 11:09
### What changes were proposed in this pull request?

Like what we‘ve improved in apache#50674.

This PR introduces TypedConfigBuilder for Java enums and leverages it for existing configurations that use enums as parameters.

Before this PR, we need to change them from Enumeration to string, string to Enumeration, back and forth... We also need to do upper-case transformation, .checkValues validation one by one.

After this PR, those steps are centralized.

### Why are the changes needed?

Better support for java-enum-like configurations

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
new tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#50691 from yaooqinn/SPARK-51896.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
### What changes were proposed in this pull request?

This PR aims to update `setup-minikube` to the latest version v0.0.19.

### Why are the changes needed?

Currently, we use `v0.0.18` (2024-06-18). We had better use the latest one.
- https://github.com/medyagh/setup-minikube/releases/tag/v0.0.19 (2025-01-23)

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50702 from dongjoon-hyun/SPARK-51908.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
Fix python lint

Closes apache#50705 from zhengruifeng/fix_lint_x.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
### What changes were proposed in this pull request?
Add 4 missing functions to API references

### Why are the changes needed?
for docs

### Does this PR introduce _any_ user-facing change?
doc-only change

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#50709 from zhengruifeng/doc_missing_fcs.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
…ug in connect-only mode

### What changes were proposed in this pull request?
Enable SparkConnectDataFrameDebug in connect-only mode

### Why are the changes needed?
to improve test coverage

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
CI

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#50710 from zhengruifeng/connect-only-df-debug.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
…roper exception instead of an internal one

### What changes were proposed in this pull request?
Following query throw `Cannot cast NullType to Arraytype`:

```
SELECT get(null, 0);
```

instead of throwing a more user friendly one. I propose that we fix that.

### Why are the changes needed?
To correct behavior of `get` function.

### Does this PR introduce _any_ user-facing change?
Query that were failing with internal error are now throwing a more user friendly one.

### How was this patch tested?
Added tests.

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50590 from mihailoale-db/getnull.

Authored-by: mihailoale-db <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
…alyzer

### What changes were proposed in this pull request?

Properly throw datatype mismatch in single-pass Analyzer. Currently we don't have a way to pass a resolved operator to `failOnTypeCheckResult`, so we pass `None` - this simply omits the `issueFixedIfAnsiOff` functionality.

### Why are the changes needed?

This improves error message reporting in single-pass Analyzer.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50697 from vladimirg-db/vladimir-golubev_data/throw-datatype-mismatch-in-single-pass-analyzer.

Authored-by: Vladimir Golubev <[email protected]>
Signed-off-by: Max Gekk <[email protected]>
…nd ignoring non-batch files when listing OperatorMetadata files

### What changes were proposed in this pull request?

Currently, we don't want to purge StateSchemaV3 files, so we need to remove the relevant call from MicrobatchExecution.
Additionally, we want to ignore any files in the metadata or state schema directory that don't have a Long (which would cause a parse exception)

### Why are the changes needed?

The changes are needed because we cannot purge schema files because these are necessary until full rewrite is implemented.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50700 from ericm-db/remove-async-purge.

Authored-by: Eric Marnadi <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
…angelogReaderFactory for v1

### What changes were proposed in this pull request?

Catch the UTFDataFormatException thrown for v1 in the StateStoreChangelogReaderFactory and assign the version to 1.

### Why are the changes needed?

We should not throw this error.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit test

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50721 from liviazhu-db/liviazhu-db/master.

Authored-by: Livia Zhu <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
…sFinderSuite`

### What changes were proposed in this pull request?
This pr fixes an incorrect `assume` behavior in the `ClassFinderSuite` test suite, which was introduced in SPARK-51623.

The issue stems from the fact that the `expectedClassFiles` list contained file paths without their parent directories. Consequently, the assertion added in SPARK-51623

https://github.com/apache/spark/blob/b634978936499f58f8cb2e8ea16339feb02ffb52/sql/connect/client/jvm/src/test/scala/org/apache/spark/sql/connect/client/ClassFinderSuite.scala#L40

would always evaluate to `false`, causing the test case to be permanently marked as `CANCELED`.

We can observe relevant test cases in the GA testing phase, for example:

- https://github.com/apache/spark/actions/runs/14675551942/job/41191081107

![image](https://github.com/user-attachments/assets/15d37903-63b7-41a0-a628-2379f8385623)

Therefore, we should modify the check to `assume` whether the pre-defined class files exist within the source directory (`classResourcePath`).

### Why are the changes needed?
Fix the `erroneous` assume in the `ClassFinderSuite`.

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Acitons
- Locally test

```
build/sbt clean "connect-client-jvm/testOnly org.apache.spark.sql.connect.client.ClassFinderSuite"
```

**Before**

```
[info] ClassFinderSuite:
[info] - REPLClassDirMonitor functionality test !!! CANCELED !!! (202 milliseconds)
[info]   p.toFile().exists() was false (ClassFinderSuite.scala:40)
[info]   org.scalatest.exceptions.TestCanceledException:
[info]   at org.scalatest.Assertions.newTestCanceledException(Assertions.scala:475)
[info]   at org.scalatest.Assertions.newTestCanceledException$(Assertions.scala:474)
[info]   at org.scalatest.Assertions$.newTestCanceledException(Assertions.scala:1231)
[info]   at org.scalatest.Assertions$AssertionsHelper.macroAssume(Assertions.scala:1310)
[info]   at org.apache.spark.sql.connect.client.ClassFinderSuite.$anonfun$new$3(ClassFinderSuite.scala:40)
[info]   at scala.collection.immutable.List.foreach(List.scala:334)
[info]   at org.apache.spark.sql.connect.client.ClassFinderSuite.checkClasses$1(ClassFinderSuite.scala:40)
[info]   at org.apache.spark.sql.connect.client.ClassFinderSuite.$anonfun$new$1(ClassFinderSuite.scala:48)
[info]   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.scala:18)
[info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
[info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
[info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
[info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:226)
[info]   at org.scalatest.TestSuite.withFixture(TestSuite.scala:196)
[info]   at org.scalatest.TestSuite.withFixture$(TestSuite.scala:195)
[info]   at org.scalatest.funsuite.AnyFunSuite.withFixture(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:224)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:236)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:218)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTest(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTests$1(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.SuperEngine.$anonfun$runTestsInBranch$1(Engine.scala:413)
[info]   at scala.collection.immutable.List.foreach(List.scala:334)
[info]   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
[info]   at org.scalatest.SuperEngine.runTestsInBranch(Engine.scala:396)
[info]   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:475)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests(AnyFunSuiteLike.scala:269)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.runTests$(AnyFunSuiteLike.scala:268)
[info]   at org.scalatest.funsuite.AnyFunSuite.runTests(AnyFunSuite.scala:1564)
[info]   at org.scalatest.Suite.run(Suite.scala:1114)
[info]   at org.scalatest.Suite.run$(Suite.scala:1096)
[info]   at org.scalatest.funsuite.AnyFunSuite.org$scalatest$funsuite$AnyFunSuiteLike$$super$run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$run$1(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.SuperEngine.runImpl(Engine.scala:535)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run(AnyFunSuiteLike.scala:273)
[info]   at org.scalatest.funsuite.AnyFunSuiteLike.run$(AnyFunSuiteLike.scala:272)
[info]   at org.scalatest.funsuite.AnyFunSuite.run(AnyFunSuite.scala:1564)
[info]   at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:321)
[info]   at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:517)
[info]   at sbt.ForkMain$Run.lambda$runTest$1(ForkMain.java:414)
[info]   at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
[info]   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
[info]   at java.base/java.lang.Thread.run(Thread.java:840)
[info] Run completed in 628 milliseconds.
[info] Total number of tests run: 0
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 0, failed 0, canceled 1, ignored 0, pending 0
[info] No tests were executed.

```

**After**

```
[info] ClassFinderSuite:
[info] - REPLClassDirMonitor functionality test (169 milliseconds)
[info] Run completed in 530 milliseconds.
[info] Total number of tests run: 1
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 1, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

Manually delete `Hello.class`, `smallClassFile.class` and `smallClassFileDup.class`, then proceed with the test, the test will be skipped.

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50725 from LuciferYang/SPARK-51925.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…ble error correctly

### What changes were proposed in this pull request?

As of today, when people use jdbc v2 and try to query a nonexisting table, they will get `FAILED_JDBC.LOAD_TABLE` error. This is a bit confusing as the real error is table not exist.

This PR improves the error message by using an additional table existence check and throw no such table error if the table does not exists.

### Why are the changes needed?

better error messaging

### Does this PR introduce _any_ user-facing change?

yes people will see clearer errors if the JDBC table does not exist

### How was this patch tested?

updated existing tests

### Was this patch authored or co-authored using generative AI tooling?

no

Closes apache#50706 from cloud-fan/jdbc.

Lead-authored-by: Wenchen Fan <[email protected]>
Co-authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
…dPrefixes

### What changes were proposed in this pull request?
This PR adds `com.mysql.cj` to `spark.sql.hive.metastore.sharedPrefixes`

### Why are the changes needed?

Following upstream changes https://github.com/mysql/mysql-connector-j

### Does this PR introduce _any_ user-facing change?
no
### How was this patch tested?

existing tests

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#50711 from yaooqinn/SPARK-51914.

Authored-by: Kent Yao <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
### What changes were proposed in this pull request?
This pr aims to upgrade Apache `commons-collections4` from 4.4 to 4.5.0

### Why are the changes needed?
The full release notes as follows:
- https://commons.apache.org/proper/commons-collections/changes.html#a4.5.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
- Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50723 from LuciferYang/SPARK-51923.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
### What changes were proposed in this pull request?

This PR proposes to sync the missing python function types which are out of sync between Scala and Python.

### Why are the changes needed?

These types are supposed to be sync between Scala and Python.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50728 from HeartSaVioR/SPARK-51814-follow-up-sync-function-type.

Authored-by: Jungtaek Lim <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…eption

### What changes were proposed in this pull request?

apache#50693 enabled `SparkConnectErrorTests` in connect-only mode

`toJSON` and `rdd` throw `PySparkNotImplementedError` in connect model, but `PySparkAttributeError` in connect-only model

### Why are the changes needed?
to fix https://github.com/apache/spark/actions/runs/14649632571/job/41112060443

### Does this PR introduce _any_ user-facing change?
no

### How was this patch tested?
will be tested in daily builder

### Was this patch authored or co-authored using generative AI tooling?
no

Closes apache#50708 from zhengruifeng/follow_up_connect_error.

Authored-by: Ruifeng Zheng <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
… messageParameters of CAST_INVALID_INPUT and CAST_OVERFLOW

### What changes were proposed in this pull request?

In Spark Connect, we guarantee that older clients are compatible with newer
versions of the Spark Connect service.

A previous change - e28c33b - broke this compatibility by removing the
"ansiConfig" field in the message parameters for two error codes -
"CAST_OVERFLOW" and "CAST_INVALID_INPUT".

The Spark Connect client includes GrpcExceptionConverter.scala\[1] to
convert error codes from the server to produce SQL compliant error codes
on the client. The SQL compliant error codes and corresponding error
messages are included in the error-conditions.json file. Older clients do not
include the change (e28c33b) to this file and still include the `ansiConfig`
parameter. Later versions of the Spark Connect service don't return this
parameter resulting in an internal error\[2] that the correct error condition
could not be formulated.

This change reverts the changes on the server to continue producing the
"ansiConfig" field so older clients can still correctly reformulate the error class.

\[1]: https://github.com/apache/spark/blob/2ba156096e83adf7b0b2f5c38453d6fd37d95ded/sql/connect/common/src/main/scala/org/apache/spark/sql/connect/client/GrpcExceptionConverter.scala#L184
\[2]: https://github.com/apache/spark/blob/2ba156096e83adf7b0b2f5c38453d6fd37d95ded/common/utils/src/main/scala/org/apache/spark/ErrorClassesJSONReader.scala#L58

### Why are the changes needed?

Explained above.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Manual tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50604 from nija-at/cast-invalid-input.

Authored-by: Niranjan Jayakar <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
…e in PySpark

### What changes were proposed in this pull request?

This PR proposes to support Spark Connect on transformWithState in PySpark. The code is mostly reused between Pandas version and Row version.

We rely on PythonEvanType to determine the user facing type of API, hence no proto change.

### Why are the changes needed?

The new API needs to be supported with Spark Connect.

### Does this PR introduce _any_ user-facing change?

Yes, we will expose a new API to be available in Spark Connect.

### How was this patch tested?

New test suites.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50704 from HeartSaVioR/WIP-transform-with-state-python-in-spark-connect.

Authored-by: Jungtaek Lim <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
…o 5.12.2

### What changes were proposed in this pull request?
This pr aims to upgrade `jupiter-interface` from 0.13.3 to 0.14.0 and Junit5 to the latest version(Platform 1.12.2 + Jupiter 5.12.2).

### Why are the changes needed?
The full release notes of `jupiter-interface` as follows:

- https://github.com/sbt/sbt-jupiter-interface/releases/tag/v0.14.0

and the full release notes between Junit 5.11.4 to 5.12.2 as follows:

- https://junit.org/junit5/docs/5.12.2/release-notes/#release-notes-5.12.2

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50724 from LuciferYang/SPARK-51924.

Authored-by: yangjie01 <[email protected]>
Signed-off-by: yangjie01 <[email protected]>
### What changes were proposed in this pull request?

This PR aims to upgrade AWS SDK v2 to 2.29.52.

### Why are the changes needed?

Like [Apache Iceberg v1.8.1](https://iceberg.apache.org/releases/#181-release) and Apache Hadoop 3.4.2 (HADOOP-19485), Apache Spark 4.1.0 had better use the latest one.
- apache/hadoop#7479
- apache/iceberg#12339

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Pass the CIs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50731 from dongjoon-hyun/SPARK-51929.

Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Kent Yao <[email protected]>
…Size

### What changes were proposed in this pull request?
The current implementation of the `prepare` in `OffsetWindowFunctionFrameBase`:
```
  override def prepare(rows: ExternalAppendOnlyUnsafeRowArray): Unit = {
    if (offset > rows.length) {
      fillDefaultValue(EmptyRow)
    } else {
    ...
  }
```
The current implementation of the `write` in `FrameLessOffsetWindowFunctionFrame`:
```
   override def write(index: Int, current: InternalRow): Unit = {
    if (offset > rows.length) {
      // Already use default values in prepare.
    } else {
    ...
  }
```

These implementations caused the `LEAD` and `LAG` functions to have `NullPointerException` when the default value is not Literal and the range of the default value exceeds the window group size.

This pr introduced a boolean val `onlyLiteralNulls` and modified `prepare` and `write`.

The `onlyLiteralNulls` indicated whether the default values are Literal values.

In `prepare`, first check `onlyLiteralNulls`. If the default value is Literal, call `fillDefaultValue(EmptyRow)`.

In `write`, if `onlyLiteralNulls ` is false, the default value must be non-literal, call `fillDefaultValue(current)`.

### Why are the changes needed?
Fix `LEAD` and `LAG` cause NullPointerException in the window function (SPARK-51757)

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Add test method in test("lead/lag with column reference as default when offset exceeds window group size") in org.apache.spark.sql.DataFrameWindowFramesSuite

### Was this patch authored or co-authored using generative AI tooling?
No.

Closes apache#50552 from xin-aurora/windowFuncFix.

Lead-authored-by: xin-aurora <[email protected]>
Co-authored-by: Xin Zhang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request?
This pr aims to upgarde Apache `common-text` from 1.13.0 to 1.13.1.

### Why are the changes needed?
The full release notes as follows:

- https://github.com/apache/commons-text/blob/rel/commons-text-1.13.1/RELEASE-NOTES.txt

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50732 from LuciferYang/SPARK-51928.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
### What changes were proposed in this pull request?

Adds a new state store config `unloadOnCommit` that unloads the state store instance from the executor at task completion. This frees up resources on the executor and prevents potentially unbounded resource usage from continually adding more state store instances to a single executor.

A task completion listener will execute a synchronous maintenance followed by a close on the state store. Since we do the maintenance synchronously, we never need to start the background maintenance thread.

### Why are the changes needed?

Stateful streams can have trouble scaling to large volumes of data without also increasing the total resources allocated to the application. By unloading state stores on task completion, stateful streams are able to complete with fewer resources, at the cost of slightly higher latency per batch in certain scenarios.

### Does this PR introduce _any_ user-facing change?

Yes, adds a new config for changing the behavior of stateful streams.

### How was this patch tested?

New UT is added to show the config takes effect. I'm not sure what all corner cases may need to be tested with this.

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50612 from Kimahriman/state-store-unload-on-commit.

Authored-by: Adam Binford <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
…e in PySpark

### What changes were proposed in this pull request?

This PR proposes to support Spark Connect on transformWithState in PySpark. The code is mostly reused between Pandas version and Row version.

We rely on PythonEvanType to determine the user facing type of API, hence no proto change.

### Why are the changes needed?

The new API needs to be supported with Spark Connect.

### Does this PR introduce _any_ user-facing change?

Yes, we will expose a new API to be available in Spark Connect.

### How was this patch tested?

New test suites.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50704 from HeartSaVioR/WIP-transform-with-state-python-in-spark-connect.

Authored-by: Jungtaek Lim <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
…references

### What changes were proposed in this pull request?

Fix ML cache object python client references.

When a model is copied from client, it results in multiple client model objects refer to the same server cached model.
In this case, we need a reference count, only when reference count decreases to zero, we can release the server cached model.

### Why are the changes needed?

Bugfix.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Unit tests.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50707 from WeichenXu123/ml-ref-id-fix.

Lead-authored-by: Weichen Xu <[email protected]>
Co-authored-by: WeichenXu <[email protected]>
Signed-off-by: Ruifeng Zheng <[email protected]>
…for PythonEvalType.toString

### What changes were proposed in this pull request?

This PR adds missing type handling for PythonEvalType.toString.

### Why are the changes needed?

Just completeness's sake. This isn't based on actual observed failure.

### Does this PR introduce _any_ user-facing change?

No.

### How was this patch tested?

Existing UTs.

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#50736 from HeartSaVioR/SPARK-51814-followup-2.

Authored-by: Jungtaek Lim <[email protected]>
Signed-off-by: Jungtaek Lim <[email protected]>
…istTables()

### What changes were proposed in this pull request?

- Revert apache#50515
- Implement error handling rules based on error conditions for Spark errors in spark.catalog.listTables().

### Why are the changes needed?

There are risks associated with working with partial data, especially when unaware that some tables are broken. Throwing an exception instead provides a clear indication that something is wrong.

Instead we can use error handling rules to determine the proper behavior on a case-by-case basis. SparkThrowable should be sufficient to capture the cases we want to handle.

### Does this PR introduce _any_ user-facing change?

No

### How was this patch tested?

Existing tests (e.g. `build/sbt "sql/testOnly *CatalogSuite"`, `build/sbt "hive/testOnly *HiveDDLSuite"`)

### Was this patch authored or co-authored using generative AI tooling?

No

Closes apache#50696 from heyihong/SPARK-51899.

Authored-by: Yihong He <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
### What changes were proposed in this pull request?
This pr aims to upgrade `datasketches-java` from 6.1.1 to 6.2.0.

### Why are the changes needed?
Based on the release notes, this version fixes a bug that was discovered in the Theta compression algorithm.
- https://github.com/apache/datasketches-java/releases/tag/6.2.0

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Pass GitHub Actions

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#50733 from LuciferYang/SPARK-51930.

Lead-authored-by: yangjie01 <[email protected]>
Co-authored-by: YangJie <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
@EnricoMi EnricoMi force-pushed the ci-do-not-upload-dockerbuild branch from bde9fa9 to 7c2793e Compare April 28, 2025 04:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.