[DRAFT] JDBC Driver for Spark Connect Server #52110

pan3793 · 2025-08-25T04:05:45Z

What changes were proposed in this pull request?

This PR proposes introducing a JDBC Driver for Spark Connect Server.

Note: The JDBC standard defines hundreds of APIs, most JDBC drivers only implement a subset of those. This PR is a PoC work that only implements a few sets of JDBC API, but allows integrating with BeeLine to use as a SQL CLI.

This PoC PR only handles NULL, BOOLEAN, BYTE, SHORT, INT, BIGINT, FLOAT, DOUBLE, STRING in ResultSet.

The JDBC URL reuses the current URL used by the Spark Connect client, with an additional prefix jdbc:, e.g., jdbc:sc://localhost:15002

Why are the changes needed?

This enables more pure SQL use cases for Spark Connect Server.

Does this PR introduce any user-facing change?

Yes, a new feature.

How was this patch tested?

1. Add some basic UTs.

2. Manual test with BeeLine

Start a Connect Server first. (I use Spark 4.0.0 as example)

$ sbin/start-connect-server.sh

Package with Hive and STS (required by BeeLine)

$ build/sbt -Phive,hive-thriftserver package

Run BeeLine in interactive mode.

$ SPARK_PREPEND_CLASSES=true bin/beeline -u jdbc:sc://localhost:15002
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
Connecting to jdbc:sc://localhost:15002
Connected to: Apache Spark Connect Server (version 4.0.0)
Driver: Apache Spark Connect JDBC Driver (version 4.1.0-SNAPSHOT)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Beeline version 2.3.10 by Apache Hive
0: jdbc:sc://localhost:15002> select 'Hello, Spark Connect', version();
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/08/25 12:04:09 WARN Utils: Your hostname, H27212-MAC-01.local, resolves to a loopback address: 127.0.0.1; using 10.242.159.140 instead (on interface en0)
25/08/25 12:04:09 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
+-----------------------+-------------------------------------------------+
| Hello, Spark Connect  |                    version()                    |
+-----------------------+-------------------------------------------------+
| Hello, Spark Connect  | 4.0.0 fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4  |
+-----------------------+-------------------------------------------------+
1 row selected (1.759 seconds)
0: jdbc:sc://localhost:15002>

Run BeeLine to execute a SQL file

$ cat > /tmp/select.sql <<EOF
select 'Hello, Spark Connect';
select version();
EOF
$ SPARK_PREPEND_CLASSES=true bin/beeline -u jdbc:sc://localhost:15002 -f /tmp/select.sql
NOTE: SPARK_PREPEND_CLASSES is set, placing locally compiled Spark classes ahead of assembly.
WARNING: Using incubator modules: jdk.incubator.vector
Connecting to jdbc:sc://localhost:15002
Connected to: Apache Spark Connect Server (version 4.0.0)
Driver: Apache Spark Connect JDBC Driver (version 4.1.0-SNAPSHOT)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:sc://localhost:15002> select 'Hello, Spark Connect';
Using Spark's default log4j profile: org/apache/spark/log4j2-defaults.properties
25/08/25 12:04:52 WARN Utils: Your hostname, H27212-MAC-01.local, resolves to a loopback address: 127.0.0.1; using 10.242.159.140 instead (on interface en0)
25/08/25 12:04:52 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
+-----------------------+
| Hello, Spark Connect  |
+-----------------------+
| Hello, Spark Connect  |
+-----------------------+
1 row selected (0.462 seconds)
0: jdbc:sc://localhost:15002> select version();
+-------------------------------------------------+
|                    version()                    |
+-------------------------------------------------+
| 4.0.0 fa33ea000a0bda9e5a3fa1af98e8e85b8cc5e4d4  |
+-------------------------------------------------+
1 row selected (0.046 seconds)
0: jdbc:sc://localhost:15002>
0: jdbc:sc://localhost:15002> Closing: 0: jdbc:sc://localhost:15002
$

Was this patch authored or co-authored using generative AI tooling?

No.

pan3793 · 2025-08-25T04:13:11Z

cc @HyukjinKwon @grundprinzip @hvanhovell @LuciferYang @yaooqinn

Please let me know if the Spark community likes this feature, if so, I will continue the work.

itskals · 2025-08-25T04:35:27Z

This appears promising. JDBC constructs will facilitate easier integration with numerous clients. Please specify the scope of work involved and your expectations to call it done-done. It seems that an SPIP or a more detailed document than the current proposal may be appropriate.

LuciferYang · 2025-08-25T04:56:14Z

If this new approach has the potential to be fully compatible with and replace STS, and can enable Spark to completely remove the STS code in a future version, I will strongly support the introduction of this new feature.

LuciferYang · 2025-08-25T04:56:20Z

also cc @cloud-fan and @zhengruifeng

pan3793 · 2025-08-25T05:09:47Z

@itskals thanks for your response!

Please specify the scope of work involved ...

JDBC has well-defined APIs, so there isn't much room for implementation flexibility.

... your expectations to call it done-done.

This is a good question. I can imagine 3 milestones:

Usable: supports all Spark Primitive Data Types, works well with BeeLine as a SQL CLI to execute SQL, and retrieve results
Done: supports all Spark Data Types, and JDBC API implementation coverage reaches the Hive JDBC driver level, then it can compete with the Spark Thrift Server.
Improvement: implement more JDBC APIs to enable Spark Connect JDBC driver to integrate with more tools, e.g., DBeaver.

It seems that an SPIP or a more detailed document than the current proposal may be appropriate.

If my above two answers do not solve your concerns, I can follow the SPIP guide to process this feature. Thank you again for your quick reply!

pan3793 · 2025-08-25T05:28:15Z

If this new approach has the potential to be fully compatible with and replace STS, and can enable Spark to completely remove the STS code in a future version, I will strongly support the introduction of this new feature.

@LuciferYang thanks for your reply!

I suppose this feature could make Connect Server a drop-in replacement for STS in two typical use cases: 1) use spark-sql/beeline to run SQL; 2) use the JDBC driver to access STS. But for users who use other APIs to access STS, e.g., ODBC driver, Thrift APIs, it requires additional work to migrate from STS to Connect Server.

HyukjinKwon · 2025-08-25T05:44:48Z

I feel like this would need an SPIP ...

pan3793 · 2025-08-25T06:06:27Z

Okay, will prepare an SPIP soon. Feedback is still welcome here :)

grundprinzip · 2025-08-26T08:32:08Z

I like the idea of a native JDBC driver that is based on Spark Connect, that makes a lot of sense! I'm supportive of going through a SPIP here and I think that replacing the Spark Thrift Server is for sure a good idea :)

hvanhovell · 2025-08-26T12:42:56Z

@pan3793 nice work! Very much in favor of this!

pan3793 · 2025-08-26T15:05:17Z

@grundprinzip @hvanhovell thanks for your positive feedback, likely I will submit SPIP docs and start discussion next week

pan3793 added 4 commits August 25, 2025 11:41

[SPARK-53149] Fix JLine running at background detection

0b9c53e

Initial implement Spark Connect JDBC Driver

43714df

fix SPARK_PREPEND_CLASSES

6877bbf

UT

4fd1524

github-actions bot added SQL BUILD CONNECT labels Aug 25, 2025

pan3793 force-pushed the connect-jdbc branch from d804391 to 4fd1524 Compare August 25, 2025 04:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DRAFT] JDBC Driver for Spark Connect Server #52110

[DRAFT] JDBC Driver for Spark Connect Server #52110

Uh oh!

pan3793 commented Aug 25, 2025 •

edited

Loading

Uh oh!

pan3793 commented Aug 25, 2025

Uh oh!

itskals commented Aug 25, 2025

Uh oh!

LuciferYang commented Aug 25, 2025

Uh oh!

LuciferYang commented Aug 25, 2025 •

edited

Loading

Uh oh!

pan3793 commented Aug 25, 2025

Uh oh!

pan3793 commented Aug 25, 2025

Uh oh!

HyukjinKwon commented Aug 25, 2025

Uh oh!

pan3793 commented Aug 25, 2025

Uh oh!

grundprinzip commented Aug 26, 2025

Uh oh!

hvanhovell commented Aug 26, 2025

Uh oh!

pan3793 commented Aug 26, 2025

Uh oh!

Uh oh!

[DRAFT] JDBC Driver for Spark Connect Server #52110

Are you sure you want to change the base?

[DRAFT] JDBC Driver for Spark Connect Server #52110

Uh oh!

Conversation

pan3793 commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

1. Add some basic UTs.

2. Manual test with BeeLine

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

pan3793 commented Aug 25, 2025

Uh oh!

itskals commented Aug 25, 2025

Uh oh!

LuciferYang commented Aug 25, 2025

Uh oh!

LuciferYang commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pan3793 commented Aug 25, 2025

Uh oh!

pan3793 commented Aug 25, 2025

Uh oh!

HyukjinKwon commented Aug 25, 2025

Uh oh!

pan3793 commented Aug 25, 2025

Uh oh!

grundprinzip commented Aug 26, 2025

Uh oh!

hvanhovell commented Aug 26, 2025

Uh oh!

pan3793 commented Aug 26, 2025

Uh oh!

Uh oh!

pan3793 commented Aug 25, 2025 •

edited

Loading

LuciferYang commented Aug 25, 2025 •

edited

Loading