-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[DRAFT] JDBC Driver for Spark Connect Server #52110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cc @HyukjinKwon @grundprinzip @hvanhovell @LuciferYang @yaooqinn Please let me know if the Spark community likes this feature, if so, I will continue the work. |
This appears promising. JDBC constructs will facilitate easier integration with numerous clients. Please specify the scope of work involved and your expectations to call it done-done. It seems that an SPIP or a more detailed document than the current proposal may be appropriate. |
If this new approach has the potential to be fully compatible with and replace STS, and can enable Spark to completely remove the STS code in a future version, I will strongly support the introduction of this new feature. |
also cc @cloud-fan and @zhengruifeng |
@itskals thanks for your response!
JDBC has well-defined APIs, so there isn't much room for implementation flexibility.
This is a good question. I can imagine 3 milestones:
If my above two answers do not solve your concerns, I can follow the SPIP guide to process this feature. Thank you again for your quick reply! |
@LuciferYang thanks for your reply! I suppose this feature could make Connect Server a drop-in replacement for STS in two typical use cases: 1) use |
I feel like this would need an SPIP ... |
Okay, will prepare an SPIP soon. Feedback is still welcome here :) |
I like the idea of a native JDBC driver that is based on Spark Connect, that makes a lot of sense! I'm supportive of going through a SPIP here and I think that replacing the Spark Thrift Server is for sure a good idea :) |
@pan3793 nice work! Very much in favor of this! |
@grundprinzip @hvanhovell thanks for your positive feedback, likely I will submit SPIP docs and start discussion next week |
What changes were proposed in this pull request?
This PR proposes introducing a JDBC Driver for Spark Connect Server.
Note: The JDBC standard defines hundreds of APIs, most JDBC drivers only implement a subset of those. This PR is a PoC work that only implements a few sets of JDBC API, but allows integrating with BeeLine to use as a SQL CLI.
This PoC PR only handles
NULL
,BOOLEAN
,BYTE
,SHORT
,INT
,BIGINT
,FLOAT
,DOUBLE
,STRING
inResultSet
.The JDBC URL reuses the current URL used by the Spark Connect client, with an additional prefix
jdbc:
, e.g.,jdbc:sc://localhost:15002
Why are the changes needed?
This enables more pure SQL use cases for Spark Connect Server.
Does this PR introduce any user-facing change?
Yes, a new feature.
How was this patch tested?
1. Add some basic UTs.
2. Manual test with BeeLine
Start a Connect Server first. (I use Spark 4.0.0 as example)
Package with Hive and STS (required by BeeLine)
Run BeeLine in interactive mode.
Run BeeLine to execute a SQL file
Was this patch authored or co-authored using generative AI tooling?
No.