Open
Conversation
…related issues ### _Why are the changes needed?_ Currently, the `KyuubiOperationWithEngineSecuritySuite` is not valid, because 1. `InternalSecurityAccessor` is a singleton, only the first initialized one takes effect, which means if we change the testing orders, some tests may fail. 2. `discoveryClient.startSecretNode` calls `PersistentNode#start` underlying, which is async, we should call `waitForInitialCreate` to ensure it is created before running the test. Base on my analysis, it may take 30s for waiting. (mtime-ctime) ``` [zk: 10.221.106.196:55408(CONNECTED) 2] get /SECRET _ENGINE_SECRET_ cZxid = 0x5 ctime = Wed Jul 19 23:01:57 CST 2023 mZxid = 0x7 mtime = Wed Jul 19 23:02:17 CST 2023 pZxid = 0x5 cversion = 0 dataVersion = 1 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 15 numChildren = 0 ``` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5072 from pan3793/security. Closes #5072 69cce2935 [Cheng Pan] fix 2d623555c [Cheng Pan] fix 74eb2cb18 [Cheng Pan] fix 6d8f4ce4e [Cheng Pan] KyuubiOperationWithEngineSecurity Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ to fix ``` SparkDeltaOperationSuite: org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite *** ABORTED *** java.lang.RuntimeException: Unable to load a Suite class org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite that was discovered in the runpath: Not Support spark version (4,0) at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:80) at org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) at scala.collection.Iterator.foreach(Iterator.scala:943) at scala.collection.Iterator.foreach$(Iterator.scala:943) at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.map(TraversableLike.scala:286) ... Cause: java.lang.IllegalArgumentException: Not Support spark version (4,0) at org.apache.kyuubi.engine.spark.WithSparkSQLEngine.$init$(WithSparkSQLEngine.scala:42) at org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite.<init>(SparkDeltaOperationSuite.scala:25) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at java.lang.Class.newInstance(Class.java:442) at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:66) at org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) ... ``` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5075 from cfmcgrady/spark-4.0. Closes #5075 ad38c0d98 [Fu Chen] refine test to adapt Spark 4.0 Authored-by: Fu Chen <cfmcgrady@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
… allows it release temp files ### _Why are the changes needed?_ fix bug apache/kyuubi#5065 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5066 from ASiegeLion/master. Closes #5065 08d1ac077 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala bf908f5af [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala 9144582f9 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala f1c95e409 [liupeiyue] [KYUUBI-#5065] Call destroy first on killing Spark startup process to allows it release temp files 907123a93 [liupeiyue] [KYUUBI-#5065] Call destroy first on killing Spark startup process to allows it release temp files f30a9fc39 [liupeiyue] [KYUUBI-#5065] Call destroy first on killing Spark startup process to allows it release temp files 449be44d7 [文艺攻城狮] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala 987ffc7fe [文艺攻城狮] Update kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala 995386f98 [文艺攻城狮] Update kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala ad3d11191 [liupeiyue] [KYUUBI-#5065]destroy the spark engine release the submitted temp files Lead-authored-by: liupeiyue <liupeiyue@yy.com> Co-authored-by: 文艺攻城狮 <945076608@qq.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Close #5009 When Kyuubi Server Log is Huge, it's difficult to find `Spark Engine Log Path` in logs. Here pass the path to spark conf, user can find engine log path in spark ui or spark history server. Submit Command Like: ```shell XXXX/bin/spark-submit \ --class org.apache.kyuubi.engine.spark.SparkSQLEngine \ --conf spark.kyuubi.engine.engineLog.path=XXXX/kyuubi-spark-sql-engine.log.0 \ --proxy-user kyuubi XXXX/target/kyuubi-spark-sql-engine_2.12-1.8.0-SNAPSHOT.jar ``` ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5011 from zwangsheng/KYUUBI_5009. Closes #5009 36c772209 [zwangsheng] fix compile 1c20f9264 [zwangsheng] retest 70568c758 [zwangsheng] Fix Unit Test 2bc465740 [zwangsheng] try to fix unit test 2197b3503 [zwangsheng] Narrow the scope of access a44eefc5c [zwangsheng] [KYUUBI #5009]Pass Spark Engine Log Path to Spark COnf Authored-by: zwangsheng <2213335496@qq.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
This is required by Batch V2, as it allows the batch job queued in metastore before being picked by Kyuubi Server for scheduling.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [x] Add screenshots for manual tests if appropriate
```
mysql> CREATE TABLE IF NOT EXISTS metadata(
-> key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id',
-> identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
-> session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
-> real_user varchar(255) NOT NULL COMMENT 'the real user',
-> user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
-> ip_address varchar(128) COMMENT 'the client ip address',
-> kyuubi_instance varchar(1024) NOT NULL COMMENT 'the kyuubi instance that creates this',
-> state varchar(128) NOT NULL COMMENT 'the session state',
-> resource varchar(1024) COMMENT 'the main resource',
-> class_name varchar(1024) COMMENT 'the main class name',
-> request_name varchar(1024) COMMENT 'the request name',
-> request_conf mediumtext COMMENT 'the request config map',
-> request_args mediumtext COMMENT 'the request arguments',
-> create_time BIGINT NOT NULL COMMENT 'the metadata create time',
-> engine_type varchar(32) NOT NULL COMMENT 'the engine type',
-> cluster_manager varchar(128) COMMENT 'the engine cluster manager',
-> engine_open_time bigint COMMENT 'the engine open time',
-> engine_id varchar(128) COMMENT 'the engine application id',
-> engine_name mediumtext COMMENT 'the engine application name',
-> engine_url varchar(1024) COMMENT 'the engine tracking url',
-> engine_state varchar(32) COMMENT 'the engine application state',
-> engine_error mediumtext COMMENT 'the engine application diagnose',
-> end_time bigint COMMENT 'the metadata end time',
-> peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance',
-> UNIQUE INDEX unique_identifier_index(identifier),
-> INDEX user_name_index(user_name),
-> INDEX engine_type_index(engine_type)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.04 sec)
mysql> ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this';
Query OK, 0 rows affected (0.05 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> SHOW CREATE TABLE metadata;
mysql> SHOW CREATE TABLE metadata;
+----------+---------------------------------------------------------------------------+
| Table | Create Table |
+----------+---------------------------------------------------------------------------+
| metadata | CREATE TABLE `metadata` (
`key_id` bigint NOT NULL AUTO_INCREMENT COMMENT 'the auto increment key id',
`identifier` varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
`session_type` varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
`real_user` varchar(255) NOT NULL COMMENT 'the real user',
`user_name` varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
`ip_address` varchar(128) DEFAULT NULL COMMENT 'the client ip address',
`kyuubi_instance` varchar(1024) DEFAULT NULL COMMENT 'the kyuubi instance that creates this',
`state` varchar(128) NOT NULL COMMENT 'the session state',
`resource` varchar(1024) DEFAULT NULL COMMENT 'the main resource',
`class_name` varchar(1024) DEFAULT NULL COMMENT 'the main class name',
`request_name` varchar(1024) DEFAULT NULL COMMENT 'the request name',
`request_conf` mediumtext COMMENT 'the request config map',
`request_args` mediumtext COMMENT 'the request arguments',
`create_time` bigint NOT NULL COMMENT 'the metadata create time',
`engine_type` varchar(32) NOT NULL COMMENT 'the engine type',
`cluster_manager` varchar(128) DEFAULT NULL COMMENT 'the engine cluster manager',
`engine_open_time` bigint DEFAULT NULL COMMENT 'the engine open time',
`engine_id` varchar(128) DEFAULT NULL COMMENT 'the engine application id',
`engine_name` mediumtext COMMENT 'the engine application name',
`engine_url` varchar(1024) DEFAULT NULL COMMENT 'the engine tracking url',
`engine_state` varchar(32) DEFAULT NULL COMMENT 'the engine application state',
`engine_error` mediumtext COMMENT 'the engine application diagnose',
`end_time` bigint DEFAULT NULL COMMENT 'the metadata end time',
`peer_instance_closed` tinyint(1) DEFAULT '0' COMMENT 'closed by peer kyuubi instance',
PRIMARY KEY (`key_id`),
UNIQUE KEY `unique_identifier_index` (`identifier`),
KEY `user_name_index` (`user_name`),
KEY `engine_type_index` (`engine_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
```
The derby SQL also is tested
<img width="1330" alt="image" src="https://github.com/apache/kyuubi/assets/26535726/4eef0742-05dd-4bd6-a77e-e9de0238375e">
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes #5078 from pan3793/nullable.
Closes #5078
0c5dec85d [Cheng Pan] Make kyuubi_instance nullable in metadata table schema
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ This is a pure code refactor extracted from apache/kyuubi#4790 to reduce the diff. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5081 from pan3793/dialect. Closes #5081 537d62303 [Cheng Pan] Minor refactor JDBCMetadataStore Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
…ing bootstrap ### _Why are the changes needed?_ As titled. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5082 from link3280/KYUUBI-5080. Closes #5080 e8026b89b [Paul Lin] [KYUUBI #4806][FLINK] Improve logs fd78f3239 [Paul Lin] [KYUUBI #4806][FLINK] Fix gateway NPE a0a7c4422 [Cheng Pan] Update externals/kyuubi-flink-sql-engine/src/main/java/org/apache/flink/client/deployment/application/executors/EmbeddedExecutorFactory.java 50830d4d4 [Paul Lin] [KYUUBI #5080][FLINK] Fix EmbeddedExecutorFactory not thread-safe during bootstrap Lead-authored-by: Paul Lin <paullin3280@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Updated [kyuubi on kubernetes config section](https://kyuubi.readthedocs.io/en/master/deployment/kyuubi_on_kubernetes.html#config) to state <code> Kyuubi **does** not recommend using this way on Kubernetes</code> ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5086 from mans2singh/ISSUE-5085. Closes #5086 5faf0df2e [mans2singh] [KYUUBI # 5085] Update config section based on review comments df9f62f36 [mans2singh] [KYUUBI # 5085] Update config section of deploy on kubernetes Authored-by: mans2singh <mans2singh@yahoo.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
… lines in Windows ### _Why are the changes needed?_ close #5090 ### _How was this patch tested?_ After this PR it generates normal settings file in windows. - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5091 from wForget/KYUUBI-5090. Closes #5090 9e974c7f8 [wforget] fix dc1ebfc08 [wforget] fix 2cbec60f9 [wforget] [KYUUBI-5090] Fix AllKyuubiConfiguration to generate redundant blank lines in Windows ecc3b4af6 [mans2singh] [KYUUBI #5086] [KYUUBI # 5085] Update config section of deploy on kubernetes Lead-authored-by: wforget <643348094@qq.com> Co-authored-by: mans2singh <mans2singh@yahoo.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
…nt version comparison methods ### _Why are the changes needed?_ - Support initializing or comparing version with major version only, e.g "3" equivalent to "3.0" - Remove redundant version comparison methods by using semantic versions of Spark, Flink and Kyuubi - adding common `toDouble` method ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5039 from bowenliang123/improve-semanticversion. Closes #5039 b6868264f [liangbowen] nit d39646b7d [liangbowen] SPARK_ENGINE_RUNTIME_VERSION 9148caad0 [liangbowen] use semantic versions ecc3b4af6 [mans2singh] [KYUUBI #5086] [KYUUBI # 5085] Update config section of deploy on kubernetes Lead-authored-by: liangbowen <liangbowen@gf.com.cn> Co-authored-by: mans2singh <mans2singh@yahoo.com> Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_ ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5094 from dev-lpq/add_python_doc. Closes #5094 c7d50d75a [pengqli] upgrade Python-JayDeBeApi doc 41f96fc1b [pengqli] upgrade Python-JayDeBeApi doc dd0f91bd6 [pengqli] upgrade Python-JayDeBeApi doc ae1b7bc63 [pengqli] upgrade Python-JayDeBeApi doc 189d7c835 [pengqli] upgrade Python-JayDeBeApi doc 2e1e7b418 [pengqli] upgrade Python-JayDeBeApi doc 362a43296 [pengqli] add Python-JayDeBeApi doc Authored-by: pengqli <pengqli@cisco.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ - Remove the existing single quote in message format which causes the argument 0 is not used - `A single quote itself must be represented by doubled single quotes '' throughout a String.` https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5100 from bowenliang123/datatype-msg. Closes #5100 8135ff146 [liangbowen] fix Authored-by: liangbowen <liangbowen@gf.com.cn> Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_ - remove 2 unused string builders in `KyuubiQueryResultSet` and `KyuubiArrowQueryResultSet`, which are only appended separator only and never queried again ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5101 from bowenliang123/unused-sb. Closes #5101 ccb6fb77d [liangbowen] remove never queried StringBuilders Authored-by: liangbowen <liangbowen@gf.com.cn> Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_ close #5099 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5103 from lsm1/features/kyuubi_5099. Closes #5099 84a1ecad0 [senmiaoliu] fix doc Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
…e timeout ### _Why are the changes needed?_ #5065 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5097 from ASiegeLion/master. Closes #5065 d50a388d6 [Cheng Pan] followup 80861dd71 [liupeiyue] [KYUUBI #5065][FOLLOWUP] Graceful close the process when launch engine timeout Lead-authored-by: liupeiyue <liupeiyue@yy.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ As reported by #4825, a large number of engine builder processes may cause high machine load on the kyuubi server, So I want to add a config to limit engine creation concurrency. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5089 from wForget/engine_builder_limit. Closes #5089 77507005d [wforget] comment 774a8599b [wforget] comments 373640fc0 [wforget] Limit maximum engine creation concurrency of kyuubi server ecc3b4af6 [mans2singh] [KYUUBI #5086] [KYUUBI # 5085] Update config section of deploy on kubernetes Lead-authored-by: wforget <643348094@qq.com> Co-authored-by: mans2singh <mans2singh@yahoo.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ close #5076 ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5102 from lsm1/features/kyuubi_5076. Closes #5076 ce7cfe678 [senmiaoliu] kdf support engine url Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ As titled. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5107 from link3280/engine_fatal_log. Closes #5106 db45392d1 [Paul Lin] [KYUUBI #5106][Flink] Improve logs for fatal errors Authored-by: Paul Lin <paullin3280@gmail.com> Signed-off-by: Paul Lin <paullin3280@gmail.com>
### _Why are the changes needed?_
#### How is it done today?
The current procedure of Batch Job API, called V1
##### CREATE batch job procedure in Batch V1
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM
Client ->> Server : Create Batch Job
Server ->> Server : Create Batch Operator
Server ->> Metastore : Persist Job metadata (PENDING)
Server ->> Server : Put Batch Operator into Execution thread pool
Server ->> Client : Batch Job Info
Server ->> RM : Submit Applicition (in Execution thread pool)
loop Application Check
Server ->> RM : Query Application Status
Server ->> Metastore : Update Batch Status
end
```
##### GET batch job info procedure in Batch V1
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM
Client ->> Server : Query Batch Job Info
alt KyuubiInstance matched
Server ->> Client : Batch Job Info
else
Server ->> Server : Forward Request to expected KyuubiInstance
end
```
<!--
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM
Client ->> Server : Fetch Batch Job logs
alt KyuubiInstance matched
Server ->> Client : Batch Job logs
else
Server ->> Server : Forward Request to expected KyuubiInstance
end
Client ->> Server : Close Batch Job
alt KyuubiInstance matched
Server ->> RM : Close the Application
Server ->> Metastore : Update Batch Status
Server ->> Client : Closed Batch Job Info
else
Server ->> Server : Forward Request to expected KyuubiInstance
end
```
-->
#### What is new in your approach?
This PR proposes a new way for batch job submission, called V2
##### CREATE batch job procedure in Batch V2
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM
Client ->> Server : Create Batch Job
Server ->> Metastore : Persist Job metadata (INITIALIZED)
Server ->> Client : Batch Job Info
loop Forever in dedicated thread pool
Server ->> Metastore : Pick up and lock INITIALIZED job
Server ->> RM : Submit Application
Server ->> RM : Query Application Status
Server ->> Metastore : Update Batch Status
end
```
##### GET batch job info procedure in Batch V2
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM
Client ->> Server : Query Batch Job Info
Server ->> Metastore : Query Batch Job Info
Server ->> Client : Batch Job Info
```
<!--
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM
Client ->> Server : Fetch Batch Job logs
alt KyuubiInstance matched
Server ->> Client : Batch Job logs
else
Server ->> Server : Forward Request to expected KyuubiInstance
end
Client ->> Server : Close Batch Job
alt KyuubiInstance matched
Server ->> RM : Close the Application
Server ->> Metastore : Update Batch Status
Server ->> Client : Closed Batch Job Info
else
Server ->> Server : Forward Request to expected KyuubiInstance
end
```
-->
#### What are the limits of current practice, and why do you think it will be successful?
Pros:
1. The CREATE request becomes light and returns faster. In V1, we have struggled with whether the response should wait for the engine to be submitted to RM, and how to report the un-submitted job status to the client; in V2, the CREATE request just simply inserts a new record into metastore and returns w/ INITIALIZED state.
2. In common practice, Kyuubi server cluster is deployed behind the load balancer, and the load balancer does not know the real load of each Kyuubi server, suppose it uses Random/RoundRobbin/IPHash policies to forward requests, the existing Batch V1 implementation may cause some Kyuubi servers in high load but others' load are low, because it always uses the requested Kyuubi server to do batch submission; in V2, the Kyuubi server is easy to know the load of itself, e.g. measure by CPU/memory usage, or active batch sessions, and then decides to pick up new batch jobs or not. Besides, when all Kyuubi servers overload, the V1 cannot benefit immediately even if the admin scale up the cluster size.
3. In V1, the metrics are almost independent in each Kyuubi server; in V2, it's easy to expose global metrics of batch jobs when using sharable storage as metastore backend, e.g. we can easily get how many batches are queued in metastore, and how many batches are managed by each Kyuubi server, by querying the metastore backend directly or metrics exposed by each Kyuubi server.
Cons:
1. V1 assumes Kyuubi server tolerant long time outage of metastore, V2 forcibly depends on the availability of metastore. But we can move the existing forwarding logic and async retry logic to the implementation of `Metastore` to overcome this regression.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [ ] Add screenshots for manual tests if appropriate
- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request
Closes #4790 from pan3793/batch-v2.
Closes #4790
860698ad6 [Cheng Pan] BATCH_IMPL_VERSION
b9c68aa2f [Cheng Pan] kyuubi.batch.impl.version
17e4f199a [Cheng Pan] submitter.threads=100
7c0bdb0c1 [Cheng Pan] Initial implement Batch v2
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ I'd like to update LDAP doc to guide users for setup LDAP authentication in Kyuubi. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate <img width="1395" alt="image" src="https://github.com/apache/kyuubi/assets/26535726/6925a8e3-dfaf-48ad-a442-bb635fe75830"> - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5083 from zhaohehuhu/Improvement-0721. Closes #5083 8c0e149dd [Cheng Pan] polish 22f8d3aa6 [Cheng Pan] nit 822fa66b3 [hezhao2] sync 78ae12345 [hezhao2] further explanation for LDAP filters 7ebc61acf [Cheng Pan] Update docs/security/ldap.md bb06810f7 [Cheng Pan] Update docs/security/ldap.md 8d19fdf31 [Cheng Pan] Update docs/security/ldap.md c2fa2806e [Cheng Pan] Update docs/security/ldap.md 2acbb87db [hezhao2] update LDAP doc 22027e1f2 [hezhao2] update LDAP doc Lead-authored-by: hezhao2 <hezhao2@cisco.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Co-authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
…ootstrap ### _Why are the changes needed?_ As titled. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5109 from link3280/bootstrap_file_not_found. Closes #5108 318199fa2 [Paul Lin] [KYUUBI #5108][Flink] Fix iFileNotFoundException during Flink engine bootstrap Authored-by: Paul Lin <paullin3280@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Fix #3920 https://github.com/apache/kyuubi/actions/runs/5711863703/job/15474230690?pr=4790 ``` DockerizedZkServiceDiscoverySuite: - distribute lock *** FAILED *** Expected exception org.apache.kyuubi.KyuubiSQLException to be thrown, but no exception was thrown (DiscoveryClientTests.scala:147) ``` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5112 from pan3793/test-lock. Closes #3920 d980f87dc [Cheng Pan] Fix flaky test - distribute lock Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ - remove duplicated assignment for the same variable in adjacent lines in `FastHiveDecimalImpl` - replace redundant `putAll` with collection initialization in `BatchRestApi` - use `try-with-resources` statement with the reader and avoid declaring two variables in the same line of code in `KyuubiCommands` - fix `warning: Tag 'return:' is not recognised` compilation warning in `KyuubiGetSqlClassification:L53` ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5117 from bowenliang123/fastsignum. Closes #5117 595b5747d [liangbowen] simplify be530fac4 [liangbowen] fix warning: Tag '@return:' is not recognised compilation warning in KyuubiGetSqlClassification:L53 249706905 [liangbowen] use try-with-resources in KyuubiCommands a54a97fdd [liangbowen] remove redundant addAll call to collection initialization cc76d5d0f [liangbowen] remove repeated assignment Authored-by: liangbowen <liangbowen@gf.com.cn> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ It was planned but actually delayed, remove this dummy module to save CI and avoid confusing users and release managers. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5113 from pan3793/remove-kudu. Closes #5113 ff8fd2e6a [Cheng Pan] Remove Spark Kudu connector Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ close #4940 ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5110 from lsm1/features/kyuubi_4940. Closes #4940 6c0a9a37f [senmiaoliu] add kdf for hive engine Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ https://hadoop.apache.org/release/3.3.6.html ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5116 from pan3793/hadoop-3.3.6. Closes #5116 c3717e7fb [Cheng Pan] Bump Hadoop 3.3.6 Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Use StatefulSet instead of Deployment, add a headless service for statefulset ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request   Closes #5062 from camper42/statefulset. Closes #4788 a1a7f1b0e [camper42] style: remove redudant Global variable `$` 5286f4ff4 [camper42] fix: set statefulset podManagementPolicy ed83ae2e8 [camper42] style: move headless service to separate file 97b76ea24 [camper42] use `clusterIP: None` for headless serivce d2078ffe5 [Cheng Pan] Update charts/kyuubi/templates/kyuubi-statefulset.yaml 35c7e0f90 [Cheng Pan] Update charts/kyuubi/templates/kyuubi-statefulset.yaml 8d970d21d [camper42] style: indent 3cf22748f [camper42] [KYUUBI #4788][K8S][HELM] Use StatefulSet instead of Deployment Lead-authored-by: camper42 <camper.xlii@gmail.com> Co-authored-by: Cheng Pan <pan3793@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ close #5122 ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5125 from lsm1/features/kyuubi_5122. Closes #5122 02d0769cc [senmiaoliu] add hive kdf docs Authored-by: senmiaoliu <senmiaoliu@trip.com> Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_
In Batch implementation v2, the following query is frequently executed to pick the job.
```
SELECT identifier FROM metadata WHERE state='INITIALIZED' ORDER BY create_time DESC LIMIT 1
```
Create an index for `create_time` could speed up the query and reduce the pressure on MySQL server.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible
- [x] Add screenshots for manual tests if appropriate
Test the MySQL upgrading SQLs
```
mysql> CREATE TABLE IF NOT EXISTS metadata(
-> key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id',
-> identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
-> session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
-> real_user varchar(255) NOT NULL COMMENT 'the real user',
-> user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
-> ip_address varchar(128) COMMENT 'the client ip address',
-> kyuubi_instance varchar(1024) NOT NULL COMMENT 'the kyuubi instance that creates this',
-> state varchar(128) NOT NULL COMMENT 'the session state',
-> resource varchar(1024) COMMENT 'the main resource',
-> class_name varchar(1024) COMMENT 'the main class name',
-> request_name varchar(1024) COMMENT 'the request name',
-> request_conf mediumtext COMMENT 'the request config map',
-> request_args mediumtext COMMENT 'the request arguments',
-> create_time BIGINT NOT NULL COMMENT 'the metadata create time',
-> engine_type varchar(32) NOT NULL COMMENT 'the engine type',
-> cluster_manager varchar(128) COMMENT 'the engine cluster manager',
-> engine_open_time bigint COMMENT 'the engine open time',
-> engine_id varchar(128) COMMENT 'the engine application id',
-> engine_name mediumtext COMMENT 'the engine application name',
-> engine_url varchar(1024) COMMENT 'the engine tracking url',
-> engine_state varchar(32) COMMENT 'the engine application state',
-> engine_error mediumtext COMMENT 'the engine application diagnose',
-> end_time bigint COMMENT 'the metadata end time',
-> peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance',
-> UNIQUE INDEX unique_identifier_index(identifier),
-> INDEX user_name_index(user_name),
-> INDEX engine_type_index(engine_type)
-> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.03 sec)
mysql> ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this';
Query OK, 0 rows affected (0.06 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> ALTER TABLE metadata ADD INDEX create_time_index(create_time);
Query OK, 0 rows affected (0.03 sec)
Records: 0 Duplicates: 0 Warnings: 0
mysql> show create table metadata;
+----------+--------------------------------------------------------------------------------+
| Table | Create Table |
+----------+--------------------------------------------------------------------------------+
| metadata | CREATE TABLE `metadata` (
`key_id` bigint NOT NULL AUTO_INCREMENT COMMENT 'the auto increment key id',
`identifier` varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
`session_type` varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
`real_user` varchar(255) NOT NULL COMMENT 'the real user',
`user_name` varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
`ip_address` varchar(128) DEFAULT NULL COMMENT 'the client ip address',
`kyuubi_instance` varchar(1024) DEFAULT NULL COMMENT 'the kyuubi instance that creates this',
`state` varchar(128) NOT NULL COMMENT 'the session state',
`resource` varchar(1024) DEFAULT NULL COMMENT 'the main resource',
`class_name` varchar(1024) DEFAULT NULL COMMENT 'the main class name',
`request_name` varchar(1024) DEFAULT NULL COMMENT 'the request name',
`request_conf` mediumtext COMMENT 'the request config map',
`request_args` mediumtext COMMENT 'the request arguments',
`create_time` bigint NOT NULL COMMENT 'the metadata create time',
`engine_type` varchar(32) NOT NULL COMMENT 'the engine type',
`cluster_manager` varchar(128) DEFAULT NULL COMMENT 'the engine cluster manager',
`engine_open_time` bigint DEFAULT NULL COMMENT 'the engine open time',
`engine_id` varchar(128) DEFAULT NULL COMMENT 'the engine application id',
`engine_name` mediumtext COMMENT 'the engine application name',
`engine_url` varchar(1024) DEFAULT NULL COMMENT 'the engine tracking url',
`engine_state` varchar(32) DEFAULT NULL COMMENT 'the engine application state',
`engine_error` mediumtext COMMENT 'the engine application diagnose',
`end_time` bigint DEFAULT NULL COMMENT 'the metadata end time',
`peer_instance_closed` tinyint(1) DEFAULT '0' COMMENT 'closed by peer kyuubi instance',
PRIMARY KEY (`key_id`),
UNIQUE KEY `unique_identifier_index` (`identifier`),
KEY `user_name_index` (`user_name`),
KEY `engine_type_index` (`engine_type`),
KEY `create_time_index` (`create_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+--------------------------------------------------------------------------------+
```
- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request
Closes #5131 from pan3793/metastore-create-time-index.
Closes #5131
fc18041f2 [Cheng Pan] ALTER TABLE ADD INDEX
c2261edb2 [Cheng Pan] update upgrade script
4f94be5ca [Cheng Pan] Create index on metastore.create_time
Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Otherwise we can not see JDK logs like Krb5. ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request Closes #5129 from pan3793/beeline-log. Closes #5129 100094823 [Cheng Pan] KyuubiBeeline should redirect JDK logging Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ After performing binary distribution artifacts packaging during 1.8.0-rc0 ```patch diff --git a/kyuubi-server/web-ui/pnpm-lock.yaml b/kyuubi-server/web-ui/pnpm-lock.yaml index 8375429..f25c02de7 100644 --- a/kyuubi-server/web-ui/pnpm-lock.yaml +++ b/kyuubi-server/web-ui/pnpm-lock.yaml -1,4 +1,4 -lockfileVersion: '6.0' +lockfileVersion: '6.1' settings: autoInstallPeers: true ``` The inconsistency may be caused by different version install in the local environment and defined in `pom.xml`, I'm not sure if there is a version management system for pnpm ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [x] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes #5569 from pan3793/pnpm-lock. Closes #5569 8a09870fd [Cheng Pan] Fix pnpm-lock file version Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
…abled` to add HTTP auth header ### _Why are the changes needed?_ `kyuubi.engine.security.enabled` aims to control whether enabled security mechanism internal communication, but the current implementation is not symmetrical, the auth generator ignores the conf and always produces the auth header, but the auth header handler is only activated when conf is enabled, that causes authentication failure when `kyuubi.engine.security.enabled=false`(default value) ### _How was this patch tested?_ - [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No. Closes #5566 from pan3793/none-auth. Closes #5566 d42a4c3f4 [Cheng Pan] Revert "Extract AnonymousAuthenticationHandler from BasicAuthenticationHandler" b544343bc [Cheng Pan] Extract AnonymousAuthenticationHandler from BasicAuthenticationHandler 75c4b7dc3 [Cheng Pan] InternalRestClient respects `kyuubi.engine.security.enabled` to add HTTP auth header Authored-by: Cheng Pan <chengpan@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
1. This PR fixes the precision loss issue in `xx_gmt_offset`. Please note that since `xx_gmt_offset` is of integer type, there is no actual loss of precision.
```
trino:tiny> select cc_gmt_offset from call_center ;
cc_gmt_offset
---------------
-5.00
-5.00
```
Before this PR:
```scala
scala> spark.sql("select cc_gmt_offset from tpcds.tiny.call_center").show
+-------------+
|cc_gmt_offset|
+-------------+
| -5|
| -5|
+-------------+
```
After this PR:
```scala
scala> spark.sql("select cc_gmt_offset from tpcds.tiny.call_center").show
+-------------+
|cc_gmt_offset|
+-------------+
| -5.00|
| -5.00|
+-------------+
```
2. This PR accelerates the generation of the TPC-DS dataset by optimizing the way Rows are generated.
Before this PR, The previous process involved converting **Trino TableRow** into **String Row** and then further into **Spark InternalRow**.
After this PR, we have streamlined the process by directly converting **Trino TableRow** into **Spark InternalRow**, eliminating unnecessary toString operations. This change significantly improves the speed of TPC-DS dataset generation.
```scala
spark.table("tpcds.sf1000.catalog_sales").foreach(r => ())
```
Task Duration before this PR:

Task Duration after this PR:

### _How was this patch tested?_
- New UT `tpcds.tiny count and checksum`
- Compare checksum values before and after this PR on the 1TB dataset
| table_name | count | checksum |
|------------------------|-----------------|---------------------------|
| call_center | 42 | 95607401475 |
| catalog_page | 30000 | 64470199469085 |
| catalog_returns | 143996756 | 309202327050775220 |
| catalog_sales | 1439980416 | 3092267266923848000 |
| customer | 12000000 | 25769069905636795 |
| customer_address | 6000000 | 12889423380880973 |
| customer_demographics | 1920800 | 4124183189708148 |
| date_dim | 73049 | 156926081012862 |
| household_demographics | 7200 | 15494873325812 |
| income_band | 20 | 41180951007 |
| inventory | 783000000 | 1681487454682584456 |
| item | 300000 | 643000708260945 |
| promotion | 1500 | 3270935493709 |
| reason | 65 | 118806664977 |
| ship_mode | 20 | 52349078860 |
| store | 1002 | 2096408105720 |
| store_returns | 287999764 | 618451374856897114 |
| store_sales | 2879987999 | 6184670571185100839 |
| time_dim | 86400 | 186045071019485 |
| warehouse | 20 | 31374161844 |
| web_page | 3000 | 6502456139647 |
| web_returns | 71997522 | 154614570845312413 |
| web_sales | 720000376 | 1546188452223821591 |
| web_site | 54 | 107485781738 |
### _Was this patch authored or co-authored using generative AI tooling?_
No
Closes #5562 from cfmcgrady/tpcds-perf.
Closes #5550
a789b9e70 [Fu Chen] maxPartitionBytes=384m
659e20912 [Fu Chen] style
916f6d276 [Fu Chen] unnecessary change
75981af8b [Fu Chen] tpcds perf
Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ Usually, we can use `spark.sql.shuffle.partitions` to configure the number of shuffle partitions (or `spark.sql.adaptive.coalescePartitions.initialPartitionNum` for AQE). However, it seems difficult to find a universal value for all SQL jobs. Although Spark AQE can dynamically merge and split partitions based on partition size, inappropriate shuffle partitions may still cause some problems: + When there are too few shuffle partitions, the join skew optimization threshold is large and the skew partitions will not be split. + When using RemoteShuffleService, an inappropriate number of shuffle partitions may result in too large partitions or too many partitions, which will lead to high pressure on the shuffle server. So I want to provide an optimization rule to dynamically adjust the number of partitions based on the size of the input data. Calculate the number of partitions based on input data size: ``` targetShufflePartitions = sum(scanSize|shuffleReadSize) / advisoryPartitionSizeInBytes ``` then replace the number of partitions for all `ShuffleExchangeExec` nodes. ### _How was this patch tested?_ - [X] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes #5489 from wForget/dynamic_shuffle_partitions. Closes #5489 5a2bb6c25 [wforget] only takes effect when aqe is enabled 038b7bb45 [wforget] moved behind InsertShuffleNodeBeforeJoin 7ca87d8e8 [wforget] comment d65047fda [wforget] sum scanSizes e4d8f33af [wforget] comments 4f0f25d8e [wforget] configurable f77d1d648 [wforget] code style 0bf572f27 [wforget] use partition stats 8d251c3fd [wforget] Adjust shuffle partitions dynamically Authored-by: wforget <643348094@qq.com> Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
…e been verified ### _Why are the changes needed?_ To close #5503 For sql such as lateral join in test `[KYUUBI #5503][AUTHZ] Check plan auth checked should not set tag to all child nodes`, it will first verify subquery in `lateral` then verify whole plan, if there is a view, when verify the whole plan, the `PermanentViewMarker` will be remove by spark's optimizer. Then it will verify both source table `table1` and `table2`. So I think we need to do 3 things: 1. Mark all PermanentViewMarker's children's all nodes as checked and Subquery's all child marks as checked. 2. `isAuthChecked` should only check the first level of the plan to avoid skipping the check of the whole plan in the demo test 3. in `buildQuery`, if the current node has the tag, we just skip it. Without this pr, the SQL in test will both check `table1` and `table2` ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes #5563 from AngersZhuuuu/KYUUBI-5503-FOLLOWUP. Closes #5503 c1a427f58 [Angerszhuuuu] Update Authorization.scala d6b2899db [Angerszhuuuu] update 633bc91e0 [Angerszhuuuu] Update Authorization.scala 7a006b136 [Angerszhuuuu] [KYUUBI #5503][FOLLOWUP][AUTHZ] Authz should skip inner plan that have been verified Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ To close #5575 Fix wrong code in test case of dir command ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes #5577 from AngersZhuuuu/KYUUBI-5576. Closes #5576 60e2cb817 [Angerszhuuuu] [KYUUBI #5576][Bug] Fix wrong code in test case of dir command Authored-by: Angerszhuuuu <angers.zhu@gmail.com> Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_ As title and make Web UI more clean. And as Contact Us page and Overview page will do refactor later, so remain these. ### _How was this patch tested?_ - [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible - [ ] Add screenshots for manual tests if appropriate - [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request  ### _Was this patch authored or co-authored using generative AI tooling?_ No Closes #5574 from zwangsheng/KYUUBI#5573. Closes #5573 462f9f662 [zwangsheng] fix comments d32101055 [zwangsheng] [KYUUBI #5573][Improvement] Delete parts of the Kyuubi Web UI that are not useful Authored-by: zwangsheng <binjieyang@apache.org> Signed-off-by: Cheng Pan <chengpan@apache.org>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are the changes needed?
How was this patch tested?
Add some test cases that check the changes thoroughly including negative and positive cases if possible
Add screenshots for manual tests if appropriate
Run test locally before make a pull request