Skip to content

Upstream updates#6

Open
nousot-cloud-guy wants to merge 761 commits intomasterfrom
upstream-updates
Open

Upstream updates#6
nousot-cloud-guy wants to merge 761 commits intomasterfrom
upstream-updates

Conversation

@nousot-cloud-guy
Copy link
Mannequin

@nousot-cloud-guy nousot-cloud-guy mannequin commented Oct 31, 2023

Why are the changes needed?

How was this patch tested?

  • Add some test cases that check the changes thoroughly including negative and positive cases if possible

  • Add screenshots for manual tests if appropriate

  • Run test locally before make a pull request

pan3793 and others added 30 commits July 20, 2023 17:02
…related issues

### _Why are the changes needed?_

Currently, the `KyuubiOperationWithEngineSecuritySuite` is not valid, because

1. `InternalSecurityAccessor` is a singleton, only the first initialized one takes effect, which means if we change the testing orders, some tests may fail.
2. `discoveryClient.startSecretNode` calls `PersistentNode#start` underlying, which is async, we should call `waitForInitialCreate` to ensure it is created before running the test. Base on my analysis, it may take 30s for waiting. (mtime-ctime)
   ```
   [zk: 10.221.106.196:55408(CONNECTED) 2] get /SECRET
   _ENGINE_SECRET_
   cZxid = 0x5
   ctime = Wed Jul 19 23:01:57 CST 2023
   mZxid = 0x7
   mtime = Wed Jul 19 23:02:17 CST 2023
   pZxid = 0x5
   cversion = 0
   dataVersion = 1
   aclVersion = 0
   ephemeralOwner = 0x0
   dataLength = 15
   numChildren = 0
   ```
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5072 from pan3793/security.

Closes #5072

69cce2935 [Cheng Pan] fix
2d623555c [Cheng Pan] fix
74eb2cb18 [Cheng Pan] fix
6d8f4ce4e [Cheng Pan] KyuubiOperationWithEngineSecurity

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

to fix

```
SparkDeltaOperationSuite:
org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite *** ABORTED ***
  java.lang.RuntimeException: Unable to load a Suite class org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite that was discovered in the runpath: Not Support spark version (4,0)
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:80)
  at org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  at scala.collection.Iterator.foreach(Iterator.scala:943)
  at scala.collection.Iterator.foreach$(Iterator.scala:943)
  at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
  at scala.collection.IterableLike.foreach(IterableLike.scala:74)
  at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
  at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
  at scala.collection.TraversableLike.map(TraversableLike.scala:286)
  ...
  Cause: java.lang.IllegalArgumentException: Not Support spark version (4,0)
  at org.apache.kyuubi.engine.spark.WithSparkSQLEngine.$init$(WithSparkSQLEngine.scala:42)
  at org.apache.kyuubi.engine.spark.operation.SparkDeltaOperationSuite.<init>(SparkDeltaOperationSuite.scala:25)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
  at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
  at java.lang.Class.newInstance(Class.java:442)
  at org.scalatest.tools.DiscoverySuite$.getSuiteInstance(DiscoverySuite.scala:66)
  at org.scalatest.tools.DiscoverySuite.$anonfun$nestedSuites$1(DiscoverySuite.scala:38)
  at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
  ...
```

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5075 from cfmcgrady/spark-4.0.

Closes #5075

ad38c0d98 [Fu Chen] refine test to adapt Spark 4.0

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
… allows it release temp files

### _Why are the changes needed?_

fix bug apache/kyuubi#5065

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5066 from ASiegeLion/master.

Closes #5065

08d1ac077 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
bf908f5af [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
9144582f9 [Cheng Pan] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
f1c95e409 [liupeiyue] [KYUUBI-#5065] Call destroy first on killing Spark startup process to allows it release temp files
907123a93 [liupeiyue] [KYUUBI-#5065] Call destroy first on killing Spark startup process to allows it release temp files
f30a9fc39 [liupeiyue] [KYUUBI-#5065] Call destroy first on killing Spark startup process to allows it release temp files
449be44d7 [文艺攻城狮] Update kyuubi-server/src/main/scala/org/apache/kyuubi/engine/ProcBuilder.scala
987ffc7fe [文艺攻城狮] Update kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
995386f98 [文艺攻城狮] Update kyuubi-common/src/main/scala/org/apache/kyuubi/config/KyuubiConf.scala
ad3d11191 [liupeiyue] [KYUUBI-#5065]destroy the spark engine release the submitted temp files

Lead-authored-by: liupeiyue <liupeiyue@yy.com>
Co-authored-by: 文艺攻城狮 <945076608@qq.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Close #5009

When Kyuubi Server Log is Huge, it's difficult to find `Spark Engine Log Path` in logs.

Here pass the path to spark conf, user can find engine log path in spark ui or spark history server.

Submit Command Like:
```shell
XXXX/bin/spark-submit \
  --class org.apache.kyuubi.engine.spark.SparkSQLEngine \
  --conf spark.kyuubi.engine.engineLog.path=XXXX/kyuubi-spark-sql-engine.log.0 \
  --proxy-user kyuubi XXXX/target/kyuubi-spark-sql-engine_2.12-1.8.0-SNAPSHOT.jar
```

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5011 from zwangsheng/KYUUBI_5009.

Closes #5009

36c772209 [zwangsheng] fix compile
1c20f9264 [zwangsheng] retest
70568c758 [zwangsheng] Fix Unit Test
2bc465740 [zwangsheng] try to fix unit test
2197b3503 [zwangsheng] Narrow the scope of access
a44eefc5c [zwangsheng] [KYUUBI #5009]Pass Spark Engine Log Path to Spark COnf

Authored-by: zwangsheng <2213335496@qq.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

This is required by Batch V2, as it allows the batch job queued in metastore before being picked by Kyuubi Server for scheduling.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

```
mysql> CREATE TABLE IF NOT EXISTS metadata(
    ->     key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id',
    ->     identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
    ->     session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
    ->     real_user varchar(255) NOT NULL COMMENT 'the real user',
    ->     user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
    ->     ip_address varchar(128) COMMENT 'the client ip address',
    ->     kyuubi_instance varchar(1024) NOT NULL COMMENT 'the kyuubi instance that creates this',
    ->     state varchar(128) NOT NULL COMMENT 'the session state',
    ->     resource varchar(1024) COMMENT 'the main resource',
    ->     class_name varchar(1024) COMMENT 'the main class name',
    ->     request_name varchar(1024) COMMENT 'the request name',
    ->     request_conf mediumtext COMMENT 'the request config map',
    ->     request_args mediumtext COMMENT 'the request arguments',
    ->     create_time BIGINT NOT NULL COMMENT 'the metadata create time',
    ->     engine_type varchar(32) NOT NULL COMMENT 'the engine type',
    ->     cluster_manager varchar(128) COMMENT 'the engine cluster manager',
    ->     engine_open_time bigint COMMENT 'the engine open time',
    ->     engine_id varchar(128) COMMENT 'the engine application id',
    ->     engine_name mediumtext COMMENT 'the engine application name',
    ->     engine_url varchar(1024) COMMENT 'the engine tracking url',
    ->     engine_state varchar(32) COMMENT 'the engine application state',
    ->     engine_error mediumtext COMMENT 'the engine application diagnose',
    ->     end_time bigint COMMENT 'the metadata end time',
    ->     peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance',
    ->     UNIQUE INDEX unique_identifier_index(identifier),
    ->     INDEX user_name_index(user_name),
    ->     INDEX engine_type_index(engine_type)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.04 sec)

mysql> ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this';
Query OK, 0 rows affected (0.05 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> SHOW CREATE TABLE metadata;
mysql> SHOW CREATE TABLE metadata;
+----------+---------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+----------+---------------------------------------------------------------------------+
| metadata | CREATE TABLE `metadata` (
  `key_id` bigint NOT NULL AUTO_INCREMENT COMMENT 'the auto increment key id',
  `identifier` varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
  `session_type` varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
  `real_user` varchar(255) NOT NULL COMMENT 'the real user',
  `user_name` varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
  `ip_address` varchar(128) DEFAULT NULL COMMENT 'the client ip address',
  `kyuubi_instance` varchar(1024) DEFAULT NULL COMMENT 'the kyuubi instance that creates this',
  `state` varchar(128) NOT NULL COMMENT 'the session state',
  `resource` varchar(1024) DEFAULT NULL COMMENT 'the main resource',
  `class_name` varchar(1024) DEFAULT NULL COMMENT 'the main class name',
  `request_name` varchar(1024) DEFAULT NULL COMMENT 'the request name',
  `request_conf` mediumtext COMMENT 'the request config map',
  `request_args` mediumtext COMMENT 'the request arguments',
  `create_time` bigint NOT NULL COMMENT 'the metadata create time',
  `engine_type` varchar(32) NOT NULL COMMENT 'the engine type',
  `cluster_manager` varchar(128) DEFAULT NULL COMMENT 'the engine cluster manager',
  `engine_open_time` bigint DEFAULT NULL COMMENT 'the engine open time',
  `engine_id` varchar(128) DEFAULT NULL COMMENT 'the engine application id',
  `engine_name` mediumtext COMMENT 'the engine application name',
  `engine_url` varchar(1024) DEFAULT NULL COMMENT 'the engine tracking url',
  `engine_state` varchar(32) DEFAULT NULL COMMENT 'the engine application state',
  `engine_error` mediumtext COMMENT 'the engine application diagnose',
  `end_time` bigint DEFAULT NULL COMMENT 'the metadata end time',
  `peer_instance_closed` tinyint(1) DEFAULT '0' COMMENT 'closed by peer kyuubi instance',
  PRIMARY KEY (`key_id`),
  UNIQUE KEY `unique_identifier_index` (`identifier`),
  KEY `user_name_index` (`user_name`),
  KEY `engine_type_index` (`engine_type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+---------------------------------------------------------------------------+
1 row in set (0.00 sec)

mysql>
```

The derby SQL also is tested

<img width="1330" alt="image" src="https://github.com/apache/kyuubi/assets/26535726/4eef0742-05dd-4bd6-a77e-e9de0238375e">

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5078 from pan3793/nullable.

Closes #5078

0c5dec85d [Cheng Pan] Make kyuubi_instance nullable in metadata table schema

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

This is a pure code refactor extracted from apache/kyuubi#4790 to reduce the diff.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5081 from pan3793/dialect.

Closes #5081

537d62303 [Cheng Pan] Minor refactor JDBCMetadataStore

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…ing bootstrap

### _Why are the changes needed?_
As titled.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5082 from link3280/KYUUBI-5080.

Closes #5080

e8026b89b [Paul Lin] [KYUUBI #4806][FLINK] Improve logs
fd78f3239 [Paul Lin] [KYUUBI #4806][FLINK] Fix gateway NPE
a0a7c4422 [Cheng Pan] Update externals/kyuubi-flink-sql-engine/src/main/java/org/apache/flink/client/deployment/application/executors/EmbeddedExecutorFactory.java
50830d4d4 [Paul Lin] [KYUUBI #5080][FLINK] Fix EmbeddedExecutorFactory not thread-safe during bootstrap

Lead-authored-by: Paul Lin <paullin3280@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
Updated [kyuubi on kubernetes config section](https://kyuubi.readthedocs.io/en/master/deployment/kyuubi_on_kubernetes.html#config) to state <code> Kyuubi **does** not recommend using this way on Kubernetes</code>

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5086 from mans2singh/ISSUE-5085.

Closes #5086

5faf0df2e [mans2singh] [KYUUBI # 5085] Update config section based on review comments
df9f62f36 [mans2singh] [KYUUBI # 5085] Update config section of deploy on kubernetes

Authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
… lines in Windows

### _Why are the changes needed?_

close #5090

### _How was this patch tested?_

After this PR it generates normal settings file in windows.

- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5091 from wForget/KYUUBI-5090.

Closes #5090

9e974c7f8 [wforget] fix
dc1ebfc08 [wforget] fix
2cbec60f9 [wforget] [KYUUBI-5090] Fix AllKyuubiConfiguration to generate redundant blank lines in Windows
ecc3b4af6 [mans2singh] [KYUUBI #5086] [KYUUBI # 5085] Update config section of deploy on kubernetes

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…nt version comparison methods

### _Why are the changes needed?_

- Support initializing or comparing version with major version only, e.g "3" equivalent to  "3.0"
- Remove redundant version comparison methods by using semantic versions of Spark, Flink and Kyuubi
- adding common `toDouble` method

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5039 from bowenliang123/improve-semanticversion.

Closes #5039

b6868264f [liangbowen] nit
d39646b7d [liangbowen] SPARK_ENGINE_RUNTIME_VERSION
9148caad0 [liangbowen] use semantic versions
ecc3b4af6 [mans2singh] [KYUUBI #5086] [KYUUBI # 5085] Update config section of deploy on kubernetes

Lead-authored-by: liangbowen <liangbowen@gf.com.cn>
Co-authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5094 from dev-lpq/add_python_doc.

Closes #5094

c7d50d75a [pengqli] upgrade Python-JayDeBeApi doc
41f96fc1b [pengqli] upgrade Python-JayDeBeApi doc
dd0f91bd6 [pengqli] upgrade Python-JayDeBeApi doc
ae1b7bc63 [pengqli] upgrade Python-JayDeBeApi doc
189d7c835 [pengqli] upgrade Python-JayDeBeApi doc
2e1e7b418 [pengqli] upgrade Python-JayDeBeApi doc
362a43296 [pengqli] add Python-JayDeBeApi doc

Authored-by: pengqli <pengqli@cisco.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

- Remove the existing single quote in message format which causes the argument 0 is not used
- `A single quote itself must be represented by doubled single quotes '' throughout a String.` https://docs.oracle.com/javase/8/docs/api/java/text/MessageFormat.html

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5100 from bowenliang123/datatype-msg.

Closes #5100

8135ff146 [liangbowen] fix

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_

- remove 2 unused string builders in `KyuubiQueryResultSet` and `KyuubiArrowQueryResultSet`, which are only appended separator only and never queried again

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5101 from bowenliang123/unused-sb.

Closes #5101

ccb6fb77d [liangbowen] remove never queried StringBuilders

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_

close #5099

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5103 from lsm1/features/kyuubi_5099.

Closes #5099

84a1ecad0 [senmiaoliu] fix doc

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…e timeout

### _Why are the changes needed?_
#5065

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5097 from ASiegeLion/master.

Closes #5065

d50a388d6 [Cheng Pan] followup
80861dd71 [liupeiyue] [KYUUBI #5065][FOLLOWUP] Graceful close the process when launch engine timeout

Lead-authored-by: liupeiyue <liupeiyue@yy.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

As reported by #4825, a large number of engine builder processes may cause high machine load on the kyuubi server, So I want to add a config to limit engine creation concurrency.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5089 from wForget/engine_builder_limit.

Closes #5089

77507005d [wforget] comment
774a8599b [wforget] comments
373640fc0 [wforget] Limit maximum engine creation concurrency of kyuubi server
ecc3b4af6 [mans2singh] [KYUUBI #5086] [KYUUBI # 5085] Update config section of deploy on kubernetes

Lead-authored-by: wforget <643348094@qq.com>
Co-authored-by: mans2singh <mans2singh@yahoo.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

close #5076

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5102 from lsm1/features/kyuubi_5076.

Closes #5076

ce7cfe678 [senmiaoliu] kdf support engine url

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
As titled.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5107 from link3280/engine_fatal_log.

Closes #5106

db45392d1 [Paul Lin] [KYUUBI #5106][Flink] Improve logs for fatal errors

Authored-by: Paul Lin <paullin3280@gmail.com>
Signed-off-by: Paul Lin <paullin3280@gmail.com>
### _Why are the changes needed?_

#### How is it done today?

The current procedure of Batch Job API, called V1

##### CREATE batch job procedure in Batch V1

```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Create Batch Job
Server ->> Server : Create Batch Operator
Server ->> Metastore : Persist Job metadata (PENDING)
Server ->> Server : Put Batch Operator into Execution thread pool
Server ->> Client : Batch Job Info
Server ->> RM : Submit Applicition (in Execution thread pool)
loop Application Check
    Server ->> RM : Query Application Status
    Server ->> Metastore : Update Batch Status
end
```

##### GET batch job info procedure in Batch V1

```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Query Batch Job Info
alt KyuubiInstance matched
    Server ->> Client : Batch Job Info
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end
```

<!--
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Fetch Batch Job logs
alt KyuubiInstance matched
    Server ->> Client : Batch Job logs
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end

Client ->> Server : Close Batch Job
alt KyuubiInstance matched
    Server ->> RM : Close the Application
    Server ->> Metastore : Update Batch Status
    Server ->> Client : Closed Batch Job Info
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end
```
-->

#### What is new in your approach?

This PR proposes a new way for batch job submission, called V2

##### CREATE batch job procedure in Batch V2

```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Create Batch Job
Server ->> Metastore : Persist Job metadata (INITIALIZED)
Server ->> Client : Batch Job Info

loop Forever in dedicated thread pool
    Server ->> Metastore : Pick up and lock INITIALIZED job
    Server ->> RM : Submit Application
    Server ->> RM : Query Application Status
    Server ->> Metastore : Update Batch Status
end
```

##### GET batch job info procedure in Batch V2

```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Query Batch Job Info
Server ->> Metastore : Query Batch Job Info
Server ->> Client : Batch Job Info
```

<!--
```mermaid
sequenceDiagram
participant Client
participant Server
participant Metastore
participant RM

Client ->> Server : Fetch Batch Job logs
alt KyuubiInstance matched
    Server ->> Client : Batch Job logs
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end

Client ->> Server : Close Batch Job
alt KyuubiInstance matched
    Server ->> RM : Close the Application
    Server ->> Metastore : Update Batch Status
    Server ->> Client : Closed Batch Job Info
else
    Server ->> Server : Forward Request to expected KyuubiInstance
end
```
-->

#### What are the limits of current practice, and why do you think it will be successful?

Pros:

1. The CREATE request becomes light and returns faster. In V1, we have struggled with whether the response should wait for the engine to be submitted to RM, and how to report the un-submitted job status to the client; in V2, the CREATE request just simply inserts a new record into metastore and returns w/ INITIALIZED state.
2. In common practice, Kyuubi server cluster is deployed behind the load balancer, and the load balancer does not know the real load of each Kyuubi server, suppose it uses Random/RoundRobbin/IPHash policies to forward requests, the existing Batch V1 implementation may cause some Kyuubi servers in high load but others' load are low, because it always uses the requested Kyuubi server to do batch submission; in V2, the Kyuubi server is easy to know the load of itself, e.g. measure by CPU/memory usage, or active batch sessions, and then decides to pick up new batch jobs or not. Besides, when all Kyuubi servers overload, the V1 cannot benefit immediately even if the admin scale up the cluster size.
3. In V1, the metrics are almost independent in each Kyuubi server; in V2, it's easy to expose global metrics of batch jobs when using sharable storage as metastore backend, e.g. we can easily get how many batches are queued in metastore, and how many batches are managed by each Kyuubi server, by querying the metastore backend directly or metrics exposed by each Kyuubi server.

Cons:

1. V1 assumes Kyuubi server tolerant long time outage of metastore, V2 forcibly depends on the availability of metastore. But we can move the existing forwarding logic and async retry logic to the implementation of `Metastore` to overcome this regression.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/develop_tools/testing.html#running-tests) locally before make a pull request

Closes #4790 from pan3793/batch-v2.

Closes #4790

860698ad6 [Cheng Pan] BATCH_IMPL_VERSION
b9c68aa2f [Cheng Pan] kyuubi.batch.impl.version
17e4f199a [Cheng Pan] submitter.threads=100
7c0bdb0c1 [Cheng Pan] Initial implement Batch v2

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

I'd like to update LDAP doc to guide users for setup LDAP authentication in Kyuubi.
### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

<img width="1395" alt="image" src="https://github.com/apache/kyuubi/assets/26535726/6925a8e3-dfaf-48ad-a442-bb635fe75830">

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5083 from zhaohehuhu/Improvement-0721.

Closes #5083

8c0e149dd [Cheng Pan] polish
22f8d3aa6 [Cheng Pan] nit
822fa66b3 [hezhao2] sync
78ae12345 [hezhao2] further explanation for LDAP filters
7ebc61acf [Cheng Pan] Update docs/security/ldap.md
bb06810f7 [Cheng Pan] Update docs/security/ldap.md
8d19fdf31 [Cheng Pan] Update docs/security/ldap.md
c2fa2806e [Cheng Pan] Update docs/security/ldap.md
2acbb87db [hezhao2] update LDAP doc
22027e1f2 [hezhao2] update LDAP doc

Lead-authored-by: hezhao2 <hezhao2@cisco.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Co-authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…ootstrap

### _Why are the changes needed?_
As titled.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5109 from link3280/bootstrap_file_not_found.

Closes #5108

318199fa2 [Paul Lin] [KYUUBI #5108][Flink] Fix iFileNotFoundException during Flink engine bootstrap

Authored-by: Paul Lin <paullin3280@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Fix #3920

https://github.com/apache/kyuubi/actions/runs/5711863703/job/15474230690?pr=4790

```
DockerizedZkServiceDiscoverySuite:
- distribute lock *** FAILED ***
  Expected exception org.apache.kyuubi.KyuubiSQLException to be thrown, but no exception was thrown (DiscoveryClientTests.scala:147)
```

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5112 from pan3793/test-lock.

Closes #3920

d980f87dc [Cheng Pan] Fix flaky test - distribute lock

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

- remove duplicated assignment for the same variable in adjacent lines in `FastHiveDecimalImpl`
- replace redundant `putAll` with collection initialization in `BatchRestApi`
- use `try-with-resources` statement with the reader and avoid declaring two variables in the same line of code in `KyuubiCommands`
- fix `warning: Tag 'return:' is not recognised` compilation warning in `KyuubiGetSqlClassification:L53`

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5117 from bowenliang123/fastsignum.

Closes #5117

595b5747d [liangbowen] simplify
be530fac4 [liangbowen] fix warning: Tag '@return:' is not recognised compilation warning in KyuubiGetSqlClassification:L53
249706905 [liangbowen] use try-with-resources in KyuubiCommands
a54a97fdd [liangbowen] remove redundant addAll call to collection initialization
cc76d5d0f [liangbowen] remove repeated assignment

Authored-by: liangbowen <liangbowen@gf.com.cn>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

It was planned but actually delayed, remove this dummy module to save CI and avoid confusing users and release managers.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5113 from pan3793/remove-kudu.

Closes #5113

ff8fd2e6a [Cheng Pan] Remove Spark Kudu connector

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

close #4940

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5110 from lsm1/features/kyuubi_4940.

Closes #4940

6c0a9a37f [senmiaoliu] add kdf for hive engine

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

https://hadoop.apache.org/release/3.3.6.html

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5116 from pan3793/hadoop-3.3.6.

Closes #5116

c3717e7fb [Cheng Pan] Bump Hadoop 3.3.6

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Use StatefulSet instead of Deployment, add a headless service for statefulset

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

![image](https://github.com/apache/kyuubi/assets/3177898/0991c287-cf1a-40f1-8e50-3934bd2886ca)
![image](https://github.com/apache/kyuubi/assets/3177898/9a5d11a5-2ac9-468e-bfcb-9a070f54c6b4)

Closes #5062 from camper42/statefulset.

Closes #4788

a1a7f1b0e [camper42] style: remove redudant Global variable `$`
5286f4ff4 [camper42] fix: set statefulset podManagementPolicy
ed83ae2e8 [camper42] style: move headless service to separate file
97b76ea24 [camper42] use `clusterIP: None` for headless serivce
d2078ffe5 [Cheng Pan] Update charts/kyuubi/templates/kyuubi-statefulset.yaml
35c7e0f90 [Cheng Pan] Update charts/kyuubi/templates/kyuubi-statefulset.yaml
8d970d21d [camper42] style: indent
3cf22748f [camper42] [KYUUBI #4788][K8S][HELM] Use StatefulSet instead of Deployment

Lead-authored-by: camper42 <camper.xlii@gmail.com>
Co-authored-by: Cheng Pan <pan3793@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

close #5122

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5125 from lsm1/features/kyuubi_5122.

Closes #5122

02d0769cc [senmiaoliu] add hive kdf docs

Authored-by: senmiaoliu <senmiaoliu@trip.com>
Signed-off-by: liangbowen <liangbowen@gf.com.cn>
### _Why are the changes needed?_

In Batch implementation v2, the following query is frequently executed to pick the job.
```
SELECT identifier FROM metadata WHERE state='INITIALIZED' ORDER BY create_time DESC LIMIT 1
```
Create an index for `create_time` could speed up the query and reduce the pressure on MySQL server.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

Test the MySQL upgrading SQLs

```
mysql> CREATE TABLE IF NOT EXISTS metadata(
    ->     key_id bigint PRIMARY KEY AUTO_INCREMENT COMMENT 'the auto increment key id',
    ->     identifier varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
    ->     session_type varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
    ->     real_user varchar(255) NOT NULL COMMENT 'the real user',
    ->     user_name varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
    ->     ip_address varchar(128) COMMENT 'the client ip address',
    ->     kyuubi_instance varchar(1024) NOT NULL COMMENT 'the kyuubi instance that creates this',
    ->     state varchar(128) NOT NULL COMMENT 'the session state',
    ->     resource varchar(1024) COMMENT 'the main resource',
    ->     class_name varchar(1024) COMMENT 'the main class name',
    ->     request_name varchar(1024) COMMENT 'the request name',
    ->     request_conf mediumtext COMMENT 'the request config map',
    ->     request_args mediumtext COMMENT 'the request arguments',
    ->     create_time BIGINT NOT NULL COMMENT 'the metadata create time',
    ->     engine_type varchar(32) NOT NULL COMMENT 'the engine type',
    ->     cluster_manager varchar(128) COMMENT 'the engine cluster manager',
    ->     engine_open_time bigint COMMENT 'the engine open time',
    ->     engine_id varchar(128) COMMENT 'the engine application id',
    ->     engine_name mediumtext COMMENT 'the engine application name',
    ->     engine_url varchar(1024) COMMENT 'the engine tracking url',
    ->     engine_state varchar(32) COMMENT 'the engine application state',
    ->     engine_error mediumtext COMMENT 'the engine application diagnose',
    ->     end_time bigint COMMENT 'the metadata end time',
    ->     peer_instance_closed boolean default '0' COMMENT 'closed by peer kyuubi instance',
    ->     UNIQUE INDEX unique_identifier_index(identifier),
    ->     INDEX user_name_index(user_name),
    ->     INDEX engine_type_index(engine_type)
    -> ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
Query OK, 0 rows affected (0.03 sec)

mysql> ALTER TABLE metadata MODIFY kyuubi_instance varchar(1024) COMMENT 'the kyuubi instance that creates this';
Query OK, 0 rows affected (0.06 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> ALTER TABLE metadata ADD INDEX create_time_index(create_time);
Query OK, 0 rows affected (0.03 sec)
Records: 0  Duplicates: 0  Warnings: 0

mysql> show create table metadata;
+----------+--------------------------------------------------------------------------------+
| Table    | Create Table                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |
+----------+--------------------------------------------------------------------------------+
| metadata | CREATE TABLE `metadata` (
  `key_id` bigint NOT NULL AUTO_INCREMENT COMMENT 'the auto increment key id',
  `identifier` varchar(36) NOT NULL COMMENT 'the identifier id, which is an UUID',
  `session_type` varchar(32) NOT NULL COMMENT 'the session type, SQL or BATCH',
  `real_user` varchar(255) NOT NULL COMMENT 'the real user',
  `user_name` varchar(255) NOT NULL COMMENT 'the user name, might be a proxy user',
  `ip_address` varchar(128) DEFAULT NULL COMMENT 'the client ip address',
  `kyuubi_instance` varchar(1024) DEFAULT NULL COMMENT 'the kyuubi instance that creates this',
  `state` varchar(128) NOT NULL COMMENT 'the session state',
  `resource` varchar(1024) DEFAULT NULL COMMENT 'the main resource',
  `class_name` varchar(1024) DEFAULT NULL COMMENT 'the main class name',
  `request_name` varchar(1024) DEFAULT NULL COMMENT 'the request name',
  `request_conf` mediumtext COMMENT 'the request config map',
  `request_args` mediumtext COMMENT 'the request arguments',
  `create_time` bigint NOT NULL COMMENT 'the metadata create time',
  `engine_type` varchar(32) NOT NULL COMMENT 'the engine type',
  `cluster_manager` varchar(128) DEFAULT NULL COMMENT 'the engine cluster manager',
  `engine_open_time` bigint DEFAULT NULL COMMENT 'the engine open time',
  `engine_id` varchar(128) DEFAULT NULL COMMENT 'the engine application id',
  `engine_name` mediumtext COMMENT 'the engine application name',
  `engine_url` varchar(1024) DEFAULT NULL COMMENT 'the engine tracking url',
  `engine_state` varchar(32) DEFAULT NULL COMMENT 'the engine application state',
  `engine_error` mediumtext COMMENT 'the engine application diagnose',
  `end_time` bigint DEFAULT NULL COMMENT 'the metadata end time',
  `peer_instance_closed` tinyint(1) DEFAULT '0' COMMENT 'closed by peer kyuubi instance',
  PRIMARY KEY (`key_id`),
  UNIQUE KEY `unique_identifier_index` (`identifier`),
  KEY `user_name_index` (`user_name`),
  KEY `engine_type_index` (`engine_type`),
  KEY `create_time_index` (`create_time`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+----------+--------------------------------------------------------------------------------+
```

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5131 from pan3793/metastore-create-time-index.

Closes #5131

fc18041f2 [Cheng Pan] ALTER TABLE ADD INDEX
c2261edb2 [Cheng Pan] update upgrade script
4f94be5ca [Cheng Pan] Create index on metastore.create_time

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Otherwise we can not see JDK logs like Krb5.

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

Closes #5129 from pan3793/beeline-log.

Closes #5129

100094823 [Cheng Pan] KyuubiBeeline should redirect JDK logging

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
pan3793 and others added 8 commits October 31, 2023 12:09
### _Why are the changes needed?_

After performing binary distribution artifacts packaging during 1.8.0-rc0

```patch
diff --git a/kyuubi-server/web-ui/pnpm-lock.yaml b/kyuubi-server/web-ui/pnpm-lock.yaml
index 8375429..f25c02de7 100644
--- a/kyuubi-server/web-ui/pnpm-lock.yaml
+++ b/kyuubi-server/web-ui/pnpm-lock.yaml
 -1,4 +1,4
-lockfileVersion: '6.0'
+lockfileVersion: '6.1'

 settings:
   autoInstallPeers: true
```

The inconsistency may be caused by different version install in the local environment and defined in `pom.xml`, I'm not sure if there is a version management system for pnpm

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [x] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes #5569 from pan3793/pnpm-lock.

Closes #5569

8a09870fd [Cheng Pan] Fix pnpm-lock file version

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
…abled` to add HTTP auth header

### _Why are the changes needed?_

`kyuubi.engine.security.enabled` aims to control whether enabled security mechanism internal communication, but the current implementation is not symmetrical, the auth generator ignores the conf and always produces the auth header, but the auth header handler is only activated when conf is enabled, that causes authentication failure when `kyuubi.engine.security.enabled=false`(default value)

### _How was this patch tested?_
- [ ] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [x] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No.

Closes #5566 from pan3793/none-auth.

Closes #5566

d42a4c3f4 [Cheng Pan] Revert "Extract AnonymousAuthenticationHandler from BasicAuthenticationHandler"
b544343bc [Cheng Pan] Extract AnonymousAuthenticationHandler from BasicAuthenticationHandler
75c4b7dc3 [Cheng Pan] InternalRestClient respects `kyuubi.engine.security.enabled` to add HTTP auth header

Authored-by: Cheng Pan <chengpan@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

1. This PR fixes the precision loss issue in `xx_gmt_offset`. Please note that since `xx_gmt_offset` is of integer type, there is no actual loss of precision.

```
trino:tiny> select cc_gmt_offset from call_center ;
 cc_gmt_offset
---------------
         -5.00
         -5.00
```

Before this PR:

```scala
scala> spark.sql("select cc_gmt_offset from tpcds.tiny.call_center").show
+-------------+
|cc_gmt_offset|
+-------------+
|           -5|
|           -5|
+-------------+
```

After this PR:
```scala
scala> spark.sql("select cc_gmt_offset from tpcds.tiny.call_center").show
+-------------+
|cc_gmt_offset|
+-------------+
|        -5.00|
|        -5.00|
+-------------+
```

2. This PR accelerates the generation of the TPC-DS dataset by optimizing the way Rows are generated.

Before this PR, The previous process involved converting **Trino TableRow** into **String Row** and then further into **Spark InternalRow**.

After this PR, we have streamlined the process by directly converting **Trino TableRow** into **Spark InternalRow**, eliminating unnecessary toString operations. This change significantly improves the speed of TPC-DS dataset generation.

```scala
spark.table("tpcds.sf1000.catalog_sales").foreach(r => ())
```

Task Duration before this PR:

![截屏2023-10-30 下午4 04 12](https://github.com/apache/kyuubi/assets/8537877/69bd9938-2886-4044-99b8-79ed20d4791c)

Task Duration after this PR:

![截屏2023-10-30 下午4 02 08](https://github.com/apache/kyuubi/assets/8537877/ddfe01a9-081c-41b5-b82c-a0934dd8686c)

### _How was this patch tested?_

- New UT `tpcds.tiny count and checksum`
- Compare checksum values before and after this PR on the 1TB dataset

| table_name             | count           | checksum                  |
|------------------------|-----------------|---------------------------|
| call_center            | 42              | 95607401475               |
| catalog_page           | 30000           | 64470199469085            |
| catalog_returns        | 143996756       | 309202327050775220        |
| catalog_sales          | 1439980416      | 3092267266923848000       |
| customer               | 12000000        | 25769069905636795         |
| customer_address       | 6000000         | 12889423380880973         |
| customer_demographics  | 1920800         | 4124183189708148          |
| date_dim               | 73049           | 156926081012862           |
| household_demographics | 7200            | 15494873325812            |
| income_band            | 20              | 41180951007               |
| inventory              | 783000000       | 1681487454682584456       |
| item                   | 300000          | 643000708260945           |
| promotion              | 1500            | 3270935493709             |
| reason                 | 65              | 118806664977              |
| ship_mode              | 20              | 52349078860               |
| store                  | 1002            | 2096408105720             |
| store_returns          | 287999764       | 618451374856897114        |
| store_sales            | 2879987999      | 6184670571185100839       |
| time_dim               | 86400           | 186045071019485           |
| warehouse              | 20              | 31374161844               |
| web_page               | 3000            | 6502456139647             |
| web_returns            | 71997522        | 154614570845312413        |
| web_sales              | 720000376       | 1546188452223821591       |
| web_site               | 54              | 107485781738              |

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes #5562 from cfmcgrady/tpcds-perf.

Closes #5550

a789b9e70 [Fu Chen] maxPartitionBytes=384m
659e20912 [Fu Chen] style
916f6d276 [Fu Chen] unnecessary change
75981af8b [Fu Chen] tpcds perf

Authored-by: Fu Chen <cfmcgrady@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

Usually, we can use `spark.sql.shuffle.partitions` to configure the number of shuffle partitions (or `spark.sql.adaptive.coalescePartitions.initialPartitionNum` for AQE). However, it seems difficult to find a universal value for all SQL jobs.

Although Spark AQE can dynamically merge and split partitions based on partition size, inappropriate shuffle partitions may still cause some problems:

+ When there are too few shuffle partitions, the join skew optimization threshold is large and the skew partitions will not be split.
+ When using RemoteShuffleService, an inappropriate number of shuffle partitions may result in too large partitions or too many partitions, which will lead to high pressure on the shuffle server.

So I want to provide an optimization rule to dynamically adjust the number of partitions based on the size of the input data.

Calculate the number of partitions based on input data size:

```
targetShufflePartitions = sum(scanSize|shuffleReadSize) / advisoryPartitionSizeInBytes
```

then replace the number of partitions for all `ShuffleExchangeExec` nodes.

### _How was this patch tested?_
- [X] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes #5489 from wForget/dynamic_shuffle_partitions.

Closes #5489

5a2bb6c25 [wforget] only takes effect when aqe is enabled
038b7bb45 [wforget] moved behind InsertShuffleNodeBeforeJoin
7ca87d8e8 [wforget] comment
d65047fda [wforget] sum scanSizes
e4d8f33af [wforget] comments
4f0f25d8e [wforget] configurable
f77d1d648 [wforget] code style
0bf572f27 [wforget] use partition stats
8d251c3fd [wforget] Adjust shuffle partitions dynamically

Authored-by: wforget <643348094@qq.com>
Signed-off-by: ulyssesyou <ulyssesyou@apache.org>
…e been verified

### _Why are the changes needed?_
To close #5503
For sql such as lateral join in test `[KYUUBI #5503][AUTHZ] Check plan auth checked should not set tag to all child nodes`, it will first verify subquery in `lateral` then verify whole plan, if there is a view, when verify the whole plan, the `PermanentViewMarker` will be remove by spark's optimizer.
Then it will verify both source table `table1` and `table2`.
So I think we need to do 3 things:

1. Mark all PermanentViewMarker's children's all nodes as checked and Subquery's all child marks as checked.
2. `isAuthChecked` should only check the first level of the plan to avoid skipping the check of the whole plan in the demo test
3. in `buildQuery`, if the current node has the tag, we just skip it.

Without this pr, the SQL in test will both check `table1` and `table2`

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_
No

Closes #5563 from AngersZhuuuu/KYUUBI-5503-FOLLOWUP.

Closes #5503

c1a427f58 [Angerszhuuuu] Update Authorization.scala
d6b2899db [Angerszhuuuu] update
633bc91e0 [Angerszhuuuu] Update Authorization.scala
7a006b136 [Angerszhuuuu] [KYUUBI #5503][FOLLOWUP][AUTHZ] Authz should skip inner plan that have been verified

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_
To close #5575
 Fix wrong code in test case of dir command

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

### _Was this patch authored or co-authored using generative AI tooling?_
No

Closes #5577 from AngersZhuuuu/KYUUBI-5576.

Closes #5576

60e2cb817 [Angerszhuuuu] [KYUUBI #5576][Bug] Fix wrong code in test case of dir command

Authored-by: Angerszhuuuu <angers.zhu@gmail.com>
Signed-off-by: Cheng Pan <chengpan@apache.org>
### _Why are the changes needed?_

As title and make Web UI more clean.

And as Contact Us page and Overview page will do refactor later, so remain these.

### _How was this patch tested?_
- [x] Add some test cases that check the changes thoroughly including negative and positive cases if possible

- [ ] Add screenshots for manual tests if appropriate

- [ ] [Run test](https://kyuubi.readthedocs.io/en/master/contributing/code/testing.html#running-tests) locally before make a pull request

![截屏2023-10-31 15 30 37](https://github.com/apache/kyuubi/assets/52876270/443feaf5-2d9a-4683-9214-6b7f5b5769cd)

### _Was this patch authored or co-authored using generative AI tooling?_

No

Closes #5574 from zwangsheng/KYUUBI#5573.

Closes #5573

462f9f662 [zwangsheng] fix comments
d32101055 [zwangsheng] [KYUUBI #5573][Improvement] Delete parts of the Kyuubi Web UI that are not useful

Authored-by: zwangsheng <binjieyang@apache.org>
Signed-off-by: Cheng Pan <chengpan@apache.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment