-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] When using insert overwrite, a StackOverflowError exception is thrown #5331
Comments
spark sql conf set spark.sql.optimizer.dynamicPartitionPruning.enabled=false; If I don't set the value of the spark.sql.optimizer.dynamicPartitionPruning.enabled parameter to false, I get Error in query: unresolved operator 'Filter dynamicpruning950' If I set the value of the spark.sql.optimizer.dynamicPartitionPruning.enabled parameter to false, throw java.lang.StackOverflowError.
|
Can you try the latest 1.0.1 |
|
Search before asking
Paimon version
paimon0.9.0
Compute Engine
spark3.3.0
hive3.1.2
Minimal reproduce step
for example
a double partition table
CREATE TABLE IF NOT EXISTS paimon.${dbname}.${tablename}(
column1 string COMMENT 'column1',
column2 string COMMENT 'column2',
column3 string COMMENT 'column3',
column4 string COMMENT 'column4',
p_col_1 string COMMENT 'p_col_1',
p_col_2 string COMMENT 'p_col_2'
) PARTITIONED BY (p_col_1,p_col_2)
COMMENT ''
TBLPROPERTIES (
'primary-key' = 'p_col_1,p_col_2,column1,column2,column3',
'bucket-key' = 'column1,column2,column3',
'bucket' = '1',
'file.format' = 'parquet',
'deletion-vectors.enabled' = 'true',
'metastore.partitioned-table' = 'true',
'scan.mode' = 'compacted-full',
'compaction.optimization-interval' = '3600000',
'target-file-size' = '256mb',
'sink.parallelism' = '20',
'num-sorted-run.stop-trigger' = '2147483647',
'sort-spill-threshold' = '10',
'merge-engine' = 'deduplicate'
);
insert overwrite table paimon.${dbname}.${tablename} partition(cdate,p_col_2)
select
t1.column1
,t1.column2
,t1.column3
,t2.column4
,t2.column5
,t1.cdate
,t1.p_col_2
from (
select cdate,p_col_2,column1,column2,column3
from hive_db.hive_table1
where cdate between '${start_day}' and '${end_day}'
)t1 join(
select cdate,p_col_2,column1,column2,column3,column4,column5
from hive_db.hive_table2
where cdate between '${start_day}' and '${end_day}'
)t2 on
t1.cdate = t2.cdate and t1.p_col_2 = t2.p_col_2 and t1.column1 = t2.column1 and t1.column2 = t2.column2
and t1.column3 = t2.column3;
hive_db.hive_table1:unique key is cdate,p_col_2,column1,column2,column3
hive_db.hive_table2:unique key is cdate,p_col_2,column1,column2,column3,column4
The estimated number of partitions written is around 4500.
What doesn't meet your expectations?
When the last stage is written, an error is reported
java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.StackOverflowError
at org.apache.paimon.spark.commands.PaimonSparkWriter.commit(PaimonSparkWriter.scala:285)
at org.apache.paimon.spark.commands.WriteIntoPaimonTable.run(WriteIntoPaimonTable.scala:64)
at org.apache.paimon.spark.commands.PaimonDynamicPartitionOverwriteCommand.run(PaimonDynamicPartitionOverwriteCommand.scala:69)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:98)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$6(SQLExecution.scala:109)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:169)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:95)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:98)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:94)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:584)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:584)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:560)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:94)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:81)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:79)
at org.apache.spark.sql.Dataset.(Dataset.scala:220)
at org.apache.spark.sql.Dataset$.$anonfun$ofRows$2(Dataset.scala:100)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.Dataset$.ofRows(Dataset.scala:97)
at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:622)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:779)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:617)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:651)
at org.apache.spark.sql.hive.thriftserver.SparkSQLDriver.run(SparkSQLDriver.scala:67)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processCmd(SparkSQLCLIDriver.scala:384)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1(SparkSQLCLIDriver.scala:504)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.$anonfun$processLine$1$adapted(SparkSQLCLIDriver.scala:498)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.processLine(SparkSQLCLIDriver.scala:498)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:336)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:207)
at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Anything else?
No response
Are you willing to submit a PR?
The text was updated successfully, but these errors were encountered: