Am I using the newest version of the library?
Is there an existing issue for this?
Current Behavior
Description:
I'm encountering an issue when trying to write timestamp_ntz type columns from PySpark DataFrames to Excel files. The process fails with the following error:
25/08/01 10:45:06 WARN task-result-getter-1 TaskSetManager: Lost task 0.0 in stage 8.0 (TID 17) (10.255.0.124 executor 1): java.lang.RuntimeException: Unsupported type: timestamp_ntz at com.crealytics.spark.excel.v2.ExcelGenerator.makeConverter(ExcelGenerator.scala:145) at com.crealytics.spark.excel.v2.ExcelGenerator.$anonfun$valueConverters$2(ExcelGenerator.scala:60) at scala.collection.immutable.List.map(List.scala:297) at com.crealytics.spark.excel.v2.ExcelGenerator.<init>(ExcelGenerator.scala:60) at com.crealytics.spark.excel.v2.ExcelOutputWriter.<init>(ExcelOutputWriter.scala:29) at com.crealytics.spark.excel.v2.ExcelFileFormat$$anon$1.newInstance(ExcelFileFormat.scala:50) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:215) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:200) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:434) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:94) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:145) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:619) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:622) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)
- Create a PySpark DataFrame that includes a timestamp_ntz field.
- Attempt to write this DataFrame to an Excel file using the com.crealytics.spark.excel library.
- The error occurs during the write operation.
Expected Behavior
The timestamp_ntz field should be supported for writing to Excel files. Ideally, the library should either:
- Automatically handle the timestamp_ntz type, or
- Allow users to explicitly define how to handle timestamp_ntz fields before writing to Excel.
Steps To Reproduce
No response
Environment
- Spark version: 3.5.1
- Spark-Excel version: spark-excel_2.12-3.5.1_0.20.4.jar
- OS: Mac OS
- Cluster environment
Driver 2CU
Executors 8CU
Anything else?
I think this function can be supported.
Am I using the newest version of the library?
Is there an existing issue for this?
Current Behavior
Description:
I'm encountering an issue when trying to write timestamp_ntz type columns from PySpark DataFrames to Excel files. The process fails with the following error:
25/08/01 10:45:06 WARN task-result-getter-1 TaskSetManager: Lost task 0.0 in stage 8.0 (TID 17) (10.255.0.124 executor 1): java.lang.RuntimeException: Unsupported type: timestamp_ntz at com.crealytics.spark.excel.v2.ExcelGenerator.makeConverter(ExcelGenerator.scala:145) at com.crealytics.spark.excel.v2.ExcelGenerator.$anonfun$valueConverters$2(ExcelGenerator.scala:60) at scala.collection.immutable.List.map(List.scala:297) at com.crealytics.spark.excel.v2.ExcelGenerator.<init>(ExcelGenerator.scala:60) at com.crealytics.spark.excel.v2.ExcelOutputWriter.<init>(ExcelOutputWriter.scala:29) at com.crealytics.spark.excel.v2.ExcelFileFormat$$anon$1.newInstance(ExcelFileFormat.scala:50) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:215) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:200) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:434) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:94) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:145) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:619) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:622) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)Expected Behavior
The timestamp_ntz field should be supported for writing to Excel files. Ideally, the library should either:
Steps To Reproduce
No response
Environment
Anything else?
I think this function can be supported.