Skip to content

[SUPPORT] Support for Writing timestamp_ntz Fields from PySpark to Excel #987

@nothing-go-reade

Description

@nothing-go-reade

Am I using the newest version of the library?

  • I have made sure that I'm using the latest version of the library.

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

Description:
I'm encountering an issue when trying to write timestamp_ntz type columns from PySpark DataFrames to Excel files. The process fails with the following error:

25/08/01 10:45:06 WARN task-result-getter-1 TaskSetManager: Lost task 0.0 in stage 8.0 (TID 17) (10.255.0.124 executor 1): java.lang.RuntimeException: Unsupported type: timestamp_ntz at com.crealytics.spark.excel.v2.ExcelGenerator.makeConverter(ExcelGenerator.scala:145) at com.crealytics.spark.excel.v2.ExcelGenerator.$anonfun$valueConverters$2(ExcelGenerator.scala:60) at scala.collection.immutable.List.map(List.scala:297) at com.crealytics.spark.excel.v2.ExcelGenerator.<init>(ExcelGenerator.scala:60) at com.crealytics.spark.excel.v2.ExcelOutputWriter.<init>(ExcelOutputWriter.scala:29) at com.crealytics.spark.excel.v2.ExcelFileFormat$$anon$1.newInstance(ExcelFileFormat.scala:50) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:215) at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:200) at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:434) at org.apache.spark.sql.execution.datasources.WriteFilesExec.$anonfun$doExecuteWrite$1(WriteFiles.scala:100) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) at org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:94) at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) at org.apache.spark.scheduler.Task.run(Task.scala:145) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:619) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64) at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:622) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829)

  1. Create a PySpark DataFrame that includes a timestamp_ntz field.
  2. Attempt to write this DataFrame to an Excel file using the com.crealytics.spark.excel library.
  3. The error occurs during the write operation.

Expected Behavior

The timestamp_ntz field should be supported for writing to Excel files. Ideally, the library should either:

  • Automatically handle the timestamp_ntz type, or
  • Allow users to explicitly define how to handle timestamp_ntz fields before writing to Excel.

Steps To Reproduce

No response

Environment

- Spark version: 3.5.1
- Spark-Excel version: spark-excel_2.12-3.5.1_0.20.4.jar
- OS: Mac OS
- Cluster environment
Driver 2CU
Executors 8CU

Anything else?

I think this function can be supported.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions