Skip to content

[BUG] Filters on partition columns not taking effect | Spark 3.5.0 | com.crealytics:spark-excel_2.12:3.5.0_0.20.2/3 and 3.5.1_0.20.4 #907

@minnieshi

Description

@minnieshi

Am I using the newest version of the library?

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

The filter on a column, the partition folder, does not take effect on the below combination versions:

spark-excel_2.12-3.5.0_0.20.2 + Spark 3.5.0
spark-excel_2.12-3.5.0_0.20.3 + Spark 3.5.0
spark-excel_2.12-3.5.1_0.20.4 + Spark 3.5.0
(I did not list the 3.5.0_0.20.1 here as it has other issues which in older versions it had the same packing error
SparkClassNotFoundException: [DATA_SOURCE_NOT_FOUND] Failed to find the data source: excel
)

image

spark 3.5 meant databricks 15.4

image

The spark-excel library

image

databricks notebook (scala) filter code:

image

Expected Behavior

dataframe Filters work on partition folders
ps, the below version combinations work
spark-excel_2.12-3.2.4_0.20.4 + Spark 3.3.2
spark-excel_2.12-3.2.2_0.18.5 + Spark 3.3.2

image

Steps To Reproduce

see the notebook screenshot
val df = spark.read .format("excel") // for V2 implementation .option("dataAddress", "0!A3") // Optional, default: "A1" .option("header", "true") // Required .option("inferSchema", "true") // Optional, default: false .option("treatEmptyValuesAsNulls", "true") .load(excelPath)
also tried to filter using an integer
import org.apache.spark.sql.functions.col import org.apache.spark.sql.functions._ display(df.where(col("execution_date") === lit(20231218)).select("execution_date").distinct)
filter did not take effect
image

Environment

- Spark version:
- Spark-Excel version:
- OS:
- Cluster environment

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions