Skip to content

[FEA]Support pyspark.ml.evaluation.{BinaryClassificationEvaluator, MulticlassClassificationEvaluator} #64

@viadea

Description

@viadea

I wish we can support pyspark.ml.evaluation.{BinaryClassificationEvaluator, MulticlassClassificationEvaluator}.

Take the example from https://stackoverflow.com/questions/60772315/how-to-evaluate-a-classifier-with-pyspark-2-4-5 :

from pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator

# Create both evaluators
evaluatorMulti = MulticlassClassificationEvaluator(labelCol="target", predictionCol="prediction")
evaluator = BinaryClassificationEvaluator(labelCol="target", rawPredictionCol="prediction", metricName='areaUnderROC')

# Make predicitons
predictionAndTarget = model.transform(df).select("target", "prediction")

# Get metrics
acc = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "accuracy"})
f1 = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "f1"})
weightedPrecision = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "weightedPrecision"})
weightedRecall = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "weightedRecall"})
auc = evaluator.evaluate(predictionAndTarget)

Seems those are RDD APIs and will generate lots of un-supported messages.
Such as:

! <DeserializeToObjectExec> cannot run on GPU because not all expressions can be replaced; GPU does not currently support the operator class org.apache.spark.sql.execution.DeserializeToObjectExec
  ! <CreateExternalRow> createexternalrow(prediction#327, label#322, 1.0#400, newInstance(class org.apache.spark.ml.linalg.VectorUDT).deserialize, StructField(prediction,DoubleType,true), StructField(label,DoubleType,true), StructField(1.0,DoubleType,false), StructField(probability,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
    @Expression <AttributeReference> prediction#327 could run on GPU
    @Expression <AttributeReference> label#322 could run on GPU
    @Expression <AttributeReference> 1.0#400 could run on GPU
    ! <Invoke> newInstance(class org.apache.spark.ml.linalg.VectorUDT).deserialize cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
      ! <NewInstance> newInstance(class org.apache.spark.ml.linalg.VectorUDT) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.NewInstance
      !Expression <AttributeReference> probability#326 cannot run on GPU because expression AttributeReference probability#326 produces an unsupported type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
  !Expression <AttributeReference> obj#406 cannot run on GPU because expression AttributeReference obj#406 produces an unsupported type ObjectType(interface org.apache.spark.sql.Row)
  !Exec <ProjectExec> cannot run on GPU because unsupported data types in input: org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 [probability#326]; not all expressions can be replaced; unsupported data types in output: org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 [probability#326]
    @Expression <AttributeReference> prediction#327 could run on GPU
    @Expression <AttributeReference> label#322 could run on GPU
    @Expression <Alias> 1.0 AS 1.0#400 could run on GPU
      @Expression <Literal> 1.0 could run on GPU
    !Expression <AttributeReference> probability#326 cannot run on GPU because expression AttributeReference probability#326 produces an unsupported type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
    !Exec <FileSourceScanExec> cannot run on GPU because unsupported data types org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 [probability] in read for Parquet; unsupported data types in output: org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 [probability#326]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions