-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
I wish we can support pyspark.ml.evaluation.{BinaryClassificationEvaluator, MulticlassClassificationEvaluator}.
Take the example from https://stackoverflow.com/questions/60772315/how-to-evaluate-a-classifier-with-pyspark-2-4-5 :
from pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator
# Create both evaluators
evaluatorMulti = MulticlassClassificationEvaluator(labelCol="target", predictionCol="prediction")
evaluator = BinaryClassificationEvaluator(labelCol="target", rawPredictionCol="prediction", metricName='areaUnderROC')
# Make predicitons
predictionAndTarget = model.transform(df).select("target", "prediction")
# Get metrics
acc = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "accuracy"})
f1 = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "f1"})
weightedPrecision = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "weightedPrecision"})
weightedRecall = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "weightedRecall"})
auc = evaluator.evaluate(predictionAndTarget)
Seems those are RDD APIs and will generate lots of un-supported messages.
Such as:
! <DeserializeToObjectExec> cannot run on GPU because not all expressions can be replaced; GPU does not currently support the operator class org.apache.spark.sql.execution.DeserializeToObjectExec
! <CreateExternalRow> createexternalrow(prediction#327, label#322, 1.0#400, newInstance(class org.apache.spark.ml.linalg.VectorUDT).deserialize, StructField(prediction,DoubleType,true), StructField(label,DoubleType,true), StructField(1.0,DoubleType,false), StructField(probability,org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7,true)) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.CreateExternalRow
@Expression <AttributeReference> prediction#327 could run on GPU
@Expression <AttributeReference> label#322 could run on GPU
@Expression <AttributeReference> 1.0#400 could run on GPU
! <Invoke> newInstance(class org.apache.spark.ml.linalg.VectorUDT).deserialize cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.Invoke
! <NewInstance> newInstance(class org.apache.spark.ml.linalg.VectorUDT) cannot run on GPU because GPU does not currently support the operator class org.apache.spark.sql.catalyst.expressions.objects.NewInstance
!Expression <AttributeReference> probability#326 cannot run on GPU because expression AttributeReference probability#326 produces an unsupported type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
!Expression <AttributeReference> obj#406 cannot run on GPU because expression AttributeReference obj#406 produces an unsupported type ObjectType(interface org.apache.spark.sql.Row)
!Exec <ProjectExec> cannot run on GPU because unsupported data types in input: org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 [probability#326]; not all expressions can be replaced; unsupported data types in output: org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 [probability#326]
@Expression <AttributeReference> prediction#327 could run on GPU
@Expression <AttributeReference> label#322 could run on GPU
@Expression <Alias> 1.0 AS 1.0#400 could run on GPU
@Expression <Literal> 1.0 could run on GPU
!Expression <AttributeReference> probability#326 cannot run on GPU because expression AttributeReference probability#326 produces an unsupported type org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7
!Exec <FileSourceScanExec> cannot run on GPU because unsupported data types org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 [probability] in read for Parquet; unsupported data types in output: org.apache.spark.ml.linalg.VectorUDT@3bfc3ba7 [probability#326]
Metadata
Metadata
Assignees
Labels
No labels