Skip to content

Sigmoid function in PrecisionRecallCurve leads to information loss #1526

@djdameln

Description

@djdameln

🐛 Bug

Hello, first of all, thank you for the awesome library! I am a maintainer of the Anomalib library, and we are using TorchMetrics extensively throughout our code base to evaluate our models.

The most recent version of TorchMetrics introduced some changes to the PrecisionRecallCurve metric, which are causing some problems in one of our components. The problems are related to the re-mapping of the prediction values to the [0,1] range by applying a sigmoid function.

Some context

The goal of the models in our library is to detect anomalous samples in a dataset that contains both normal and anomalous samples. The task is similar to a classical binary classification problem, but instead of generating a class label and a confidence score, our models generate an anomaly score, which quantifies the distance of the sample to the distribution of normal samples seen during training. The range of possible anomaly score values is unbounded and may differ widely between models and/or datasets, which makes it tricky to set a good threshold for mapping the raw anomaly scores to a binary class label (normal vs. anomalous). This is why we apply an adaptive thresholding mechanism as a post-processing step. The adaptive threshold mechanism returns the threshold value that maximizes the F1 score over the validation set.

Our adaptive thresholding class inherits from TorchMetrics' PrecisionRecallCurve class. After TorchMetrics computes the precision and recall values, our class computes the F1 scores for the range of precision and recall values, and finally returns the threshold value that corresponds to the highest observed F1 score.

The problem

In the latest version of the PrecisionRecallCurve metric, the update method now re-maps the predictions to the [0, 1] range by applying a sigmoid function. As a result, the thresholds variable returned by compute is now not in the same domain as the original predictions, and the values are not usable for our purpose of finding the optimal threshold value.

In addition, the sigmoid function squeezes the higher and lower values, which leads to lower resolution at the extremes of the input range, and in some cases even information loss.

To Reproduce

Here's an example to illustrate the problem. Let's say we have a set of binary targets and a set of model predictions in the range [12, 17]. Previously, the PrecisionRecallCurve metric would return the values of precision and recall for the different thresholds that occur naturally in the data.

v0.10.3

>>> from torchmetrics import PrecisionRecallCurve
>>> from torch import Tensor
>>>
>>> targets = Tensor([0, 0, 1, 0, 1, 1]).int()
>>> predictions = Tensor([12, 13, 14, 15, 16, 17])
>>>
>>> metric = PrecisionRecallCurve()
>>> metric.update(predictions, targets)
>>> precision, recall, thresholds = metric.compute()
>>>
>>> precision
tensor([0.7500, 0.6667, 1.0000, 1.0000, 1.0000])
>>> recall
tensor([1.0000, 0.6667, 0.6667, 0.3333, 0.0000])
>>> thresholds
tensor([14., 15., 16., 17.])

Given these outputs it is straightforward to obtain the F1 scores for the different threshold values and use this to find the optimal threshold that maximizes F1.

After the recent changes, the predictions are now re-mapped by the sigmoid function. While we can still compute the F1 scores, we can no longer find the value of the threshold that yields the highest F1 score, because the values of the thresholds variable are no longer in the same domain as the original predictions.

v0.11.1

>>> from torchmetrics import PrecisionRecallCurve
>>> from torch import Tensor
>>>
>>> targets = Tensor([0, 0, 1, 0, 1, 1]).int()
>>> predictions = Tensor([12, 13, 14, 15, 16, 17])
>>>
>>> metric = PrecisionRecallCurve(task="binary")
>>> metric.update(predictions, targets)
>>> precision, recall, thresholds = metric.compute()
>>>
>>> precision
tensor([0.7500, 0.6667, 1.0000, 1.0000, 1.0000])
>>> recall
tensor([1.0000, 0.6667, 0.6667, 0.3333, 0.0000])
>>> thresholds
tensor([1.0000, 1.0000, 1.0000, 1.0000])

Note that the elements of the thresholds variable all appear as 1.0000 because the numerical differences between the threshold candidates are minimized due to the squeezing effect of the sigmoid.

It gets even worse when we increase the absolute values of the predictions to [22, 27]. The output of the sigmoid now evaluates to 1.0 for all predictions due to rounding, and the metric is not able to compute any meaningful precision and recall values.

v0.11.1

>>> from torchmetrics import PrecisionRecallCurve
>>> from torch import Tensor
>>>
>>> targets = Tensor([0, 0, 1, 0, 1, 1]).int()
>>> predictions = Tensor([22, 23, 24, 25, 26, 27])
>>>
>>> metric = PrecisionRecallCurve(task="binary")
>>> metric.update(predictions, targets)
>>> precision, recall, thresholds = metric.compute()
>>>
>>> precision
tensor([0.5000, 1.0000])
>>> recall
tensor([1., 0.])
>>> thresholds
tensor(1.)

I guess this change was made to accommodate classical binary classification problems, where the predictions are generally confidence scores in the [0, 1] range, but I feel this is too restricting for other problem classes. Mathematically there is no reason why the precision-recall curve cannot be computed using predictions that fall outside of this range.

Expected behavior

The re-mapping of the prediction values to [0,1] by applying a sigmoid function should be optional.

Environment

  • TorchMetrics 0.11.1 (pip)
  • Python 3.8
  • PyTorch 1.13.1

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions