Skip to content

[Feature] Increase model coverage of the BERT Score metric by adding torchmetrics implementation #332

@achad4

Description

@achad4

The current underlying implementation of BERT score supports a limited set of transformer models, and FMEval further truncates this list to microsoft/deberta-xlarge-mnli and roberta-large-mnli.

Torchmetrics provides a more generic BERT score implementation. The specific request here is to not limit what transformer models users can configure. The behavior should be as follows:

  • If microsoft/deberta-xlarge-mnli and roberta-large-mnli is specified, use the bert-score implementation to avoid regressions for existing customers.
  • Otherwise, use the torchmetrics implementation of BERT score.

Use cases for broader set of models underlying BERT score:

  1. Monolingual BERT models have been shown to outperform multi-lingual BERT models on certain tasks. https://aclanthology.org/2021.acl-long.243.pdf
  2. Customers may fine tune their own transformers which can be downloaded into the container running AWSFMeval and passed into the torchmetrics BERT score implementation.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions