GitHub - deep-spin/UA_COMET: Repository for "Uncertainty-Aware Machine Translation Evaluation", accepted to Findings of EMNLP 2021.

This repository presents UA-COMET – an extension of the COMET metric implemented by Unbabel.

It contains the code and data to reproduce the experiments in Uncertainty-Aware Machine Translation Evaluation.

Quick Installation

We recommend python 3.6 to run COMET.

Detailed usage examples and instructions can be found in the Full Documentation.

To develop locally:

git clone https://github.com/deep-spin/UA_COMET.git
pip install -r requirements.txt
pip install -e .

Scoring MT outputs:

Via Bash:

Examples from WMT20:

echo -e "Dem Feuer konnte Einhalt geboten werden\nSchulen und Kindergärten wurden eröffnet." >> src.de
echo -e "The fire could be stopped\nSchools and kindergartens were open" >> hyp.en
echo -e "They were able to control the fire.\nSchools and kindergartens opened" >> ref.en

comet score -s src.de -h hyp.en -r ref.en

You can export your results to a JSON file using the --to_json flag and select another model/metric with --model.

comet score -s src.de -h hyp.en -r ref.en --model wmt-large-hter-estimator --to_json segments.json

Via Python:

from comet.models import download_model
model = download_model("wmt-large-da-estimator-1719")
data = [
    {
        "src": "Dem Feuer konnte Einhalt geboten werden",
        "mt": "The fire could be stopped",
        "ref": "They were able to control the fire."
    },
    {
        "src": "Schulen und Kindergärten wurden eröffnet.",
        "mt": "Schools and kindergartens were open",
        "ref": "Schools and kindergartens opened"
    }
]
model.predict(data, cuda=True, show_progress=True)

Scoring MT outputs with MCD runs

To run COMET with multiple MCD runs:

 #!/bin/bash
 
GPU_N=3

SCORES=/path/to/your/output/folder
DATA=/path/to/your/data/folder

N=100
D=0.1
N_REFS=1

SRC=src.txt
MT=mt.txt
REF=ref.txt

MODEL=wmt-large-da-estimator-1719

echo Starting the process...

CUDA_VISIBLE_DEVICES=$GPU_N comet score \
  -s $DATA/sources/$SRC \
  -h $DATA/system-outputs/$MT \
  -r $DATA/references/$REF \
  --to_json $SCORES/filename.json \
  --n_refs $N_REFS \
  --n_dp_runs $N \
  --d_enc $D \
  --d_pool $D \
  --d_ff1 $D \
  --d_ff2 $D \
  --model $MODEL

This will run the model with a set of hyperparameters defined above. Here is the description of the main scoring arguments:

-s: Source segments.
-h: MT outputs.
-r: Reference segments.
--to_json: Creates and exports model predictions to a JSON file.
--n_refs: Number of references used during inference. [default=1]
--n_dp_runs: Number of dropout runs at test time. [default=30]
--d_enc: Dropout value for the encoder. [default=0.1]
--d_pool: Dropout value for the layerwise pooling layer. [default=0.1]
--d_ff1: Dropout value for the 1st feed forward layer. [default=0.1]
--d_ff2: Dropout value for the 2nd feed forward layer. [default=0.1]
--model: Name of the pretrained model OR path to a model checkpoint.

To know more about the rest of the parameters and their default values, take a look at the comet/cli.py file.

How to Reproduce and Evaluate Experiments

The evaluation sub-folder contains the scripts and data necessary to reproduce the experiments presented in Uncertainty-Aware Machine Translation Evaluation and/or test new model outputs. See the README in that folder for more detailed instructions.

Model Zoo:

The COMET models used for uncertainty-aware MT evaluation experiments are:

wmt-large-da-estimator-1719 for the WMT20 dataset (DA/MQM scores)
wmt-large-hter-estimator for the QT21 dataset (HTER scores)

Available and compatible models are:

Model	Description
↑`wmt-large-da-estimator-1719`	RECOMMENDED: Estimator model build on top of XLM-R (large) trained on DA from WMT17, WMT18 and WMT19
↑`wmt-base-da-estimator-1719`	Estimator model build on top of XLM-R (base) trained on DA from WMT17, WMT18 and WMT19
↓`wmt-large-hter-estimator`	Estimator model build on top of XLM-R (large) trained to regress on HTER.
↓`wmt-base-hter-estimator`	Estimator model build on top of XLM-R (base) trained to regress on HTER.

Train your own Metric:

Instead of using pretrained models your can train your own COMET model with the following command:

comet train -f {config_file_path}.yaml

For more information check: COMET's documentation.

Alternatively, it is possible to train a different metric and compare performance using the scripts in the evaluation sub-folder. In this case, ensure the metric output files maintain the same structure as described in evaluation/data/README.md.

Publications

@inproceedings{rei-etal-2020-comet,
    title = "{COMET}: A Neural Framework for {MT} Evaluation",
    author = "Rei, Ricardo  and
      Stewart, Craig  and
      Farinha, Ana C  and
      Lavie, Alon",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)",
    month = nov,
    year = "2020",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/2020.emnlp-main.213",
    pages = "2685--2702",
}

@inproceedings{rei-EtAl:2020:WMT,
  author    = {Rei, Ricardo  and  Stewart, Craig  and  Farinha, Ana C  and  Lavie, Alon},
  title     = {Unbabel's Participation in the WMT20 Metrics Shared Task},
  booktitle      = {Proceedings of the Fifth Conference on Machine Translation},
  month          = {November},
  year           = {2020},
  address        = {Online},
  publisher      = {Association for Computational Linguistics},
  pages     = {909--918},
}

@inproceedings{stewart-etal-2020-comet,
    title = "{COMET} - Deploying a New State-of-the-art {MT} Evaluation Metric in Production",
    author = "Stewart, Craig  and
      Rei, Ricardo  and
      Farinha, Catarina  and
      Lavie, Alon",
    booktitle = "Proceedings of the 14th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)",
    month = oct,
    year = "2020",
    address = "Virtual",
    publisher = "Association for Machine Translation in the Americas",
    url = "https://www.aclweb.org/anthology/2020.amta-user.4",
    pages = "78--109",
}

Name	Name	Last commit message	Last commit date
Latest commit glushkovato added prism translations data folder Sep 22, 2021 47bddef · Sep 22, 2021 History 197 Commits
.github/ISSUE_TEMPLATE	.github/ISSUE_TEMPLATE	Update issue templates	Oct 10, 2020
comet	comet	adding functions for multi-ref eval	Sep 9, 2021
configs	configs	docstring Return: -> :return:	Sep 22, 2020
data	data	da scores mapped to src/ref/hyp corrected dtype	Mar 21, 2021
docs	docs	Apache License 2.0	Oct 4, 2020
evaluation	evaluation	added prism translations data folder	Sep 22, 2021
experiments	experiments	Models reported in paper	Jun 2, 2020
tests	tests	black formatted code	Feb 9, 2021
wmt-shared-task	wmt-shared-task	unused imports	Oct 8, 2020
.gitignore	.gitignore	deprecated cli command	Sep 18, 2020
CONTRIBUTING.md	CONTRIBUTING.md	Add contributing.md	Oct 9, 2020
LICENSE	LICENSE	Apache License 2.0	Oct 4, 2020
MANIFEST.in	MANIFEST.in	MANIFEST.in	Sep 22, 2020
README.md	README.md	minor updates	Sep 15, 2021
requirements.txt	requirements.txt	Update requirements.txt	Apr 29, 2021
setup.py	setup.py	bump version 0.0.7	Feb 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

This repository presents UA-COMET – an extension of the COMET metric implemented by Unbabel.

It contains the code and data to reproduce the experiments in Uncertainty-Aware Machine Translation Evaluation.

Quick Installation

Scoring MT outputs:

Via Bash:

Via Python:

Scoring MT outputs with MCD runs

How to Reproduce and Evaluate Experiments

Model Zoo:

Train your own Metric:

Publications

About

Releases

Packages

Contributors 4

Languages

License

deep-spin/UA_COMET

Folders and files

Latest commit

History

Repository files navigation

This repository presents UA-COMET – an extension of the COMET metric implemented by Unbabel.

It contains the code and data to reproduce the experiments in Uncertainty-Aware Machine Translation Evaluation.

Quick Installation

Scoring MT outputs:

Via Bash:

Via Python:

Scoring MT outputs with MCD runs

How to Reproduce and Evaluate Experiments

Model Zoo:

Train your own Metric:

Publications

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages