When reproducing the results, it is possible to reproduce the NLG results but not the CE metric results. #9

mengweiwang · 2023-04-01T14:02:42Z

I used the data format (only for the findings section) as R2Gen and R2GenCMN (Chen et al.) followed in this article, but I was unable to obtain the CE metric results mentioned in the paper.

I used the provided epoch=8-val_chen_cider=0.425092.ckpt model for cvt_21_to_distilgpt2 task and also tested epoch=0-val_chen_cider=0.410965.ckpt model for cvt_21_to_distilgpt2_scst task, but neither of them achieved the CE metric results mentioned in the paper.

In terms of CE metric, precision_macro can reach the result mentioned in the paper, but recall_macro and f1_macro cannot achieve it and there is a significant difference between them.

When calculating CE metrics here, only text related to findings is considered; do I need to perform any other processing?

The results obtained from performing cvt_21_to_distilgpt2 task are as follows:
{'test_ce_f1_example': 0.36598095297813416,
'test_ce_f1_macro': 0.2593880891799927,
'test_ce_f1_micro': 0.4408090114593506,
'test_ce_num_examples': 3858.0,
'test_ce_precision_example': 0.4171517491340637,
'test_ce_precision_macro': 0.3600466549396515,
'test_ce_precision_micro': 0.4919118881225586,
'test_ce_recall_example': 0.3665845990180969,
'test_ce_recall_macro': 0.25423887372016907,
'test_ce_recall_micro': 0.3993246555328369,
'test_chen_bleu_1': 0.39292487502098083,
'test_chen_bleu_2': 0.24805393815040588,
'test_chen_bleu_3': 0.17164887487888336,
'test_chen_bleu_4': 0.1269991397857666,
'test_chen_cider': 0.3902686834335327,
'test_chen_meteor': 0.15456412732601166,
'test_chen_num_examples': 3858.0,
'test_chen_rouge': 0.286588191986084}

The results in the paper are as follows:
precision_macro: 0.3597
recall_macro: 0.4122
f1_macro: 0.3842

The results obtained from performing cvt_21_to_distilgpt2_scst task are as follows:
{'test_ce_f1_example': 0.36484676599502563,
'test_ce_f1_macro': 0.26361414790153503,
'test_ce_f1_micro': 0.4410783648490906,
'test_ce_num_examples': 3858.0,
'test_ce_precision_example': 0.4175392985343933,
'test_ce_precision_macro': 0.3873042166233063,
'test_ce_precision_micro': 0.49624764919281006,
'test_ce_recall_example': 0.3643813729286194,
'test_ce_recall_macro': 0.2558453679084778,
'test_ce_recall_micro': 0.3969484865665436,
'test_chen_bleu_1': 0.39466917514801025,
'test_chen_bleu_2': 0.248764768242836,
'test_chen_bleu_3': 0.1718045324087143,
'test_chen_bleu_4': 0.1269892156124115,
'test_chen_cider': 0.37993040680885315,
'test_chen_meteor': 0.15499255061149597,
'test_chen_num_examples': 3858.0,
'test_chen_rouge': 0.28760746121406555}

Reproduced the above content, only modifying the task parameters in task/mimic_cxr_jpg_chen/jobs.yaml.

The text was updated successfully, but these errors were encountered:

anicolson · 2023-04-03T00:29:17Z

Hi,

Please see the updated README.md for the labels from Chen et al.
https://github.com/aehrc/cvt2distilgpt2

I will look into the discrepency with the results.

mengweiwang · 2023-04-06T09:19:38Z

Hi,

Please see the updated README.md for the labels from Chen et al. aehrc/cvt2distilgpt2

I will look into the discrepency with the results.

@anicolson

Yes, I am using this dataset and the precision has reached the level reported in the paper. However, the recall rate is low and cannot reach the level reported in the paper.

Also, I have checked and tested the updated source code. The CE metric results did not change much, and there is a bug in the latest source code when running it. The bug is as follows:

The bug that occurred while I was executing the cvt_21_to_distilgpt2 task.

On line 281 of transmodal.model.py, the content is if not getattr(self, metric).compute_on_step:.

It indicates that the compute_on_step attribute does not exist.

anicolson · 2023-05-16T03:34:47Z

Hi, there are some errors in the preprint, the correct results are reported in the updated repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When reproducing the results, it is possible to reproduce the NLG results but not the CE metric results. #9

When reproducing the results, it is possible to reproduce the NLG results but not the CE metric results. #9

mengweiwang commented Apr 1, 2023

anicolson commented Apr 3, 2023

mengweiwang commented Apr 6, 2023

anicolson commented May 16, 2023

When reproducing the results, it is possible to reproduce the NLG results but not the CE metric results. #9

When reproducing the results, it is possible to reproduce the NLG results but not the CE metric results. #9

Comments

mengweiwang commented Apr 1, 2023

anicolson commented Apr 3, 2023

mengweiwang commented Apr 6, 2023

anicolson commented May 16, 2023