Summarization is the task of producing a shorter version of one or several documents that preserves most of the input's meaning.
For summarization, automatic metrics such as ROUGE and METEOR have serious limitations:
- They only assess content selection and do not account for other quality aspects, such as fluency, grammaticality, coherence, etc.
- To assess content selection, they rely mostly on the lexical overlap, although an abstractive summary could express the same content as a reference without any lexical overlap.
- Given the subjectiveness of summarization and the correspondingly low agreement between annotators, the metrics were designed to be used with multiple reference summaries per input. However, recent datasets such as
pn_summary
provide only a single reference.
Therefore, tracking progress and claiming state-of-the-art based only on these metrics is questionable. Most papers carry out additional manual comparisons of alternative summaries. Unfortunately, such experiments are difficult to compare across papers. If you have an idea on how to do that, feel free to contribute.
There are a few resources for the abstractive/extractive tasks in Persian, while some are not available online, or there are no curators for them. While surfing the academic papers, you might see some of them like Pasokh. Of course, thanks to some researchers' efforts in this field, a dataset called Persian News Summarization (known as pn_summary
) has been prepared for both Persian summarization tasks and made available online.
The Persian News Summary (known as pn_summary) is a well-structured summarization dataset for the Persian language that consists of 93,207 online news articles (from 200,000 crawled news) from 6 different news agencies in 18 different news categories from economy to tourism. Each document (article) includes the long original text as well as a human-generated summary. Models are evaluated with full-length F1-scores of ROUGE-1, ROUGE-2, ROUGE-L, and METEOR (optional).
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR | Paper / Source | Code |
---|---|---|---|---|---|---|
BERT2BERT (ParsBERT) + mT5 (Farahani et al., 2020) | 44.01 | 25.07 | 37.76 | - | Leveraging ParsBERT and Pretrained mT5 for Persian Abstractive Text Summarization | Official |
Pasokh is a summarization dataset covering 6 news categories from 7 news agencies in two forms: Single-Document (SD) and Multi-Document (MD) with 100, 1000 records. Each document covers 5 samples for extractive and abstractive example.
Model | ROUGE-1 | ROUGE-2 | ROUGE-L | METEOR | Paper / Source | Code |
---|---|---|---|---|---|---|
Based on NER (SD) (Khademi, Fakhredanesh, 2020) | 47.20 | 33.40 | - | - | Persian Automatic Text Summarization Based on Named Entity Recognition | - |
Based on NER (SD) (Khademi et al., 2020) | 45.40 | 30.10 | - | - | Conceptual Text Summarizer: A new model in continuous vector space | - |
Feature Extraction (SD) (Rezaei et al., 2019) | 78.00 | 71.00 | 74.00 | - | Features in Extractive Supervised Single-document Summarization: Case of Persian News | Official |
Multi-Feature Extraction (SD) (Kermani, Ghanbari, 2019) | 48.70 | 42.60 | - | - | Extractive Persian Summarizer for News Websites | - |