-
Notifications
You must be signed in to change notification settings - Fork 0
CSS‐NLP Team 26
This project explores the generation of newspaper articles using a variety of Natural Language Processing techniques. We trained and evaluated multiple language models, including traditional n-gram models and fine-tuned large language models like LLaMa2-7B and GPT-Neo.
Natural Language Generation (NLG) has evolved rapidly, with advancements in large-scale language models. This wiki documents our journey in building and evaluating models for generating realistic fake articles.
- Fine-Tuning: Custom task adaptation using pre-trained transformer models.
- LoRA: Efficient low-rank adaptation of LLMs for resource-constrained environments.
- QLoRA: Combining LoRA with model quantization for fine-tuning on commodity hardware.
The dataset consists of 10,700 New York Times front-page articles, providing a comprehensive foundation for training and evaluation.
- n-grams: Classical statistical model based on word frequency.
- LLaMa2-7B: Open-source large language model by Meta.
- Falcon-7B: High-quality model trained on RefinedWeb.
- GPT-Neo-1.3B: Open GPT-3-like model by EleutherAI.
- Automated Metrics: BLEU scores for comparing generated text with source text.
- Human Evaluation: Turing test-inspired guessing game for assessing realism.
Model | BLEU Score (%) | Human Accuracy (%) |
---|---|---|
n-grams | 9.89 | 100 |
LLaMa2 | 8.55 | 100 |
Falcon | 7.65 | 100 |
GPT-Neo | 9.62 | 100 |
- Context truncation for long articles.
- Human evaluation biases due to non-double-blind setup.
We demonstrate the potential of fine-tuned LLMs in realistic text generation while highlighting evaluation challenges.
- For more details, see the Model Training page.
- Refer to the Evaluation Details page for comprehensive metrics.
- Visit the GitHub Repository for source code.
- Explore HuggingFace for pretrained models and tools.
Some key findings were based on previous research^[See Radford et al., 2018].
- Set up wiki.
- Add project documentation.
- Perform peer review.
- Publish final version.
- Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Paper Link
- Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, et al. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv Link
- Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv Link
- Leo Gao et al. 2020. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv Link
- Guilherme Penedo et al. 2023. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data. arXiv Link
- Philipp Singer. 2023. While quantization can degrade inference accuracy, it can also act as a cheap way of adding regularization. Twitter Link
- Andrew Thompson. 2019. 10,700 articles from the front page of the Times. Dataset Link
- Hugo Touvron et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv Link
- Various Authors. 2023. Human Evaluation is Gold Standard. arXiv Link