Skip to content

CSS‐NLP Team 26

chickenwithitsheadcutoff edited this page Nov 10, 2024 · 6 revisions

Welcome to the CSS-NLP Wiki!

Abstract

This project explores the generation of newspaper articles using a variety of Natural Language Processing techniques. We trained and evaluated multiple language models, including traditional n-gram models and fine-tuned large language models like LLaMa2-7B and GPT-Neo.

Introduction

Natural Language Generation (NLG) has evolved rapidly, with advancements in large-scale language models. This wiki documents our journey in building and evaluating models for generating realistic fake articles.

Recent Developments

  • Fine-Tuning: Custom task adaptation using pre-trained transformer models.
  • LoRA: Efficient low-rank adaptation of LLMs for resource-constrained environments.
  • QLoRA: Combining LoRA with model quantization for fine-tuning on commodity hardware.

Data

The dataset consists of 10,700 New York Times front-page articles, providing a comprehensive foundation for training and evaluation.

Models

  • n-grams: Classical statistical model based on word frequency.
  • LLaMa2-7B: Open-source large language model by Meta.
  • Falcon-7B: High-quality model trained on RefinedWeb.
  • GPT-Neo-1.3B: Open GPT-3-like model by EleutherAI.

Evaluation

  • Automated Metrics: BLEU scores for comparing generated text with source text.
  • Human Evaluation: Turing test-inspired guessing game for assessing realism.

Results

Model BLEU Score (%) Human Accuracy (%)
n-grams 9.89 100
LLaMa2 8.55 100
Falcon 7.65 100
GPT-Neo 9.62 100

Limitations

  • Context truncation for long articles.
  • Human evaluation biases due to non-double-blind setup.

Conclusion

We demonstrate the potential of fine-tuned LLMs in realistic text generation while highlighting evaluation challenges.


Additional Features

Internal Links

External Links

Footnotes

Some key findings were based on previous research^[See Radford et al., 2018].

Task List

  • Set up wiki.
  • Add project documentation.
  • Perform peer review.
  • Publish final version.

References

  1. Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Paper Link
  2. Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, et al. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv Link
  3. Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv Link
  4. Leo Gao et al. 2020. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv Link
  5. Guilherme Penedo et al. 2023. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data. arXiv Link
  6. Philipp Singer. 2023. While quantization can degrade inference accuracy, it can also act as a cheap way of adding regularization. Twitter Link
  7. Andrew Thompson. 2019. 10,700 articles from the front page of the Times. Dataset Link
  8. Hugo Touvron et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv Link
  9. Various Authors. 2023. Human Evaluation is Gold Standard. arXiv Link
Clone this wiki locally