CSS‐NLP Team 26

Welcome to the CSS-NLP Wiki!

Abstract

This project explores the generation of newspaper articles using a variety of Natural Language Processing techniques. We trained and evaluated multiple language models, including traditional n-gram models and fine-tuned large language models like LLaMa2-7B and GPT-Neo.

Introduction

Natural Language Generation (NLG) has evolved rapidly, with advancements in large-scale language models. This wiki documents our journey in building and evaluating models for generating realistic fake articles.

Recent Developments

Fine-Tuning: Custom task adaptation using pre-trained transformer models.
LoRA: Efficient low-rank adaptation of LLMs for resource-constrained environments.
QLoRA: Combining LoRA with model quantization for fine-tuning on commodity hardware.

Data

The dataset consists of 10,700 New York Times front-page articles, providing a comprehensive foundation for training and evaluation.

Models

n-grams: Classical statistical model based on word frequency.
LLaMa2-7B: Open-source large language model by Meta.
Falcon-7B: High-quality model trained on RefinedWeb.
GPT-Neo-1.3B: Open GPT-3-like model by EleutherAI.

Evaluation

Automated Metrics: BLEU scores for comparing generated text with source text.
Human Evaluation: Turing test-inspired guessing game for assessing realism.

Results

Model	BLEU Score (%)	Human Accuracy (%)
n-grams	9.89	100
LLaMa2	8.55	100
Falcon	7.65	100
GPT-Neo	9.62	100

Limitations

Context truncation for long articles.
Human evaluation biases due to non-double-blind setup.

Conclusion

We demonstrate the potential of fine-tuned LLMs in realistic text generation while highlighting evaluation challenges.

Additional Features

Internal Links

For more details, see the Model Training page.
Refer to the Evaluation Details page for comprehensive metrics.

External Links

Visit the GitHub Repository for source code.
Explore HuggingFace for pretrained models and tools.

Footnotes

Some key findings were based on previous research^[See Radford et al., 2018].

Task List

Set up wiki.
Add project documentation.
Perform peer review.
Publish final version.

References

Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. 2018. Improving Language Understanding by Generative Pre-Training. Paper Link
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, et al. 2021. LoRA: Low-Rank Adaptation of Large Language Models. arXiv Link
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2023. QLoRA: Efficient Finetuning of Quantized LLMs. arXiv Link
Leo Gao et al. 2020. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv Link
Guilherme Penedo et al. 2023. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data. arXiv Link
Philipp Singer. 2023. While quantization can degrade inference accuracy, it can also act as a cheap way of adding regularization. Twitter Link
Andrew Thompson. 2019. 10,700 articles from the front page of the Times. Dataset Link
Hugo Touvron et al. 2023. LLaMA: Open and Efficient Foundation Language Models. arXiv Link
Various Authors. 2023. Human Evaluation is Gold Standard. arXiv Link

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CSS‐NLP Team 26

Welcome to the CSS-NLP Wiki!

Abstract

Introduction

Recent Developments

Data

Models

Evaluation

Results

Limitations

Conclusion

Additional Features

Internal Links

External Links

Footnotes

Task List

References

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally