Generative Models of Text

Overview

This project explores generative text models, focusing on Recurrent Neural Networks (RNNs) and Transformer-based language models. It involves implementing and experimenting with architectures such as bidirectional RNNs, Transformers, Sliding Window Attention, Rotary Positional Embeddings (RoPE), and Grouped Query Attention (GQA). The goal is to understand how different generative techniques impact language modeling and computational efficiency.

Project Structure

├── requirements.txt
├── input.txt
├── chargpt.py
├── mingpt/
│   ├── model.py
│   ├── trainer.py
│   ├── utils.py
├── test_model.py
├── README.md

Getting Started

Installation

To set up the environment locally, follow these steps:

Install Python dependencies:
```
pip install torch einops
```
Run the main training script:
```
python chargpt.py
```

Recurrent Neural Network (RNN) Language Models

Model Implementation

The RNN language model uses an Elman Network with hidden states updated as follows:

ht = slide(Whh * ht-1 + Whx * xt + bh)
yt = slide(Wyh * ht + by)

where slide(a) = min(1, max(0, a)) ensures values stay within a fixed range. The model processes sequential text data, capturing dependencies over time.

Experiments

Implemented a simple RNN-based language model.
Explored bidirectional RNNs and their inability to serve as autoregressive models.
Compared how different architectures handle sequential context.

Transformer Language Models

Model Implementation

The Transformer model is based on scaled dot-product attention:

st,j = kT * qt / sqrt(|k|)
at = softmax(st)

where queries, keys, and values are computed as:

vj = Wv * xj, qj = Wq * xj, kj = Wk * xj

We also analyzed alternative attention mechanisms, such as multiplicative and additive attention.

Experiments

Implemented scaled dot-product attention.
Explored multiplicative attention and its impact on model expressiveness.
Analyzed self-attention properties, including conditions for symmetry.

Sliding Window Attention

Sliding Window Attention improves efficiency by restricting context to a fixed window size w, rather than attending to the entire sequence.

Implementation

Defined causal masks for attention computation.
Optimized time complexity from O(N^2) to O(Nw).
Reduced space complexity from O(N^2) to O(N + w).

Experiments

Implemented optimized Sliding Window Attention.
Evaluated computational efficiency against naive matrix multiplication.

Rotary Position Embeddings (RoPE)

RoPE encodes relative positional information directly into the attention mechanism, replacing absolute position embeddings.

Implementation

Implemented RoPE in the RotaryPositionalEmbeddings class.
Modified the CausalSelfAttention class to integrate RoPE embeddings.

Experiments

Compared text samples generated with and without RoPE.
Evaluated training loss across different training schedules.

Grouped Query Attention (GQA)

GQA reduces memory usage by sharing key-value pairs across query groups, balancing efficiency and performance.

Implementation

Implemented GroupedQueryAttention in model.py.
Modified the attention mechanism to support grouped query heads.

Experiments

Measured attention computation time across different numbers of key heads.
Compared training loss between standard multi-head attention and GQA.

Training and Experimentation

Implemented and tested various attention mechanisms.
Trained models using Shakespeare’s works as a dataset.
Logged results with Weights & Biases (wandb) for analysis.

How to Run the Code

Train the language model:

python chargpt.py --trainer.max_iters=600 --model.rope=True

Run unit tests for verification:
```
python test_model.py
```
View experiment logs with Weights & Biases.

Key Learnings

RNNs struggle with long-term dependencies; Transformers improve contextual modeling.
RoPE enhances positional encoding in attention layers.
Sliding Window Attention and GQA improve efficiency without major performance losses.

Acknowledgments

This project is part of 10-623 Generative AI at Carnegie Mellon University, with datasets and starter code provided by the course instructors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generative Models of Text

Overview

Project Structure

Getting Started

Installation

Recurrent Neural Network (RNN) Language Models

Model Implementation

Experiments

Transformer Language Models

Model Implementation

Experiments

Sliding Window Attention

Implementation

Experiments

Rotary Position Embeddings (RoPE)

Implementation

Experiments

Grouped Query Attention (GQA)

Implementation

Experiments

Training and Experimentation

How to Run the Code

Key Learnings

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
mingpt		mingpt
LICENSE		LICENSE
README.md		README.md
chargpt.py		chargpt.py
input.txt		input.txt
requirements.txt		requirements.txt
test_model.py		test_model.py

License

JavierAM01/Generative-Models-of-Text

Folders and files

Latest commit

History

Repository files navigation

Generative Models of Text

Overview

Project Structure

Getting Started

Installation

Recurrent Neural Network (RNN) Language Models

Model Implementation

Experiments

Transformer Language Models

Model Implementation

Experiments

Sliding Window Attention

Implementation

Experiments

Rotary Position Embeddings (RoPE)

Implementation

Experiments

Grouped Query Attention (GQA)

Implementation

Experiments

Training and Experimentation

How to Run the Code

Key Learnings

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages