Skip to content

ishikawa/q

Folders and files

NameName
Last commit message
Last commit date

Latest commit

4d86e4c · Apr 7, 2025

History

40 Commits
Mar 22, 2025
Mar 30, 2025
Mar 15, 2025
Mar 23, 2025
Apr 7, 2025
Mar 29, 2025
Apr 1, 2025
Mar 29, 2025
Mar 29, 2025
Mar 20, 2025
Mar 16, 2025
Mar 22, 2025
Mar 30, 2025
Apr 7, 2025
May 29, 2024
Apr 7, 2025

Repository files navigation

q

workflow

Homebrew small-scale LLM based on GPT-2

I'd like to gain practical experience with transformers, particularly by understanding their architecture and real-world applications, with a focus on small-scale LLMs. To achieve this, I decided to create a tiny LLM. First, I plan to study excellent articles and papers to understand the basic concepts and architecture. Next, I will build and improve my own GPT model. My goal is to integrate it into web applications, games, and iOS apps that interest me.

Currently, I am studying by building a LLM based on OpenAI's GPT-2 model. I used an extremely simple numpy-based model as a baseline and am experimenting with an implementation using mlx.

Install

$ poetry install

Download model parameters

You have to download an OpenAI GPT-2 model parameters before executing q:

$ poetry install --extras download
$ poetry run download --model-size 124M

Available models:

  • 124M
  • 355M
  • 774M
  • 1558M

Run

$ poetry run q "Alan Turing theorized that computers would one day become"
Generating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 42.19it/s]
Generated 41.35 tokens/sec

Alan Turing theorized that computers would one day become the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

Stream output

You can enable stream output by setting --stream flag:

$ poetry run q --stream "Alan Turing theorized that computers would one day become"
Alan Turing theorized that computers would one day become the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

Generated 37.19 tokens/sec

Evaluation

Model Hellaswag MMLU
Q (124M) 28.92% 22.92%
Q (335M) 33.31% 22.90%
GPT-2 (124M) 28.92% 22.92%
Qwen2.5-0.5B 40.59% 47.14%

Hellaswag:

  • Measure: Accuracy
  • Shots: 0-shot

MMLU

  • Measure: Accuracy
  • Shots: 0-shot

How to evaluate

You can run lm-evaluation-harness.

poetry run python -m q.eval --model q --model_args model_size=355M --tasks hellaswag

and we have our evaluation script.

Benchmark

TPS (Average)

max_length 64 128 256
Q (124M) 80.90 80.79 79.05
GPT-2 (124M) 53.96 51.56 54.76
Qwen2.5-0.5B 21.80 22.33 22.24

Peak Memory (Average, MB)

max_length 64 128 256
Q (124M) 777.11 779.03 779.89
GPT-2 (124M) 781.96 974.82 1358.00
Qwen2.5-0.5B 1257.32 1292.65 1284.94

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published