q

Homebrew small-scale LLM based on GPT-2

I'd like to gain practical experience with transformers, particularly by understanding their architecture and real-world applications, with a focus on small-scale LLMs. To achieve this, I decided to create a tiny LLM. First, I plan to study excellent articles and papers to understand the basic concepts and architecture. Next, I will build and improve my own GPT model. My goal is to integrate it into web applications, games, and iOS apps that interest me.

Currently, I am studying by building a LLM based on OpenAI's GPT-2 model. I used an extremely simple numpy-based model as a baseline and am experimenting with an implementation using mlx.

Install

$ poetry install

Download model parameters

You have to download an OpenAI GPT-2 model parameters before executing q:

$ poetry install --extras download
$ poetry run download --model-size 124M

Available models:

124M
355M
774M
1558M

Run

$ poetry run q "Alan Turing theorized that computers would one day become"
Generating: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 42.19it/s]
Generated 41.35 tokens/sec

Alan Turing theorized that computers would one day become the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

Stream output

You can enable stream output by setting --stream flag:

$ poetry run q --stream "Alan Turing theorized that computers would one day become"
Alan Turing theorized that computers would one day become the most powerful machines on the planet.

The computer is a machine that can perform complex calculations, and it can perform these calculations in a way that is very similar to the human brain.

Generated 37.19 tokens/sec

Evaluation

Model	Hellaswag	MMLU
Q (124M)	28.92%	22.92%
Q (335M)	33.31%	22.90%
GPT-2 (124M)	28.92%	22.92%
Qwen2.5-0.5B	40.59%	47.14%

Hellaswag:

Measure: Accuracy
Shots: 0-shot

MMLU

Measure: Accuracy
Shots: 0-shot

How to evaluate

You can run lm-evaluation-harness.

poetry run python -m q.eval --model q --model_args model_size=355M --tasks hellaswag

and we have our evaluation script.

Benchmark

TPS (Average)

`max_length`	64	128	256
Q (124M)	80.90	80.79	79.05
GPT-2 (124M)	53.96	51.56	54.76
Qwen2.5-0.5B	21.80	22.33	22.24

Peak Memory (Average, MB)

`max_length`	64	128	256
Q (124M)	777.11	779.03	779.89
GPT-2 (124M)	781.96	974.82	1358.00
Qwen2.5-0.5B	1257.32	1292.65	1284.94

Name	Name	Last commit message	Last commit date
Latest commit ishikawa Add PyTorch base train script (#45 ) Apr 7, 2025 4d86e4c · Apr 7, 2025 History 40 Commits
.github/workflows	.github/workflows	Add `max_length` and `max_new_length` option (#29 )	Mar 22, 2025
eval	eval	Add evaluation results (#44 )	Mar 30, 2025
models/124M	models/124M	Migrate Model Parameter Storage from Pickle to Safetensors	Mar 15, 2025
notebooks	notebooks	Add evaluation script (#28 )	Mar 23, 2025
q	q	Add PyTorch base train script (#45 )	Apr 7, 2025
scripts	scripts	Add benchmark script for language model performance metrics (#36 )	Mar 29, 2025
test	test	Update encoder token attributes and usage (#48 )	Apr 1, 2025
.flake8	.flake8	Add benchmark script for language model performance metrics (#36 )	Mar 29, 2025
.gitignore	.gitignore	Add benchmark script for language model performance metrics (#36 )	Mar 29, 2025
.windsurfrules	.windsurfrules	Update GPT2 logits shape and add verification tests (#25 )	Mar 20, 2025
LICENSE	LICENSE	Refine repos (#18 )	Mar 16, 2025
Makefile	Makefile	Add `max_length` and `max_new_length` option (#29 )	Mar 22, 2025
README.md	README.md	Add evaluation results (#44 )	Mar 30, 2025
poetry.lock	poetry.lock	Add PyTorch base train script (#45 )	Apr 7, 2025
poetry.toml	poetry.toml	add picoGPT (#1 )	May 29, 2024
pyproject.toml	pyproject.toml	Add PyTorch base train script (#45 )	Apr 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

q

Install

Download model parameters

Run

Stream output

Evaluation

How to evaluate

Benchmark

TPS (Average)

Peak Memory (Average, MB)

References

About

Releases

Packages

Languages

License

ishikawa/q

Folders and files

Latest commit

History

Repository files navigation

q

Install

Download model parameters

Run

Stream output

Evaluation

How to evaluate

Benchmark

TPS (Average)

Peak Memory (Average, MB)

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages