Pocket LLM

Low scale LLM for autocompletion on open corpus database (currently Tiny Shakespeare, 1.1M tokens)
Change max_length in the generate function to decide how many tokens to generate (context length will stay block_length regardless), toggle nb_iter if you want to train
Character level tokenizer:
Training Loss: 1.33
Validation Loss: 1.55
Byte-Pair Encoding tokenizer:
Training Loss: 1.56
Validation Loss: 3.34

To-DO

Add device support (cuda, load move all params to cuda)
Add weight decay
Add Multi-Latent Attention
Gradient Accumulation
Gradient clipping
RoPE

Resources

Andrej Karparthy DeepSeek-V3 Technical Report DeepSeek-V3 Github (and many many youtube videos & medium articles)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pocket LLM

To-DO

Resources

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Pocket LLM

To-DO

Resources