Skip to content

Latest commit

 

History

History
39 lines (30 loc) · 915 Bytes

File metadata and controls

39 lines (30 loc) · 915 Bytes

Pocket LLM

Low scale LLM for autocompletion on open corpus database (currently Tiny Shakespeare, 1.1M tokens)
Change max_length in the generate function to decide how many tokens to generate (context length will stay block_length regardless), toggle nb_iter if you want to train
Character level tokenizer:
Training Loss: 1.33
Validation Loss: 1.55
Byte-Pair Encoding tokenizer:
Training Loss: 1.56
Validation Loss: 3.34

To-DO

  • Add device support (cuda, load move all params to cuda)

  • Add weight decay

  • Add Multi-Latent Attention

  • Gradient Accumulation

  • Gradient clipping

  • RoPE


Resources

Andrej Karparthy DeepSeek-V3 Technical Report DeepSeek-V3 Github (and many many youtube videos & medium articles)