Skip to content

joeysbase/CS224N

Repository files navigation

CS224N Assignments Code Solution

Model performances report

Dependency parser

LSTM CH2EN translator

  • Training process seems to run into plateau, and training loss stop decreasing at 55 with perplexity around 12
  • Model eventually achieved BLEU of 7
  • More training details are in notebook
  • Trained model's parameters here -> parameters

miniGPT

  • Finetuning on a simple "birth place" question answering task without pretraining achieved 2.19% acc on dev, whereas a single line of code outputing "London" have achieved 5% in comparison
  • Pretraining on collected wiki text incorporateing information regarding famous people before finetuning had achieved 24% acc on dev. Considerable improvement.
  • Pretraining took about 42 mins on T4 GPU
  • After adapating perceiver, training speed improved by 1 sec per epoch
  • Model's parameters can be accessed here -> pretraining parameters
  • More training details can be seen in notebook

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published