ECE590LLM_pj

Used for ECE590 project

Initial idea

Project plan: Build softmax matrix and the approximated matrix using two ideas, prove they are similar Idea 1: Gaussian kernel approximation - this one is supposed to be unstable due to negative values Idea 2: Positive random features - this one is supposed to be more stable Idea 3 (if we have time): orthogonal random features - supposed to be even better Vary the number of random feature sampled and compare efficiency & accuracy Put the attention approximation in an actual model and compare performance Maybe try different kinds of models?

Experiments

Based on jax implementation, experiment 1 will first compare the approximated matrix with the groud truth matrix. How the loss will change vs. different kernel settings

Experiment 2: Evaluate on real dataset, e.g. /google-research/protein_lm/

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
jax		jax
nlp_seq		nlp_seq
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ECE590LLM_pj

Initial idea

Experiments

About

Releases

Packages

Contributors 3

Languages

glzhou97/ECE590LLM_pj

Folders and files

Latest commit

History

Repository files navigation

ECE590LLM_pj

Initial idea

Experiments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages