VisionTransformer

This is implementation of a vision transformer , that basically uses the encoder part only of the Transformer . For more clear unerstanding I have added comments on each step where I thought they are necessary . You can play along with the hyperparameters and also you may consider adding a Learning Rate scheduler for keeping it high at the start and then lowering it to prevent Divergence . You can also increase EPOCHS ( I used free GPU from colab so I could not train it to the end just stopped at 75% , but it was visible that the model was still not platued and converging after 20 Epochs )

This is the complete structure of Transformer but we will be using only the encoder block Replacing the FeedForward Network with a MultiLayerPerceptron

These are the computed accurcies around 20 Epochs

These are few visualized examples out of all

Name		Name	Last commit message	Last commit date
Latest commit History 43 Commits
Accuracies.png		Accuracies.png
Losses.png		Losses.png
README.md		README.md
Results.png		Results.png
Transformer.png		Transformer.png
VitModel.ipynb		VitModel.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VisionTransformer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VisionTransformer

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages