Brief analysis of different tokenizers #79
bigwolfeman
started this conversation in
Show and tell
Replies: 3 comments 4 replies
-
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
This is interesting idea, however those 8M tokens could accidentally fit a certain tokenizer while the full 1B might be more suitable for a different optimizer. This would need more research but it's a good start. Thank you for the contribution. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment



Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Starcoder2 and Phi-3 are strong standouts.

This was conducted on a my local setup using the current training script to 8m tokens. Run to run variance I have been seeing is about 1.5s
Beta Was this translation helpful? Give feedback.
All reactions