Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A question about eos_token_id #25

Open
Harry-Miral opened this issue Feb 28, 2025 · 3 comments
Open

A question about eos_token_id #25

Harry-Miral opened this issue Feb 28, 2025 · 3 comments

Comments

@Harry-Miral
Copy link

Harry-Miral commented Feb 28, 2025

Hi,OLMoE Team
Thank you for your work, so that more people can join this work.
In the readme I saw that tokenizer.eos_token_id is set to 50279. But in the .yml configuration file I found that eos_token_id is 0. How should I understand this difference? Maybe it's because of the vocab_size of 50280?Thank you!

Image

Image

@Muennighoff
Copy link
Collaborator

I'm not super sure but I think that in practice it makes no difference but maybe @soldni could chime in here?

For reference here is the tokenizer vocab https://huggingface.co/allenai/OLMoE-1B-7B-0125-Instruct/raw/main/tokenizer.json

@Harry-Miral
Copy link
Author

Harry-Miral commented Mar 4, 2025

I think I found the problem. The configuration file needs to have different settings for different versions, but for version 0924, the yml configuration file needs to be modified.
The yml file of version 0924 should be set to eos_token_id: 50279
This is the answer I left in another location.
https://github.com/allenai/OLMo/issues/757#issuecomment-2695972834)

@soldni
Copy link
Member

soldni commented Mar 4, 2025

hey @Harry-Miral, I think you are right!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants