Skip to content

Conversation

@jmercat
Copy link
Collaborator

@jmercat jmercat commented Apr 30, 2024

This allows to change the rotary positional embedding frequency parameter.
This is useful given the more recent approaches:
LLaMA 1&2 used 10000 which is the default value here.
LLaMA 3 uses 500000, Mistral uses 1000000
LLaMA long context extension works usually increase from 10000 to 100000.

@jmercat jmercat requested a review from pouransari April 30, 2024 02:35
@jmercat
Copy link
Collaborator Author

jmercat commented Apr 30, 2024

To test my changes I ran training with a small model and put a breakpoint where the parameter is used in rotary.py. I set a different value both in command line or in the json file and it was correctly passed on. Training was successful and tests passed. (except one in grad_accum and one in tokenize but not due to this PR, just access right issues).

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant