-
Notifications
You must be signed in to change notification settings - Fork 3
Open
Description
Hi,
Thanks a lot for sharing the code with us, interesting work!
I have a question regarding tokenization for GPT-2.
I've seen that you add an EOS token at the end of every sentence in each text example. Here:
def add_eos_tokens(self, text):
eos_token = " " + self.transformer_tokenizer.eos_token + " "
sentences = self.sentence_detector.tokenize(text)
eos_added_text = (
eos_token.join(sentences) + " " + self.transformer_tokenizer.eos_token
)
return eos_added_text
Why do you do this? Wouldn't one at the end of the whole text be sufficient?
Thanks a lot for your input :)
Metadata
Metadata
Assignees
Labels
No labels