You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for making it possible to use custom embeddings. Using FastText is particularly useful for less-resourced languages that are not supported in your own embeddings yet.
I have an issue working with FastText embeddings (binary files). When using FastText, it is not possible to save the model even though it gets trained without any problem. I get an error saying FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/gj/dvhshvn52d92wrk8r7cqgx9r0000gp/T/tmpglky2zrb/fasttext.model.vectors_vocab.npy.
To make sure that there are not other issues regarding my code, I converted the FastText embeddings to Gensim and could train and save the model successfully.
I am not sure what the problem with FastText is but there seems to be a bug either with saving the embeddings or pointing to the correct directory.
To Reproduce
fromflair.dataimportCorpusfromflair.datasetsimportColumnCorpusfromflair.embeddingsimportWordEmbeddings, StackedEmbeddings, FlairEmbeddingsfromflair.modelsimportSequenceTaggerfromflair.trainersimportModelTrainerfromflair.embeddingsimportFastTextEmbeddingscolumns= {0: 'text', 1: 'pos'}
# this is the folder in which train, test and dev files residedata_folder='datasets'# init a corpus using column format, data folder and the names of the train, dev and test filescorpus: Corpus=ColumnCorpus(data_folder, columns,
train_file='train.txt',
test_file='test.txt',
dev_file='dev.txt')
len(corpus.train)
print(corpus.train[1].to_tagged_string('pos'))
label_type='pos'label_dict=corpus.make_label_dictionary(label_type=label_type)
print(label_dict)
embeddings_fasttext=FastTextEmbeddings('/Users/sina/Bucket/Embeddings/cc.ckb.300.bin')
model=SequenceTagger(hidden_size=256,
embeddings=embeddings_fasttext,
tag_dictionary=label_dict,
tag_type=label_type)
trainer=ModelTrainer(model, corpus)
trainer.train('models',
learning_rate=0.1,
mini_batch_size=32,
max_epochs=20)
Expected behavior
I expect that to see the model saved after training.
Logs and Stack traces
2023-08-05 21:56:32,546 Evaluating as a multi-label problem: False
2023-08-05 21:56:32,561 DEV : loss 2.1664223670959473 - f1-score (micro avg) 0.1777
2023-08-05 21:56:32,563 BAD EPOCHS (no improvement): 0
2023-08-05 21:56:32,563 saving best model
2023-08-05 21:56:39,026 ----------------------------------------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/sina/POS/train_pos.py", line 35, in <module>
trainer.train('models',
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/trainers/trainer.py", line 893, in train
final_score = self.final_test(
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/trainers/trainer.py", line 1015, in final_test
self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict())
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/models/sequence_tagger_model.py", line 1035, in load
return cast("SequenceTagger", super().load(model_path=model_path))
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/nn/model.py", line 559, in load
return cast("Classifier", super().load(model_path=model_path))
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/nn/model.py", line 198, in load
model = cls._init_model_with_state_dict(state)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/models/sequence_tagger_model.py", line 617, in _init_model_with_state_dict
return super()._init_model_with_state_dict(
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/nn/model.py", line 86, in _init_model_with_state_dict
embeddings = load_embeddings(embeddings)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/embeddings/base.py", line 227, in load_embeddings
return cls.load_embedding(params)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/embeddings/base.py", line 97, in load_embedding
embedding = cls.from_params(params)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/embeddings/token.py", line 1085, in from_params
return cls(**params, embeddings=str(out_path), use_local=True)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/embeddings/token.py", line 1040, in __init__
self.precomputed_word_embeddings = FastTextKeyedVectors.load(str(embeddings_path))
File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/models/fasttext.py", line 1001, in load
return super(FastTextKeyedVectors, cls).load(fname_or_handle, **kwargs)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/utils.py", line 487, in load
obj._load_specials(fname, mmap, compress, subname)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/models/fasttext.py", line 1005, in _load_specials
super(FastTextKeyedVectors, self)._load_specials(*args, **kwargs)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 263, in _load_specials
super(KeyedVectors, self)._load_specials(*args, **kwargs)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/utils.py", line 529, in _load_specials
val = np.load(subname(fname, attrib), mmap_mode=mmap)
File "/Users/sina/POS/venv/lib/python3.9/site-packages/numpy/lib/npyio.py", line 427, in load
fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/gj/dvhshvn52d92wrk8r7cqgx9r0000gp/T/tmpglky2zrb/fasttext.model.vectors_vocab.npy'
Screenshots
No response
Additional Context
No response
Environment
Python 3.9
The text was updated successfully, but these errors were encountered:
Describe the bug
Thanks for making it possible to use custom embeddings. Using FastText is particularly useful for less-resourced languages that are not supported in your own embeddings yet.
I have an issue working with FastText embeddings (binary files). When using FastText, it is not possible to save the model even though it gets trained without any problem. I get an error saying
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/gj/dvhshvn52d92wrk8r7cqgx9r0000gp/T/tmpglky2zrb/fasttext.model.vectors_vocab.npy
.To make sure that there are not other issues regarding my code, I converted the FastText embeddings to Gensim and could train and save the model successfully.
I am not sure what the problem with FastText is but there seems to be a bug either with saving the embeddings or pointing to the correct directory.
To Reproduce
Expected behavior
I expect that to see the model saved after training.
Logs and Stack traces
Screenshots
No response
Additional Context
No response
Environment
Python 3.9
The text was updated successfully, but these errors were encountered: