Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: fasttext embeddings don't work #3291

Closed
sinaahmadi opened this issue Aug 6, 2023 · 1 comment · Fixed by #3293
Closed

[Bug]: fasttext embeddings don't work #3291

sinaahmadi opened this issue Aug 6, 2023 · 1 comment · Fixed by #3293
Labels
bug Something isn't working

Comments

@sinaahmadi
Copy link

Describe the bug

Thanks for making it possible to use custom embeddings. Using FastText is particularly useful for less-resourced languages that are not supported in your own embeddings yet.

I have an issue working with FastText embeddings (binary files). When using FastText, it is not possible to save the model even though it gets trained without any problem. I get an error saying FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/gj/dvhshvn52d92wrk8r7cqgx9r0000gp/T/tmpglky2zrb/fasttext.model.vectors_vocab.npy.

To make sure that there are not other issues regarding my code, I converted the FastText embeddings to Gensim and could train and save the model successfully.

I am not sure what the problem with FastText is but there seems to be a bug either with saving the embeddings or pointing to the correct directory.

To Reproduce

from flair.data import Corpus
from flair.datasets import ColumnCorpus
from flair.embeddings import WordEmbeddings, StackedEmbeddings, FlairEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
from flair.embeddings import FastTextEmbeddings

columns = {0: 'text', 1: 'pos'}

# this is the folder in which train, test and dev files reside
data_folder = 'datasets'

# init a corpus using column format, data folder and the names of the train, dev and test files
corpus: Corpus = ColumnCorpus(data_folder, columns,
                              train_file='train.txt',
                              test_file='test.txt',
                              dev_file='dev.txt')

len(corpus.train)
print(corpus.train[1].to_tagged_string('pos'))

label_type = 'pos'
label_dict = corpus.make_label_dictionary(label_type=label_type)
print(label_dict)

embeddings_fasttext = FastTextEmbeddings('/Users/sina/Bucket/Embeddings/cc.ckb.300.bin')

model = SequenceTagger(hidden_size=256,
                        embeddings=embeddings_fasttext,
                        tag_dictionary=label_dict,
                        tag_type=label_type)

trainer = ModelTrainer(model, corpus)

trainer.train('models',
              learning_rate=0.1,
              mini_batch_size=32,
              max_epochs=20)

Expected behavior

I expect that to see the model saved after training.

Logs and Stack traces

2023-08-05 21:56:32,546 Evaluating as a multi-label problem: False
2023-08-05 21:56:32,561 DEV : loss 2.1664223670959473 - f1-score (micro avg)  0.1777
2023-08-05 21:56:32,563 BAD EPOCHS (no improvement): 0
2023-08-05 21:56:32,563 saving best model
2023-08-05 21:56:39,026 ----------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/sina/POS/train_pos.py", line 35, in <module>
    trainer.train('models',
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/trainers/trainer.py", line 893, in train
    final_score = self.final_test(
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/trainers/trainer.py", line 1015, in final_test
    self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict())
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/models/sequence_tagger_model.py", line 1035, in load
    return cast("SequenceTagger", super().load(model_path=model_path))
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/nn/model.py", line 559, in load
    return cast("Classifier", super().load(model_path=model_path))
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/nn/model.py", line 198, in load
    model = cls._init_model_with_state_dict(state)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/models/sequence_tagger_model.py", line 617, in _init_model_with_state_dict
    return super()._init_model_with_state_dict(
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/nn/model.py", line 86, in _init_model_with_state_dict
    embeddings = load_embeddings(embeddings)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/embeddings/base.py", line 227, in load_embeddings
    return cls.load_embedding(params)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/embeddings/base.py", line 97, in load_embedding
    embedding = cls.from_params(params)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/embeddings/token.py", line 1085, in from_params
    return cls(**params, embeddings=str(out_path), use_local=True)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/flair/embeddings/token.py", line 1040, in __init__
    self.precomputed_word_embeddings = FastTextKeyedVectors.load(str(embeddings_path))
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/models/fasttext.py", line 1001, in load
    return super(FastTextKeyedVectors, cls).load(fname_or_handle, **kwargs)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/utils.py", line 487, in load
    obj._load_specials(fname, mmap, compress, subname)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/models/fasttext.py", line 1005, in _load_specials
    super(FastTextKeyedVectors, self)._load_specials(*args, **kwargs)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/models/keyedvectors.py", line 263, in _load_specials
    super(KeyedVectors, self)._load_specials(*args, **kwargs)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/gensim/utils.py", line 529, in _load_specials
    val = np.load(subname(fname, attrib), mmap_mode=mmap)
  File "/Users/sina/POS/venv/lib/python3.9/site-packages/numpy/lib/npyio.py", line 427, in load
    fid = stack.enter_context(open(os_fspath(file), "rb"))
FileNotFoundError: [Errno 2] No such file or directory: '/var/folders/gj/dvhshvn52d92wrk8r7cqgx9r0000gp/T/tmpglky2zrb/fasttext.model.vectors_vocab.npy'

Screenshots

No response

Additional Context

No response

Environment

Python 3.9

@helpmefindaname
Copy link
Collaborator

Hi @sinaahmadi
thank you for your report, can you check if #3293 fixes your problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants