You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've got key error during selecting sentence for training set. (error message below)
[INFO] 2021-10-20 04:32:44,511 - pipeline - Finished selecting sentences for dev set. INFO:pipeline:Finished selecting sentences for dev set. [INFO] 2021-10-20 04:32:44,512 - pipeline - Starting selecting sentences for training set... INFO:pipeline:Starting selecting sentences for training set... 100%|███████████████████████████████████████████████████████████████████████████████████████████| 145449/145449 [03:46<00:00, 642.38it/s] Traceback (most recent call last): File "src/scripts/athene/pipeline.py", line 196, in <module> sentence_retrieval_ensemble(logger, args.mode) File "src/scripts/athene/pipeline.py", line 138, in sentence_retrieval_ensemble sentence_retrieval_ensemble_entrance(_args) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/ensemble.py", line 265, in entrance random_seed=args.random_seed, reserve_embed=args.reserve_embed) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 33, in __init__ self.data_pipeline() File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 69, in data_pipeline self.test_indexes = self.predict_indexes_loader(test_indexes_path, tests) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 439, in predict_indexes_loader predicts_indexes = self.predict_data_indexes(predict_data, self.iword_dict) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 400, in predict_data_indexes sent_index = self.sent_2_index(sent, word_dict, self.s_max_length) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 376, in sent_2_index word_indexes.append(word_dict[word.lower()]) KeyError: 'wedgwood'
I think the problem comes from word dictionary that is generated from train_sample.p.
Since train_sample.p is generated from negative sampled training dataset, the vocabulary does not include whole words in training data.
I solved this problem by changing data.py from
words_dict_path = os.path.join(self.embedding_path, "words_dict.p")
if os.path.exists(words_dict_path):
with open(words_dict_path, "rb") as f:
self.word_dict = pickle.load(f)
else:
self.word_dict = self.get_complete_words(words_dict_path, X_train, devs, tests)
I've got key error during selecting sentence for training set. (error message below)
[INFO] 2021-10-20 04:32:44,511 - pipeline - Finished selecting sentences for dev set. INFO:pipeline:Finished selecting sentences for dev set. [INFO] 2021-10-20 04:32:44,512 - pipeline - Starting selecting sentences for training set... INFO:pipeline:Starting selecting sentences for training set... 100%|███████████████████████████████████████████████████████████████████████████████████████████| 145449/145449 [03:46<00:00, 642.38it/s] Traceback (most recent call last): File "src/scripts/athene/pipeline.py", line 196, in <module> sentence_retrieval_ensemble(logger, args.mode) File "src/scripts/athene/pipeline.py", line 138, in sentence_retrieval_ensemble sentence_retrieval_ensemble_entrance(_args) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/ensemble.py", line 265, in entrance random_seed=args.random_seed, reserve_embed=args.reserve_embed) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 33, in __init__ self.data_pipeline() File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 69, in data_pipeline self.test_indexes = self.predict_indexes_loader(test_indexes_path, tests) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 439, in predict_indexes_loader predicts_indexes = self.predict_data_indexes(predict_data, self.iword_dict) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 400, in predict_data_indexes sent_index = self.sent_2_index(sent, word_dict, self.s_max_length) File "/mnt/userName/FEVER/FEVER1.0/fever-2018-team-athene/src/athene/retrieval/sentences/data_processing/data.py", line 376, in sent_2_index word_indexes.append(word_dict[word.lower()]) KeyError: 'wedgwood'
I think the problem comes from word dictionary that is generated from train_sample.p.
Since train_sample.p is generated from negative sampled training dataset, the vocabulary does not include whole words in training data.
I solved this problem by changing data.py from
to
to update dictionary every time.
Is my solution looks fine?
The text was updated successfully, but these errors were encountered: