[Bug]: Cannot load pre-trained models after fine-tuning (Transformers) #3543

DhruvSondhi · 2024-09-03T04:42:49Z

Describe the bug

Hello,

I was trying to fine tune a mT5 model (google/mT5 series models) on a custom dataset that follows the text format given in your documentation for the column data loader. I have been trying to figure out what is happening but I think there is some problem in the way the model is being loaded/saved. I am sharing my files that have changes done to them (uses the base template of this example).

To Reproduce

run_ner.py (I am trying to reproduce results from this repo: https://github.com/MLlab4CS/Astro-mT5/tree/main)

import inspect
import json
import logging
import os
import sys
from dataclasses import dataclass, field

import torch
from transformers import HfArgumentParser

import flair
from flair import set_seed
from flair.embeddings import TransformerWordEmbeddings
from flair.models import SequenceTagger
from flair.trainers import ModelTrainer
from flair.datasets import ColumnCorpus

logger = logging.getLogger("flair")
logger.setLevel(level="INFO")


@dataclass
class ModelArguments:
    model_name_or_path: str = field(
        metadata={"help": "The model checkpoint for weights initialization."},
    )
    layers: str = field(default="-1", metadata={"help": "Layers to be fine-tuned."})
    subtoken_pooling: str = field(
        default="first",
        metadata={"help": "Subtoken pooling strategy used for fine-tuned."},
    )
    hidden_size: int = field(default=256, metadata={"help": "Hidden size for NER model."})
    use_crf: bool = field(default=False, metadata={"help": "Whether to use a CRF on-top or not."})


@dataclass
class TrainingArguments:
    num_epochs: int = field(default=10, metadata={"help": "The number of training epochs."})
    batch_size: int = field(default=8, metadata={"help": "Batch size used for training."})
    mini_batch_chunk_size: int = field(
        default=1,
        metadata={"help": "If smaller than batch size, batches will be chunked."},
    )
    learning_rate: float = field(default=5e-05, metadata={"help": "Learning rate"})
    seed: int = field(default=42, metadata={"help": "Seed used for reproducible fine-tuning results."})
    device: str = field(default="cuda:0", metadata={"help": "CUDA device string."})
    weight_decay: float = field(default=0.0, metadata={"help": "Weight decay for optimizer."})
    embeddings_storage_mode: str = field(default="none", metadata={"help": "Defines embedding storage method."})


@dataclass
class FlertArguments:
    context_size: int = field(default=0, metadata={"help": "Context size when using FLERT approach."})
    respect_document_boundaries: bool = field(
        default=False,
        metadata={"help": "Whether to respect document boundaries or not when using FLERT."},
    )


@dataclass
class DataArguments:
    dataset_name: str = field(metadata={"help": "Flair NER dataset name."})
    dataset_arguments: str = field(default="", metadata={"help": "Dataset arguments for Flair NER dataset."})
    output_dir: str = field(
        default="resources/taggers/ner",
        metadata={"help": "Defines output directory for final fine-tuned model."},
    )


def get_flair_corpus(data_args):
    ner_task_mapping = {}

    for name, obj in inspect.getmembers(flair.datasets.sequence_labeling):
        if inspect.isclass(obj):
            if name.startswith("NER") or name.startswith("CONLL") or name.startswith("WNUT"):
                ner_task_mapping[name] = obj

    dataset_args = {}
    dataset_name = data_args.dataset_name

    if data_args.dataset_arguments:
        dataset_args = json.loads(data_args.dataset_arguments)

    if dataset_name not in ner_task_mapping:
        raise ValueError(f"Dataset name {dataset_name} is not a valid Flair datasets name!")

    return ner_task_mapping[dataset_name](**dataset_args)


def main():
    parser = HfArgumentParser((ModelArguments, TrainingArguments, FlertArguments, DataArguments))

    if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
        (
            model_args,
            training_args,
            flert_args,
            data_args,
        ) = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
    else:
        (
            model_args,
            training_args,
            flert_args,
            data_args,
        ) = parser.parse_args_into_dataclasses()

    set_seed(training_args.seed)

    flair.device = training_args.device

        columns = {0: 'tokens', 1: 'ner'}
    corpus: Corpus = ColumnCorpus('some_directory/astrobert_models/Model_3(mT5)', columns,
                              train_file='train-80txt',
                              test_file='test-10.txt',
                              dev_file='val-10.txt'
                              )

    logger.info(corpus)

    tag_type: str = "ner"
    tag_dictionary = corpus.make_label_dictionary(tag_type, add_unk=False)
    logger.info(tag_dictionary)

    embeddings = TransformerWordEmbeddings(
        model=model_args.model_name_or_path,
        layers=model_args.layers,
        subtoken_pooling=model_args.subtoken_pooling,
        fine_tune=True,
        allow_long_sentences=True,
        use_context=flert_args.context_size,
        respect_document_boundaries=flert_args.respect_document_boundaries,
    )

    tagger = SequenceTagger(
        hidden_size=model_args.hidden_size,
        embeddings=embeddings,
        tag_dictionary=tag_dictionary,
        tag_type=tag_type,
        use_crf=model_args.use_crf,
        use_rnn=False,
        allow_unk_predictions=True,
        reproject_embeddings=True,
    )

    trainer = ModelTrainer(tagger, corpus)

    trainer.fine_tune(
        data_args.output_dir,
        learning_rate=training_args.learning_rate,
        mini_batch_size=training_args.batch_size,
        mini_batch_chunk_size=training_args.mini_batch_chunk_size,
        max_epochs=training_args.num_epochs,
        embeddings_storage_mode=training_args.embeddings_storage_mode,
        weight_decay=training_args.weight_decay,
        param_selection_mode=False,
        use_final_model_for_eval=False,
        save_final_model=False,
    )

    torch.save(model_args, os.path.join(data_args.output_dir, "model_args.bin"))
    torch.save(training_args, os.path.join(data_args.output_dir, "training_args.bin"))

    # finally, print model card for information
    tagger.print_model_card()


if __name__ == "__main__":
    main()

This uses the google/mT5-large model to fine tune but I am using the google/mT5-base which is similar architecture but less parameters.
Also, this is using the add-t5-encoder-support branch for running the code.

Expected behavior

Expected behaviour is that these parameters:

param_selection_mode=False,
use_final_model_for_eval=False,
save_final_model=False,

should allow me to save the best model and run the tests on this. But I am unable to do so.

Logs and Stack traces

Command to invoke the training (fine tuning)

python3 run_ner.py --dataset_name NER_MASAKHANE --model_name_or_path google/mt5-base --layers -1 --subtoken_pooling first_last --hidden_size 256 --batch_size 4 --learning_rate 5e-05 --num_epochs 5 --use_crf True --output_dir ./content/mt5-large

Stack Trace with the training log:


2024-09-02 22:12:56,024 Reading data from some_directory/astrobert_models/Model_3(mT5)
2024-09-02 22:12:56,024 Train: some_directory/astrobert_models/Model_3(mT5)/train-80.txt
2024-09-02 22:12:56,025 Dev: some_directory/astrobert_models/Model_3(mT5)/val-10.txt
2024-09-02 22:12:56,025 Test: some_directory/astrobert_models/Model_3(mT5)/test-10.txt
2024-09-02 22:13:02,297 Corpus: 2028 train + 226 dev + 251 test sentences
2024-09-02 22:13:02,298 Computing label dictionary. Progress:
2028it [00:00, 22607.38it/s]
2024-09-02 22:13:02,408 Dictionary created for label 'ner' with 31 values: Organization (seen 9269 times), Citation (seen 7050 times), Person (seen 4895 times), Grant (seen 4199 times), Wavelength (seen 3773 times), CelestialObject (seen 3035 times), Formula (seen 2860 times), Model (seen 2531 times), Telescope (seen 1929 times), Location (seen 1817 times), Software (seen 1154 times), Observatory (seen 1036 times), Survey (seen 1034 times), Instrument (seen 912 times), CelestialObjectRegion (seen 619 times), ComputingFacility (seen 496 times), Fellowship (seen 495 times), Dataset (seen 448 times), Collaboration (seen 370 times), EntityOfFutureInterest (seen 347 times)
2024-09-02 22:13:02,408 Dictionary with 31 tags: Organization, Citation, Person, Grant, Wavelength, CelestialObject, Formula, Model, Telescope, Location, Software, Observatory, Survey, Instrument, CelestialObjectRegion, ComputingFacility, Fellowship, Dataset, Collaboration, EntityOfFutureInterest, URL, Archive, Database, TextGarbage, Mission, CelestialRegion, Proposal, Identifier, Tag, ObservationalTechniques, Event
/home/bob2/.local/lib/python3.10/site-packages/transformers/convert_slow_tokenizer.py:560: UserWarning: The sentencepiece tokenizer that you are converting to a fast tokenizer uses the byte fallback option which is not implemented in the fast tokenizers. In practice this means that the fast version of the tokenizer can produce unknown tokens whereas the sentencepiece version would have converted these unknown tokens into a sequence of byte tokens matching the original piece of text.
  warnings.warn(
/home/bob2/.local/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
2024-09-02 22:13:08,487 SequenceTagger predicts: Dictionary with 126 tags: <unk>, O, S-Organization, B-Organization, E-Organization, I-Organization, S-Citation, B-Citation, E-Citation, I-Citation, S-Person, B-Person, E-Person, I-Person, S-Grant, B-Grant, E-Grant, I-Grant, S-Wavelength, B-Wavelength, E-Wavelength, I-Wavelength, S-CelestialObject, B-CelestialObject, E-CelestialObject, I-CelestialObject, S-Formula, B-Formula, E-Formula, I-Formula, S-Model, B-Model, E-Model, I-Model, S-Telescope, B-Telescope, E-Telescope, I-Telescope, S-Location, B-Location, E-Location, I-Location, S-Software, B-Software, E-Software, I-Software, S-Observatory, B-Observatory, E-Observatory, I-Observatory
2024-09-02 22:13:09,364 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,365 Model: "SequenceTagger(
  (embeddings): TransformerWordEmbeddings(
    (model): T5EncoderModel(
      (shared): Embedding(250112, 768)
      (encoder): T5Stack(
        (embed_tokens): Embedding(250112, 768)
        (block): ModuleList(
          (0): T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=768, out_features=768, bias=False)
                  (k): Linear(in_features=768, out_features=768, bias=False)
                  (v): Linear(in_features=768, out_features=768, bias=False)
                  (o): Linear(in_features=768, out_features=768, bias=False)
                  (relative_attention_bias): Embedding(32, 12)
                )
                (layer_norm): T5LayerNorm()
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=768, out_features=2048, bias=False)
                  (wi_1): Linear(in_features=768, out_features=2048, bias=False)
                  (wo): Linear(in_features=2048, out_features=768, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): T5LayerNorm()
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
          (1-11): 11 x T5Block(
            (layer): ModuleList(
              (0): T5LayerSelfAttention(
                (SelfAttention): T5Attention(
                  (q): Linear(in_features=768, out_features=768, bias=False)
                  (k): Linear(in_features=768, out_features=768, bias=False)
                  (v): Linear(in_features=768, out_features=768, bias=False)
                  (o): Linear(in_features=768, out_features=768, bias=False)
                )
                (layer_norm): T5LayerNorm()
                (dropout): Dropout(p=0.1, inplace=False)
              )
              (1): T5LayerFF(
                (DenseReluDense): T5DenseGatedActDense(
                  (wi_0): Linear(in_features=768, out_features=2048, bias=False)
                  (wi_1): Linear(in_features=768, out_features=2048, bias=False)
                  (wo): Linear(in_features=2048, out_features=768, bias=False)
                  (dropout): Dropout(p=0.1, inplace=False)
                  (act): NewGELUActivation()
                )
                (layer_norm): T5LayerNorm()
                (dropout): Dropout(p=0.1, inplace=False)
              )
            )
          )
        )
        (final_layer_norm): T5LayerNorm()
        (dropout): Dropout(p=0.1, inplace=False)
      )
    )
  )
  (word_dropout): WordDropout(p=0.05)
  (locked_dropout): LockedDropout(p=0.5)
  (embedding2nn): Linear(in_features=1536, out_features=1536, bias=True)
  (linear): Linear(in_features=1536, out_features=128, bias=True)
  (loss_function): ViterbiLoss()
  (crf): CRF()
)"
2024-09-02 22:13:09,365 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,365 Corpus: "Corpus: 2028 train + 226 dev + 251 test sentences"
2024-09-02 22:13:09,365 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,365 Parameters:
2024-09-02 22:13:09,365  - learning_rate: "0.000050"
2024-09-02 22:13:09,365  - mini_batch_size: "4"
2024-09-02 22:13:09,365  - patience: "3"
2024-09-02 22:13:09,365  - anneal_factor: "0.5"
2024-09-02 22:13:09,365  - max_epochs: "5"
2024-09-02 22:13:09,365  - shuffle: "True"
2024-09-02 22:13:09,365  - train_with_dev: "False"
2024-09-02 22:13:09,365  - batch_growth_annealing: "False"
2024-09-02 22:13:09,365 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,365 Model training base path: "content/mt5-large"
2024-09-02 22:13:09,365 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,366 Device: cuda:0
2024-09-02 22:13:09,366 ----------------------------------------------------------------------------------------------------
2024-09-02 22:13:09,366 Embeddings storage mode: none
2024-09-02 22:13:09,366 ----------------------------------------------------------------------------------------------------
2024-09-02 22:14:22,599 epoch 1 - iter 50/507 - loss 5.21869016 - samples/sec: 2.73 - lr: 0.000010
2024-09-02 22:15:31,374 epoch 1 - iter 100/507 - loss 4.76969707 - samples/sec: 2.91 - lr: 0.000020
2024-09-02 22:16:44,454 epoch 1 - iter 150/507 - loss 3.84992501 - samples/sec: 2.74 - lr: 0.000030
2024-09-02 22:17:57,165 epoch 1 - iter 200/507 - loss 3.22765532 - samples/sec: 2.75 - lr: 0.000040
2024-09-02 22:19:07,797 epoch 1 - iter 250/507 - loss 2.81055829 - samples/sec: 2.83 - lr: 0.000049
2024-09-02 22:20:24,791 epoch 1 - iter 300/507 - loss 2.47280144 - samples/sec: 2.60 - lr: 0.000049
2024-09-02 22:21:34,641 epoch 1 - iter 350/507 - loss 2.25822920 - samples/sec: 2.86 - lr: 0.000048
2024-09-02 22:22:49,561 epoch 1 - iter 400/507 - loss 2.06685372 - samples/sec: 2.67 - lr: 0.000047
2024-09-02 22:24:04,744 epoch 1 - iter 450/507 - loss 1.91565943 - samples/sec: 2.66 - lr: 0.000046
2024-09-02 22:25:15,756 epoch 1 - iter 500/507 - loss 1.80107189 - samples/sec: 2.82 - lr: 0.000045
2024-09-02 22:25:23,133 ----------------------------------------------------------------------------------------------------
2024-09-02 22:25:23,133 EPOCH 1 done: loss 1.7909 - lr 0.000045
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [00:37<00:00,  1.50it/s]
2024-09-02 22:26:01,071 Evaluating as a multi-label problem: False
2024-09-02 22:26:01,123 DEV : loss 0.40007856488227844 - f1-score (micro avg)  0.4167
2024-09-02 22:26:01,134 BAD EPOCHS (no improvement): 4
2024-09-02 22:26:01,134 saving best model
2024-09-02 22:26:02,344 ----------------------------------------------------------------------------------------------------
2024-09-02 22:27:14,119 epoch 2 - iter 50/507 - loss 0.66224097 - samples/sec: 2.79 - lr: 0.000043
2024-09-02 22:28:26,077 epoch 2 - iter 100/507 - loss 0.66289136 - samples/sec: 2.78 - lr: 0.000042
2024-09-02 22:29:43,508 epoch 2 - iter 150/507 - loss 0.66188128 - samples/sec: 2.58 - lr: 0.000041
2024-09-02 22:30:56,096 epoch 2 - iter 200/507 - loss 0.64561237 - samples/sec: 2.76 - lr: 0.000040
2024-09-02 22:32:07,025 epoch 2 - iter 250/507 - loss 0.63093977 - samples/sec: 2.82 - lr: 0.000039
2024-09-02 22:33:13,665 epoch 2 - iter 300/507 - loss 0.62267017 - samples/sec: 3.00 - lr: 0.000038
2024-09-02 22:34:27,071 epoch 2 - iter 350/507 - loss 0.61492844 - samples/sec: 2.72 - lr: 0.000037
2024-09-02 22:35:41,670 epoch 2 - iter 400/507 - loss 0.60867990 - samples/sec: 2.68 - lr: 0.000036
2024-09-02 22:36:53,006 epoch 2 - iter 450/507 - loss 0.60102799 - samples/sec: 2.80 - lr: 0.000035
2024-09-02 22:38:06,344 epoch 2 - iter 500/507 - loss 0.59238830 - samples/sec: 2.73 - lr: 0.000034
2024-09-02 22:38:15,044 ----------------------------------------------------------------------------------------------------
2024-09-02 22:38:15,045 EPOCH 2 done: loss 0.5919 - lr 0.000034
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:26<00:00,  1.52s/it]
2024-09-02 22:39:41,854 Evaluating as a multi-label problem: False
2024-09-02 22:39:41,895 DEV : loss 0.258797824382782 - f1-score (micro avg)  0.6063
2024-09-02 22:39:41,907 BAD EPOCHS (no improvement): 4
2024-09-02 22:39:41,907 saving best model
2024-09-02 22:39:50,782 ----------------------------------------------------------------------------------------------------
2024-09-02 22:40:52,674 epoch 3 - iter 50/507 - loss 0.53751011 - samples/sec: 3.23 - lr: 0.000032
2024-09-02 22:42:07,553 epoch 3 - iter 100/507 - loss 0.51292905 - samples/sec: 2.67 - lr: 0.000031
2024-09-02 22:43:15,788 epoch 3 - iter 150/507 - loss 0.52074144 - samples/sec: 2.93 - lr: 0.000030
2024-09-02 22:44:29,978 epoch 3 - iter 200/507 - loss 0.50887246 - samples/sec: 2.70 - lr: 0.000029
2024-09-02 22:45:44,776 epoch 3 - iter 250/507 - loss 0.50465450 - samples/sec: 2.67 - lr: 0.000028
2024-09-02 22:46:53,595 epoch 3 - iter 300/507 - loss 0.49652591 - samples/sec: 2.91 - lr: 0.000027
2024-09-02 22:48:03,269 epoch 3 - iter 350/507 - loss 0.49103096 - samples/sec: 2.87 - lr: 0.000026
2024-09-02 22:49:22,787 epoch 3 - iter 400/507 - loss 0.48587132 - samples/sec: 2.52 - lr: 0.000025
2024-09-02 22:50:40,318 epoch 3 - iter 450/507 - loss 0.47988559 - samples/sec: 2.58 - lr: 0.000024
2024-09-02 22:51:53,871 epoch 3 - iter 500/507 - loss 0.47534172 - samples/sec: 2.72 - lr: 0.000022
2024-09-02 22:52:02,896 ----------------------------------------------------------------------------------------------------
2024-09-02 22:52:02,896 EPOCH 3 done: loss 0.4754 - lr 0.000022
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:27<00:00,  1.53s/it]
2024-09-02 22:53:30,026 Evaluating as a multi-label problem: False
2024-09-02 22:53:30,067 DEV : loss 0.22028639912605286 - f1-score (micro avg)  0.6517
2024-09-02 22:53:30,079 BAD EPOCHS (no improvement): 4
2024-09-02 22:53:30,079 saving best model
2024-09-02 22:53:39,030 ----------------------------------------------------------------------------------------------------
2024-09-02 22:54:58,710 epoch 4 - iter 50/507 - loss 0.42972222 - samples/sec: 2.51 - lr: 0.000021
2024-09-02 22:56:09,934 epoch 4 - iter 100/507 - loss 0.42529253 - samples/sec: 2.81 - lr: 0.000020
2024-09-02 22:57:18,254 epoch 4 - iter 150/507 - loss 0.41949796 - samples/sec: 2.93 - lr: 0.000019
2024-09-02 22:58:35,158 epoch 4 - iter 200/507 - loss 0.41590241 - samples/sec: 2.60 - lr: 0.000018
2024-09-02 22:59:42,396 epoch 4 - iter 250/507 - loss 0.42134116 - samples/sec: 2.97 - lr: 0.000017
2024-09-02 23:00:51,994 epoch 4 - iter 300/507 - loss 0.42124508 - samples/sec: 2.87 - lr: 0.000016
2024-09-02 23:02:06,538 epoch 4 - iter 350/507 - loss 0.41991969 - samples/sec: 2.68 - lr: 0.000015
2024-09-02 23:03:16,007 epoch 4 - iter 400/507 - loss 0.41864415 - samples/sec: 2.88 - lr: 0.000014
2024-09-02 23:04:30,849 epoch 4 - iter 450/507 - loss 0.41877229 - samples/sec: 2.67 - lr: 0.000012
2024-09-02 23:05:43,238 epoch 4 - iter 500/507 - loss 0.41600581 - samples/sec: 2.76 - lr: 0.000011
2024-09-02 23:05:52,670 ----------------------------------------------------------------------------------------------------
2024-09-02 23:05:52,670 EPOCH 4 done: loss 0.4157 - lr 0.000011
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:27<00:00,  1.53s/it]
2024-09-02 23:07:20,127 Evaluating as a multi-label problem: False
2024-09-02 23:07:20,169 DEV : loss 0.20156854391098022 - f1-score (micro avg)  0.6764
2024-09-02 23:07:20,181 BAD EPOCHS (no improvement): 4
2024-09-02 23:07:20,181 saving best model
2024-09-02 23:07:29,094 ----------------------------------------------------------------------------------------------------
2024-09-02 23:08:41,206 epoch 5 - iter 50/507 - loss 0.41014725 - samples/sec: 2.77 - lr: 0.000010
2024-09-02 23:09:55,703 epoch 5 - iter 100/507 - loss 0.40355902 - samples/sec: 2.68 - lr: 0.000009
2024-09-02 23:11:06,169 epoch 5 - iter 150/507 - loss 0.40052907 - samples/sec: 2.84 - lr: 0.000008
2024-09-02 23:12:16,356 epoch 5 - iter 200/507 - loss 0.40273058 - samples/sec: 2.85 - lr: 0.000007
2024-09-02 23:13:28,812 epoch 5 - iter 250/507 - loss 0.39995092 - samples/sec: 2.76 - lr: 0.000006
2024-09-02 23:14:41,129 epoch 5 - iter 300/507 - loss 0.39412877 - samples/sec: 2.77 - lr: 0.000005
2024-09-02 23:15:54,505 epoch 5 - iter 350/507 - loss 0.39045605 - samples/sec: 2.73 - lr: 0.000004
2024-09-02 23:17:07,290 epoch 5 - iter 400/507 - loss 0.39085101 - samples/sec: 2.75 - lr: 0.000002
2024-09-02 23:18:20,001 epoch 5 - iter 450/507 - loss 0.38970339 - samples/sec: 2.75 - lr: 0.000001
2024-09-02 23:19:30,506 epoch 5 - iter 500/507 - loss 0.38807320 - samples/sec: 2.84 - lr: 0.000000
2024-09-02 23:19:42,705 ----------------------------------------------------------------------------------------------------
2024-09-02 23:19:42,705 EPOCH 5 done: loss 0.3880 - lr 0.000000
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 57/57 [01:27<00:00,  1.53s/it]
2024-09-02 23:21:09,993 Evaluating as a multi-label problem: False
2024-09-02 23:21:10,034 DEV : loss 0.19652396440505981 - f1-score (micro avg)  0.6858
2024-09-02 23:21:10,046 BAD EPOCHS (no improvement): 4
2024-09-02 23:21:10,046 saving best model
2024-09-02 23:21:20,453 ----------------------------------------------------------------------------------------------------
2024-09-02 23:21:20,454 loading file content/mt5-large/best-model.pt
Traceback (most recent call last):
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/run_ner.py", line 382, in <module>
    main()
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/run_ner.py", line 363, in main
    trainer.fine_tune(
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/trainers/trainer.py", line 919, in fine_tune
    return self.train(
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/trainers/trainer.py", line 836, in train
    final_score = self.final_test(
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/trainers/trainer.py", line 949, in final_test
    self.model.load_state_dict(self.model.load(base_path / "best-model.pt").state_dict())
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/nn/model.py", line 142, in load
    state = torch.load(f, map_location="cpu")
  File "/home/bob2/.local/lib/python3.10/site-packages/torch/serialization.py", line 1025, in load
    return _load(opened_zipfile,
  File "/home/bob2/.local/lib/python3.10/site-packages/torch/serialization.py", line 1446, in _load
    result = unpickler.load()
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/transformer.py", line 1004, in __setstate__
    embedding = self.create_from_state(saved_config=config, **state)
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/token.py", line 62, in create_from_state
    return cls(**state)
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/token.py", line 49, in __init__
    TransformerEmbeddings.__init__(
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/transformer.py", line 810, in __init__
    self.tokenizer = self._tokenizer_from_bytes(tokenizer_data)
  File "/media/bob2/d8c6a01c-a6c1-4ad3-a8d5-a740f2fa4a7a/home/bob2/_dhruv/astrobert_models/Model_3(mT5)/flair/flair/embeddings/transformer.py", line 335, in _tokenizer_from_bytes
    return AutoTokenizer.from_pretrained(temp_dir, add_prefix_space=True)
  File "/home/bob2/.local/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 880, in from_pretrained
    return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
  File "/home/bob2/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2110, in from_pretrained
    return cls._from_pretrained(
  File "/home/bob2/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2336, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/home/bob2/.local/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5_fast.py", line 120, in __init__
    super().__init__(
  File "/home/bob2/.local/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py", line 124, in __init__
    slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs)
  File "/home/bob2/.local/lib/python3.10/site-packages/transformers/models/t5/tokenization_t5.py", line 151, in __init__
    self.sp_model.Load(vocab_file)
  File "/home/bob2/.local/lib/python3.10/site-packages/sentencepiece/__init__.py", line 367, in Load
    return self.LoadFromFile(model_file)
  File "/home/bob2/.local/lib/python3.10/site-packages/sentencepiece/__init__.py", line 171, in LoadFromFile
    return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
TypeError: not a string

Screenshots

No response

Additional Context

Please let me know if you need any more context or maybe a very small dataset to reproduce the results for this output. Thanks in advance for any assistance.

Environment

Versions:

Flair

0.13.1

Pytorch

2.3.1+cu121

Transformers

4.41.2

GPU

True

The text was updated successfully, but these errors were encountered:

helpmefindaname · 2024-09-06T11:34:27Z

Can comfirm,

minimal reproducible example is:

from flair.embeddings import TransformerWordEmbeddings
from flair.models import SequenceTagger
from flair.data import Dictionary

file_name = "mt5.pt"

embeddings = TransformerWordEmbeddings("google/mt5-small")
tagger = SequenceTagger(embeddings, Dictionary(), "ner")
tagger.save(file_name)
reload_tagger = SequenceTagger.load(file_name)

DhruvSondhi added the bug Something isn't working label Sep 3, 2024

helpmefindaname self-assigned this Sep 6, 2024

helpmefindaname linked a pull request Sep 6, 2024 that will close this issue

fix T5 tokenizer loading #3544

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Cannot load pre-trained models after fine-tuning (Transformers) #3543

[Bug]: Cannot load pre-trained models after fine-tuning (Transformers) #3543

DhruvSondhi commented Sep 3, 2024 •

edited

Loading

helpmefindaname commented Sep 6, 2024

[Bug]: Cannot load pre-trained models after fine-tuning (Transformers) #3543

[Bug]: Cannot load pre-trained models after fine-tuning (Transformers) #3543

Comments

DhruvSondhi commented Sep 3, 2024 • edited Loading

Describe the bug

To Reproduce

Expected behavior

Logs and Stack traces

Screenshots

Additional Context

Environment

Versions:

Flair

Pytorch

Transformers

GPU

helpmefindaname commented Sep 6, 2024

DhruvSondhi commented Sep 3, 2024 •

edited

Loading