How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning? #18009

rileyhun · 2023-07-07T03:48:30Z

rileyhun
Jul 7, 2023

I'm trying to enable activation checkpointing for a T5-3b model to significantly free up GPU memory. However, based on the PTL docs, it's not quite clear how to do the implementation for an pre-trained LLM from Huggingface.

Here is the minimal code to reproduce:

class T5FineTuner(pl.LightningModule):
    """PyTorch Lightning T5 Model class"""

    def __init__(self, hparams, tokenizer, model):
        """initiates a PyTorch Lightning T5 Model"""
        super().__init__()
        self.hparams.update(vars(hparams))
        self.save_hyperparameters(self.hparams)

        self.model = model
        self.tokenizer = tokenizer
        self.outputdir = self.hparams.output_dir
        self.average_training_loss = None
        self.average_validation_loss = None
        self.save_only_last_epoch = self.hparams.save_only_last_epoch
    
    def forward(self, input_ids, attention_mask, decoder_attention_mask, labels=None):
                
        return deepspeed.checkpointing.checkpoint(self._forward, input_ids, attention_mask, decoder_attention_mask, labels)
    
    def _forward(self, input_ids, attention_mask, decoder_attention_mask, labels=None):
                
        output = self.model(
            input_ids,
            attention_mask=attention_mask,
            labels=labels,
            decoder_attention_mask=decoder_attention_mask,
        )

        return output.loss, output.logits

    def training_step(self, batch, batch_size):
        """training step"""
        input_ids = batch["source_text_input_ids"]
        attention_mask = batch["source_text_attention_mask"]
        labels = batch["labels"]
        labels_attention_mask = batch["labels_attention_mask"]

        loss, outputs = self(
            input_ids=input_ids,
            attention_mask=attention_mask,
            decoder_attention_mask=labels_attention_mask,
            labels=labels,
        )

        self.log(
            "train_loss",
            loss,
            prog_bar=True,
            logger=True,
            on_epoch=True,
            on_step=True,
            sync_dist=True,
        )
        return loss

Here is the full code to reproduce - https://github.com/rileyhun/llm_finetuning_metaflow/blob/main/pytorch-deepspeed/src/model_training.py

The error I'm getting is as follows:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Any guidance would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning? #18009

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning? #18009

rileyhun Jul 7, 2023

Replies: 0 comments

rileyhun
Jul 7, 2023