How to add Deepspeed Activation Checkpointing to LLM for Fine-Tuning in PyTorch Lightning? #18009
Unanswered
rileyhun
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I'm trying to enable activation checkpointing for a T5-3b model to significantly free up GPU memory. However, based on the PTL docs, it's not quite clear how to do the implementation for an pre-trained LLM from Huggingface.
Here is the minimal code to reproduce:
Here is the full code to reproduce - https://github.com/rileyhun/llm_finetuning_metaflow/blob/main/pytorch-deepspeed/src/model_training.py
The error I'm getting is as follows:
Any guidance would be greatly appreciated!
Beta Was this translation helpful? Give feedback.
All reactions