Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3 #6987

Closed
fanfanffff1 opened this issue Jan 31, 2025 · 1 comment
Labels
bug Something isn't working training

Comments

@fanfanffff1
Copy link

Describe the bug
Try to print model.parameters() in transfomers trainer(), but get Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True) for all layers
In fact, I am trying to return the correct model.parameters() in DeepSpeed Zero-3 mode and use the EMA model. Could you suggest any ways to solve the above issue, or any other methods to use the EMA model under Zero-3?

System Info
transformers 4.44.2
accelerate 1.2.1
deepspeed 0.12.2
torch 2.2.2
torchaudio 2.2.2
torchvision 0.17.2

Expected behavior
Expect to see the gathered parameters

@fanfanffff1 fanfanffff1 added bug Something isn't working training labels Jan 31, 2025
@tjruwase
Copy link
Contributor

@fanfanffff1, please see the following links for how to access and modify training states in ZeRO-3

  1. https://deepspeed.readthedocs.io/en/latest/zero3.html#debugging
  2. https://deepspeed.readthedocs.io/en/latest/zero3.html#modifying-partitioned-states

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working training
Projects
None yet
Development

No branches or pull requests

2 participants