model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3 #6987

fanfanffff1 · 2025-01-31T16:48:27Z

Describe the bug
Try to print model.parameters() in transfomers trainer(), but get Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True) for all layers
In fact, I am trying to return the correct model.parameters() in DeepSpeed Zero-3 mode and use the EMA model. Could you suggest any ways to solve the above issue, or any other methods to use the EMA model under Zero-3?

System Info
transformers 4.44.2
accelerate 1.2.1
deepspeed 0.12.2
torch 2.2.2
torchaudio 2.2.2
torchvision 0.17.2

Expected behavior
Expect to see the gathered parameters

tjruwase · 2025-02-14T00:48:08Z

@fanfanffff1, please see the following links for how to access and modify training states in ZeRO-3

fanfanffff1 added bug Something isn't working training labels Jan 31, 2025

tjruwase closed this as completed Feb 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3 #6987

model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3 #6987

fanfanffff1 commented Jan 31, 2025

tjruwase commented Feb 14, 2025

model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3 #6987

model.parameters() return [Parameter containing: tensor([], device='cuda:0', dtype=torch.bfloat16, requires_grad=True)] when using zero3 #6987

Comments

fanfanffff1 commented Jan 31, 2025

tjruwase commented Feb 14, 2025