Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

attention_adapter.params.grad为None #28

Open
CoverZhao opened this issue Aug 22, 2024 · 6 comments
Open

attention_adapter.params.grad为None #28

CoverZhao opened this issue Aug 22, 2024 · 6 comments

Comments

@CoverZhao
Copy link

作者你好!我在运行源代码attention_attr.py时报错:
File "/aiarena/gpfs/label-words-are-anchors/attention_attr.py", line 144, in
saliency = attentionermanger.grad(use_abs=True)[i]
File "/aiarena/gpfs/label-words-are-anchors/icl/analysis/attentioner_for_attribution.py", line 104, in grad
grads.append(self.grad_process(attention_adapter.params.grad,*args,**kwargs))
AttributeError: 'NoneType' object has no attribute 'grad'
请问这是什么原因造成的?

@leanwang326
Copy link
Collaborator

代码在单 gpu,默认setting下应该不会有问题,有可能是因为使用了pipeling parallelism

@CoverZhao
Copy link
Author

代码在单 gpu,默认setting下应该不会有问题,有可能是因为使用了pipeling parallelism

我是使用了单gpu。最开始直接运行时,程序会报错:
Traceback (most recent call last):
File "/aiarena/gpfs/label-words-are-anchors/attention_attr.py", line 143, in
loss.backward()
File "/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/_tensor.py", line 492, in backward
torch.autograd.backward(
File "/opt/conda/envs/pytorch/lib/python3.10/site-packages/torch/autograd/init.py", line 251, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

然后我发现用 loss = F.cross_entropy(output['logits'], label)计算出的loss的require_gard是false
于是我加了一行 loss.requires_grad=True,再次运行时就有了上文说的attention_adapter.params.grad为None 的报错

@leanwang326
Copy link
Collaborator

也许是哪里设置了torch.no_grad/inference_mode,或者是你再在params 和attention相乘的时候设一下params.requires_grad=True,或者是如果你用了flash_attention的话可能会有问题(现在的代码没支持这个,我看最新的flash attention似乎支持了乘mask以及backward,如果你需要的话可以自己适配一下

@CoverZhao
Copy link
Author

也许是哪里设置了torch.no_grad/inference_mode,或者是你再在params 和attention相乘的时候设一下params.requires_grad=True,或者是如果你用了flash_attention的话可能会有问题(现在的代码没支持这个,我看最新的flash attention似乎支持了乘mask以及backward,如果你需要的话可以自己适配一下

是flash attention的问题,我重新建了一个环境就解决了,感谢!

@lilhongxy
Copy link

请问可以说一下新环境的配置吗,现在也遇到了类似的问题

@CoverZhao
Copy link
Author

请问可以说一下新环境的配置吗,现在也遇到了类似的问题

我是按照requirements.txt里面配置,稍微改了一下:
datasets
ipython==8.11.0
matplotlib==3.7.1
numpy
seaborn==0.12.2
tqdm==4.65.0
transformers==4.37.0
然后torch直接去官网上装的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants