Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eager版GPT复现中的问题 #198

Open
dangkai4u opened this issue Sep 13, 2021 · 5 comments
Open

eager版GPT复现中的问题 #198

dangkai4u opened this issue Sep 13, 2021 · 5 comments

Comments

@dangkai4u
Copy link
Contributor

dangkai4u commented Sep 13, 2021

关于gelu函数的细微不一致

>>> import torch
>>> import oneflow as flow

>>> def pt_gelu(x):
>>>     """
>>>     Implementation of the GELU activation function currently in Google BERT repo (identical to OpenAI GPT). Also see
>>>     the Gaussian Error Linear Units paper: https://arxiv.org/abs/1606.08415
>>>     """
>>>     return 0.5 * x * (1.0 + torch.tanh(math.sqrt(2.0 / math.pi) * (x + 0.044715 * torch.pow(x, 3.0))))

>>> gelu = flow.nn.GELU()
>>> x = np.array([-5, 10, 105]).astype(np.float32)

>>> pt_input = torch.from_numpy(x)
>>> of_input = flow.Tensor(x)

>>> print(pt_gelu(pt_input))
tensor([-2.9802e-07,  1.0000e+01,  1.0500e+02])

>>> print(gelu(of_input))
tensor([-1.4901e-06,  1.0000e+01,  1.0500e+02], dtype=oneflow.float32)

>>> print(torch.nn.GELU(pt_input))
tensor([-1.4333e-06, 1.0000e+01,  1.0500e+02])
@MARD1NO
Copy link
Contributor

MARD1NO commented Sep 13, 2021

他这里是自定义了gelu吗,因为我们单测是有和torch版本的gelu比较的。

如果可以的话,用torch.nn.GELU再看看结果

@dangkai4u
Copy link
Contributor Author

他这里是自定义了gelu吗,因为我们单测是有和torch版本的gelu比较的。

如果可以的话,用torch.nn.GELU再看看结果

这是transformers库里实现的gelu,关于这个激活函数,有好几个实现版本,但我看我们的实现公式和它是一样的。刚刚试了一下torch.nn.GELU,结果也不一样。

@MARD1NO
Copy link
Contributor

MARD1NO commented Sep 13, 2021

方便贴下你的结果么,我这里mac cpu,pytorch1.7.1是

import torch
import numpy as np

x = torch.Tensor(np.array([-5, 10, 105]))
gelu = torch.nn.GELU()
out = gelu(x)

tensor([-1.4333e-06,  1.0000e+01,  1.0500e+02])

@dangkai4u
Copy link
Contributor Author

很抱歉,实验室的网上github太慢,没能及时回复。我也是这个结果,版本相同,已更新到上面。看了一下pytorch的官网,它们的实现应该是没有采用近似函数。

@MARD1NO
Copy link
Contributor

MARD1NO commented Sep 13, 2021

嗯嗯,这确实是个很细节的点,之前我们做的可能都没有注意到这个细微的差别

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants