-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AMPLIFY Mega-PR #640
base: main
Are you sure you want to change the base?
AMPLIFY Mega-PR #640
Conversation
ff1bf93
to
c42eb78
Compare
983ae2f
to
223e212
Compare
223e212
to
1940ee0
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #640 +/- ##
==========================================
+ Coverage 86.27% 86.37% +0.10%
==========================================
Files 118 121 +3
Lines 7162 7363 +201
==========================================
+ Hits 6179 6360 +181
- Misses 983 1003 +20 ☔ View full report in Codecov by Sentry. |
ce8233a
to
5a849d6
Compare
Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: Peter St. John <[email protected]>
1542641
to
76fab80
Compare
Signed-off-by: Peter St. John <[email protected]>
76fab80
to
ddbe407
Compare
Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: Peter St. John <[email protected]>
Signed-off-by: Peter St. John <[email protected]>
assert_cosine_similarity( | ||
hf_block_output, | ||
nemo_block_output, | ||
attention_mask.cpu(), | ||
rtol=1e-4, | ||
atol=1e-4, | ||
msg=f"Encoder block output {i}", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yeah, I have really no idea what to make of this. The accuracy starts degrading steadily through each attention block; this test fails after the second block with an error of ~0.01.
I tried using a non-TE attention block, as well as turning off all the fusions with no luck. Some other things to try would be omitting the mask tokens in the test inputs and see if that helps figure out where this is coming from.
No description provided.