Fix: ScaledDotProductAttention.forward throws AttributeError when exe…

…cuted within `enable_torch_sdp`
explosion · Sep 30, 2024 · 48f1878 · 48f1878
1 parent e7e7e9d
commit 48f1878
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/curated_transformers/layers/attention.py b/curated_transformers/layers/attention.py
@@ -708,7 +708,7 @@ def forward(
                 key=key,
                 value=value,
                 attn_mask=logit_mask,
-                dropout_p=self.dropout_prob if self.training else 0.0,
+                dropout_p=self.dropout.p if self.training else 0.0,
             )
 
             # Torch SDP returns NaNs for pieces where every is piece masked out.