Commit 4389efd
authored
Remove caching for attention masks (#2117)
We remove the lru_cache for attention masks, because in
get_attention_mask() function, `and_masks(*mask_mods)` will return
different object id. `create_attention_mask` will use all parameters as
cache key, and new object id will always cause cache miss.
Before the change: (llama3 debugmodel_flex_attn)
<img width="1182" height="275" alt="Screenshot 2025-12-09 at 1 27 45 PM"
src="https://github.com/user-attachments/assets/e9af2597-9d94-4478-8136-8b9b8c35d9e6"
/>
After the change:
<img width="1182" height="275" alt="Screenshot 2025-12-09 at 1 29 56 PM"
src="https://github.com/user-attachments/assets/756a7d09-b47f-434f-8ff6-40098b265a03"
/>1 parent a632855 commit 4389efd
1 file changed
+8
-17
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | 9 | | |
11 | 10 | | |
12 | 11 | | |
| |||
171 | 170 | | |
172 | 171 | | |
173 | 172 | | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
179 | | - | |
180 | | - | |
181 | | - | |
182 | | - | |
183 | | - | |
184 | 173 | | |
185 | 174 | | |
186 | 175 | | |
187 | 176 | | |
188 | 177 | | |
189 | 178 | | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
190 | 186 | | |
191 | 187 | | |
192 | 188 | | |
| |||
275 | 271 | | |
276 | 272 | | |
277 | 273 | | |
278 | | - | |
279 | 274 | | |
280 | | - | |
281 | | - | |
282 | | - | |
283 | | - | |
284 | | - | |
| 275 | + | |
285 | 276 | | |
286 | 277 | | |
287 | 278 | | |
| |||
0 commit comments