Commit 016f64c
[NVBUG: 5612606] Clear GPU cache for large models layer quantization during export (#497)
## What does this PR do?
**Type of change:** Bug fix
**Overview:** ?
For large models like llama4 maverick, the stacked weights to fp8
conversion might hit OOM. This change aim to fix that.
---------
Signed-off-by: Chenjie Luo <[email protected]>
Signed-off-by: mxin <[email protected]>1 parent 3ea6921 commit 016f64c
1 file changed
+3
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
37 | 37 | | |
38 | 38 | | |
39 | 39 | | |
| 40 | + | |
40 | 41 | | |
41 | 42 | | |
42 | 43 | | |
| |||
763 | 764 | | |
764 | 765 | | |
765 | 766 | | |
| 767 | + | |
| 768 | + | |
766 | 769 | | |
767 | 770 | | |
768 | 771 | | |
| |||
0 commit comments