Commit 7436674
committed
ops: prioritize mem transfer
The async offload streams reason for existence is to transfer from
RAM to GPU. The post processing compute steps are a bonus on the side
stream, but if the compute stream is running a long kernel, it can
stall the side stream, as it wait to type-cast the bias before
transferring the weight. So do a pure xfer of the weight straight up,
then do everything bias, then go back to fix the weight type and do
weight patches.1 parent fcdb4a5 commit 7436674
1 file changed
+11
-7
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
99 | 98 | | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
100 | 106 | | |
101 | | - | |
102 | | - | |
| 107 | + | |
103 | 108 | | |
104 | | - | |
| 109 | + | |
105 | 110 | | |
106 | 111 | | |
107 | 112 | | |
108 | 113 | | |
109 | | - | |
110 | | - | |
111 | | - | |
| 114 | + | |
| 115 | + | |
112 | 116 | | |
113 | 117 | | |
114 | 118 | | |
| |||
0 commit comments