Commit 2e39d71
committed
fix: GPTQ performance for large MoE models
- chunked error compensation: process columns in blocks of 8 with
cross-chunk matmul instead of per-column broadcast to all remaining.
reduces memory traffic ~5x for expert tensors
- expert sub-batching: split 256+ experts into batches of 32 for GPTQ.
reduces peak memory ~8x (3GB+ -> ~400MB per projection) while
producing identical results. batch size configurable via admin UI
- budget plan fallback: when target bpw is unreachable due to
sensitivity tier limits, boost non-expert tensors toward 8-bit
using remaining budget. fixes oQ2 on large MoE models stopping
at 2.68 instead of reaching target 2.81 parent 39f1ee2 commit 2e39d71
10 files changed
Lines changed: 231 additions & 59 deletions
File tree
- omlx
- admin
- i18n
- static/js
- templates/dashboard
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
200 | 205 | | |
201 | 206 | | |
202 | 207 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
200 | 205 | | |
201 | 206 | | |
202 | 207 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
200 | 205 | | |
201 | 206 | | |
202 | 207 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
200 | 205 | | |
201 | 206 | | |
202 | 207 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
197 | 197 | | |
198 | 198 | | |
199 | 199 | | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
200 | 205 | | |
201 | 206 | | |
202 | 207 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
78 | 78 | | |
79 | 79 | | |
80 | 80 | | |
| 81 | + | |
81 | 82 | | |
82 | 83 | | |
83 | 84 | | |
| |||
253 | 254 | | |
254 | 255 | | |
255 | 256 | | |
| 257 | + | |
256 | 258 | | |
257 | 259 | | |
258 | 260 | | |
| |||
324 | 326 | | |
325 | 327 | | |
326 | 328 | | |
| 329 | + | |
327 | 330 | | |
328 | 331 | | |
329 | 332 | | |
| |||
485 | 488 | | |
486 | 489 | | |
487 | 490 | | |
| 491 | + | |
488 | 492 | | |
489 | 493 | | |
490 | 494 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
212 | 212 | | |
213 | 213 | | |
214 | 214 | | |
| 215 | + | |
215 | 216 | | |
216 | 217 | | |
217 | 218 | | |
| |||
3660 | 3661 | | |
3661 | 3662 | | |
3662 | 3663 | | |
| 3664 | + | |
3663 | 3665 | | |
3664 | 3666 | | |
3665 | 3667 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
269 | 269 | | |
270 | 270 | | |
271 | 271 | | |
| 272 | + | |
272 | 273 | | |
273 | 274 | | |
274 | 275 | | |
| |||
2685 | 2686 | | |
2686 | 2687 | | |
2687 | 2688 | | |
| 2689 | + | |
2688 | 2690 | | |
2689 | 2691 | | |
2690 | 2692 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1178 | 1178 | | |
1179 | 1179 | | |
1180 | 1180 | | |
| 1181 | + | |
| 1182 | + | |
| 1183 | + | |
| 1184 | + | |
| 1185 | + | |
| 1186 | + | |
| 1187 | + | |
| 1188 | + | |
| 1189 | + | |
| 1190 | + | |
| 1191 | + | |
| 1192 | + | |
| 1193 | + | |
| 1194 | + | |
| 1195 | + | |
1181 | 1196 | | |
1182 | 1197 | | |
1183 | 1198 | | |
| |||
0 commit comments