Follow-up to #48.
When #48 landed (commit f298c6d), GemmArgs was collapsed to single-shape fp32 because:
- Every callsite (proxy_cli, cpu/gpu workers, benches, tests) hardcoded
group_size = 1.
- The
group_size > 1 branch in CalculateTaskSizes was LOG_FATAL.
- No fp16/bf16/int8 GEMM consumer existed.
The struct now ships with a GemmDtype enum (kFloat32 only) as an explicit extension point and static_assert(sizeof(float) == 4) to make the assumption observable. Reopen / pick this up when a real consumer requires:
- Grouped GEMM: variadic per-group shapes (re-introduce the
MAX_GEMM_DIM-indexed layout or use a heap-allocated descriptor array).
- Non-fp32 operands: fp16 / bf16 / int8 paths. Will need cuBLASXt / MKL backend support and probably a dispatch table keyed on
GemmDtype.
Until then, the single-shape fp32 surface is the documented contract.
Follow-up to #48.
When #48 landed (commit f298c6d),
GemmArgswas collapsed to single-shape fp32 because:group_size = 1.group_size > 1branch inCalculateTaskSizeswasLOG_FATAL.The struct now ships with a
GemmDtypeenum (kFloat32only) as an explicit extension point andstatic_assert(sizeof(float) == 4)to make the assumption observable. Reopen / pick this up when a real consumer requires:MAX_GEMM_DIM-indexed layout or use a heap-allocated descriptor array).GemmDtype.Until then, the single-shape fp32 surface is the documented contract.