Skip to content

Grouped-GEMM and non-fp32 redesign for GemmArgs #56

@drunkcoding

Description

@drunkcoding

Follow-up to #48.

When #48 landed (commit f298c6d), GemmArgs was collapsed to single-shape fp32 because:

  • Every callsite (proxy_cli, cpu/gpu workers, benches, tests) hardcoded group_size = 1.
  • The group_size > 1 branch in CalculateTaskSizes was LOG_FATAL.
  • No fp16/bf16/int8 GEMM consumer existed.

The struct now ships with a GemmDtype enum (kFloat32 only) as an explicit extension point and static_assert(sizeof(float) == 4) to make the assumption observable. Reopen / pick this up when a real consumer requires:

  1. Grouped GEMM: variadic per-group shapes (re-introduce the MAX_GEMM_DIM-indexed layout or use a heap-allocated descriptor array).
  2. Non-fp32 operands: fp16 / bf16 / int8 paths. Will need cuBLASXt / MKL backend support and probably a dispatch table keyed on GemmDtype.

Until then, the single-shape fp32 surface is the documented contract.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions