Can the linear layer of PyTorch be optimized by using grouped Gemm

I understand that this is optimizing matrix multiplication for different sizes. Can all linear layers in Torch be optimized using grouped gemm.