Unclear what kind of support would be possible for those hybrid models. We can still give access to layers and mlps, we could just throw an error if people tried to check the attention of one of the mamba layer and detect which layers are attention vs mamba based