Hi dear authors, can we use `--mtp-num-layers 1 --enable-mtp-training` to train glm-4.7-flash in RL stage?