Similar to zai-org/ChatGLM3#1321, the VisionReward model is incompatible with newer versions of diffusers, version newer than 4.43.0 will trigger
AttributeError: 'CogVLMVideoForCausalLM' object has no attribute '_extract_past_from_model_output'
For ChatGLM3 this was handled in https://huggingface.co/THUDM/chatglm3-6b/commit/67d005d386a01d4825649743f41e90f83edd6094 and subsequent commits. Can this be done to make VisionReward useable with modern transformers (say, 4.52.1) too?
Thank you!