HunyuanImage3 | Unified AR + diffusion | Text → Image | ✅ -- | -- | -- | -- Bagel | Unified AR + diffusion | Text → Image | ✅ Thanks for the great job! Can this RL framework support these two models on Image+Text → Text, that is, the image understanding task?
Thanks for the great job!
Can this RL framework support these two models on Image+Text → Text, that is, the image understanding task?