Python 3.10.16 (conda-forge) PyTorch 2.7.0+cu128 CUDA 12.8 + NVIDIA Driver 550+ Forge commit: dfdcbab6 Launch args: --cuda-malloc --cuda-stream --pin-shared-memory --vae-in-fp32