Skip to content

v0.6.1

Compare
Choose a tag to compare
@github-actions github-actions released this 11 Sep 21:44
· 790 commits to main since this release
3fd2b0d

Highlights

Model Support

  • Added support for Pixtral (mistralai/Pixtral-12B-2409). (#8377, #8168)
  • Added support for Llava-Next-Video (#7559), Qwen-VL (#8029), Qwen2-VL (#7905)
  • Multi-input support for LLaVA (#8238), InternVL2 models (#8201)

Performance Enhancements

  • Memory optimization for awq_gemm and awq_dequantize, 2x throughput (#8248)

Production Engine

  • Support load and unload LoRA in api server (#6566)
  • Add progress reporting to batch runner (#8060)
  • Add support for NVIDIA ModelOpt static scaling checkpoints. (#6112)

Others

  • Update the docker image to use Python 3.12 for small performance bump. (#8133)
  • Added CODE_OF_CONDUCT.md (#8161)

What's Changed

New Contributors

Full Changelog: v0.6.0...v0.6.1