-
Notifications
You must be signed in to change notification settings - Fork 125
Open
Description
Miles
- MXFP8
- NVFP4
SGLang
- MXFP8
- Add mxfp8 support for online quantization, Triton dense linear, and CUTLASS MoE sgl-project/sglang#17449
- [RL] Support per-layer mixed FP8/BF16 serving for FP8 checkpoints sgl-project/sglang#18742
- Add support for FlashInfer mxfp8 sgl-project/sglang#18945
- Expand deep_gemm entrypoint to support more FP8 recipes. sgl-project/sglang#17294
- NVFP4
TransformerEngine
- MXFP8 & NVFP4
- Add
NVTE_BACKWARD_MODE=default|unquant|dequantNVIDIA/TransformerEngine#2644- For high-precision wgrad & dgrad
- Add selection for dequant or original high-precision input
- Add
FlashInfer
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels