Add support for TurboQuant kv cache, with custom fused kernel for MLA by jondurbin · Pull Request #13 · chutesai/sglang

jondurbin · 2026-03-27T16:02:24Z

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

Cherry-pick from upstream PR sgl-project#21419. Implements Google TurboQuant algorithm for KV cache quantization via --kv-cache-dtype turboquant, achieving up to 4.92x compression with near-zero accuracy loss. Reference: Zandieh et al., TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate (arXiv:2504.19874)

jondurbin added 5 commits March 27, 2026 14:09

WIP: TurboQuant, but also with support for MLA

5945c39

TurboQuant DeepGEMM compatibility.

e84c979

NSA (ds 3.2) fused kernel (in theory).

c52ceba

MHA fused

997eea8

jondurbin force-pushed the turboquant branch from 28afc54 to 997eea8 Compare March 27, 2026 18:31

jondurbin added 11 commits March 27, 2026 16:01

fixes/benchmark scripts

17160bc

prealloc tensor for centroids tensor

b43edd5

ds32 test updates

6d8e31c

Fixes.

ab354de

Fewer kernel launches.

55d666f

async-compress

89c4858

Fixes

4dede91

TQ perf tweaks.

ff30fdf

Fix _use_fused_dequant

beec6a1

Configurable.

ed8370b

Fixes.

0436c6d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for TurboQuant kv cache, with custom fused kernel for MLA#13

Add support for TurboQuant kv cache, with custom fused kernel for MLA#13
jondurbin wants to merge 16 commits intochutesfrom
turboquant

jondurbin commented Mar 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jondurbin commented Mar 27, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant