feat: add DSA cache and PP support by zhjc1124 · Pull Request #134 · taco-project/FlexKV

zhjc1124 · 2026-04-02T08:49:09Z

Summary

This PR adds DSA (Dynamic Sparse Attention) cache support and Pipeline Parallelism (PP) support to FlexKV.

Added dataclass in to hold indexer-specific cache configuration (e.g., , , ) for DSA/NSA sparse attention models
Extended to manage separate indexer storage handles () for CPU, SSD, and REMOTE devices
Extended to accept optional indexer GPU blocks

Added parameter to so each PP rank only manages its own layers instead of the full model layer count
Fixed resolution to use total heads (not per-rank heads) for correct KV layout

feat: add DSA cache and PP support

a26792d

YconquestY self-requested a review April 2, 2026 08:57