kvcache framework restructure by Adrenaline-S · Pull Request #1139 · alibaba/rtp-llm

Adrenaline-S · 2026-06-25T03:36:04Z

Background

The existing KVCache framework had all implementations flat in rtp_llm/cpp/cache/, with CacheConfig using raw std::vector fields (layer_to_group_id, group_types, cache_specs) to manage cache group information. This lacked type-safe group/layer mapping abstractions and made it impossible to declare per-layer cache layouts from the Python side.

Key Changes

1. Directory Modularization (with git mv to preserve history)

Split root-level files into four subdirectories by responsibility:

spec/ — KVCacheSpec type hierarchy (MHA/MLA/Linear/Opaque) + KVCacheSpecDesc Python bridge descriptors
allocator/ — Allocator hierarchy; added HybridKVCacheAllocator base class and HybridPoolKVCacheAllocator (independent per-group block pools)
group/ — Cache group implementations; added SWAKVCacheGroup (sliding-window attention group)
config_creator/ — Config factory; added HybridPoolConfigCreator

2. CacheConfig: GroupBase/LayerBase Compositional Design

Introduced GroupBase (managing spec/policy/tag/layer_ids per group) and LayerBase (managing multi-dimensional layer-to-group mapping) to replace the flat field model. Added fromGroupedSpecs() / fromLayerDescs() construction paths, enabling declarative per-layer cache type configuration (MHA/MLA/Linear/SWA) via KVCacheSpecDesc from the Python side.

3. KVCacheAllocator Enhancements

Integrated SharedBlockCache (standalone prefix-tree cache, replacing the BlockPool-embedded BlockCache) and CPSlotMapper (context-parallel sharding support). Added tag- and group_id-dimensioned address conversion interfaces and independent block pool support (use_independent_block_pools).

4. Config Layer Extensions

ConfigModules.h: PrefillCPConfig gains kv_cache_sharded / prefill_cp_size; KVCacheConfig gains enable_gpu_prefix_tree / enable_independent_group_eviction; HybridAttentionConfig gains enable_independent_kv_cache_pools
ModelConfig.h: added kv_cache_spec_descs (LayerKVCacheSpecDescs) for models to populate per-layer cache specs

5. Python-Side Changes

BaseModel._post_build_model_config() auto-populates kv_cache_spec_descs for standard MHA/MLA models
Python bindings: new KVCacheSpecDesc / KVCacheSpecDescExtra types exposed
OpDefs.h: layer_attn_types renamed to layer_group_types to reflect multi-group semantics

Move implementation files into subdirectories to align with refactor branch: - allocator/: KVCacheAllocator, HybridTypeKVCacheAllocator, SingleTypeKVCacheAllocator - config_creator/: CacheConfigCreator, HybridConfigCreator, SingleConfigCreator, MemoryEvaluationHelper - group/: KVCacheGroup, FullKVCacheGroup, LinearKVCacheGroup - spec/: CacheGroupType, KVCacheSpec, KVCacheSpecBase, MHAKVCacheSpec, MLAKVCacheSpec, LinearKVCacheSpec Content is unchanged; include paths and BUILD files will be fixed in the next commit.

…v_refactor Phase 1 (a88a7da): restructure kvcache directory layout (rename only) - Move files into allocator/, config_creator/, group/, spec/ subdirs - Content unchanged; ensures git log --follow can trace history Phase 2: sync kvcache content from refactor-kvcache branch (3e04315) - Add spec/ subdirectory: KVCacheSpec hierarchy (MHA/MLA/Linear/Opaque), KVCacheSpecDesc/KVCacheSpecDescTypes for Python-C++ config bridge - Add allocator/: KVCacheAllocator, HybridKVCacheAllocator, HybridPoolKVCacheAllocator, HybridTypeKVCacheAllocator, SingleTypeKVCacheAllocator - Add config_creator/: CacheConfigCreator, HybridConfigCreator, HybridPoolConfigCreator, SingleConfigCreator, MemoryEvaluationHelper - Add group/: KVCacheGroup, FullKVCacheGroup, LinearKVCacheGroup, SWAKVCacheGroup - Replace CacheConfig with GroupBase/LayerBase design - Update BlockPool, SharedBlockCache, KVCacheResource/BatchKVCacheResource - Add CPSlotMapper, KVCacheTransferPlanner, SharedBlockCache - Update metrics, model_rpc, normal_engine, pybind, Python models Excludes DSV4/deepseek_v4-specific content per scope constraint. Phase 3: sync kvcache framework optimizations from feat/dsv4_on_dev_refactor (df5290b -> 92dcc53), excluding all DeepSeek V4 model-specific content. - GroupBase/LayerBase changed from class to plain struct; CacheConfig stores vector<GroupBase>/vector<LayerBase> + tag_to_gid map - KVCacheSpecDesc: local_head_num_kv->num_kv_heads, local_num_k/v_heads->num_k/v_heads; SpecBuilder refactored with per-type factories and TP-aware head count derivation - SpecBuildContext gains attn_tp_size/kernel_tokens_per_block/cp fields - LayerKVCacheSpecDescs: map<int64_t,...> -> vector<vector<...>> - CacheGroupPolicy: group_type field removed (derived from spec lifecycle) - primaryLayerGroupIdsSnapshot() removed; initGroups() signature simplified - kv_cache_layer_to_group param chain removed from NormalEngine, NormalExecutor, MtpExecutor, PyWrappedModel, CudaGraphRunner - group_order fields removed; mergeMTPModule rewritten with tag-based group alignment preserving target group-index namespace - BlockPoolConfigHelper: simplify MTP spec lookup to specForGroup(0) Excludes: deepseek_v4.py, DSV4CacheTest.cc, DSV4-specific helpers

Adrenaline-S requested review from xinfei-shi and xinfeishi June 25, 2026 03:36

Adrenaline-S force-pushed the refactor/kvcache-framework-restructure branch 2 times, most recently from 8750d41 to 713c00c Compare June 29, 2026 02:07

Adrenaline-S force-pushed the refactor/kvcache-framework-restructure branch from 713c00c to 8e3adcd Compare June 29, 2026 03:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

kvcache framework restructure#1139

kvcache framework restructure#1139
Adrenaline-S wants to merge 2 commits into
alibaba:mainfrom
Adrenaline-S:refactor/kvcache-framework-restructure

Adrenaline-S commented Jun 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Adrenaline-S commented Jun 25, 2026

Background

Key Changes

1. Directory Modularization (with git mv to preserve history)

2. CacheConfig: GroupBase/LayerBase Compositional Design

3. KVCacheAllocator Enhancements

4. Config Layer Extensions

5. Python-Side Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant