Parent: #1610
Scope
Extract the first piece of GQA kernel selection policy while keeping the intended layering and manifest visibility explicit.
Preferred layering:
OP class -> local selector/helper -> self.kernel_map[key] -> concrete Kernel class -> TileLang implementation/factory/callable
Required design constraints
Do this:
Op -> selector/helper -> concrete Kernel -> lower-level TL implementation/factory/callable
Do not do this:
logical Kernel -> existing Kernel -> TL implementation
The OP-to-kernel relationship should remain visible through default_kernel_map and manifest source.kernel_map where applicable. Shared dispatch logic can live in a local pure helper near the GQA OP code, such as _select_gqa_prefill_kernel_key(...); it does not need to become a concrete Kernel subclass or a new module in the first patch.
Manifest constraint
This issue must not add new public OPs or change tileops/manifest/attention.yaml.
Current manifest entries have fixed ordered signature.inputs. Later OP aggregation should keep that constraint: each aggregated OP must define one explicit, fixed input list in both code and manifest. No optional tensor inputs and no dynamic arity.
Non-goals
- Do not add new public OPs.
- Do not rename existing public OPs.
- Do not change
tileops/manifest/attention.yaml.
- Do not add logical
Kernel subclasses.
- Do not remove existing concrete kernel classes.
- Do not hide all concrete kernels behind a single mega logical kernel that weakens manifest/kernel_map visibility.
- Do not split the large GQA files yet unless the selector extraction naturally requires a small local helper.
Expected behavior
- Dense/no-cache prefill paths can share selection logic without changing the public OP contract.
- Contiguous KV-cache prefill paths can share selection logic without changing the public OP contract.
- Paged KV-cache prefill paths can share selection logic without changing the public OP contract.
- RoPE / append-style paths remain OP-level orchestration of multiple concrete kernels.
- Dispatch tests should verify selector behavior without compiling GPU kernels.
Validation
- Unit tests for selector behavior using concrete kernel-map keys or lightweight fake maps.
- Import checks for the touched GQA OP path.
- Focused GQA smoke tests where practical.
- No manifest validation changes are expected because this issue does not add public OPs.
Parent: #1610
Scope
Extract the first piece of GQA kernel selection policy while keeping the intended layering and manifest visibility explicit.
Preferred layering:
Required design constraints
Do this:
Do not do this:
The OP-to-kernel relationship should remain visible through
default_kernel_mapand manifestsource.kernel_mapwhere applicable. Shared dispatch logic can live in a local pure helper near the GQA OP code, such as_select_gqa_prefill_kernel_key(...); it does not need to become a concreteKernelsubclass or a new module in the first patch.Manifest constraint
This issue must not add new public OPs or change
tileops/manifest/attention.yaml.Current manifest entries have fixed ordered
signature.inputs. Later OP aggregation should keep that constraint: each aggregated OP must define one explicit, fixed input list in both code and manifest. No optional tensor inputs and no dynamic arity.Non-goals
tileops/manifest/attention.yaml.Kernelsubclasses.Expected behavior
Validation