Skip to content

[Refactor] Extract GQA prefill kernel selection logic #1611

Description

@superAngGao

Parent: #1610

Scope

Extract the first piece of GQA kernel selection policy while keeping the intended layering and manifest visibility explicit.

Preferred layering:

OP class -> local selector/helper -> self.kernel_map[key] -> concrete Kernel class -> TileLang implementation/factory/callable

Required design constraints

Do this:

Op -> selector/helper -> concrete Kernel -> lower-level TL implementation/factory/callable

Do not do this:

logical Kernel -> existing Kernel -> TL implementation

The OP-to-kernel relationship should remain visible through default_kernel_map and manifest source.kernel_map where applicable. Shared dispatch logic can live in a local pure helper near the GQA OP code, such as _select_gqa_prefill_kernel_key(...); it does not need to become a concrete Kernel subclass or a new module in the first patch.

Manifest constraint

This issue must not add new public OPs or change tileops/manifest/attention.yaml.

Current manifest entries have fixed ordered signature.inputs. Later OP aggregation should keep that constraint: each aggregated OP must define one explicit, fixed input list in both code and manifest. No optional tensor inputs and no dynamic arity.

Non-goals

  • Do not add new public OPs.
  • Do not rename existing public OPs.
  • Do not change tileops/manifest/attention.yaml.
  • Do not add logical Kernel subclasses.
  • Do not remove existing concrete kernel classes.
  • Do not hide all concrete kernels behind a single mega logical kernel that weakens manifest/kernel_map visibility.
  • Do not split the large GQA files yet unless the selector extraction naturally requires a small local helper.

Expected behavior

  • Dense/no-cache prefill paths can share selection logic without changing the public OP contract.
  • Contiguous KV-cache prefill paths can share selection logic without changing the public OP contract.
  • Paged KV-cache prefill paths can share selection logic without changing the public OP contract.
  • RoPE / append-style paths remain OP-level orchestration of multiple concrete kernels.
  • Dispatch tests should verify selector behavior without compiling GPU kernels.

Validation

  • Unit tests for selector behavior using concrete kernel-map keys or lightweight fake maps.
  • Import checks for the touched GQA OP path.
  • Focused GQA smoke tests where practical.
  • No manifest validation changes are expected because this issue does not add public OPs.

Metadata

Metadata

Assignees

Labels

refactorCode restructuring without behavior change

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions