Skip to content

Fully AI Powered Kernel Development

lcy-seso edited this page Mar 4, 2026 · 1 revision

Status: Draft — under active discussion

Each design area below will become a tracked issue once the architecture stabilizes.

Context

TileOPs is a 2-layer operator library (L1 Kernel → L2 Op) built on TileLang. Today, AI assists development through issue → PR workflows. The next step is enabling AI to autonomously deliver new kernels end-to-end — from spec to tested, benchmarked, merged code.

Current State → Target State

Current Target Design Area
Kernel authoring Human writes kernel, AI assists AI writes kernel from spec 1. Spec & Knowledge
TileLang knowledge AI copies from existing examples Structured patterns, synced with upstream 1. Spec & Knowledge
Quality Unit tests + manual benchmarks + human review Automated correctness + perf regression + AI review 2. Validation
Delivery Skill-guided, manual triggering Spec → implement → validate → PR, orchestrated 3. Delivery

Design Areas

1. Kernel Spec & TileLang Knowledge

Problem: AI needs two things to generate a kernel — a clear spec (what to build) and TileLang expertise (how to build it). Both are currently ad hoc.

Spec: structured issue template with algorithm, I/O tensors, dtypes, hardware scope, and a PyTorch reference implementation.

Knowledge: a curated pattern catalog (tiling, pipelining, shared memory, warp specialization, Hopper WGMMA/TMA) plus anti-patterns. Lives in skill files or CLAUDE.md, kept in sync with tilelang upstream.

Related: #139 TileLang eager mode, #215 tilelang-puzzles documentation

2. Validation Pipeline

Problem: Current testing is single-reference with fixed tolerances and no persisted performance data. Not enough to trust AI-generated kernels.

Correctness: multi-reference baselines, edge case coverage, shape fuzzing, formalized tolerance per dtype.

Performance: baseline numbers stored in repo, CI regression detection, autotuning by default, comparison against external implementations (cuBLAS, FlashAttention).

Architecture: structural checks (Kernel → Op → Test → Bench pattern, naming, exports) encoded as CI checks or skill constraints.

3. Delivery Pipeline

Problem: Individual skills exist (issue, commit, PR, review) but no orchestrated end-to-end flow.

Pipeline: spec validation → implementation (L1 + L2 + tests + benchmarks) → validation (correctness + perf + architecture) → AI review → human approval.

Key decisions: single agent with phases vs. multi-agent; where human intervenes; how failures are handled and retried.

Dependencies

┌──────────────────────────────┐
│ 1. Spec & Knowledge         │  ← start here
└──────────────┬───────────────┘
               ↓
┌──────────────────────────────┐
│ 2. Validation Pipeline       │  ← build quality gates
└──────────────┬───────────────┘
               ↓
┌──────────────────────────────┐
│ 3. Delivery Pipeline         │  ← wire it all together
└──────────────────────────────┘

Clone this wiki locally