Fully AI Powered Kernel Development

Status: Draft — under active discussion

Each design area below will become a tracked issue once the architecture stabilizes.

Context

TileOPs is a 2-layer operator library (L1 Kernel → L2 Op) built on TileLang. Today, AI assists development through issue → PR workflows. The next step is enabling AI to autonomously deliver new kernels end-to-end — from spec to tested, benchmarked, merged code.

Current State → Target State

	Current	Target	Design Area
Kernel authoring	Human writes kernel, AI assists	AI writes kernel from spec	1. Spec & Knowledge
TileLang knowledge	AI copies from existing examples	Structured patterns, synced with upstream	1. Spec & Knowledge
Quality	Unit tests + manual benchmarks + human review	Automated correctness + perf regression + AI review	2. Validation
Delivery	Skill-guided, manual triggering	Spec → implement → validate → PR, orchestrated	3. Delivery

Design Areas

1. Kernel Spec & TileLang Knowledge

Problem: AI needs two things to generate a kernel — a clear spec (what to build) and TileLang expertise (how to build it). Both are currently ad hoc.

Spec: structured issue template with algorithm, I/O tensors, dtypes, hardware scope, and a PyTorch reference implementation.

Knowledge: a curated pattern catalog (tiling, pipelining, shared memory, warp specialization, Hopper WGMMA/TMA) plus anti-patterns. Lives in skill files or CLAUDE.md, kept in sync with tilelang upstream.

Related: #139 TileLang eager mode, #215 tilelang-puzzles documentation

2. Validation Pipeline

Problem: Current testing is single-reference with fixed tolerances and no persisted performance data. Not enough to trust AI-generated kernels.

Correctness: multi-reference baselines, edge case coverage, shape fuzzing, formalized tolerance per dtype.

Performance: baseline numbers stored in repo, CI regression detection, autotuning by default, comparison against external implementations (cuBLAS, FlashAttention).

Architecture: structural checks (Kernel → Op → Test → Bench pattern, naming, exports) encoded as CI checks or skill constraints.

3. Delivery Pipeline

Problem: Individual skills exist (issue, commit, PR, review) but no orchestrated end-to-end flow.

Pipeline: spec validation → implementation (L1 + L2 + tests + benchmarks) → validation (correctness + perf + architecture) → AI review → human approval.

Key decisions: single agent with phases vs. multi-agent; where human intervenes; how failures are handled and retried.

Dependencies

┌──────────────────────────────┐
│ 1. Spec & Knowledge         │  ← start here
└──────────────┬───────────────┘
               ↓
┌──────────────────────────────┐
│ 2. Validation Pipeline       │  ← build quality gates
└──────────────┬───────────────┘
               ↓
┌──────────────────────────────┐
│ 3. Delivery Pipeline         │  ← wire it all together
└──────────────────────────────┘

Weekly Reports

Weekly Report

Maintain Dashboard

Summary
- CI Health Log
- Ops Health Dashboard

AI Knowledge Base

TileLang Knowledge Base

Uh oh!

Fully AI Powered Kernel Development

Context

Current State → Target State

Design Areas

1. Kernel Spec & TileLang Knowledge

2. Validation Pipeline

3. Delivery Pipeline

Dependencies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally