-
Notifications
You must be signed in to change notification settings - Fork 44
Fully AI Powered Kernel Development
Status: Draft — under active discussion
Each design area below will become a tracked issue once the architecture stabilizes.
TileOPs is a 2-layer operator library (L1 Kernel → L2 Op) built on TileLang. Today, AI assists development through issue → PR workflows. The next step is enabling AI to autonomously deliver new kernels end-to-end — from spec to tested, benchmarked, merged code.
| Current | Target | Design Area | |
|---|---|---|---|
| Kernel authoring | Human writes kernel, AI assists | AI writes kernel from spec | 1. Spec & Knowledge |
| TileLang knowledge | AI copies from existing examples | Structured patterns, synced with upstream | 1. Spec & Knowledge |
| Quality | Unit tests + manual benchmarks + human review | Automated correctness + perf regression + AI review | 2. Validation |
| Delivery | Skill-guided, manual triggering | Spec → implement → validate → PR, orchestrated | 3. Delivery |
Problem: AI needs two things to generate a kernel — a clear spec (what to build) and TileLang expertise (how to build it). Both are currently ad hoc.
Spec: structured issue template with algorithm, I/O tensors, dtypes, hardware scope, and a PyTorch reference implementation.
Knowledge: a curated pattern catalog (tiling, pipelining, shared memory, warp specialization, Hopper WGMMA/TMA) plus anti-patterns. Lives in skill files or CLAUDE.md, kept in sync with tilelang upstream.
Related: #139 TileLang eager mode, #215 tilelang-puzzles documentation
Problem: Current testing is single-reference with fixed tolerances and no persisted performance data. Not enough to trust AI-generated kernels.
Correctness: multi-reference baselines, edge case coverage, shape fuzzing, formalized tolerance per dtype.
Performance: baseline numbers stored in repo, CI regression detection, autotuning by default, comparison against external implementations (cuBLAS, FlashAttention).
Architecture: structural checks (Kernel → Op → Test → Bench pattern, naming, exports) encoded as CI checks or skill constraints.
Problem: Individual skills exist (issue, commit, PR, review) but no orchestrated end-to-end flow.
Pipeline: spec validation → implementation (L1 + L2 + tests + benchmarks) → validation (correctness + perf + architecture) → AI review → human approval.
Key decisions: single agent with phases vs. multi-agent; where human intervenes; how failures are handled and retried.
┌──────────────────────────────┐
│ 1. Spec & Knowledge │ ← start here
└──────────────┬───────────────┘
↓
┌──────────────────────────────┐
│ 2. Validation Pipeline │ ← build quality gates
└──────────────┬───────────────┘
↓
┌──────────────────────────────┐
│ 3. Delivery Pipeline │ ← wire it all together
└──────────────────────────────┘
Weekly Reports
Maintain Dashboard
AI Knowledge Base