Skip to content

Commit d07a41f

Browse files
authored
[Refactor] Phaseout LLVM Dependency by Making it Optional (#247)
* remove llvm build * [Refactor] Update kernel compilation and profiling in examples - Replaced `tilelang.lower` with `tilelang.compile` in multiple example scripts to streamline kernel compilation. - Updated profiling calls to utilize the new `get_profiler` method, enhancing performance measurement consistency. - Adjusted assertions and benchmarking methods to align with the new profiling structure across various examples, ensuring correctness and clarity in performance evaluations. * lint fix * License Update * [Refactor] Improve code formatting and documentation in CUDA header and HIP runtime files - Adjusted formatting in `cuda.h` for better readability, including alignment of comments and struct fields. - Cleaned up whitespace and improved comment clarity in `rt_mod_hip.cc` to enhance code maintainability. * [Refactor] Enhance formatting and clarity in CUDA header and HIP runtime files - Improved comment alignment and readability in `cuda.h`. - Cleaned up whitespace and formatting in `rt_mod_hip.cc` to enhance maintainability. * lint fix * lint fix * lint fix * lint fix * fix * License update * [Enhancement] Update JITKernel to use artifact for kernel source - Assigned the generated artifact to `self.artifact` for better management. - Updated kernel source references to use `artifact.kernel_source` for consistency in execution backend handling. * lint fix * Add @tilelang.testing.requires_llvm decorator to vectorization tests * Enhance setup.py and env.py for library management - Added functionality to remove original files after copying in CMakeBuild. - Updated TVM_LIBRARY_PATH in env.py to include the PyPI build library path for better integration. * Refactor TVM_LIBRARY_PATH assignment for improved readability in env.py * Refactor CMakeBuild file handling in setup.py - Added a check to ensure the target library directory exists before copying .so files. - Improved the logic for creating the target directory and copying files to enhance robustness. * bugfix * Rename BuildTLDebug to BuildTileLangCUDAWithoutCompile and update registration. Add @tilelang.testing.requires_llvm decorator to multiple tests for LLVM requirement. * lint fix * Enhance TileLang code generation by adding support for device code generation without compilation. Updated `host_codegen` and `device_codegen` functions to include new transformations and registration for `tilelang_hip_without_compile`. Refactored JIT kernel adapters to accommodate host and device modules, improving overall integration and flexibility. * lint fix * Add support for C target in device code generation - Updated `device_codegen_without_compile` to include handling for the C target by registering the `tilelang_cpp` function. * [Enhancement] Implement auto-clear cache feature based on environment variable * Added TILELANG_CLEAR_CACHE environment variable to control cache clearing. * Updated CI workflow to set TILELANG_CLEAR_CACHE during testing. * Modified cache initialization to clear cache if TILELANG_CLEAR_CACHE is set to true. * [Refactor] Update kernel invocation and import paths in tests and cache * Changed kernel invocation in `test_tilelang_kernel_dequantize_gemm.py` to return the result. * Updated import statements in `test_tilelang_kernel_int4_gemm_mma.py` to use `bitblas` instead of `tilelang`. * Refactored paths for artifact and parameters in `kernel_cache.py` for better maintainability. * [Refactor] Clean up whitespace and improve code formatting in kernel_cache.py * Removed unnecessary blank lines and adjusted spacing for better readability in the KernelCache class. * Enhanced overall code formatting to align with project standards. * [Enhancement] Add bfloat16 test case and improve kernel caching logic * Introduced a new test case for bfloat16 matrix multiplication in `test_tilelang_kernel_gemm_mma_intrinsic.py`. * Updated `KernelCache` to handle multiple kernel source files and improve error handling during saving and loading. * Refactored `JITKernel` to support instantiation from a database, enhancing flexibility in kernel management. * Adjusted `CtypesKernelAdapter` and `CythonKernelAdapter` to utilize the new kernel loading mechanism from the database. * Improved code formatting and readability across several files. * lint fix * Update bfloat16 matrix multiplication test case to use larger dimensions for improved coverage
1 parent 6972a3a commit d07a41f

File tree

231 files changed

+999
-750
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

231 files changed

+999
-750
lines changed

.github/workflows/ci.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,4 +67,5 @@ jobs:
6767
run: |
6868
source tilelang_ci/bin/activate
6969
cd testing/python
70+
export TILELANG_CLEAR_CACHE=1
7071
python -m pytest

3rdparty/tvm

Submodule tvm updated from 2654ce8 to c1c2a08

CMakeLists.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright(c) Microsoft Corporation.
1+
# Copyright(c) Tile-AI Corporation.
22
# Licensed under the MIT License.
33
# Learn a lot from the MLC - LLM Project
44
# https: // github.com/mlc-ai/mlc-llm/blob/main/CMakeLists.txt

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ Or install locally:
8787
sudo apt-get update
8888
sudo apt-get install -y python3-setuptools gcc libtinfo-dev zlib1g-dev build-essential cmake libedit-dev libxml2-dev
8989

90-
pip install . # with -e option if you want to install in editable mode
90+
pip install -e . -v # remove -e option if you don't want to install in editable mode, -v for verbose output
9191
```
9292

9393
### Method 2: Build from Source

benchmark/blocksparse_attention/benchmark_library_dense_fmha.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) Microsoft Corporation.
1+
# Copyright (c) Tile-AI Corporation.
22
# Licensed under the MIT License.
33
# ruff: noqa
44
import torch

benchmark/blocksparse_attention/benchmark_tilelang_block_sparse_fmha.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) Microsoft Corporation.
1+
# Copyright (c) Tile-AI Corporation.
22
# Licensed under the MIT License.
33
# ruff: noqa
44
import math

benchmark/blocksparse_attention/benchmark_torch_block_sparse_fmha.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) Microsoft Corporation.
1+
# Copyright (c) Tile-AI Corporation.
22
# Licensed under the MIT License.
33
# ruff: noqa
44
import math

benchmark/blocksparse_attention/benchmark_triton_block_sparse_fmha.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) Microsoft Corporation.
1+
# Copyright (c) Tile-AI Corporation.
22
# Licensed under the MIT License.
33
# ruff: noqa
44
import math

benchmark/matmul/benchmark_matmul_intrinsic.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) Microsoft Corporation.
1+
# Copyright (c) Tile-AI Corporation.
22
# Licensed under the MIT License.
33

44
import argparse

examples/blocksparse_attention/block_sparse_attn_triton.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Copyright (c) Microsoft Corporation.
1+
# Copyright (c) Tile-AI Corporation.
22
# Licensed under the MIT License.
33
# ruff: noqa: E712
44
import math

examples/convolution/example_convolution.py

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
1+
# Copyright (c) Tile-AI Corporation.
2+
# Licensed under the MIT License.
13
import torch
24
import tilelang
3-
from tilelang import Profiler
45
from tilelang.autotuner import *
56
import tilelang.language as T
67
import itertools
@@ -145,14 +146,14 @@ def ref_program(A, B, stride, padding, dilation):
145146
N, C, H, W, F, K, S, D, P, tune=args.tune)(
146147
block_M=256, block_N=128, block_K=64, num_stages=4, threads=256)
147148
ref_program = partial(ref_program, stride=S, padding=P, dilation=D)
148-
mod, params = tilelang.lower(program)
149-
mod = Profiler(mod, params, [2], tilelang.TensorSupplyType.Normal)
150-
mod.assert_allclose(ref_program, rtol=0.01, atol=0.01)
149+
kernel = tilelang.compile(program, out_idx=[2])
150+
profiler = kernel.get_profiler(tilelang.TensorSupplyType.Normal)
151+
profiler.assert_allclose(ref_program, rtol=0.01, atol=0.01)
151152
print("All checks pass.")
152-
latency = mod.do_bench(ref_program, warmup=500)
153+
latency = profiler.do_bench(ref_program, warmup=500)
153154
print("Ref: {:.2f} ms".format(latency))
154155
print("Ref: {:.2f} TFlops".format(total_flops / latency * 1e-9))
155-
latency = mod.do_bench(mod.func, warmup=500)
156+
latency = profiler.do_bench(warmup=500)
156157
print("Tile-lang: {:.2f} ms".format(latency))
157158
print("Tile-lang: {:.2f} TFlops".format(total_flops / latency * 1e-9))
158159
else:

examples/deepseek_deepgemm/example_deepgemm_fp8_2xAcc.py

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -148,8 +148,8 @@ def calc_diff(x, y):
148148

149149
def assert_tl_gemm_correctness(M, N, K, in_dtype, out_dtype, accum_dtype):
150150
gemm = tl_gemm(M, N, K, in_dtype, out_dtype, accum_dtype)
151-
mod, params = TL.lower(gemm)
152-
src_code = mod.imported_modules[0].get_source()
151+
kernel = TL.compile(gemm, out_idx=[])
152+
src_code = kernel.get_kernel_source()
153153

154154
# src_code is the generated cuda source
155155
assert src_code is not None
@@ -165,16 +165,15 @@ def assert_tl_gemm_correctness(M, N, K, in_dtype, out_dtype, accum_dtype):
165165

166166
C = torch.zeros(M, N, device="cuda", dtype=out_dtype)
167167

168-
mod = TL.Profiler(mod, params, [], TL.TensorSupplyType.Integer)
169-
170-
mod(A_fp8, B_fp8, C, A_scale, B_scale)
168+
kernel(A_fp8, B_fp8, C, A_scale, B_scale)
171169
# Get Reference Result
172170
ref_c = ref_deepgemm_fp8(A_fp8, B_fp8, A_scale, B_scale, out_dtype)
173171
diff = calc_diff(C, ref_c)
174172
print(f"diff: {diff}")
175173
assert diff < 1e-3
176174

177-
latency = mod.do_bench(mod.func, warmup=25)
175+
profiler = kernel.get_profiler()
176+
latency = profiler.do_bench(warmup=25)
178177
# Ensure that the latency is not None
179178
assert latency is not None
180179
print(f"latency: {latency} ms")

0 commit comments

Comments
 (0)