KernelTracer

A production-oriented kernel / operator tracing framework for large language models. It supports HuggingFace Transformers and vLLM, runs tracers in isolated subprocesses, and is designed for batch model evaluation at scale.

Features

Multiple tracing backends
- torch.profiler
- torch dispatch / dynamo / FX / Inductor
- vLLM built-in profiler
Unified tracer abstraction (Tracer class + registry)
Subprocess isolation (OOM-safe, crash-safe)
Concurrent, de-duplicated model download
Model skip-list support
Intelligent bypass if trace already exists
Built-in support for HF mirror & offline mode
Works with both HF eager execution and vLLM engines

Installation

pip install -e .

With vLLM support:

pip install -e .[vllm]

Requirements:

Python ≥ 3.10
PyTorch ≥ 2.2
transformers ≥ 4.40.0
vllm ≥ 0.11.0 (optional, only if tracing vLLM)

Quick Start

list available tracers

python main.py --list-tracers

Trace a single model

python main.py \
  --framework huggingface \
  --model meta-llama/Llama-2-7b-hf \
  --output-dir outputs/llama2

python main.py \
  --framework vllm \
  --model meta-llama/Llama-2-7b-hf \
  --output-dir outputs/llama2

Trace a single model with specific tracer

python main.py \
  --framework huggingface \
  --model meta-llama/Llama-2-7b-hf \
  --tracers torch_profiler \
  --output-dir outputs/llama2

python main.py \
  --framework vllm \
  --model meta-llama/Llama-2-7b-hf \
  --tracers vllm_profiler \
  --output-dir outputs/llama2

Trace a single model with specific mode

python main.py \
  --framework huggingface \
  --model meta-llama/Llama-2-7b-hf \
  --mode train \
  --output-dir outputs/llama2

python main.py \
  --framework vllm \
  --model meta-llama/Llama-2-7b-hf \
  --mode eval \
  --output-dir outputs/llama2

Trace a single model without download (local path)

python core/worker.py \
  --framework huggingface \
  --model-path /path/to/model \
  --tracer torch_profiler \
  --mode eval
  --output-dir /path/to/output

python core/worker.py \
  --framework vllm \
  --model-path /path/to/model \
  --tracer vllm_profiler \
  --mode eval
  --output-dir /path/to/output

Trace multiple models from config

python main.py \
  --framework huggingface \
  --model-list config/models.txt \
  --skip-model-list config/skip.txt \
  --output-dir outputs

python main.py \
  --framework vllm \
  --model-list config/models.txt \
  --skip-model-list config/skip.txt \
  --output-dir outputs

Trace multiple models from config with specific tracer

python main.py \
  --framework huggingface \
  --model-list config/models.txt \
  --tracers torch_profiler \
  --skip-model-list config/skip.txt \
  --output-dir outputs

python main.py \
  --framework vllm \
  --model-list config/models.txt \
  --tracers vllm_profiler \
  --skip-model-list config/skip.txt \
  --output-dir outputs

Trace multiple models from config with specific mode

python main.py \
  --framework huggingface \
  --model-list config/models.txt \
  --mode train \
  --skip-model-list config/skip.txt \
  --output-dir outputs

python main.py \
  --framework vllm \
  --model-list config/models.txt \
  --mode eval \
  --skip-model-list config/skip.txt \
  --output-dir outputs

Force re-run even if results exist

python main.py \
  --framework huggingface \
  --model meta-llama/Llama-2-7b-hf \
  --force

python main.py \
  --framework vllm \
  --model meta-llama/Llama-2-7b-hf \
  --force

Force re-run even if results exist with specific tracer

python main.py \
  --framework huggingface \
  --tracers torch_profiler \
  --model meta-llama/Llama-2-7b-hf \
  --force

python main.py \
  --framework vllm \
  --tracers vllm_profiler \
  --model meta-llama/Llama-2-7b-hf \
  --force

Force re-run even if results exist with specific mode

python main.py \
  --framework huggingface \
  --mode train \
  --model meta-llama/Llama-2-7b-hf \
  --force

python main.py \
  --framework vllm \
  --mode eval \
  --model meta-llama/Llama-2-7b-hf \
  --force

Why subprocesses?

Torch profiler & dynamo are stateful
vLLM may hang or OOM
Subprocess isolation guarantees:
- clean CUDA state
- safe timeout
- partial failure tolerance

Tracing Semantics: HF vs vLLM

HuggingFace

Execution unit: nn.Module
Input: tokenized tensors
Tracing scope:
- aten ops
- FX graphs
- torch.profiler events
- torch.inductor IR
- torch.dispatch
Failure mode:
- Missing shards → exception (fast fail)

vLLM

Execution unit: LLMEngine
Input: prompt strings
Tracing scope:
- CUDA kernels inside engine
- attention / KV cache behavior
Failure mode:
- Partial cache → hang or silent stall

Therefore this project enforces strict model cache validation before tracing.

Output Layout

outputs/
  llama2/
    torch_profiler/
      torch_profile.json
      torch_kernel_profile.json
      ...
    print_fx/
      print_fx.json

Each tracer owns one directory, and success is marked by:

_TRACE_STATUS.json

Used for bypass & resumability.

FAQ

Q: Why not run all tracers in one process?

A: Torch dynamo / profiler conflict with each other and leak global state.

Q: Can I trace models without HF permission?

A: Yes. Download failures are detected early and skipped safely.

Q: Why validate vLLM compatibility explicitly?

A: vLLM may hang if model type or shards are incomplete.

Q: How to add a new tracer?

A:

class MyTracer(Tracer):
    name = "my_tracer"
    exclusive_group = None

    def run(self, ctx, output_dir):
        ...

Register it in tracers/registry.py.

License

MIT

Contact

dxhan@baai.ac.cn

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
config		config
core		core
execution		execution
status		status
trace_data/Qwen/Qwen3-4B/eval		trace_data/Qwen/Qwen3-4B/eval
tracers		tracers
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
OVERVIEW.md		OVERVIEW.md
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
pyvenv.cfg		pyvenv.cfg
requirements-vllm.txt		requirements-vllm.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KernelTracer

Features

Installation

Quick Start

list available tracers

Trace a single model

Trace a single model with specific tracer

Trace a single model with specific mode

Trace a single model without download (local path)

Trace multiple models from config

Trace multiple models from config with specific tracer

Trace multiple models from config with specific mode

Force re-run even if results exist

Force re-run even if results exist with specific tracer

Force re-run even if results exist with specific mode

Why subprocesses?

Tracing Semantics: HF vs vLLM

HuggingFace

vLLM

Output Layout

FAQ

Q: Why not run all tracers in one process?

Q: Can I trace models without HF permission?

Q: Why validate vLLM compatibility explicitly?

Q: How to add a new tracer?

License

Contact

About

Uh oh!

Releases

Packages

Languages

License

Dongxu-H/KernelTracer

Folders and files

Latest commit

History

Repository files navigation

KernelTracer

Features

Installation

Quick Start

list available tracers

Trace a single model

Trace a single model with specific tracer

Trace a single model with specific mode

Trace a single model without download (local path)

Trace multiple models from config

Trace multiple models from config with specific tracer

Trace multiple models from config with specific mode

Force re-run even if results exist

Force re-run even if results exist with specific tracer

Force re-run even if results exist with specific mode

Why subprocesses?

Tracing Semantics: HF vs vLLM

HuggingFace

vLLM

Output Layout

FAQ

Q: Why not run all tracers in one process?

Q: Can I trace models without HF permission?

Q: Why validate vLLM compatibility explicitly?

Q: How to add a new tracer?

License

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages