doc(kernels): update kernels integration documentation #42277

mfuntowicz · 2025-11-19T09:06:05Z

Add some more content to the kernels integration in Transformers.

HuggingFaceDocBuilderDev · 2025-11-19T09:14:37Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

LysandreJik

It's a good start! I'm inviting @MekkCyber and @danieldk to also contribute here wrt the overall integration

Small additional comment: imo we should try to have a good, coherent doc here, and cross link it everywhere else where it makes sense (guides/docs like performance, optimization, etc)

LysandreJik · 2025-11-19T09:17:17Z

docs/source/en/kernel_doc/overview.md

+## Supported Operations
+
+The `kernels` library provides optimized implementations for:


Theoretically we'll go in a direction where most layers can be replaced by kernels, so this list is bound to be outdated quite quickly

cc @MekkCyber and @danieldk, IMO spending some time on this doc to really showcase the kernels integration in transformers would be really worhtwhile

danieldk · 2025-11-19T09:28:49Z

docs/source/en/kernel_doc/overview.md

+
+## Important Notes
+
+- **No Unkernelization**: Once kernels are enabled, they cannot be disabled during the session


It should be possible to support this. In vanilla kernels, you can kernelize with an empty mapping and the model will use the original implementations again.

danieldk · 2025-11-19T09:29:46Z

docs/source/en/kernel_doc/overview.md

+- **No Unkernelization**: Once kernels are enabled, they cannot be disabled during the session
+- **Lazy Loading**: Kernels are downloaded and cached only when needed
+- **Backward Compatibility**: Models work identically with or without kernels enabled
+- **Hardware Requirements**: CUDA kernels require compatible NVIDIA GPUs; ROCm requires AMD GPUs; XPU requires Intel GPUs


Maybe worth mentioning that some kernels require specific compute capabilities (e.g. Hopper/Blackwell)?

danieldk · 2025-11-19T09:30:56Z

docs/source/en/kernel_doc/overview.md

+- **Kernels Library**: [github.com/huggingface/kernels](https://github.com/huggingface/kernels)
+- **Community Kernels**: [huggingface.co/kernels-community](https://huggingface.co/kernels-community)
+- **API Reference**: See `KernelConfig` documentation for advanced configuration options


Maybe also worth linking out to https://huggingface.co/docs/kernels/index and perhaps kernel-builder?

MekkCyber

Thanks a lot @mfuntowicz ! This was very much needed

docs/source/en/kernel_doc/overview.md

MekkCyber · 2025-11-19T09:34:27Z

docs/source/en/kernel_doc/overview.md

+**Popular Kernel Repositories:**
+- [`kernels-community/flash-attn`](https://huggingface.co/kernels-community/flash-attn2) - Flash attention implementations


we can add vllm-flash-attn3 too I think

MekkCyber · 2025-11-19T09:37:06Z

docs/source/en/kernel_doc/overview.md

+- **kernels** package: `pip install kernels`
+- **Recommended**: `kernels>=0.10.2` for XPU support
+


nit : might be understood incorrectly that the specified version is for xpu support only

Suggested change

- **kernels** package: `pip install kernels`

- **Recommended**: `kernels>=0.10.2` for XPU support

- **kernels** package: `pip install kernels`

- **Recommended**: Use `kernels>=0.10.2` to ensure support for all backends.

MekkCyber · 2025-11-19T09:39:01Z

docs/source/en/kernel_doc/overview.md

+### Built-in Kernels
+
+Transformers includes built-in CUDA kernels for specific models:
+
+- **Falcon Mamba**: Selective scan operations with layer normalization fusion
+- Located in: `transformers.kernels.falcon_mamba`


This will be removed here : #41664

MekkCyber · 2025-11-19T09:40:54Z

docs/source/en/kernel_doc/overview.md

+- **Hardware Requirements**: CUDA kernels require compatible NVIDIA GPUs; ROCm requires AMD GPUs; XPU requires Intel GPUs
+


Suggested change

- **Hardware Requirements**: CUDA kernels require compatible NVIDIA GPUs; ROCm requires AMD GPUs; XPU requires Intel GPUs

- **Hardware Requirements**: CUDA kernels require compatible NVIDIA GPUs; ROCm requires AMD GPUs; XPU requires Intel GPUs; and Metal kernels require Apple Silicon Devices

MekkCyber · 2025-11-19T09:43:51Z

docs/source/en/kernel_doc/overview.md

+model = AutoModelForCausalLM.from_pretrained(
+    "meta-llama/Llama-3.2-1B-Instruct",
+    attn_implementation="kernels-community/flash-attn",
+    device_map="cuda"
+)


It was renamed recently

Suggested change

model = AutoModelForCausalLM.from_pretrained(

"meta-llama/Llama-3.2-1B-Instruct",

attn_implementation="kernels-community/flash-attn",

device_map="cuda"

)

model = AutoModelForCausalLM.from_pretrained(

"meta-llama/Llama-3.2-1B-Instruct",

attn_implementation="kernels-community/flash-attn2",

device_map="cuda"

)

MekkCyber · 2025-11-19T09:44:35Z

docs/source/en/kernel_doc/overview.md

+3. **Replaces forward methods**: Swaps standard PyTorch operations with optimized kernels
+4. **Maintains compatibility**: Ensures identical outputs while improving performance


not really sure about the 4th point, since with kernels comes some non determinism

Co-authored-by: Mohamed Mekkouri <[email protected]>

stevhliu

Thanks for kicking this off, I think we can polish this a bit further!

the Key Benefits and Supported Operations list interrupts the narrative flow and gets in the way of the learning path. i think this may be better at the end in a Reference section or even a link to where users can find all the supported ops (may not scale well when the list grows)
it'd be nice to integrate Key Benefits into the intro paragraph of the doc to emphasize its benefits, instead of a list
it would also make more sense to move the Requirements higher up so users know upfront
i think it'd flow better if we string the Advanced Features together with the content in the Quick Start. currently it feels a bit disjointed and doesn't really build off of or connect to what comes before
may be useful to include an example that shows you how to find out which kernels are loaded
maybe organize it like this:

# Kernels

intro paragraph about what it is and the key benefits
requirements and installation

## Enabling kernels
different ways of specifying kernels in `from_pretrained`
`use_kernels=True`
`attn_implementation` for attention kernels
`KernelConfig` for device-specific kernels

## Mode configuration
switching between inference, training, and torch.compile kernels

## Automatic kernel replacement
explanation about how it works

stevhliu · 2025-11-19T22:34:51Z

docs/source/en/kernel_doc/overview.md

+
+### Basic Usage
+
+Enable kernels when loading any model:


Clarify that this means the optimized kernels are automatically pulled in and used

stevhliu · 2025-11-19T22:35:57Z

docs/source/en/kernel_doc/overview.md

+
+### Kernel Sources
+
+Kernels are distributed through Hugging Face Hub repositories in the format:


It'd be nice to add some examples showing these different formats.

stevhliu · 2025-11-19T22:36:57Z

docs/source/en/kernel_doc/overview.md

+
+Kernels support different operational modes:
+
+- **Mode.INFERENCE**: Optimized for inference workloads (batch size optimization, reduced memory)


An example showing how to toggle these modes on/off would also be useful. Or is this done automatically when you do model.eval() and model.train()?

stevhliu · 2025-11-19T22:38:30Z

docs/source/en/kernel_doc/overview.md

+Specify different kernel implementations per device:
+
+```python
+kernel_config = KernelConfig(


It'd be nice to show what you do with the kernel_config once its defined for more context

doc(kernels): update kernels integration documentation

6d9445e

LysandreJik reviewed Nov 19, 2025

View reviewed changes

danieldk reviewed Nov 19, 2025

View reviewed changes

MekkCyber reviewed Nov 19, 2025

View reviewed changes

mfuntowicz and others added 2 commits November 19, 2025 11:13

Update docs/source/en/kernel_doc/overview.md

5ce59fd

Co-authored-by: Mohamed Mekkouri <[email protected]>

Update docs/source/en/kernel_doc/overview.md

a1f05f7

Co-authored-by: Mohamed Mekkouri <[email protected]>

stevhliu reviewed Nov 19, 2025

View reviewed changes

		## Supported Operations

		The `kernels` library provides optimized implementations for:


		## Important Notes

		- No Unkernelization: Once kernels are enabled, they cannot be disabled during the session

		Popular Kernel Repositories:
		- [`kernels-community/flash-attn`](https://huggingface.co/kernels-community/flash-attn2) - Flash attention implementations

		- kernels package: `pip install kernels`
		- Recommended: `kernels>=0.10.2` for XPU support

		- Hardware Requirements: CUDA kernels require compatible NVIDIA GPUs; ROCm requires AMD GPUs; XPU requires Intel GPUs

		3. Replaces forward methods: Swaps standard PyTorch operations with optimized kernels
		4. Maintains compatibility: Ensures identical outputs while improving performance


		### Kernel Sources

		Kernels are distributed through Hugging Face Hub repositories in the format:


		Kernels support different operational modes:

		- Mode.INFERENCE: Optimized for inference workloads (batch size optimization, reduced memory)

doc(kernels): update kernels integration documentation #42277

Are you sure you want to change the base?

doc(kernels): update kernels integration documentation #42277

Uh oh!

Conversation

mfuntowicz commented Nov 19, 2025

Uh oh!

HuggingFaceDocBuilderDev commented Nov 19, 2025

Uh oh!

LysandreJik left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MekkCyber left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

stevhliu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants