Fix `torch.compile` recompilation issue with HF modeling + TP #2130

3outeille · 2025-12-09T15:55:31Z

TODO: need to apply change in transformers V5. That requires to wait for V5 to be a bit stable before switch torchtitan transformers modeling backend to v5 (as for now, it relies on 4.57.1)

Issue

[rank3]:/fsx/ferdinandmom/ferdinand-hf/huggingface/torchtitan/env_torchtitan_official/lib/python3.12/site-packages/torch/_inductor/compile_fx.py:321: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled. Consider setting `torch.set_float32_matmul_precision('high')` for better performance.
[rank3]:  warnings.warn(
[rank3]:[rank3]:W1209 10:08:01.582000 2539247 torch/_dynamo/convert_frame.py:1564] [0/8] torch._dynamo hit config.recompile_limit (8)
[rank3]:[rank3]:W1209 10:08:01.582000 2539247 torch/_dynamo/convert_frame.py:1564] [0/8]    function: 'forward' (/fsx/ferdinandmom/ferdinand-hf/huggingface/torchtitan/env_torchtitan_official/lib/python3.12/site-packages/torch/distributed/algorithms/_checkpoint/checkpoint_wrapper.py:145)
[rank3]:[rank3]:W1209 10:08:01.582000 2539247 torch/_dynamo/convert_frame.py:1564] [0/8]    last reason: 0/7: ___dict_contains(148, self._modules['_checkpoint_wrapped_module']._modules['self_attn']._forward_pre_hooks_with_kwargs)  # if hook_id in self._forward_pre_hooks_with_kwargs:  # nn/modules/module.py:1815 in inner
[rank3]:[rank3]:W1209 10:08:01.582000 2539247 torch/_dynamo/convert_frame.py:1564] [0/8] To log all recompilation reasons, use TORCH_LOGS="recompiles".
[rank3]:[rank3]:W1209 10:08:01.582000 2539247 torch/_dynamo/convert_frame.py:1564] [0/8] To diagnose recompilation issues, see https://pytorch.org/docs/main/compile/programming_model.recompilation.html
[rank3]:[rank3]: Traceback (most recent call last):

Fix

Apply + current PR changes + transformers at modeling_llama.py, change

       hidden_states, _ = self.self_attn(
-		   hidden_states=hidden_states,
+          hidden_states,
            attention_mask=attention_mask,
            position_ids=position_ids,
            past_key_values=past_key_values,
            use_cache=use_cache,
            cache_position=cache_position,
            position_embeddings=position_embeddings,
            **kwargs,
        )

./tooling_dev/debug_local.sh debugperf_large --compile

Explanation

When torch.compile traces your model, it creates a compiled graph along with guards. Guards are conditions that must be true for that graph to be reused. If guard fails, torch.compile will recompiles.
in modeling_llama.py, the self.attn(hidden_states=hidden_states) is called with kwargs
In torchtitan, if you apply TP, it will apply register_forward_pre_hook . However, depending on if you use kwargs or not, it will call different function (cf https://github.com/pytorch/pytorch/blob/main/torch/distributed/tensor/parallel/style.py#L576).
- In our case, it will call module.register_forward_pre_hook(lambda _, inputs, kwargs: some_fn(inputs, kwargs), with_kwargs=True
but calling this function is problematic as it will trigger if hook_id in self._forward_pre_hooks_with_kwargs: (cf https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/module.py#L1808)
- This means that using kwargs will results in different hook_id , hence the error ___dict_contains(148, self._modules['_checkpoint_wrapped_module']._modules['self_attn']._forward_pre_hooks_with_kwargs)
When we don't usekwargs, self._forward_pre_hooks_with_kwargs will always be empty (cf https://github.com/pytorch/pytorch/blob/main/torch/nn/modules/module.py#L1679C13-L1679C48) so the if check is not triggered, so each attention layer has same hook_id, thus no recompile

…d jobs now

…e issue when combined with TP

tianyu-l · 2025-12-09T20:47:45Z

torchtitan/experiments/transformers_modeling_backend/__init__.py



 flavors = {
+    "debugperf": HFTransformerModelArgs(


what's the difference between debugperf / debugperf_large and debugmodel? Can we just keep one of them?

tianyu-l · 2025-12-09T20:48:13Z

...htitan/experiments/transformers_modeling_backend/tooling_dev/check_checkpoint_correctness.py

Do we need to ship this folder to fix the issue? It's about 2k LoC complexity.

+1 , I think we could remove these test scripts to keep code simple

wwwjn

Thanks for finding this! to check my understanding, the bug is:

the function call with kwargs will return new object id for the hook -> causing recompile

Is this correct?

wwwjn · 2025-12-09T18:09:47Z

torchtitan/experiments/transformers_modeling_backend/__init__.py



 flavors = {
+    "debugperf": HFTransformerModelArgs(


Should we remove these 2 test models?

wwwjn · 2025-12-09T21:36:59Z

...htitan/experiments/transformers_modeling_backend/tooling_dev/check_checkpoint_correctness.py

+1 , I think we could remove these test scripts to keep code simple

wwwjn · 2025-12-09T21:37:27Z

torchtitan/models/llama3/__init__.py



 llama3_args = {
+    "debugperf": TransformerModelArgs(


same here, could we remove these 2 models here?

wwwjn · 2025-12-09T21:41:23Z

torchtitan/experiments/transformers_modeling_backend/job_config.py

 class HFTransformers:
    model: str = ""
    """HuggingFace model ID (e.g., 'Qwen/Qwen3-4B-Instruct-2507')"""
+    tie_word_embeddings: bool = False


Putting tie_word_embeddings into job config is a little bit confusing, and seems not related to this error?

IIUC this is a field is decided by model architecture, and not decided by each training run. So previously we put Qwen3's weight tying config into model_args:

torchtitan/torchtitan/models/qwen3/model/args.py

Line 43 in fbafd44

enable_weight_tying: bool = False

3outeille added 7 commits November 24, 2025 10:03

add tooling

4f36924

add check checkpoint correctness

5a63932

add compare_throughput feature + able to run torchtitan and hf backen…

6b4400b

…d jobs now

make tie weight embedding configurable

08fb8c7

add debugperf model for fair comparison

66998c2

Add debugperf_large model configuration and profiling support

d78b0e6

use input_layout instead of input_kwarg_layouts to avoid torch.compil…

5f9ae25

…e issue when combined with TP

3outeille requested review from fegin, tianyu-l, wconstab and wwwjn as code owners December 9, 2025 15:55

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 9, 2025

tianyu-l requested changes Dec 9, 2025

View reviewed changes

wwwjn reviewed Dec 9, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix `torch.compile` recompilation issue with HF modeling + TP #2130

Fix `torch.compile` recompilation issue with HF modeling + TP #2130

Uh oh!

3outeille commented Dec 9, 2025

Uh oh!

tianyu-l Dec 9, 2025

Uh oh!

tianyu-l Dec 9, 2025

Uh oh!

wwwjn Dec 9, 2025

Uh oh!

wwwjn left a comment

Uh oh!

wwwjn Dec 9, 2025

Uh oh!

wwwjn Dec 9, 2025

Uh oh!

wwwjn Dec 9, 2025

Uh oh!

wwwjn Dec 9, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix torch.compile recompilation issue with HF modeling + TP #2130

Are you sure you want to change the base?

Fix torch.compile recompilation issue with HF modeling + TP #2130

Uh oh!

Conversation

3outeille commented Dec 9, 2025

Issue

Fix

Explanation

Uh oh!

tianyu-l Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

tianyu-l Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn left a comment

Choose a reason for hiding this comment

Uh oh!

wwwjn Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix `torch.compile` recompilation issue with HF modeling + TP #2130

Fix `torch.compile` recompilation issue with HF modeling + TP #2130