-
Notifications
You must be signed in to change notification settings - Fork 631
Fix torch.compile recompilation issue with HF modeling + TP
#2130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
4f36924
5a63932
6b4400b
08fb8c7
66998c2
d78b0e6
5f9ae25
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -23,6 +23,26 @@ | |
|
|
||
|
|
||
| flavors = { | ||
| "debugperf": HFTransformerModelArgs( | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what's the difference between debugperf / debugperf_large and debugmodel? Can we just keep one of them? |
||
| titan_dense_args=TitanDenseModelArgs( | ||
| dim=256, | ||
| n_layers=6, | ||
| n_heads=16, | ||
| n_kv_heads=16, | ||
| vocab_size=2048, | ||
| rope_theta=500000, | ||
| ), | ||
| ), | ||
| "debugperf_large": HFTransformerModelArgs( | ||
| titan_dense_args=TitanDenseModelArgs( | ||
| dim=1024, | ||
| n_layers=12, | ||
| n_heads=16, | ||
| n_kv_heads=16, | ||
| vocab_size=32000, | ||
| rope_theta=500000, | ||
| ), | ||
| ), | ||
| "debugmodel": HFTransformerModelArgs( | ||
| titan_dense_args=TitanDenseModelArgs( | ||
| dim=256, | ||
|
|
||
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -11,6 +11,8 @@ | |||
| class HFTransformers: | ||||
| model: str = "" | ||||
| """HuggingFace model ID (e.g., 'Qwen/Qwen3-4B-Instruct-2507')""" | ||||
| tie_word_embeddings: bool = False | ||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Putting IIUC this is a field is decided by model architecture, and not decided by each training run. So previously we put Qwen3's weight tying config into model_args:
|
||||
| """Whether to tie input embeddings and output projection weights (default: True for HF models)""" | ||||
|
|
||||
|
|
||||
| @dataclass | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove these 2 test models?