Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bad throughput with GLU #110

Open
Muennighoff opened this issue May 17, 2024 · 1 comment
Open

Bad throughput with GLU #110

Muennighoff opened this issue May 17, 2024 · 1 comment

Comments

@Muennighoff
Copy link

I'm training models with the below specs but seeing major throughput drop when switching to GLU - Do you know why? / Ideas what I could investigate? Thanks a lot! cc @mvpatel2000 @tgale96

active params: 1,011,613,696 (for glu: 1,280,049,152)
total params: 4,769,710,080 (for glu: 6,917,193,728)
8 H100s, 1 node
FSDP SHARD_GRAD_OP
mlp_impl=grouped
n_experts=8
k=1
micro_bs=1
global_bs=512
no megablocks expert/weight parallelism

With mlp_type=mlp & activation_fn=gelu I get 17000 tokens per second per device.

With mlp_type=glu & activation_fn=silu I get 1000 tokens per second per device.

A small drop is expected as it's slightly more params due to glu, but probably not this large? Switching away from grouped or trying the memory optimized mlp did not help. 🤔

@mvpatel2000
Copy link
Contributor

@Muennighoff what is your memory usage at? I would guess your memory allocator is thrashing -- this is a common problem close to limit when using dropless MoEs and leads to steep degradation in performance (as opposed to OOM).

To verify this, if you are using composer, you can add the MemoryMonitor callback and watch alloc_retries count. If it spikes, that's bad. If you have your own training library, you can use https://pytorch.org/docs/stable/generated/torch.cuda.memory_stats.html#torch-cuda-memory-stats and look at num_alloc_retries

You can also verify this by decreasing your model size by half on same GPU count (say cut n_layers)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants