-
Notifications
You must be signed in to change notification settings - Fork 24
[Benchmark] Add all reduce benchmark #393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
stack-info: PR: #393, branch: joydddd/stack/21
8a962c5
to
8c301ef
Compare
Use custom cpp Benchmarking results for allreduce on 8x devices. (time_us)
Now our performance gap between Helion & Kraken only exist for shape >= 512k where optimal config uses persistent kernel with partial SMs + pre-log. |
8c301ef
to
331d20a
Compare
331d20a
to
bcdadde
Compare
bcdadde
to
8c18c05
Compare
8c18c05
to
1defd16
Compare
5330da4
to
95ae805
Compare
95ae805
to
f8d3763
Compare
f8d3763
to
4d1ff3b
Compare
08b4196
to
19105c5
Compare
stack-info: PR: #393, branch: joydddd/stack/21
4d1ff3b
to
80dd2ea
Compare
80dd2ea
to
616a327
Compare
616a327
to
2a5733b
Compare
2a5733b
to
ec22ee1
Compare
3df55a1
to
6651ba5
Compare
stack-info: PR: #393, branch: joydddd/stack/21
ec22ee1
to
644b641
Compare
644b641
to
b0040f7
Compare
stack-info: PR: #393, branch: joydddd/stack/21
b0040f7
to
fc8be32
Compare
@yf225 Will Leaving this distributed benchmark PR to you~ |
Stacked PRs:
[Benchmark] Add all reduce benchmark