Skip to content

Latest commit

 

History

History
65 lines (54 loc) · 2.58 KB

File metadata and controls

65 lines (54 loc) · 2.58 KB

Public Benchmark Summaries

This file keeps public benchmark summaries and points to complete machine-specific result tables when a release has a larger validation set.

B200-SXM

The B200 release validation is the current public formal benchmark set. It covers dense Llama3 TP/PP cases, DeepSeek MoE EP/PP cases, and long-context CP cases on the public B200 system configuration.

For the complete B200 release table, see B200 release summary.

area representative coverage cases timing error range memory error range
Dense Llama3 Llama3-70B / Llama3-405B, TP and PP, mbc=4/8/32 24 -11.37% to -0.18% -0.49% to -0.17%
DeepSeek MoE DeepSeek-V2 / DeepSeek-V3, EP and EP+PP, mbc=4/8/32 12 -13.54% to +0.06% -0.81% to -0.50%
Long-context CP Llama3-70B / Llama3-405B, CP4/CP8, 32K and 128K sequence 7 -9.27% to -5.05% -1.38% to -0.06%

The B200 summary also records the exact system config, run notes, and per-case real/perf timing and memory values.

A100-PCIe

A100 benchmark summary

model mbc parallelism memory Tflops
llama3 70b 4 tp1pp2 -0.80% -1.46%
llama3 70b 4 tp2 -0.32% -3.09%
llama3 70b 4 tp4 -0.39% -0.59%
llama3 70b 4 tp8 -0.50% -1.86%
llama3 70b 8 tp1pp2 -2.85% -0.10%
llama3 70b 8 tp2 -1.65% -2.08%
llama3 70b 8 tp4 -1.44% -1.28%
llama3 70b 8 tp8 -1.25% -1.73%
llama3 70b 32 tp1pp2 -2.85% 2.12%
llama3 70b 32 tp2 -1.64% 0.03%
llama3 70b 32 tp4 -1.43% -1.71%
llama3 70b 32 tp8 -1.26% -1.47%
model mbc parallelism memory Tflops
ds 236b 4 ep8 -1.45% -3.62%
ds 236b 4 ep4pp2 -4.82% -0.66%
ds 236b 8 ep8 0.22% -2.64%
ds 236b 8 ep4pp2 -4.70% -1.47%
ds 236b 32 ep8 -1.45% -1.82%
ds 236b 32 ep4pp2 -4.82% -0.82%
model mbc parallelism memory Tflops
llama3 8b 4 tp1pp2 -1.10% 0.04%
llama3 8b 4 tp2 -0.63% -2.35%
llama3 8b 4 tp4 -0.63% 1.96%
llama3 8b 4 tp8 -0.63% -0.50%
llama3 8b 8 tp1pp2 -1.10% 1.53%
llama3 8b 8 tp2 -0.63% -0.61%
llama3 8b 8 tp4 -0.63% 1.35%
llama3 8b 8 tp8 -0.63% -0.56%
llama3 8b 32 tp1pp2 -1.10% 3.76%
llama3 8b 32 tp2 -0.63% -0.31%
llama3 8b 32 tp4 -0.63% 2.08%
llama3 8b 32 tp8 -0.63% -0.75%