You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a question about tp-overlap.
The function below make a buffer for args.seq_length * args.micro_batch_size. Do this support thd format?
def_initialize_tp_communicators():
""" initializing the communicators with user buffers for high-performance tensor-model-parallel communication overlap """try:
importyamlimporttransformer_enginefromtransformer_engine.pytorchimportmoduleaste_moduleexceptImportError:
raiseRuntimeError("Tensor Parallel Communication/GEMM Overlap optimization needs 'yaml' and ""'transformer_engine' packages")
args=get_args()
ifargs.tp_comm_overlap_cfgisnotNone:
withopen(args.tp_comm_overlap_cfg,"r") asstream:
ub_cfgs=yaml.safe_load(stream)
else:
ub_cfgs= {}
input_shape= [(args.seq_length*args.micro_batch_size) //args.context_parallel_size , args.hidden_size]
#We create a MPI process group, which is needed to bootstrap the pipelined #tensor-model-parallel communication overlaptorch.distributed.new_group(backend='nccl')
te_module.base.initialize_ub(shape=input_shape, tp_size=args.tensor_model_parallel_size,
use_fp8= (args.fp8isnotNone) , ub_cfgs=ub_cfgs,)
Follow this question, I have found that after TP/SP mlp layer, the output shape is exactly seqlen, args.hidden_size. So how does that works for qkv_proj hidden_dim * 3/ tp_size and mlp hidden_dim * 2 / tp_size?
This discussion was converted from issue #1238 on October 23, 2024 21:02.
Heading
Bold
Italic
Quote
Code
Link
Numbered list
Unordered list
Task list
Attach files
Mention
Reference
Menu
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, thank you for great works.
I have a question about tp-overlap.
The function below make a buffer for
args.seq_length * args.micro_batch_size
. Do this support thd format?Follow this question, I have found that after TP/SP mlp layer, the output shape is exactly
seqlen, args.hidden_size
. So how does that works for qkv_projhidden_dim * 3/ tp_size
and mlphidden_dim * 2 / tp_size
?Beta Was this translation helpful? Give feedback.
All reactions