Skip to content

Commit 3184729

Browse files
committed
Add NCCL exports
1 parent 273d177 commit 3184729

File tree

5 files changed

+11
-3
lines changed

5 files changed

+11
-3
lines changed

project/lit_autoencoder.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -127,7 +127,7 @@ def cli_main():
127127
# plugins
128128
# ------------
129129
trainer.plugins = [
130-
'deepspeed_stage_2'
130+
# 'deepspeed_stage_2'
131131
]
132132

133133
# ------------

project/lit_image_classifier.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -193,7 +193,7 @@ def cli_main():
193193
# plugins
194194
# ------------
195195
trainer.plugins = [
196-
'deepspeed_stage_2'
196+
# 'deepspeed_stage_2'
197197
]
198198

199199
# ------------

project/lit_mnist.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ def cli_main():
184184
# plugins
185185
# ------------
186186
trainer.plugins = [
187-
'deepspeed_stage_2'
187+
# 'deepspeed_stage_2'
188188
]
189189

190190
# ------------

train.bash

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ export https_proxy=https://proxy.ccs.ornl.gov:3128/
2929
export no_proxy='localhost,127.0.0.0/8,.ccs.ornl.gov,.ncrc.gov'
3030
export LC_ALL=en_US.utf8
3131

32+
# Set NCCL settings for multi-node DDP
33+
export NCCL_NSOCKS_PERTHREAD=4
34+
export NCCL_SOCKET_NTHREADS=2
35+
3236
# Run training script
3337
cd "$PROJDIR"/project || exit
3438

train_nvme.bash

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,10 @@ export https_proxy=https://proxy.ccs.ornl.gov:3128/
2929
export no_proxy='localhost,127.0.0.0/8,.ccs.ornl.gov,.ncrc.gov'
3030
export LC_ALL=en_US.utf8
3131

32+
# Set NCCL settings for multi-node DDP
33+
export NCCL_NSOCKS_PERTHREAD=4
34+
export NCCL_SOCKET_NTHREADS=2
35+
3236
# Run training script
3337
cd "$PROJDIR"/project || exit
3438

0 commit comments

Comments
 (0)