Skip to content

-j vs --cpu/--gpu in ddp  #737

@godfrey-cw

Description

@godfrey-cw

📚 Documentation

Link

https://pytorch.org/torchx/latest/components/distributed.html

What does it currently say?

Not clear whether --cpu, --gpu arguments are overrided by -j arguments, although in my testing (launch then run top, etc.) it seems they are?

What should it say?

Both the docs and the --help output for dist.ddp could be more clear on this front. More generally, I am wondering if there exists a torchx equivalent of torchrun --standalone --nnodes=1 --nproc_per_node=auto ....

Why?

Clearly I wouldn't want --gpu=0 with -j 1x2, right? As such the listed defaults in docs --help are a little confusing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions