Skip to content

Commit bd1c36d

Browse files
Kiuk Chungfacebook-github-bot
Kiuk Chung
authored andcommitted
(torchx/docs) add deprecation docstring to torch_dist_role and create_torch_dist_role, added docs for configfile under experimental, fixed a few docstring errors, rearranged component toctree to be grouped more logically (#266)
Summary: Pull Request resolved: #266 1. add deprecation docstring to torch_dist_role and create_torch_dist_role - we cannot add a `warnings.warn(DeprecationWarning)` because unfortunately `dist.ddp` uses `torch_dist_role` (hence adding it via `warnings` will print that warning each time `dist.ddp` is loaded). 2. added docs for configfile under experimental 3. fixed a few docstring errors 4. rearranged component toctree to be grouped more logically 5. removed `components/base.rst` (since it is deprecated) Reviewed By: d4l3k Differential Revision: D31697032 fbshipit-source-id: 25462ec452c38a43a54ee3d74861f13cb41cf554
1 parent d277d8c commit bd1c36d

File tree

11 files changed

+209
-53
lines changed

11 files changed

+209
-53
lines changed

docs/source/beta.rst

Lines changed: 0 additions & 2 deletions
This file was deleted.

docs/source/components/base.rst

Lines changed: 0 additions & 9 deletions
This file was deleted.
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
(beta) .torchxconfig file
2+
-----------------------------
3+
4+
.. automodule:: torchx.runner.config
5+
.. currentmodule:: torchx.runner.config
6+
7+
Config API Functions
8+
~~~~~~~~~~~~~~~~~~~~~~
9+
10+
.. autofunction:: apply
11+
.. autofunction:: load
12+
.. autofunction:: dump

docs/source/index.rst

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -81,12 +81,11 @@ Components Library
8181

8282
components/overview
8383
components/train
84-
components/serve
84+
components/distributed
8585
components/interpret
8686
components/metrics
8787
components/hpo
88-
components/base
89-
components/distributed
88+
components/serve
9089
components/utils
9190

9291
Runtime Library
@@ -123,9 +122,9 @@ Experimental
123122
---------------
124123
.. toctree::
125124
:maxdepth: 1
126-
:caption: Beta Features
125+
:caption: Experimental Features
127126

128-
beta
127+
experimental/runner.config
129128

130129

131130

torchx/components/__init__.py

Lines changed: 76 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -47,18 +47,21 @@
4747
authoring your own component is as simple as writing a python function with the following
4848
rules:
4949
50-
1. The component function must return an ``specs.AppDef`` and the return type must be annotated
51-
2. All arguments of the component must be type annotated and the type must be one of
50+
1. The component function must return an ``specs.AppDef`` and the return type must be specified
51+
2. All arguments of the component must be PEP 484 type annotated and the type must be one of
5252
#. Primitives: ``int``, ``float``, ``str``, ``bool``
5353
#. Optional primitives: ``Optional[int]``, ``Optional[float]``, ``Optional[str]``
5454
#. Maps of primitives: ``Dict[Primitive_key, Primitive_value]``
5555
#. Lists of primitives: ``List[Primitive_values]``
5656
#. Optional collections: ``Optional[List]``, ``Optional[Dict]``
5757
#. VAR_ARG: ``*arg`` (useful when passing through arguments to the entrypoint script)
58-
3. The function should have well defined docstring in
59-
`google format <https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html>`_.
60-
This docstring is used by the torchx cli to autogenerate a ``--help`` string which is useful
61-
when sharing components with others.
58+
3. (optional) A docstring in `google format <https://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html>`_
59+
(in particular see ``function_with_pep484_type_annotations``). This docstring is purely informative
60+
in that torchx cli uses it to autogenerate an informative ``--help`` message, which is
61+
useful when sharing components with others. If the component does not have a docstring
62+
the ``--help`` option will still work, but the parameters will have a canned description (see below).
63+
Note that when running components programmatically via :py:mod:`torchx.runner`, the docstring
64+
is not picked up by torchx at all.
6265
6366
Below is an example component that launches DDP scripts, it is a simplified version of
6467
the :py:func:`torchx.components.dist.ddp` builtin.
@@ -76,21 +79,6 @@ def ddp(
7679
nnodes: int = 1,
7780
nproc_per_node: int = 1,
7881
) -> specs.AppDef:
79-
\"""
80-
DDP simplified.
81-
82-
Args:
83-
image: name of the docker image containing the script + deps
84-
script: path of the script in the image
85-
script_args: arguments to the script
86-
host: machine type (one from named resources)
87-
nnodes: number of nodes to launch
88-
nproc_per_node: number of scripts to launch per node
89-
90-
Returns:
91-
specs.AppDef: ddp AppDef
92-
\"""
93-
9482
return specs.AppDef(
9583
name=os.path.basename(script),
9684
roles=[
@@ -115,6 +103,73 @@ def ddp(
115103
]
116104
)
117105
106+
Assuming the component above is saved in ``example.py``, we can run ``--help``
107+
on it as:
108+
109+
.. code-block:: shell-session
110+
111+
$ torchx ./example.py:ddp --help
112+
usage: torchx run ...torchx_params... ddp [-h] --image IMAGE --script SCRIPT [--host HOST]
113+
[--nnodes NNODES] [--nproc_per_node NPROC_PER_NODE]
114+
...
115+
116+
AppDef: ddp. TIP: improve this help string by adding a docstring ...<omitted for brevity>...
117+
118+
positional arguments:
119+
script_args (required)
120+
121+
optional arguments:
122+
-h, --help show this help message and exit
123+
--image IMAGE (required)
124+
--script SCRIPT (required)
125+
--host HOST (default: aws_p3.2xlarge)
126+
--nnodes NNODES (default: 1)
127+
--nproc_per_node NPROC_PER_NODE
128+
(default: 1)
129+
130+
If we include a docstring as such:
131+
132+
.. code-block:: python
133+
134+
def ddp(...) -> specs.AppDef:
135+
\"""
136+
DDP Simplified.
137+
138+
Args:
139+
image: name of the docker image containing the script + deps
140+
script: path of the script in the image
141+
script_args: arguments to the script
142+
host: machine type (one from named resources)
143+
nnodes: number of nodes to launch
144+
nproc_per_node: number of scripts to launch per node
145+
146+
\"""
147+
148+
# ... component body same as above ...
149+
pass
150+
151+
Then the ``--help`` message would reflect the function and parameter descriptions
152+
in the docstring as such:
153+
154+
::
155+
156+
usage: torchx run ...torchx_params... ddp [-h] --image IMAGE --script SCRIPT [--host HOST]
157+
[--nnodes NNODES] [--nproc_per_node NPROC_PER_NODE]
158+
...
159+
160+
App spec: DDP simplified.
161+
162+
positional arguments:
163+
script_args arguments to the script
164+
165+
optional arguments:
166+
-h, --help show this help message and exit
167+
--image IMAGE name of the docker image containing the script + deps
168+
--script SCRIPT path of the script in the image
169+
--host HOST machine type (one from named resources)
170+
--nnodes NNODES number of nodes to launch
171+
--nproc_per_node NPROC_PER_NODE
172+
number of scripts to launch per node
118173
119174
120175
Validating

torchx/components/base/__init__.py

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,12 @@ def torch_dist_role(
4040
**launch_kwargs: Any,
4141
) -> Role:
4242
"""
43+
.. warning:: This method is deprecated and will be removed in future versions.
44+
Instead use :py:func:`torchx.components.dist.ddp` as a builtin,
45+
or prefer to use `torch.distributed.run <https://pytorch.org/docs/stable/elastic/run.html>`_
46+
directly by setting your AppDef's ``entrypoint = python`` and
47+
``args = ["-m", "torch.distributed.run", ...]``.
48+
4349
A ``Role`` for which the user provided ``entrypoint`` is executed with the
4450
torchelastic agent (in the container). Note that the torchelastic agent
4551
invokes multiple copies of ``entrypoint``.
@@ -54,8 +60,8 @@ def torch_dist_role(
5460
5561
::
5662
57-
# nproc_per_node correspond to the ``torch.distributed.launch`` arguments. More
58-
# info about available arguments: https://pytorch.org/docs/stable/distributed.html#launch-utility
63+
# nproc_per_node correspond to the ``torch.distributed.run`` arguments. More
64+
# info about available arguments: https://pytorch.org/docs/stable/elastic/run.html
5965
trainer = torch_dist_role("trainer",container, entrypoint="trainer.py",.., nproc_per_node=4)
6066
6167

torchx/components/base/roles.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,12 @@ def create_torch_dist_role(
2828
**launch_kwargs: Any,
2929
) -> Role:
3030
"""
31+
.. warning:: This method is deprecated and will be removed in future versions.
32+
Instead use :py:func:`torchx.components.dist.ddp` as a builtin,
33+
or prefer to use `torch.distributed.run <https://pytorch.org/docs/stable/elastic/run.html>`_
34+
directly by setting your AppDef's ``entrypoint = python`` and
35+
``args = ["-m", "torch.distributed.run", ...]``.
36+
3137
A ``Role`` for which the user provided ``entrypoint`` is executed with the
3238
torchelastic agent (in the container). Note that the torchelastic agent
3339
invokes multiple copies of ``entrypoint``.

torchx/runner/api.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -145,8 +145,8 @@ def run_component(
145145
if it dryrun specified.
146146
147147
Raises:
148-
`ComponentValidationException`: if component is invalid.
149-
`ComponentNotFoundException`: if the ``component_path`` is failed to resolve.
148+
ComponentValidationException: if component is invalid.
149+
ComponentNotFoundException: if the ``component_path`` is failed to resolve.
150150
"""
151151
component_def = get_component(component_name)
152152
app = from_function(component_def.fn, app_args)
@@ -505,9 +505,9 @@ def _scheduler_app_id(
505505
is the same as this session.
506506
507507
Raises:
508-
ValueError - if ``check_session=True`` and the session in the app handle
508+
ValueError: if ``check_session=True`` and the session in the app handle
509509
does not match this session's name
510-
KeyError - if no such scheduler backend exists
510+
KeyError: if no such scheduler backend exists
511511
"""
512512

513513
scheduler_backend, _, app_id = parse_app_handle(app_handle)

torchx/runner/config.py

Lines changed: 96 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,95 @@
55
# This source code is licensed under the BSD-style license found in the
66
# LICENSE file in the root directory of this source tree.
77

8+
"""
9+
You can store the scheduler :py:class:`torchx.specs.RunConfig` for your project
10+
by storing them in the ``.torchxconfig`` file. Currently this file is only read
11+
and honored when running the component from the CLI.
12+
13+
CLI Usage
14+
~~~~~~~~~~~
15+
16+
#. ``cd`` into the directory where you want the ``.torchxconfig`` file to be dropped.
17+
The CLI only picks up ``.torchxconfig`` files from the current-working-directory (CWD)
18+
so chose a directory where you typically run ``torchx`` from. Typically this
19+
is the root of your project directory.
20+
21+
#. Generate the config file by running
22+
23+
.. code-block:: shell-session
24+
25+
$ torchx configure -s <comma,delimited,scheduler,names>
26+
27+
# -- or for all registered schedulers --
28+
$ torchx configure
29+
30+
#. If you specified ``-s local_cwd,kubernetes``, you should see a ``.torchxconfig``
31+
file as shown below:
32+
33+
.. code-block:: shell-session
34+
35+
$ cat .torchxconfig
36+
[local_cwd]
37+
38+
[kubernetes]
39+
queue = #FIXME:(str) Volcano queue to schedule job in
40+
41+
#. ``.torchxconfig`` in in INI format and the section names map to the scheduler names.
42+
Each section contains the run configs for the scheduler as ``$key = $value`` pairs.
43+
You may find that certain schedulers have empty sections, this means that
44+
the scheduler defines sensible defaults for all its run configs hence no run configs
45+
are required at runtime. If you'd like to override the default you can add them.
46+
**TIP:** To see all the run options for a scheduler use ``torchx runopts <scheduler_name>``.
47+
48+
#. The sections with ``FIXME`` placeholders are run configs that are required
49+
by the scheduler. Replace these with the values that apply to you.
50+
51+
#. **IMPORTANT:** If you are happy with the scheduler provided defaults for a particular
52+
run config, you **should not** redundantly specity them in ``.torchxconfig`` with the
53+
same default value. This is because the scheduler may decide to change the default
54+
value at a later date which would leave you with a stale default.
55+
56+
#. Now you can run your component without having to specify the scheduler run configs
57+
each time. Just make sure the directory you are running ``torchx`` cli from actually
58+
has ``.torchxconfig``!
59+
60+
.. code-block:: shell-session
61+
62+
$ ls .torchxconfig
63+
.torchxconfig
64+
65+
$ torchx run -s local_cwd ./my_component.py:train
66+
67+
Programmatic Usage
68+
~~~~~~~~~~~~~~~~~~~
69+
70+
Unlike the cli, ``.torchxconfig`` file **is not** picked up automatically
71+
from ``CWD`` if you are programmatically running your component with :py:class:`torchx.runner.Runner`.
72+
You'll have to manually specify the directory containing ``.torchxconfig``.
73+
74+
Below is an example
75+
76+
.. doctest:: [runner_config_example]
77+
78+
from torchx.runner import get_runner
79+
from torchx.runner.config import apply
80+
import torchx.specs as specs
81+
82+
def my_component(a: int) -> specs.AppDef:
83+
# <... component body omitted for brevity ...>
84+
pass
85+
86+
scheduler = "local_cwd"
87+
cfg = specs.RunConfig()
88+
cfg.set("log_dir", "/these/take/outmost/precedence")
89+
90+
apply(scheduler, cfg, dirs=["/home/bob"]) # looks for /home/bob/.torchxconfig
91+
get_runner().run(my_component(1), scheduler, cfg)
92+
93+
You may also specify multiple directories (in preceding order) which is useful when
94+
you want to keep personal config overrides on top of a project defined default.
95+
96+
"""
897
import configparser as configparser
998
import logging
1099
from pathlib import Path
@@ -66,7 +155,7 @@ def dump(
66155
To only dump required runopts pass ``required_only=True``.
67156
68157
Each scheduler's runopts are written in the section called
69-
``[default.{scheduler_name}.cfg]``.
158+
``[{scheduler_name}]``.
70159
71160
For example:
72161
@@ -77,7 +166,7 @@ def dump(
77166
queue = #FIXME (str)Volcano queue to schedule job in
78167
79168
Raises:
80-
``ValueError`` - if given a scheduler name that is not known
169+
ValueError: if given a scheduler name that is not known
81170
"""
82171

83172
if schedulers:
@@ -128,7 +217,7 @@ def apply(scheduler: str, cfg: RunConfig, dirs: Optional[List[str]] = None) -> N
128217
over the ones in the config file and only new configs are added. The same holds
129218
true for the configs loaded in list order.
130219
131-
For instance if ``cfg = {"foo": "bar"}`` and the config file is:
220+
For instance if ``cfg={"foo":"bar"}`` and the config file is:
132221
133222
::
134223
@@ -137,12 +226,12 @@ def apply(scheduler: str, cfg: RunConfig, dirs: Optional[List[str]] = None) -> N
137226
foo = baz
138227
hello = world
139228
140-
# dir_2/.torchxconfig
141-
[local_cwd]
142-
hello = bob
229+
# dir_2/.torchxconfig
230+
[local_cwd]
231+
hello = bob
143232
144233
145-
Then after the method call, ``cfg = {"foo": "bar", "hello": "world"}``.
234+
Then after the method call, ``cfg={"foo":"bar","hello":"world"}``.
146235
"""
147236

148237
if not dirs:

torchx/schedulers/api.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,7 @@ def log_iter(
248248
An ``Iterator`` over log lines of the specified role replica
249249
250250
Raises:
251-
NotImplementedError - if the scheduler does not support log iteration
251+
NotImplementedError: if the scheduler does not support log iteration
252252
"""
253253
raise NotImplementedError(
254254
f"{self.__class__.__qualname__} does not support application log iteration"

torchx/schedulers/local_scheduler.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,8 +161,8 @@ def __init__(self, cfg: RunConfig) -> None:
161161
def fetch(self, image: str) -> str:
162162
"""
163163
Raises:
164-
ValueError - if the image name is not an absolute dir
165-
and if it does not exist or is not a directory
164+
ValueError: if the image name is not an absolute dir and if it
165+
does not exist or is not a directory
166166
167167
"""
168168
if not os.path.isdir(image):

0 commit comments

Comments
 (0)