Description
Description
When users are launching TorchX jobs they often want a way to provide links to view the results on external services. This commonly includes things like Tensorboard. We want to add a generic way to allow this from components.
Detailed Proposal
Specs
Update the AppDef to include an external_links
field that can be macro filled.
https://github.com/pytorch/torchx/blob/main/torchx/specs/api.py#L341
external_links: Dict[str, str] = field(default_factory=dict)
We'll likely want to add a new "apply_app" macro method that can apply to a whole appdef to materialize these values.
https://github.com/pytorch/torchx/blob/main/torchx/specs/api.py#L146
Usage
def my_component() -> AppDef:
return AppDef(
name="train",
roles=[
Role(
name="blah"
args=[
"--tensorboard", f"s3://foo/bar/{macros.app_id}",
]
),
],
external_links={
"Tensorboard": f"https://tb_service/foo/bar/{macros.app_id}",
}
),
CLI
After the job launches the CLI should print the materialized links:
External Links:
- Tensorboard: http://...
- Latest checkpoint: s3://.../model.pt
Envs
We may also want to add these links to the job metadata so it's easy for people viewing the job on the native scheduler to find and click the links.
Alternatives
Runtime service/library to track experiment metadata. Hard since we don't currently have a runtime other than torchelastic and it still wouldn't show up on the user CLI