Skip to content

job launch hooks for linking to external services such as tensorboard #509

Open
@d4l3k

Description

@d4l3k

Description

When users are launching TorchX jobs they often want a way to provide links to view the results on external services. This commonly includes things like Tensorboard. We want to add a generic way to allow this from components.

Detailed Proposal

Specs

Update the AppDef to include an external_links field that can be macro filled.
https://github.com/pytorch/torchx/blob/main/torchx/specs/api.py#L341

external_links: Dict[str, str] = field(default_factory=dict)

We'll likely want to add a new "apply_app" macro method that can apply to a whole appdef to materialize these values.

https://github.com/pytorch/torchx/blob/main/torchx/specs/api.py#L146

Usage

def my_component() -> AppDef:
    return AppDef(
        name="train",
        roles=[
            Role(
               name="blah"
               args=[
                   "--tensorboard", f"s3://foo/bar/{macros.app_id}",
               ]
          ),
      ],
      external_links={
          "Tensorboard": f"https://tb_service/foo/bar/{macros.app_id}",
      }
    ),

CLI

After the job launches the CLI should print the materialized links:

External Links:
  - Tensorboard: http://...
  - Latest checkpoint: s3://.../model.pt

Envs

We may also want to add these links to the job metadata so it's easy for people viewing the job on the native scheduler to find and click the links.

Alternatives

Runtime service/library to track experiment metadata. Hard since we don't currently have a runtime other than torchelastic and it still wouldn't show up on the user CLI

Additional context/links

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmodule: runnerissues related to the torchx.runner and torchx.scheduler modules

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions