Skip to content

[exploratory] TorchX Dashboard #567

Open
@d4l3k

Description

@d4l3k

Description

Add a new torchx dashboard command that will launch a local HTTP server that allows users to view all of their jobs with statuses, logs and integration with any ML specific extras such as artifacts, Tensorboard, etc.

Motivation/Background

Currently the interface for TorchX is only via programmatic or via the CLI. It would also be nice to have a UI dashboard that could be used to monitor all of your job as well as support deeper integrations such as experiment tracking and metrics.

Right now if users want to use a UI they have to use their platform specific one (i.e aws batch/ray dashboard) and many don't have one (slurm/volcano).

Detailed Proposal

This would be a fairly simple interface built on top of something such as Flask (https://flask.palletsprojects.com/en/2.1.x/quickstart/).

Pages:

  • / the main page with a list of all of the users jobs and filters
  • /<scheduler>/<jobid> an overview of the job, the job def and the status with a tab for logs, artifacts and any other URLs that are logged
  • /<scheduler>/<jobid>/logs - view the logs
  • /<scheduler>/<jobid>/external/<metadata key> - iframes based off of external services such as tensorboard etc

Alternatives

Providing a way to view URLs for external services via the terminal.

Additional context/links

Metadata

Metadata

Assignees

No one assigned

    Labels

    RFCRequest for Feedback & RoadmapscliRelated to the CLIenhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions