Description
Description
Add a new torchx dashboard
command that will launch a local HTTP server that allows users to view all of their jobs with statuses, logs and integration with any ML specific extras such as artifacts, Tensorboard, etc.
Motivation/Background
Currently the interface for TorchX is only via programmatic or via the CLI. It would also be nice to have a UI dashboard that could be used to monitor all of your job as well as support deeper integrations such as experiment tracking and metrics.
Right now if users want to use a UI they have to use their platform specific one (i.e aws batch/ray dashboard) and many don't have one (slurm/volcano).
Detailed Proposal
This would be a fairly simple interface built on top of something such as Flask (https://flask.palletsprojects.com/en/2.1.x/quickstart/).
Pages:
/
the main page with a list of all of the users jobs and filters/<scheduler>/<jobid>
an overview of the job, the job def and the status with a tab for logs, artifacts and any other URLs that are logged/<scheduler>/<jobid>/logs
- view the logs/<scheduler>/<jobid>/external/<metadata key>
- iframes based off of external services such as tensorboard etc
Alternatives
Providing a way to view URLs for external services via the terminal.