Implement an HPO builtin

## Description
Add a builtin component for launching HPO (hyper-parameter optimization) jobs. At a high-level something akin to:

```
# for grid search
$ torchx run -s kubernetes hpo.grid_search --paramspacefile=~/parameters.json --component dist.ddp

# for bayesian search
$ torchx run -s kubernetes hpo.bayesian ...
```

In both cases we use the Ax/TorchX integration to run the HPO driver job. (see motivation section below for details)

## Motivation/Background
TorchX already integrates with Ax that supports both bayesian and grid_search HPO. Some definitions before we get started:

1. Ax: Experiment - ([docs](https://ax.dev/docs/glossary.html#experiment)) Defines the HPO search space and holds the optimizer state. Vends out the next set of parameters to search based on the observed results (relevant for Bayesian and Bandit optimizations, not so much for grid search).
2. Ax: Trials - ([docs](https://ax.dev/docs/glossary.html#trial)) A step in an experiment, aka a (training) job that runs with a specific set of hyper-parameters as vended out by the optimizer in the experiment
3. Ax: Runner - ([docs](https://ax.dev/docs/glossary.html#runner)) Responsible for launching trials.

Ax/TorchX integration is done at the Runner level. We implemented an [`ax/TorchXRunner`](https://ax.dev/api/runners.html#module-ax.runners.torchx) that implements Ax's `Runner` interface (do not confuse this with the TorchX runner. TorchX itself defines a runner concept). The `ax/TorchXRunner` runs the ax Trials using TorchX.

The [`ax/TorchXRunnerTest`](https://github.com/facebook/Ax/blob/main/ax/runners/tests/test_torchx.py#L72) serves as a full end-to-end example of how everything works. In summary the test runs a bayesian HPO to minimize the ["booth" function](https://en.wikipedia.org/wiki/Test_functions_for_optimization). **Note that in practice this function is replaced by your "trainer"**. The main module that computes the booth function given the parameters `x_1` and `x_2` as inputs is defined in [`torchx.apps.utils.booth`](https://github.com/pytorch/torchx/blob/main/torchx/apps/utils/booth_main.py).

The abridged code looks something like this:
  ```python
      parameters: List[Parameter] = [
            RangeParameter(
                name="x1",
                lower=-10.0,
                upper=10.0,
                parameter_type=ParameterType.FLOAT,
            ),
            RangeParameter(
                name="x2",
                lower=-10.0,
                upper=10.0,
                parameter_type=ParameterType.FLOAT,
            ),
        ]
      experiment = Experiment(
            name="torchx_booth_sequential_demo",
            search_space=SearchSpace(parameters=self._parameters),
            optimization_config=OptimizationConfig(
                 objective = Objective(metric=TorchXMetric(name="booth_eval"),
                 minimize=True,
            ),
           runner=TorchXRunner(
               tracker_base=self.test_dir,
               component=utils.booth,
               scheduler="local_cwd",
               cfg={"prepend_cwd": True},
           ),
       )

      scheduler = Scheduler( 
            experiment=experiment,
            generation_strategy=choose_generation_strategy(search_space=experiment.search_space),
            options=SchedulerOptions(),
      )

      for _ in range(3):
          scheduler.run_n_trials(max_trials=2)   
      scheduler.report_results()
  ```

## Detailed Proposal
The task here is to essentially create pre-packaged applications for the code above. We can define a two types of HPO apps by the "strategy" used:
1. hpo.grid_search
2. hpo.bayesian

Each application will come with a companion "component" (e.g. `hpo.grid_search` and `hpo.bayesian`). The applications should be designed to take as input:

1. parameter space
2. what the objective function is (e.g. trainer)
3. torchx cfgs (e.g. scheduler, scheduler runcfg, etc)
4. ax experiment configs

The challenge is to be able to correctly and sanely "parameterize" the application in such a way that allows the user to sanely pass these argument from the CLI. For complex parameters such as parameter space, one might consider taking a file in a specific format rather than conjuring up a complex string encoding to pass as CLI input. 

For instance for the `20 x 20` for `x_1` and `x_2` in the example above, rather than taking the parameter space as:
```
$ torchx run hpo.bayesian --parameter_space x_1=-10:10,x2_=-10:10
```

One can take it as a well defined python parameter file:
```
# params.py
# just defines the parameters using the regular Ax APIs
parameters: List[Parameter] = [
            RangeParameter(
                name="x1",
                lower=-10.0,
                upper=10.0,
                parameter_type=ParameterType.FLOAT,
            ),
            RangeParameter(
                name="x2",
                lower=-10.0,
                upper=10.0,
                parameter_type=ParameterType.FLOAT,
            ),
        ]
```

## Alternatives
Document how users can write their own hpo application and instruct them to run it with `torchx run utils.python` since the hpo driver application only needs to run from a single process.


## Additional context/links
See hyperlinks above.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement an HPO builtin #510

Description

Motivation/Background

Detailed Proposal

Alternatives

Additional context/links

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Implement an HPO builtin #510

Description

Description

Motivation/Background

Detailed Proposal

Alternatives

Additional context/links

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions