Skip to content

Implement an HPO builtin #510

Open
Open
@kiukchung

Description

@kiukchung

Description

Add a builtin component for launching HPO (hyper-parameter optimization) jobs. At a high-level something akin to:

# for grid search
$ torchx run -s kubernetes hpo.grid_search --paramspacefile=~/parameters.json --component dist.ddp

# for bayesian search
$ torchx run -s kubernetes hpo.bayesian ...

In both cases we use the Ax/TorchX integration to run the HPO driver job. (see motivation section below for details)

Motivation/Background

TorchX already integrates with Ax that supports both bayesian and grid_search HPO. Some definitions before we get started:

  1. Ax: Experiment - (docs) Defines the HPO search space and holds the optimizer state. Vends out the next set of parameters to search based on the observed results (relevant for Bayesian and Bandit optimizations, not so much for grid search).
  2. Ax: Trials - (docs) A step in an experiment, aka a (training) job that runs with a specific set of hyper-parameters as vended out by the optimizer in the experiment
  3. Ax: Runner - (docs) Responsible for launching trials.

Ax/TorchX integration is done at the Runner level. We implemented an ax/TorchXRunner that implements Ax's Runner interface (do not confuse this with the TorchX runner. TorchX itself defines a runner concept). The ax/TorchXRunner runs the ax Trials using TorchX.

The ax/TorchXRunnerTest serves as a full end-to-end example of how everything works. In summary the test runs a bayesian HPO to minimize the "booth" function. Note that in practice this function is replaced by your "trainer". The main module that computes the booth function given the parameters x_1 and x_2 as inputs is defined in torchx.apps.utils.booth.

The abridged code looks something like this:

    parameters: List[Parameter] = [
          RangeParameter(
              name="x1",
              lower=-10.0,
              upper=10.0,
              parameter_type=ParameterType.FLOAT,
          ),
          RangeParameter(
              name="x2",
              lower=-10.0,
              upper=10.0,
              parameter_type=ParameterType.FLOAT,
          ),
      ]
    experiment = Experiment(
          name="torchx_booth_sequential_demo",
          search_space=SearchSpace(parameters=self._parameters),
          optimization_config=OptimizationConfig(
               objective = Objective(metric=TorchXMetric(name="booth_eval"),
               minimize=True,
          ),
         runner=TorchXRunner(
             tracker_base=self.test_dir,
             component=utils.booth,
             scheduler="local_cwd",
             cfg={"prepend_cwd": True},
         ),
     )

    scheduler = Scheduler( 
          experiment=experiment,
          generation_strategy=choose_generation_strategy(search_space=experiment.search_space),
          options=SchedulerOptions(),
    )

    for _ in range(3):
        scheduler.run_n_trials(max_trials=2)   
    scheduler.report_results()

Detailed Proposal

The task here is to essentially create pre-packaged applications for the code above. We can define a two types of HPO apps by the "strategy" used:

  1. hpo.grid_search
  2. hpo.bayesian

Each application will come with a companion "component" (e.g. hpo.grid_search and hpo.bayesian). The applications should be designed to take as input:

  1. parameter space
  2. what the objective function is (e.g. trainer)
  3. torchx cfgs (e.g. scheduler, scheduler runcfg, etc)
  4. ax experiment configs

The challenge is to be able to correctly and sanely "parameterize" the application in such a way that allows the user to sanely pass these argument from the CLI. For complex parameters such as parameter space, one might consider taking a file in a specific format rather than conjuring up a complex string encoding to pass as CLI input.

For instance for the 20 x 20 for x_1 and x_2 in the example above, rather than taking the parameter space as:

$ torchx run hpo.bayesian --parameter_space x_1=-10:10,x2_=-10:10

One can take it as a well defined python parameter file:

# params.py
# just defines the parameters using the regular Ax APIs
parameters: List[Parameter] = [
            RangeParameter(
                name="x1",
                lower=-10.0,
                upper=10.0,
                parameter_type=ParameterType.FLOAT,
            ),
            RangeParameter(
                name="x2",
                lower=-10.0,
                upper=10.0,
                parameter_type=ParameterType.FLOAT,
            ),
        ]

Alternatives

Document how users can write their own hpo application and instruct them to run it with torchx run utils.python since the hpo driver application only needs to run from a single process.

Additional context/links

See hyperlinks above.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or requestmodule: componentsissues related to the torchx.components (builtins) module

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions