AutoML does not pass proper `objective` to `estimator_class` when `metric` is non-default. #1327

Atry · 2024-07-31T01:24:56Z

Currently, neither search_space nor get_params in LGBMEstimator passes objective to the params.

Lines 1266 to 1309 in a68d073

    
           def search_space(cls, data_size, **params): 
        
               upper = max(5, min(32768, int(data_size[0])))  # upper must be larger than lower 
        
               return { 
        
                   "n_estimators": { 
        
                       "domain": tune.lograndint(lower=4, upper=upper), 
        
                       "init_value": 4, 
        
                       "low_cost_init_value": 4, 
        
                   }, 
        
                   "num_leaves": { 
        
                       "domain": tune.lograndint(lower=4, upper=upper), 
        
                       "init_value": 4, 
        
                       "low_cost_init_value": 4, 
        
                   }, 
        
                   "min_child_samples": { 
        
                       "domain": tune.lograndint(lower=2, upper=2**7 + 1), 
        
                       "init_value": 20, 
        
                   }, 
        
                   "learning_rate": { 
        
                       "domain": tune.loguniform(lower=1 / 1024, upper=1.0), 
        
                       "init_value": 0.1, 
        
                   }, 
        
                   "log_max_bin": {  # log transformed with base 2 
        
                       "domain": tune.lograndint(lower=3, upper=11), 
        
                       "init_value": 8, 
        
                   }, 
        
                   "colsample_bytree": { 
        
                       "domain": tune.uniform(lower=0.01, upper=1.0), 
        
                       "init_value": 1.0, 
        
                   }, 
        
                   "reg_alpha": { 
        
                       "domain": tune.loguniform(lower=1 / 1024, upper=1024), 
        
                       "init_value": 1 / 1024, 
        
                   }, 
        
                   "reg_lambda": { 
        
                       "domain": tune.loguniform(lower=1 / 1024, upper=1024), 
        
                       "init_value": 1.0, 
        
                   }, 
        
               } 
        
           def config2params(self, config: dict) -> dict: 
        
               params = super().config2params(config) 
        
               if "log_max_bin" in params: 
        
                   params["max_bin"] = (1 << params.pop("log_max_bin")) - 1 
        
               return params

As a result, the objective of the underlying model is always the default value reg:squarederror, even when the metric is mape or other non-default value.

Ideally, objective should be either configured in the search space or derived from metric.

The text was updated successfully, but these errors were encountered:

thinkall · 2024-08-07T02:40:04Z

Thank you for the feedback, @Atry ! Would you like to raise a PR for this?

thinkall · 2024-10-31T08:30:51Z

Hi @dannycg1996 , what do you think of this issue?

dannycg1996 · 2024-10-31T14:12:16Z

Hi @Atry and @thinkall,
This is an interesting issue!

My initial thought is that I like this feature. A few points which came to mind:

I prefer deriving the objective from the search space, over the metric:
- Not all metrics available within FLAML are in-built within LGBM (such as f1)
- I'm worried we might end up with inconsistent behaviour, where sometimes we derive the objective from the metric, and sometimes we don't
- Deriving the objective from the search space gives users explicit control over the objective i.e. 'The LGBM objective will be l2 for regression tasks, unless this is overridden with a custom search space'
- There might be cases where the user wishes to have the LGBM objective differ from the objective metric used in the AutoML process.
I prefer that we don't allow users to pass custom metrics in as the objective metric:
- Custom metrics used for the objective have to be written in this form:
  , which I don't think aligns with the standard SKLearn implementation of a lot of these metrics.
- A quick check of the LGBM code seems to suggest to me that objectives (i.e. MAPE) are written in C, presumably for speed. Even if we can pass Python methods in, we might see some performance issues.

I'm still unfamiliar with large parts of the FLAML and LGBM codebases, but I think implementing this is possible, where if the user dictates a value for the objective within the LGBM search space (i.e. setting it to 'mape') then we can pass that through to the LGBMRegressor.

If we do implement something along these lines, then updating documentation is important, and I'd also like to have some sort of benchmark comparison.

Please let me know what your thoughts are @thinkall

thinkall added the bug Something isn't working label Aug 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AutoML does not pass proper `objective` to `estimator_class` when `metric` is non-default. #1327

AutoML does not pass proper `objective` to `estimator_class` when `metric` is non-default. #1327

Atry commented Jul 31, 2024 •

edited

Loading

thinkall commented Aug 7, 2024

thinkall commented Oct 31, 2024

dannycg1996 commented Oct 31, 2024

AutoML does not pass proper objective to estimator_class when metric is non-default. #1327

AutoML does not pass proper objective to estimator_class when metric is non-default. #1327

Comments

Atry commented Jul 31, 2024 • edited Loading

thinkall commented Aug 7, 2024

thinkall commented Oct 31, 2024

dannycg1996 commented Oct 31, 2024

AutoML does not pass proper `objective` to `estimator_class` when `metric` is non-default. #1327

AutoML does not pass proper `objective` to `estimator_class` when `metric` is non-default. #1327

Atry commented Jul 31, 2024 •

edited

Loading