Skip to content

Add RFECV support #1015

@windowshopr

Description

@windowshopr

Reviewing this page, most of the feature selectors offered by SKLearn are covered, however it would be cool to see dask implement RFECV into that mix as well! I'd like to send a RFECV.fit() to a dask cluster:

    # Perform RFECV feature selection
    selector = RFECV(model, 
                     step=0.05, # Remove 5% of features at each iteration
                     min_features_to_select=5, # Keep at least 5 features
                     cv=TimeSeriesSplit(n_splits=3), 
                     scoring=SCORING_METRIC,
                     verbose=0,
                     n_jobs=-1,
                     )
    
    # Use Dask to parallelize the feature selection
    with joblib.parallel_backend('dask'):
        selector.fit(X_train, y_train)
'dict' object has no attribute 'estimator'
joblib.externals.loky.process_executor._RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\_utils.py", line 72, in __call__
    return self.func(**kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\_dask.py", line 131, in __call__
    results.append(func(*args, **kwargs))
                   ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\utils\parallel.py", line 139, in __call__
    return self.function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\feature_selection\_rfe.py", line 46, in _rfe_single_fit  
    X, params=routed_params.estimator.fit, indices=train
              ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'estimator'
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\pygad\pygad.py", line 1688, in cal_pop_fitness
    fitness = self.fitness_func(self, sol, sol_idx)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "I:\nasty\Python_Projects\Stock_Options_Trading\DailyStockClassifierAndRegressors2025\1_1_class_train_v2_Matts.py", line 337, in fitness_func
    selector.fit(X_train, y_train)
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\utils\validation.py", line 63, in inner_f
    return f(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\base.py", line 1389, in wrapper
    return fit_method(estimator, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    scores_features = parallel(
                      ^^^^^^^^^
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\sklearn\utils\parallel.py", line 77, in __call__
    return super().__call__(iterable_with_config)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\parallel.py", line 2007, in __call__
    return output if self.return_generator else list(output)
                                                ^^^^^^^^^^^^
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\parallel.py", line 1650, in _get_outputs
    yield from self._retrieve()
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\parallel.py", line 1754, in _retrieve
    self._raise_error_fast()
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\parallel.py", line 1789, in _raise_error_fast
    error_job.get_result(self.timeout)
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\parallel.py", line 745, in get_result
    return self._return_or_raise()
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\chalu\AppData\Local\Programs\Python\Python311\Lib\site-packages\joblib\parallel.py", line 763, in _return_or_raise
    raise self._result
AttributeError: 'dict' object has no attribute 'estimator'

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions