Open
Description
Description
All the following classes use n_neighbors
:
ADASYN
OneSidedSelection
NeighbourhoodCleaningRule
NearMiss
AllKNN
RepeatedEditedNearestNeighbours
EditedNearestNeighbours
CondensedNearestNeighbour
Whereas k_neighbors
is used with SMOTE
and all its variants.
This poses a problem with duck-typing and pipelines.
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV
from imblearn.pipeline import Pipeline
from imblearn.over_sampling import ADASYN
from imblearn.over_sampling import SMOTE
X, y = ...
smote = SMOTE()
adasyn = ADASYN()
logreg = LogisticRegression()
smote_pipe = Pipeline([('sampler', smote), ('classifier', logreg)])
adasyn_pipe = Pipeline([('sampler', adasyn), ('classifier', logreg)])
params = dict(sampler__n_neighbors=range(3, 6))
smote_grid = GridSearchCV(smote_pipe, params)
adasyn_grid = GridSearchCV(adasyn_pipe, params)
# fails due to k_neighbors instead of n_neighbors
# I am forced to make a new params dict
smote_grid.fit(X, y)
# succeeds
adasyn_grid.fit(X, y)
Expected Results
SMOTE
would benefit using n_neighbors
to have consistent API.
Versions
Darwin-18.7.0-x86_64-i386-64bit
Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 14:38:56)
[Clang 4.0.1 (tags/RELEASE_401/final)]
NumPy 1.17.1
SciPy 1.3.1
Scikit-Learn 0.21.3
Imbalanced-Learn 0.5.0