Development for multioutput regression #803

charlesfu4 · 2020-03-17T12:30:03Z

Multioutput regression based on #292 revision according to what mfeurer mentioned. look like there are still some part missing, I will try to figure it out.

charlesfu4 · 2020-03-18T11:26:48Z

just figured out there are more parts need to be revised

metrics.py
smbo.py
and so on

charlesfu4 · 2020-05-01T23:22:39Z

Long time no see. I have revised some part of the pipelines that caused crashes last time. Now it works on a multioutput regression of a 96-output problem for me.

Note that the ensemble_memory_limit should be adjusted larger.
For example, fitting trian_X, train_y with size (35605, 112), (35605, 96), and ensemble_nbest can be only set to 12 with ensemble memory limit given 2048MB.
There are minor issues happening during fitting process
For example sometimes it got stuck at the final time when the SMAC providing challenger not better than the previous one. Not sure what caused this.

The warning messages are shown below:
[WARNING] [2020-05-02 00:38:08,675:smac.intensification.intensification.Intensifier] Challenger was the same as the current incumbent; Skipping challenger
[WARNING] [2020-05-02 00:38:08,675:smac.intensification.intensification.Intensifier] Challenger was the same as the current incumbent; Skipping challenger

Not yet supporting sklearn.multioutput wrapper. But those regressors intrinsically support multioutput will be considered as ensembles combination.

The travis_ci tester seems not supporting my new MULTIOUPUT_REGRESSION task label.

… development

charlesfu4 · 2020-05-04T09:19:20Z

I found out my revision works on the 0.6.0 version but not the development version.
In the development version, it would only output Dummy regressor after the same running time. I think that's because the new budget functionality that caused this problem. I will figure out how to fix it.

* First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix

…ty01 Clip predict values to [0-1] in classification

782 examples

…automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work

* More robust tmp file naming * UUID approach

* Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers

* Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed

mfeurer

This looks great, thanks a lot for all your effort!

Mulitoutput regression only picks regreesors natively support multioutput, not sure show add Multioutput wrapper function of sklearn for the others or not, should try it

I guess this is fine for the beginning

Feature preprocessing has not yet implemented multioutput part, but in my test locally it still somehow works. This should be dealt later on.

I guess it would be good to check this actually now. A simple way to do so is to add a new unit test in test/test_pipeline/test_regression.py following the pattern of test_configurations (or test_multiclass in the respective classifition test). That'll randomly sample configurations and fail if they are invalid.

Also, could it be that you forgot to change the properties of the gradient boosting classifier?

Also, it would probably be good if you test the AutoSklearnRegressor with multilabel output in test/test_automl/test_estimators to ensure that it will continue to work in the future.

BTW: we're currently looking into improving the memory usage of the ensemble part, so it should be able to handle more models with less RAM usage in the future.

autosklearn/automl.py

autosklearn/metrics/__init__.py

charlesfu4 · 2020-06-19T21:34:57Z

Thanks for the advices and corrections!

I guess it would be good to check this actually now. A simple way to do so is to add a new unit test in test/test_pipeline/test_regression.py following the pattern of test_configurations (or test_multiclass in the respective classifition test). That'll randomly sample configurations and fail if they are invalid.

Sure I will do that. Will taking dataset.load_linenerud as the multioutput regression testing data good enough? Or a random generated multioutput regression target will be better?

Also, it would probably be good if you test the AutoSklearnRegressor with multilabel output in test/test_automl/test_estimators to ensure that it will continue to work in the future.

Do you mean multilabel-indicator? From my understanding, shouldn't multioutput regression include continuous-multioutput only or at most include also multiclass-multioutput?

charlesfu4 · 2020-06-22T10:54:40Z

Added label handles_multioutput:
For all the models under feature_preprocessing, data_preprocessing, and classification(all set to False). Whether each fp or dp model supports multioutput was considered by sklearn documentation.
Added multioutput unit test and fixed bugs:
- Find out there were few bugs: Under pipeline/regression.py, includes a repeated function get_available_components, which also appears in pipeline/components/regression/init.py.
- Under pipeline/components/regression/init.py, a redundant term data_prop was replaced as dataset_properties.
- Add 'handels_multioutput' in ThirdPartComponents should_be_there of pipeline/components/base.py
- Add 'handels_multioutput' in DummyClassifier and DummyPreprocessor in test/test_pipeline/test_classification.py
- Add test_multioutput in test/test_pipeline/test_regression.py, with random generated data (20 features and 4 targets)
Problem with Kernel_PCA:
I tried to dodge the error several times by changing the train_size_maximum. But it's better to solve it in the scikit-learn back end.
BUG Fixes kernel PCA raising "invalid value encountered in mul… scikit-learn/scikit-learn#16718 solved it and I think it is only available until scikit-learn=0.23, test_kernel_pca almost always failed I I think this could be solved after making this version compatible to scikit-learn=0.23

charlesfu4 · 2020-06-29T15:52:20Z

Hi @mfeurer would you review my code? Thanks!
Also, I found out that if I fed resampling strategy with sklearn BaseCrossValidator like KFold or TimeSeriesSplit object, it always returned Dummy classifier and Dummy regressor. I also tested them on 0.7.0 master version without my multi-output regression modification. The results were the same. I think I can try to look into it and fix it, should I open an issue for it? Thank you!

mfeurer

Hey @charlesfu4 this looks really great, thanks for your work!

I have to minor questions and then we're good to merge this I believe. We'll do a release today (0.7.1) and will then include your changes in the next release (0.7.2).

Regarding your comments:

yes, we're aware of the kernelPCA problem and unfortunately it'll only be resolved once we upgrade scikit-learn to 0.23
yes, it would be great if you could have a look into why it fails with the KFold resampling strategy. I don't think you need to look into the time series split as that one is not really working well yet anyway.

autosklearn/pipeline/components/data_preprocessing/minority_coalescense/minority_coalescer.py

autosklearn/pipeline/components/feature_preprocessing/truncatedSVD.py

charlesfu4 · 2020-07-03T09:00:05Z

Hey @charlesfu4 this looks really great, thanks for your work!

I have to minor questions and then we're good to merge this I believe. We'll do a release today (0.7.1) and will then include your changes in the next release (0.7.2).

Regarding your comments:

yes, we're aware of the kernelPCA problem and unfortunately it'll only be resolved once we upgrade scikit-learn to 0.23

yes, it would be great if you could have a look into why it fails with the KFold resampling strategy. I don't think you need to look into the time series split as that one is not really working well yet anyway.

It's my pleasure to participate in the development, thank you for the code reviews and advices.

* PEP8 (automl#718) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * automl.py missing import * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * multioutput_regression * multioutput_regression * multioutput_regression * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * multioutput after rebased to 0.7.0 Problem: Cause: Solution: * Regressor target y shape index out of range * Revision for make tester * Revision: Cancel Multiclass-MultiOuput * Resolve automl.py metrics(__init__) reg_gb reg_svm * Fix Flake8 errors * Fix automl.py flake8 * Preprocess w/ mulitout reg,automl self._n_outputs * test_estimator.py changed back * cancel multioutput multiclass for multi reg * Fix automl self._n_output update placement * fix flake8 * Kernel pca cancelled mulitout reg * Kernel PCA test skip python <3.8 * Add test unit for multioutput reg and fix. * Fix flake8 error * Kernel PCA multioutput regression * default kernel to cosine, dodge sklearn=0.22 error * Kernel PCA should be updated to 0.23 * Kernel PCA uses rbf kernel * Kernel Pca * Modify labels in reg, class, perpro in examples * Kernel PCA * Add missing supports to mincoal and truncateSVD Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: chico <[email protected]> Co-authored-by: Francisco Rivera Valverde <[email protected]> Co-authored-by: Xiaodong DENG <[email protected]>

mfeurer and others added 9 commits March 17, 2020 12:45

PEP8 (automl#718)

ee5c3b4

multioutput_regression

9465cc2

multioutput_regression

e1aea7e

multioutput_regression

d9f5719

multioutput regression

db36c6d

multioutput regression

4e21f4d

multioutput regression

7e05994

multioutput regression

73638f3

multioutput regression

54ae859

franchuterivera and others added 4 commits April 14, 2020 22:10

automl#782 showcase pipeline components iteration

eab2bf2

Fixed flake-8 violations

af7a9e6

multi_output regression v1

7955945

Merge branch 'development' into development

ff23a05

charlesfu4 added 3 commits May 2, 2020 11:17

fix y_shape in multioutput regression

f42cdee

Merge branch 'development' of github.com:charlesfu4/auto-sklearn into…

501824f

… development

fix xy_data_manager change due to merge

67e0e2f

charlesfu4 and others added 11 commits May 4, 2020 11:52

automl.py missing import

34ab98a

Release note 070 (automl#842)

3ddb1e5

* First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix

prepare new release (automl#846)

60f7b89

Clip predict values to [0-1] in classification

73e013b

Fix for 3.5 python!

a25734c

Merge pull request automl#852 from franchuterivera/ensemble_probabili…

fcdcdc5

…ty01 Clip predict values to [0-1] in classification

Merge pull request automl#828 from franchuterivera/782_examples

2ffe9dd

782 examples

Sensible default value of 'score_func' for SelectPercentileRegression (…

2e457b9

…automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work

More robust tmp file naming (automl#854)

e0ebe95

* More robust tmp file naming * UUID approach

771 worst possible result (automl#845)

d43a1db

* Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers

Add exceptions to log file, not just stdout (automl#863)

405eaa2

* Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed

mfeurer closed this Jun 18, 2020

mfeurer reopened this Jun 18, 2020

mfeurer reviewed Jun 18, 2020

View reviewed changes

autosklearn/automl.py Outdated Show resolved Hide resolved

autosklearn/automl.py Outdated Show resolved Hide resolved

autosklearn/automl.py Outdated Show resolved Hide resolved

autosklearn/metrics/__init__.py Outdated Show resolved Hide resolved

charlesfu4 added 2 commits June 19, 2020 22:36

Resolve automl.py metrics(__init__) reg_gb reg_svm

b8ca305

Fix Flake8 errors

cc6b9f9

charlesfu4 added 14 commits June 19, 2020 23:43

Fix automl.py flake8

1a32471

Preprocess w/ mulitout reg,automl self._n_outputs

844b072

test_estimator.py changed back

44ed057

cancel multioutput multiclass for multi reg

9c7dcf3

Fix automl self._n_output update placement

f4ba39d

fix flake8

f89fbe9

Kernel pca cancelled mulitout reg

ed60351

Kernel PCA test skip python <3.8

0560c71

Add test unit for multioutput reg and fix.

abb2ece

Fix flake8 error

80cb676

Kernel PCA multioutput regression

82c9d44

default kernel to cosine, dodge sklearn=0.22 error

998e95c

Kernel PCA should be updated to 0.23

20acbf6

Kernel PCA uses rbf kernel

51cbcfa

charlesfu4 added 3 commits June 22, 2020 13:34

Kernel Pca

91662e1

Modify labels in reg, class, perpro in examples

844a0ef

Kernel PCA

212d864

mfeurer reviewed Jul 3, 2020

View reviewed changes

autosklearn/pipeline/components/data_preprocessing/minority_coalescense/minority_coalescer.py Outdated Show resolved Hide resolved

autosklearn/pipeline/components/feature_preprocessing/truncatedSVD.py Outdated Show resolved Hide resolved

Add missing supports to mincoal and truncateSVD

3ee826e

mfeurer merged commit 9a8ba56 into automl:development Jul 3, 2020

Development for multioutput regression #803

Development for multioutput regression #803

Uh oh!

Conversation

charlesfu4 commented Mar 17, 2020

Uh oh!

charlesfu4 commented Mar 18, 2020

Uh oh!

charlesfu4 commented May 1, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charlesfu4 commented May 4, 2020

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

charlesfu4 commented Jun 19, 2020

Uh oh!

charlesfu4 commented Jun 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charlesfu4 commented Jun 29, 2020

Uh oh!

mfeurer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

charlesfu4 commented Jul 3, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

charlesfu4 commented May 1, 2020 •

edited

Loading

charlesfu4 commented Jun 22, 2020 •

edited

Loading