-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Development for multioutput regression #803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
just figured out there are more parts need to be revised
|
Long time no see. I have revised some part of the pipelines that caused crashes last time. Now it works on a multioutput regression of a 96-output problem for me.
The warning messages are shown below: Not yet supporting sklearn.multioutput wrapper. But those regressors intrinsically support multioutput will be considered as ensembles combination. The travis_ci tester seems not supporting my new MULTIOUPUT_REGRESSION task label. |
I found out my revision works on the 0.6.0 version but not the development version. |
* First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix
…ty01 Clip predict values to [0-1] in classification
…automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work
* More robust tmp file naming * UUID approach
* Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers
* Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, thanks a lot for all your effort!
Mulitoutput regression only picks regreesors natively support multioutput, not sure show add Multioutput wrapper function of sklearn for the others or not, should try it
I guess this is fine for the beginning
Feature preprocessing has not yet implemented multioutput part, but in my test locally it still somehow works. This should be dealt later on.
I guess it would be good to check this actually now. A simple way to do so is to add a new unit test in test/test_pipeline/test_regression.py following the pattern of test_configurations (or test_multiclass in the respective classifition test). That'll randomly sample configurations and fail if they are invalid.
Also, could it be that you forgot to change the properties of the gradient boosting classifier?
Also, it would probably be good if you test the AutoSklearnRegressor with multilabel output in test/test_automl/test_estimators to ensure that it will continue to work in the future.
BTW: we're currently looking into improving the memory usage of the ensemble part, so it should be able to handle more models with less RAM usage in the future.
Thanks for the advices and corrections!
Sure I will do that. Will taking dataset.load_linenerud as the multioutput regression testing data good enough? Or a random generated multioutput regression target will be better?
Do you mean multilabel-indicator? From my understanding, shouldn't multioutput regression include continuous-multioutput only or at most include also multiclass-multioutput? |
|
Hi @mfeurer would you review my code? Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @charlesfu4 this looks really great, thanks for your work!
I have to minor questions and then we're good to merge this I believe. We'll do a release today (0.7.1) and will then include your changes in the next release (0.7.2).
Regarding your comments:
- yes, we're aware of the kernelPCA problem and unfortunately it'll only be resolved once we upgrade scikit-learn to 0.23
- yes, it would be great if you could have a look into why it fails with the KFold resampling strategy. I don't think you need to look into the time series split as that one is not really working well yet anyway.
autosklearn/pipeline/components/data_preprocessing/minority_coalescense/minority_coalescer.py
Outdated
Show resolved
Hide resolved
autosklearn/pipeline/components/feature_preprocessing/truncatedSVD.py
Outdated
Show resolved
Hide resolved
It's my pleasure to participate in the development, thank you for the code reviews and advices. |
* PEP8 (automl#718) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * automl.py missing import * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * automl#782 showcase pipeline components iteration * Fixed flake-8 violations * Release note 070 (automl#842) * First version of 070 release notes * Missed a bugfix * Vim added unexpected space -- fix * prepare new release (automl#846) * Clip predict values to [0-1] in classification * Fix for 3.5 python! * Sensible default value of 'score_func' for SelectPercentileRegression (automl#843) Currently default value of 'score_func' for SelectPercentileRegression is "f_classif", which is an invalid value, and will surely be rejected and will not work * More robust tmp file naming (automl#854) * More robust tmp file naming * UUID approach * 771 worst possible result (automl#845) * Initial Commit * Make worst result a function * worst possible result in metric * Fixing the name of the scorers * Add exceptions to log file, not just stdout (automl#863) * Add exceptions to log file, not just stdout * Removing dummy pred as trys is not needed * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * 715 ml memory (automl#865) * automl#715 Support for no ml memory limit * API update * Docs enhancement (automl#862) * Improved docs * Fixed example typos * Beautify examples * cleanup examples * fixed rsa equal * Move to minmax scaler (automl#866) * Do not read predictions in memory, only after score (automl#870) * Do not read predictions in memory, only after score * Precission support for string/int * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * Fix dataprocessing get params (automl#877) * Fix dataprocessing get params * Add clone-test to regression pipeline * Allow 1-D threshold binary predictions (automl#879) * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput_regression * multioutput regression * multioutput regression * multioutput regression * multioutput regression * multi_output regression v1 * fix y_shape in multioutput regression * fix xy_data_manager change due to merge * fix single output regression not working * regression need no _enusre_prediction_array_size_prediction_array_sizess * Add prediction with models trained with cross-validation (automl#864) * add the possibility to predict with cross-validation * fix unit tests * test new feature, too * multioutput_regression * multioutput_regression * multioutput_regression * Removal of competition manager (automl#869) * Removal of competition manager * Removed additional unused methods/files and moved metrics to estimator * Fix meta data generation * Make sure pytest is older newer than 4.6 * Unit tst fixing * flake8 fixes in examples * Fix metadata gen metrics * multioutput after rebased to 0.7.0 Problem: Cause: Solution: * Regressor target y shape index out of range * Revision for make tester * Revision: Cancel Multiclass-MultiOuput * Resolve automl.py metrics(__init__) reg_gb reg_svm * Fix Flake8 errors * Fix automl.py flake8 * Preprocess w/ mulitout reg,automl self._n_outputs * test_estimator.py changed back * cancel multioutput multiclass for multi reg * Fix automl self._n_output update placement * fix flake8 * Kernel pca cancelled mulitout reg * Kernel PCA test skip python <3.8 * Add test unit for multioutput reg and fix. * Fix flake8 error * Kernel PCA multioutput regression * default kernel to cosine, dodge sklearn=0.22 error * Kernel PCA should be updated to 0.23 * Kernel PCA uses rbf kernel * Kernel Pca * Modify labels in reg, class, perpro in examples * Kernel PCA * Add missing supports to mincoal and truncateSVD Co-authored-by: Matthias Feurer <[email protected]> Co-authored-by: chico <[email protected]> Co-authored-by: Francisco Rivera Valverde <[email protected]> Co-authored-by: Xiaodong DENG <[email protected]>
Multioutput regression based on #292 revision according to what mfeurer mentioned. look like there are still some part missing, I will try to figure it out.