scikit-learn-compatible estimators from Civis Analytics
Installation with pip
is recommended:
$ pip install civisml-extensions
For development, a few additional dependencies are needed:
$ pip install -r dev-requirements.txt
This package contains scikit-learn-compatible estimators for stacking (
StackedClassifier
, StackedRegressor
), non-negative linear regression (
NonNegativeLinearRegression
), preprocessing pandas DataFrames
(
DataFrameETL
), and using Hyperband for cross-validating hyperparameters (
HyperbandSearchCV
).
Usage of these estimators follows the standard sklearn conventions. Here is an
example of using the StackedClassifier
:
>>> from sklearn.linear_model import LogisticRegression >>> from sklearn.ensemble import RandomForestClassifier >>> from civismlext.stacking import StackedClassifier >>> >>> # Define some Train data and labels >>> Xtrain, ytrain = <train_features>, <train_labels> >>> >>> # Note that the final estimator 'metalr' is the meta-estimator >>> estlist = [('rf', RandomForestClassifier()), >>> ('lr', LogisticRegression()), >>> ('metalr', LogisticRegression())] >>> >>> mysm = StackedClassifier(estlist) >>> # Set some parameters, if you didn't set them at instantiation >>> mysm.set_params(rf__random_state=7, lr__random_state=8, >>> metalr__random_state=9, metalr__C=10**7) >>> >>> # Fit >>> mysm.fit(Xtrain, ytrain) >>> >>> # Predict! >>> ypred = mysm.predict_proba(Xtest)
You can learn more about stacking and see an example use of the StackedRegressor
and NonNegativeLinearRegression
estimators in a talk presented at PyData NYC in November, 2017.
See the doc strings of the various estimators for more information.
Please see CONTRIBUTING.md
for information about contributing to this project.
BSD-3
See LICENSE.md
for details.