Releases: mlr-org/mlr
mlr 2.19.1
Bug fixes
-
Adjust behavior of
"positive"
arg forclassif.logreg
(#2846) -
Consistent naming for dummy feature encoding of variables with different levels count (#2847)
-
Remove {nodeHarvest} learners (#2841)
-
Remove {rknn} learner (#2842)
-
Remove all {DiscriMiner} learners (#2840)
-
Remove {extraTrees} learner (#2839)
-
Remove depcrecated {rrlda} learner
-
Resolve some {ggplot2} deprecation warnings
-
Fixed
information.gain
filter calculation.
Before,chi.squared
was calculated even thoughinformation.gain
was requested due to a glitch in the filter naming (#2816, @jokokojote) -
Make
helpLearnerParam()
's HTML parsing more robust (#2843) -
Add HTML5 support for help pages
mlr 2.19.0
- Add filter
FSelectoRcpp::relief()
. This C++ based implementation of the RelieF filter algorithm is way faster than the Java based one from the {FSelector} package (#2804) - Fix S3 print method for
FilterWrapper
objects - Make ibrier measure work with survival tasks (#2789)
- Switch to testthat v3 (#2796)
- Enable parallel tests (#2796)
- Replace package PMCMR by PMCMRplus (#2796)
- Remove CoxBoost learner due to CRAN removal
- Warning if
fix.factors.prediction = TRUE
causes the generation of NAs for new factor levels in prediction (@jakob-r, #2794) - Clear error message if prediction of wrapped learner has not the same length as
newdata
(@jakob-r, #2794)
mlr 2.18.0
- Many praznik filters are now also able to deal with regression tasks (#2790, @bommert)
praznik_MRMR
: Remove handling of survival tasks (#2790, @bommert)- xgboost: update
objective
default fromreg:linear
(deprecated) toreg:squarederror
- issue a warning if
blocking
was set in the Task butblocking.cv
was not set within `makeResampleDesc() (#2788) - Fix order of learners in
generateLearningCurveData()
(#2768) getFeatureImportance()
: Account for feature importance weight of linear xgboost models- Fix learner note for learner glmnet (the default of param
s
did not match the learner note) (#2747) - Remove dependency {hrbrthemes} used in
createSpatialResamplingPlots()
. The package caused issues on R-devel. In addition users should set custom themes by themselves. - Explicitly return value in
getNestedTuneResultsOptPathDf()
(#2754)
mlr 2.17.1
Learners - bugfixes
- remove
regr_slim
learner due to pkg (flare) being orphaned on CRAN
Measures - bugixes
- remove measure
clValid::dunn
and its tests (package orphaned) (#2742) - Bugfix:
tuneThreshold()
now accounts for the direction of the measure.
Beforehand, the performance measure was always minimized (#2732). - Remove adjusted Rsq measure (arsq), fixes #2711
Filters - bugfixes
- Fixed an issue which caused the random forest minimal depth filter to only return NA values when using thresholding.
NAs should only be returned for features below the given threshold. (@annette987, #2710) - Fixed problem which prevented passing filter options via argument
more.args
for simple filters (@annette987, #2709)
Feature selection - bugfixes
- Fix
print.FeatSelResult()
when bits.to.features is used inselectFeatures()
(#2721) - Return a long DF for
getFeatureImportance()
(#2708)
Misc
mlr 2.17.0
plotting
n.show
argument had no effect inplotFilterValues()
. Thanks @albersonmiranda. (#2689)
Functional Data
PR: #2638 (@pfistl)
-
Added several learners for regression and classification on functional data
- classif.classiFunc.(kernel|knn) (knn/kernel using various semi-metrics)
- (classif|regr).fgam (Functional generalized additive models)
- (classif|regr).FDboost (Boosted functional generalized additive models)
-
Added preprocessing steps for feature extraction from functional data
- extractFDAFourier (Fourier transform)
- extractFDAWavelets (Wavelet features)
- extractFDAFPCA (Principal components)
- extractFDATsfeatures (Time-Series features from tsfeatures package)
- extractFDADTWKernel (Dynamic Time-Warping Kernel)
- extractFDAMultiResFeatures (Compute features at multiple resolutions)
-
Fixed a bug where multiclass to binaryclass reduction techniques did not work
with functional data. -
Several other minor bug fixes and code improvements
-
Extended and clarified documentation for several fda components.
learners - general
- xgboost: added options 'auto', 'approx' and 'gpu_hist' to param
tree_method
(@albersonmiranda, #2701)
filters - general
- Allow a custom threholding function to be passed to filterFeatures and makeFilterWrapper (@annette987, #2686)
- Allow ensemble filters to include multiple base filters of the same type (@annette987, #2688)
filters - bugfixes
filterFeatures()
: Argthresh
was not working correctly when applied to ensemble filters. (@annette987, #2699)- Fixed incorrect ranking of ensemble filters. Thanks @annette987 (#2698)
mlr 2.16.0
package infrastructure
- There is now a reference grouping for all functions on the pkgdown site (https://mlr.mlr-org.com/reference/index.html)
- CI testing now only on Circle CI (previously Travis CI)
learners - general
- fixed a bug in
classif.xgboost
which prevented passing a watchlist for binary tasks. This was caused by a suboptimal internal label inversion approach. Thanks to @001ben for reporting (#32) (@mllg) - update
fda.usc
learners to work with package version >=2.0 - update
glmnet
learners to upstream package version 3.0.0 - update
xgboost
learners to upstream version 0.90.2 (@pat-s & @be-marc, #2681) - Updated ParamSet for learners
classif.gbm
andregr.gbm
. Specifically, paramshrinkage
now defaults to 0.1 instead of 0.001. Also more choices for paramdistribution
have been added. Internal parallelization by the package is now suppressed (paramn.cores
). (@pat-s, #2651) - Update parameters for
h2o.deeplearning
learners (@albersonmiranda, #2668)
misc
learners - bugfixes
h2o.gbm
learners were not running untilwcol
was passed somehow due to an internal bug. In addition, this bug caused another issue during prediction where the predictiondata.frame
was somehow formatted as a character rather a numeric. Thanks to @nagdevAmruthnath for bringing this up in #2630.
filters - general
- Bugfix: Allow
method = "vh"
for filterrandomForestSRC_var.select
and return informative error message for not supported values. Also argumentconservative
can now be passed. See #2646 and #2639 for more information (@pat-s, #2649)
- Bugfix: With the new praznik v7.0.0 release filter
praznik_CMIM
does no longer return a result for logical features. See https://gitlab.com/mbq/praznik/issues/19 for more information
mlr 2.15.0
Breaking
- Instead of a wide
data.frame
filter values are now returned in a long (tidy)tibble
. This makes it easier to apply post-processing methods (likegroup_by()
, etc) (@pat-s, #2456) benchmark()
does not store the tuning results ($extract
slot) anymore by default.
If you want to keep this slot (e.g. for post tuning analysis), setkeep.extract = TRUE
.
This change originated from the fact that the size ofBenchmarkResult
objects with extensive tuning got very large (~ GB) which can cause memory problems during runtime if multiplebenchmark()
calls are executed on HPCs.benchmark()
does not store the created models ($models
slot) anymore by default.
The reason is the same as for the$extract
slot above.
Storing can be enabled usingmodels = TRUE
.
functions - general
generateFeatureImportanceData()
gains argumentshow.info
which shows the name of the current feature being calculated, its index in the queue and the elapsed time for each feature (@pat-s, #26222)
learners - general
classif.liquidSVM
andregr.liquidSVM
have been removed becauseliquidSVM
has been removed from CRAN.- fixed a bug that caused an incorrect aggregation of probabilities in some cases. The bug existed since quite some time and was exposed due to the change of
data.table
s default inrbindlist()
. See #2578 for more information. (@mllg, #2579) regr.randomForest
gains three new methods to estimate the standard error:regr.gbm
now supportsquantile distribution
(@bthieurmel, #2603)classif.plsdaCaret
now supports multiclass classification (@GegznaV, #2621)
functions - general
getClassWeightParam()
now also works for Wrapper* Models and ensemble models (@ja-thomas, #891)- added
getLearnerNote()
to query the "Note" slot of a learner (@alona-sydorova, #2086) e1071::svm()
now only uses the formula interface if factors are present. This change is supposed to prevent from "stack overflow" issues some users encountered when using large datasets. See #1738 for more information. (@mb706, #1740)
learners - new
- add learner
cluster.MiniBatchKmeans
from package ClusterR (@Prasiddhi, #2554)
function - general
plotHyperParsEffect()
now supports facet visualization of hyperparam effects for nested cv (@masongallo, #1653)- fixed a bug that caused an incorrect aggregation of probabilities in some cases. The bug existed since quite some time and was exposed due to the change of
data.table
s default inrbindlist()
. See #2578 for more information. (@mllg, #2579) - fixed a bug in which
options(on.learner.error)
was not respected inbenchmark()
. This causedbenchmark()
to stop even if it should have continued includingFailureModels
in the result (@dagola, #1984) getClassWeightParam()
now also works for Wrapper* Models and ensemble models (@ja-thomas, #891)- added
getLearnerNote()
to query the "Note" slot of a learner (@alona-sydorova, #2086)
filters - general
- Filter
praznik_mrmr
also supportsregr
andsurv
tasks plotFilterValues()
got a bit "smarter" and easier now regarding the ordering of multiple facets. (@pat-s, #2456)filterFeatures()
,generateFilterValuesData()
andmakeFilterWrapper()
gained new examples. (@pat-s, #2456)
filters - new
- Ensemble features are now supported. These filters combine multiple single filters to create a final ranking based on certain statistical operations. All new filters are listed in a dedicated section "ensemble filters" in the tutorial.
Tuning of simple features is not supported yet because of a missing feature in ParamHelpers. (@pat-s, #2456)
Version 2.14.0
general
- add option to use fully predefined indices in resampling (
makeResampleDesc(fixed = TRUE)
) (@pat-s, #2412). Task
help pages are now split into separate ones, e.g.RegrTask
,ClassifTask
(@pat-s, #2564)
functions - new
deleteCacheDir()
: Clear the default mlr cache directory (@pat-s, #2463)getCacheDir()
: Return the default mlr cache directory (@pat-s, #2463)
functions - general
getResamplingIndices(inner = TRUE)
now correctly returns the inner indices (before inner indices referred to the subset of the respective outer level train set) (@pat-s, #2413).
filter - general
- Caching is now used when generating filter values.
This means that filter values are only computed once for a specific setting and the stored cache is used in subsequent iterations.
This change inherits a significant speed-up when tuningfw.perc
,fw.abs
orfw.threshold
.
It can be triggered with the newcache
argument inmakeFilterWrapper()
orfilterFeatures()
(@pat-s, #2463).
filter - new
- praznik_JMI
- praznik_DISR
- praznik_JMIM
- praznik_MIM
- praznik_NJMIM
- praznik_MRMR
- praznik_CMIM
- FSelectorRcpp_gain.ratio
- FSelectorRcpp_information.gain
- FSelectorRcpp_symuncert
Additionally, filter names have been harmonized using the following scheme: _.
Exeptions are filters included in base R packages.
In this case, the package name is omitted.
filter - general
-
Added filters
FSelectorRcpp_gain.ratio
,FSelectorRcpp_information.gain
andFSelectorRcpp_symmetrical.uncertainty
from packageFSelectorRcpp
.
These filters are ~ 100 times faster than the implementation of theFSelector
pkg.
Please note that both implementations do things slightly different internally and theFSelectorRcpp
methods should not be seen as direct replacement for theFSelector
pkg. -
filter names have been harmonized using the following scheme: _. (@pat-s, #2533)
information.gain
->FSelector_information.gain
gain.ratio
->FSelector_gain.ratio
symmetrical.uncertainty
->FSelector_symmetrical.uncertainty
chi.squared
->FSelector_chi.squared
relief
->FSelector_relief
oneR
->FSelector_oneR
randomForestSRC.rfsrc
->randomForestSRC_importance
randomForestSRC.var.select
->randomForestSRC_var.select
randomForest.importance
->randomForest_importance
-
fixed a bug related to the loading of namespaces for required filter packages (@pat-s, #2483)
learners - new
- classif.liquidSVM (@PhilippPro, #2428)
- regr.liquidSVM (@PhilippPro, #2428)
learners - general
- regr.h2o.gbm: Various parameters added,
"h2o.use.data.table" = TRUE
is now the default (@j-hartshorn, #2508) - h2o learners now support getting feature importance (@markusdumke, #2434)
learners - fixes
- In some cases the optimized hyperparameters were not applied in the performance level of a nested CV (@berndbischl, #2479)
featSel - general
- The FeatSelResult object now contains an additional slot
x.bit.names
that stores the optimal bits - The slot
x
now always contains the real feature names and not the bit.names - This fixes a bug and makes
makeFeatSelWrapper
usable with custombit.names
. - Fixed a bug due to which
sffs
crashed in some cases (@bmihaljevic, #2486)
Version 2.13
general
- Disabled unit tests for CRAN, we test on travis only now
- Suppress messages with show.learner.output = FALSE
functions - general
- plotHyperParsEffect: add colors
functions - new
- getResamplingIndices
- createSpatialResamplingPlots
learners - general
- regr.nnet: Removed unneeded params linout, entropy, softmax and censored
- regr.ranger: Add weight handling
learners - removed
- {classif,regr}.blackboost: broke API with new release
Version 2.12
general
- Support for functional data (fda) using matrix columns has been added.
- Relaxed the way wrappers can be nested -- the only explicitly forbidden
combination is to wrap a tuning wrapper around another optimization wrapper - Refactored the resample progress messages to give a better overview and
distinguish between train and test measures better - calculateROCMeasures now returns absolute instead of relative values
- Added support for spatial data by providing spatial partitioning methods "SpCV" and "SpRepCV".
- Added new spatial.task classification task.
- Added new spam.task classification task.
- Classification tasks now store the class distribution in the
class.distribution member. - mlr now predicts NA for data that contains NA and learners that do not support
missing values. - Tasks are now subsetted in the "train" function and the factor levels (for
classification tasks) based on this subset. This means that the factor level
distribution is not necessarily the same as for the entire task, and that the
task descriptions of models in resampling reflect the respective subset, while
the task description of resample predictions reflect the entire task and not
necessarily the task of any individual model. - Added support for growing and fixed window cross-validation for forecasting
through new resample methods "GrowingWindowCV" and "FixedWindowCV".
functions - general
- generatePartialDependenceData: depends now on the "mmpf" package,
removed parameter: "center", "resample", "fmin", "fmax" and "gridsize"
added parameter: "uniform" and "n" to configure the grid for the partial dependence plot - batchmark: allow resample instances and reduction of partial results
- resample, performance: new flag "na.rm" to remove NAs during aggregation
- plotTuneMultiCritResultGGVIS: new parameters "point.info" and "point.trafo" to
control interactivity - calculateConfusionMatrix: new parameter "set" to specify whether confusion
matrix should be computed for "train", "test", or "both" (default) - PlotBMRSummary: Add parameter "shape"
- plotROCCurves: Add faceting argument
- PreprocWrapperCaret: Add param "ppc.corr", "ppc.zv", "ppc.nzv", "ppc.n.comp", "ppc.cutoff", "ppc.freqCut", "ppc.uniqueCut"
functions - new
- makeClassificationViaRegressionWrapper
- getPredictionTaskDesc
- helpLearner, helpLearnerParam: open the help for a learner or get a
description of its parameters - setMeasurePars
- makeFunctionalData
- hasFunctionalFeatures
- extractFDAFeatures, reextractFDAFeatures
- extractFDAFourier, extractFDAFPCA, extractFDAMultiResFeatures, extractFDAWavelets
- makeExtractFDAFeatMethod
- makeExtractFDAFeatsWrapper
- getTuneResultOptPath
- makeTuneMultiCritControlMBO: Allows model based multi-critera / multi-objective optimization using mlrMBO
functions - removed
- Removed plotViperCharts
measures - general
- measure "arsq" now has ID "arsq"
- measure "measureMultiLabelF1" was renamed to "measureMultilabelF1" for consistency
measures - new
- measureBER, measureRMSLE, measureF1
- cindex.uno, iauc.uno
learners - general
- unified {classif,regr,surv}.penalized{ridge,lasso,fusedlasso} into {classif,regr,surv}.penalized
- fixed a bug where surv.cforest gave wrong risk predictions (#1833)
- fixed bug where classif.xgboost returned NA predictions with multi:softmax
- classif.lda learner: add 'prior' hyperparameter
- ranger: update hyperpar 'respect.unordered.factors', add 'extratrees' and 'num.random.splits'
- h20deeplearning: Rename hyperpar 'MeanSquare' to 'Quadratic'
- h20*: Add support for "missings"
learners - new
- classif.adaboostm1
- classif.fdaknn
- classif.fdakernel
- classif.fdanp
- classif.fdaglm
- classif.mxff
- regr.fdaFDboost
- regr.mxff
learners - removed
- {classif,regr}.bdk: broke our API, stability issues
- {classif,regr}.xyf: broke our API, stability issues
- classif.hdrda: package removed from CRAN
- surv.penalized: stability issues
aggregations - new
- testgroup.sd
filter - new
- auc
- ranger.permutation, ranger.impurity