-
Notifications
You must be signed in to change notification settings - Fork 9
Open
Description
Some models literally retain copies of data frames etc in order to make predictions. This can be convenient but has at least two downsides (described below). This issue proposes that, in cases where such info is not needed, models that store data by default have this information removed from the fitted model. E.g., by default, lm should set the arg model = FALSE (and look into all model, x, y).
Downsides to default case of keeping original data.frame
- It creates a memory overhead. E.g., for
lm:
object.size(lm(mpg ~ ., mtcars))
#> 45768 bytes
object.size(lm(mpg ~ ., mtcars, model = FALSE))
#> 28152 bytes
Given that twidlr requires a data frame for predict, if the only reason this info is retained is to call predict, then it can be dropped.
- It is inconsistent between models and thus misleading. For example,
lmstores the original data by default makingpredictwork properly. However, other models do not, and point to the original data frame in the global environment. E.g., see examples here. A similar thing can be done whenlmis used withmodel = FALSE.
Metadata
Metadata
Assignees
Labels
No labels