Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FeatureImp returns same importance value for every variable with mlr3 BART model #202

Open
SamuelFrederick opened this issue Feb 15, 2023 · 1 comment

Comments

@SamuelFrederick
Copy link

I am training several ML models using the mlr3 package and have been using iml to retrieve permutation importance for variables in my data. However, I have noticed that, for BART models, the variable importance is exactly the same for every variable. Below is code to reproduce this issue using a toy dataset. Even the variable which is completely unrelated to the outcome variable has the same variable importance as the others.

library(mlr3verse)
mlr3extralearners::install_learners("regr.bart")
#library(iml)
n <- 100
set.seed(123)
x1 <- rnorm(n, 4, 5)
x2 <- sample(c("a", "b","c"), size = n, replace = T)
x3 <- sample(letters[1:4], size = n, replace = T)
x4_noise <- rnorm(n, 1, 6)

y <- 3 + 2*x1 + 5*(x2=="a") - 10*(x2=="b") + 25*(x2=="c") +
  4*(x3=="a") - 4*(x3=="b") + 5*(x3=="c") +10*(x3=="d") - 
  50*(x3=="d")*(x2=="b") +
  rnorm(n, 0, 3)

df <- data.frame(x1 = x1, x2 = factor(x2), 
                 x3 = factor(x3), x4_noise = x4_noise, 
                 y = y)
task <- as_task_regr(df, target = "y")
gr <- po("scale") %>>% po("encode") %>>% lrn("regr.bart")
grl <- GraphLearner$new(gr)
grl$train(task)

model <- iml::Predictor$new(grl, data = df, y = "y")
imp_mod <- iml::FeatureImp$new(model, loss = "rmse", 
                               n.repetitions = 50, 
                               compare = "ratio") 
imp_mod$results

Output:

   feature importance.05 importance importance.95 permutation.error
1       x1      15.61669   15.61669      15.61669          24.96426
2       x2      15.61669   15.61669      15.61669          24.96426
3       x3      15.61669   15.61669      15.61669          24.96426
4 x4_noise      15.61669   15.61669      15.61669          24.96426
@SamuelFrederick
Copy link
Author

One update: this seems to only be an issue when operating in parallel with the BART model. I do not obtain the same importance for all variables when using other models (e.g., Ranger, xgboost, GBM, etc.) in parallel or when using BART with sequential variable importance calculation. I have modified the code such that it will reproduce this issue below:

library(mlr3verse)
mlr3extralearners::install_learners("regr.bart")
#library(iml)
#library(future)
n <- 100
set.seed(123)
x1 <- rnorm(n, 4, 5)
x2 <- sample(c("a", "b","c"), size = n, replace = T)
x3 <- sample(letters[1:4], size = n, replace = T)
x4_noise <- rnorm(n, 1, 6)

y <- 3 + 2*x1 + 5*(x2=="a") - 10*(x2=="b") + 25*(x2=="c") +
  4*(x3=="a") - 4*(x3=="b") + 5*(x3=="c") +10*(x3=="d") - 
  50*(x3=="d")*(x2=="b") +
  rnorm(n, 0, 3)

df <- data.frame(x1 = x1, x2 = factor(x2), 
                 x3 = factor(x3), x4_noise = x4_noise, 
                 y = y)
task <- as_task_regr(df, target = "y")
gr <- po("scale") %>>% po("encode") %>>% lrn("regr.bart")
grl <- GraphLearner$new(gr)
grl$train(task)

future::plan("multisession", workers=2)
model <- iml::Predictor$new(grl, data = df, y = "y")
imp_mod <- iml::FeatureImp$new(model, loss = "rmse", 
                               n.repetitions = 50, 
                               compare = "ratio") 
imp_mod$results

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant