Main bottleneck is the $predict() and $predict_newdata() steps, so minimizing how often they are called will be helpful.
- One approach is to create the datasets for e.g. PFI individually,
rbind them together while keeping track in which chunk of data which feature is permuted, and then only calling $predict once on the combined data.
- Analogously, this can be done across multiple iterations (
iters_perm) of the permutations for PFI and SAGE.
- Since resamplings are independent and have different trained models, parallelizing across those would be an option as well.
- Chunking the data beforehand
Main bottleneck is the
$predict()and$predict_newdata()steps, so minimizing how often they are called will be helpful.rbindthem together while keeping track in which chunk of data which feature is permuted, and then only calling$predictonce on the combined data.iters_perm) of the permutations for PFI and SAGE.