Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple inits for multi-path Pathfinder? #907

Open
colinorourke opened this issue Jan 22, 2024 · 7 comments
Open

Multiple inits for multi-path Pathfinder? #907

colinorourke opened this issue Jan 22, 2024 · 7 comments
Labels
bug Something isn't working

Comments

@colinorourke
Copy link

Alluded to in #874, I've noticed that if you try to set initial values to the Pathfinder algorithm, in particular the multi-path variety of this algorithm, cmdstanr currently doesn't allow the paths to begin from different initial positions. The requirement for the Pathfinder algorithm seems to be that the inits are a list containing just a single initial value. While this makes sense for the other variational algorithms, for Pathfinder this means that each L-BFGS starts from the same place, which doesn't seem ideal. Maybe this is due to some underlying restriction of cmdstan itself?

Here's an example (I've simplified some of the output to remove warnings from Stan about gradients & Pareto k values):

# Load necessary packages
library(cmdstanr)
#> This is cmdstanr version 0.7.1
#> - CmdStanR documentation and vignettes: mc-stan.org/cmdstanr
#> - CmdStan path: C:/Users/corour/Documents/.cmdstan/cmdstan-2.34.0
#> - CmdStan version: 2.34.0
library(brms)
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.20.4). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').

# Setup a model. This is just an example from brms.
prior1 = prior(normal(0, 10), class = b) +
  prior(cauchy(0, 2), class = sd)
stan_code = make_stancode(
  count ~ zAge + zBase * Trt + (1 | patient),
  data = epilepsy,
  family = poisson(),
  prior = prior1
)
stan_data = make_standata(
  count ~ zAge + zBase * Trt + (1 | patient),
  data = epilepsy,
  family = poisson(),
  prior = prior1
)
mod = cmdstan_model(write_stan_file(stan_code))

# Create a function that generates some initial values. It will generate
# different sets of initial values depending on the value of `chain_id`
init_fn = function(chain_id) {
  old_rng_state = get(".Random.seed", envir = globalenv())
  on.exit(assign(".Random.seed", value = old_rng_state, envir = globalenv()))
  new_seed = rngtools::RNGseq(n = chain_id + 1L, seed = 1313L)[[chain_id]]
  assign(".Random.seed", value = new_seed, envir = globalenv())
  b = runif(4L,-2, 2) #vector[Kc]
  Intercept = runif(1L,-2, 2)
  sd_1 = runif(1L, 0, 2) #vector<lower=0>[M_1]
  z_1 = array(runif(1 * 59,-2, 2), dim = c(1, 59)) #array[M_1] vector[N_1]
  list(
    b = b,
    Intercept = Intercept,
    sd_1 = sd_1,
    z_1 = z_1
  )
}

# Create initial values for cmdstanr in the "list-of-lists" format
init_list_4 = lapply(1:4, \(i) init_fn(i))
init_list_1 = list(init_fn(1))

# Try to run Pathfinder with num_paths=4 using the "list-of-lists" style
# of setting initial values. This just produces an error.
mod_w_init_list = mod$pathfinder(
  data = stan_data,
  seed = 123,
  init = init_list_4,
  num_paths = 4,
  refresh = 1000,
  history_size = 10
)
#> Error: 'init' has the wrong length. See documentation of 'init' argument.

# Run Pathfinder with num_paths=4 but using just a list with a single
# initialization value. This runs but re-uses the same init for each path
mod_w_init_list = mod$pathfinder(
  data = stan_data,
  seed = 123,
  init = init_list_1,
  num_paths = 4,
  refresh = 1000,
  history_size = 10
)
#> Path [1] :Initial log joint density = -16938.506748 
#> Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.110e+03 -1.110e+03                   
#> Path [1] :Best Iter: [76] ELBO (-785.130279) evaluations: (4851) 
#> Path [2] :Initial log joint density = -16938.506748 
#> Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.135e+03 -1.135e+03                   
#> Path [2] :Best Iter: [77] ELBO (-789.101802) evaluations: (4851) 
#> Path [3] :Initial log joint density = -16938.506748 
#> Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.245e+03 -1.245e+03                   
#> Path [3] :Best Iter: [76] ELBO (-787.813011) evaluations: (4851) 
#> Path [4] :Initial log joint density = -16938.506748 
#> Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.116e+03 -1.116e+03                   
#> Path [4] :Best Iter: [76] ELBO (-788.891651) evaluations: (4851) 
#> Total log probability function evaluations:23304 
#> Finished in  1.8 seconds.

# Run using the function-style of specifying inits. This seems to just repeat
# the single set of inits for for each of the 4 paths.
mod_w_inits = mod$pathfinder(
  data = stan_data,
  seed = 123,
  init = init_fn,
  num_paths = 4,
  refresh = 1000,
  history_size = 10
)
#> Path [1] :Initial log joint density = -16938.506748
#> Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.110e+03 -1.110e+03                   
#> Path [1] :Best Iter: [76] ELBO (-785.130279) evaluations: (4851) 
#> Path [2] :Initial log joint density = -16938.506748
#> Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.135e+03 -1.135e+03                   
#> Path [2] :Best Iter: [77] ELBO (-789.101802) evaluations: (4851) 
#> Path [3] :Initial log joint density = -16938.506748 
#> Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.245e+03 -1.245e+03                   
#> Path [3] :Best Iter: [76] ELBO (-787.813011) evaluations: (4851) 
#> Path [4] :Initial log joint density = -16938.506748 
#> Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             194      -6.539e+02      7.973e-05   5.342e-02    1.000e+00  1.000e+00      4851 -1.116e+03 -1.116e+03                   
#> Path [4] :Best Iter: [76] ELBO (-788.891651) evaluations: (4851) 
#> Total log probability function evaluations:23304 
#> Finished in  1.8 seconds.

# Run using the default random inits. Different inits are used for each
# Pathfinder path.
mod_wo_inits = mod$pathfinder(
  data = stan_data,
  seed = 123,
  num_paths = 4,
  refresh = 1000,
  history_size = 10
)
#> Path [1] :Initial log joint density = -5218.225137 
#> Path [1] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             399      -6.539e+02      8.264e-05   4.839e-02    9.763e-01  9.763e-01      9976 -1.649e+05 -1.649e+05                   
#> Path [1] :Best Iter: [54] ELBO (-735.976769) evaluations: (9976) 
#> Path [2] :Initial log joint density = -6622.413033 
#> Path [2] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             308      -6.539e+02      8.655e-05   7.312e-02    1.000e+00  1.000e+00      7701 -1.418e+10 -1.418e+10                   
#> Path [2] :Best Iter: [50] ELBO (-745.757922) evaluations: (7701) 
#> Path [3] :Initial log joint density = -12283.643776 
#> Path [3] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             310      -6.539e+02      1.200e-04   1.092e-01    8.474e-01  8.474e-01      7751 -6.558e+04 -6.558e+04                   
#> Path [3] :Best Iter: [58] ELBO (-743.025505) evaluations: (7751) 
#> Path [4] :Initial log joint density = -95375.955585 
#> Path [4] : Iter      log prob        ||dx||      ||grad||     alpha      alpha0      # evals       ELBO    Best ELBO        Notes  
#>             193      -6.539e+02      3.622e-04   8.454e-02    1.000e+00  1.000e+00      4826 -2.231e+03 -2.231e+03                   
#> Path [4] :Best Iter: [82] ELBO (-786.378632) evaluations: (4826) 
#> Total log probability function evaluations:34154 
#> Finished in  2.6 seconds.

Created on 2024-01-22 with reprex v2.0.2

R version info:

 setting  value
 version  R version 4.3.2 (2023-10-31 ucrt)
 os       Windows 10 x64 (build 19045)
 system   x86_64, mingw32
 ui       RStudio
 language (EN)
 collate  English_United States.utf8
 ctype    English_United States.utf8
 tz       America/Los_Angeles
 date     2024-01-22
 rstudio  2023.06.1+524 Mountain Hydrangea (desktop)
 pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
@colinorourke colinorourke added the bug Something isn't working label Jan 22, 2024
@avehtari
Copy link
Contributor

I was also annoyed the Pathfinder expects a list and not list of lists, but I forgot to make an issue. I agree, it should be possible to give a list of lists for multi-pathfinder

@SteveBronder
Copy link
Collaborator

I'm fixing that in the PR to have inits as pathfinder fits

@avehtari
Copy link
Contributor

avehtari commented Mar 7, 2024

Can you also allow list of lists, as for other inference methods?

@SteveBronder
Copy link
Collaborator

Ooof okay this is going to take me a second. So I wrote it originally to use some internals that thinks parallelization is only done at the R level and that each init goes to a separate process started at the R level . I'm going to chat with @jgabry tmrw about how that all works and also what we can do to stop doing the parallelization to happen at the cmdstan level for everything since we have that now

@avehtari
Copy link
Contributor

Has this been fixed by #937 ?

@andrjohns
Copy link
Collaborator

@SteveBronder - just to check something with Pathfinder and inits - when multiple init files are present (one for each path), is it normal for the output to only show the first init file?

For example, with num_paths=4 and four distinct init files, the output shows:

# init = /var/folders/0h/9q9frlm906s6cwft_rp92dw80000gn/T/RtmpToGNxD/init-71038ec7ec4_1.json

@SteveBronder
Copy link
Collaborator

Yes we only print the first one. I do think we should change that so the actual result is more clear. We could also let users pass in a comma separated list of file names which I think would be nicer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants