|
| 1 | +#' samples from the posterior distribution of a Bayesian Variable Selection model using weighted Tempered Gibbs Sampling |
| 2 | +#' |
| 3 | +#' Perform Bayesian variable selection in linear regression contexts |
| 4 | +#' using discrete spike and slab priors. Posterior sampling and calculation |
| 5 | +#' of marginal Posterior Inclusion Probabilities (PIPs) for explanatory |
| 6 | +#' variables is done using the weighted Tempered Gibbs Sampling algorithm |
| 7 | +#' of Zanella and Roberts (2019). |
| 8 | +#' |
| 9 | +#' The evaluated linear regression model can be written as |
| 10 | +#' \deqn{Y|\beta_\gamma, \gamma, \sigma^2 ~ N(X_\gamma\beta_\gamma,\sigma^2(I_n))} |
| 11 | +#' \deqn{\beta_\gamma|\gamma, \sigma^2 ~ N(0,\sigma^2\Sigma_\gamma)} |
| 12 | +#' \deqn{p(\sigma^2) \propto 1/\sigma^2} |
| 13 | +#' \deqn{\gamma_i|h iid~ Bern(h) i = 1,...,p} |
| 14 | +#' where the posterior probability of interest is \eqn{p(\gamma|Y)}. |
| 15 | +#' |
| 16 | +#' The prior covariance matrix of the coefficients |
| 17 | +#' of the selected regressors is \eqn{\Sigma_\gamma = c(X_\gamma^TX_\gamma)}, i.e. the g-prior recommended by Zellner |
| 18 | +#' (1986). |
| 19 | +#' |
| 20 | +#' The Rao-Blackwellized estimators provide a vector with inclusion probabilities for |
| 21 | +#' each of the regressors, \eqn{{p(\gamma_i=1|Y)}_{i=1}^{p}}. |
| 22 | +#' |
| 23 | +#' \eqn{h} can be a fixed value or \eqn{h ~ Beta(a,b)}. |
| 24 | +#' |
| 25 | +#' The sampling algorithm flips one of p binary values of \eqn{\gamma} by sampling \eqn{i} |
| 26 | +#' from \eqn{1,...,p} proportionally to \eqn{p_i(\gamma)=p(\gamma_i|\gamma_{-i},Y)^{-1}} |
| 27 | +#' in the case of Tempered Gibbs Sampling and proportionally to |
| 28 | +#' \eqn{p_i(\gamma)=(p(\gamma_i=1|\gamma_{-i},Y)+k/p)/p(\gamma_i|\gamma_{-i},Y)} |
| 29 | +#' in the case of weighted Tempered Gibbs Sampling. Also, the weight of the new state of |
| 30 | +#' the Markov Chain is proportional to \eqn{(\sum_{i=1}^p p_i(\gamma))^{-1}}. |
| 31 | +#' |
| 32 | +#' For more information on weighted Tempered Gibbs Sampling, please refer to Zanella and |
| 33 | +#' Roberts (2019). |
| 34 | +#' |
| 35 | +#' |
| 36 | +#' @param y a vector of n observations (dependent variable) with dimensions (nx1). |
| 37 | +#' @param X a matrix of p regressors (independent variables) with dimension (nxp). |
| 38 | +#' @param c a real number greater than 0 which serves as a constant of proportionality |
| 39 | +#' to the specification of the prior covariance matrix of the coefficients of the |
| 40 | +#' selected regressors in the linear regression. The default is \code{NULL} which yields the recommended constant of proportionality for Zellner's g-prior |
| 41 | +#' , i.e. c = n. |
| 42 | +#' @param h either a real number greater than 0 and smaller than 1 or a vector of real |
| 43 | +#' values, both greater than 0. This parameter specifies the prior information of the |
| 44 | +#' inclusion probability of the regressors which is identical for all regressors. |
| 45 | +#' In the former case, the prior probability is set to a fixed value. In the latter |
| 46 | +#' case, the prior probability is a Beta distribution with the specified parameters. |
| 47 | +#' The default is the uniform distribution in terms of a Beta prior \code{c(1,1)}. |
| 48 | +#' @param n_iter a positive integer specifying the number of iterations for the Markov |
| 49 | +#' Chain. The default is 2000. |
| 50 | +#' @param burn_in either an integer greater than 1 or a real number greater than 0 and |
| 51 | +#' smaller than 1. Specifies the number of burn in iterations for the Markov Chain. In |
| 52 | +#' the former case the burn in iterations are set the fixed integer. In the latter case |
| 53 | +#' the number of iterations are the specified percentage of the number of iterations. |
| 54 | +#' The default is 0.2. |
| 55 | +#' @param k_weight a real number greater than 0 which, in the case of \code{weighted = TRUE}, |
| 56 | +#' controls the tradeoff between exploration and exploitation in the choice of the variable |
| 57 | +#' to be flipped at each iteration. A larger \code{k_weight} favours exploration. The default is 0. |
| 58 | +#' @param weighted logical, with default \code{TRUE}, indicating whether to perform |
| 59 | +#' weighted Tempered Gibbs Sampling if \code{TRUE} or Tempered Gibbs Sampling if |
| 60 | +#' \code{FALSE}. |
| 61 | +#' |
| 62 | +#' @return A list with named objects: |
| 63 | +#' \item{PIP }{a vector (px1) containing Rao-Blackwellised estimators of |
| 64 | +#' the marginal PIPs for each of the p regressors in \code{X}.} |
| 65 | +#' \item{states }{a list containing the elements necessary to reproduce the samples of |
| 66 | +#' the Markov Chain. These elements are:\cr |
| 67 | +#' "start" - starting value for \eqn{\gamma} after the burnin period.\cr |
| 68 | +#' "sample_weights" - a vector (n_iterx1) of weights for \eqn{\gamma} at each step of |
| 69 | +#' the Markov Chain.\cr |
| 70 | +#' "indices_sequence" - a vector (n_iterx1) of indices ranging \eqn{1,...,p} indicating the |
| 71 | +#' element of \eqn{\gamma} flipped at each step of the Markov Chain.} |
| 72 | +#' |
| 73 | +#' @export |
| 74 | +#' |
| 75 | +#' @references |
| 76 | +#' Zanella, G. and Roberts, G. (2019). Scalable importance tempering and Bayesian variable selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology): 489–517. Crossref. Web. |
| 77 | +#' |
| 78 | +#' Zellner, A. (1986). On Assessing Prior Distributions and Bayesian Regression Analysis with g-Prior Distributions. In: Goel, P. and Zellner, A., Eds., Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti, Elsevier Science Publishers, Inc., New York, 233-243. |
| 79 | +#' |
| 80 | +#' @seealso \code{\link{createSamples}} for creating the samples of the Markov Chain and their weights used to calculate the PIPs. |
| 81 | +#' |
| 82 | +#' @examples |
| 83 | +#' #Posterior inclusion probabilities of characteristics of cars on describing mileage |
| 84 | +#' |
| 85 | +#' #load data |
| 86 | +#' data(mtcars) |
| 87 | +#' |
| 88 | +#' #create X matrix and y vector with zero mean for all regressors |
| 89 | +#' X <- t(t(mtcars[,-1]) - colMeans(mtcars[,-1])) |
| 90 | +#' y <- mtcars$mpg - mean(mtcars$mpg) |
| 91 | +#' |
| 92 | +#' mtcars.output <- samplingBVS(y, X) |
| 93 | +#' |
| 94 | +#' names(mtcars.output$PIP) <- names(mtcars[,-1]) |
| 95 | +#' print(mtcars.output$PIP) |
| 96 | +samplingBVS <- function(y, #vector of observations |
| 97 | + X, #matrix of regressors |
| 98 | + c = NULL, #covariance matric and constant |
| 99 | + h = c(1,1), #if vector 2x1 then parameters of Beta |
| 100 | + n_iter = 2000, #number of effective iterations |
| 101 | + burn_in = 0.2, #percentage(>0,<1)/number(>1) of burnin iterations |
| 102 | + k_weight = 0, weighted = TRUE) { #weightedTGS and parameter |
| 103 | + |
| 104 | + ### wTGS algorithm for Bayesian variable selection problems |
| 105 | + |
| 106 | + ## Set options |
| 107 | + if (burn_in < 1) burn_in <- n_iter*burn_in |
| 108 | + if (any(h < 0)) stop("h must be a vector of two positive parameters of a Beta distribution or a real number between 0 and 1") |
| 109 | + |
| 110 | + ## throw errors for parameters |
| 111 | + if (burn_in < 0) stop("Burn in must be a real number between 0 and 1 or an integer larger than 1") |
| 112 | + if (!is.null(c)) |
| 113 | + if (c <= 0) stop("c must be larger than 0") |
| 114 | + if (length(h) > 2 | length(h) == 0) |
| 115 | + stop("h must be a vector of two positive parameters of a Beta distribution or a real number between 0 and 1") |
| 116 | + else if (length(h) == 1) |
| 117 | + if (h > 1) |
| 118 | + stop("h must be a vector of two positive parameters of a Beta distribution or a real number between 0 and 1") |
| 119 | + |
| 120 | + if (n_iter < 0) stop("n_iter must be an positive integer") |
| 121 | + if (k_weight < 0) stop("k_weight must be larger than 0") |
| 122 | + |
| 123 | + ## check dimensions of y, X and the initial gamma |
| 124 | + n <- length(y) |
| 125 | + if (nrow(X) != n) stop("y and X should have the same number of observations") |
| 126 | + if (is.null(c)) c <- n |
| 127 | + p <- ncol(X) |
| 128 | + |
| 129 | + if (length(h) == 2) { |
| 130 | + h1 <- h[1] |
| 131 | + h2 <- h[2] |
| 132 | + } else { |
| 133 | + h1 <- h |
| 134 | + h2 <- 0 |
| 135 | + } |
| 136 | + |
| 137 | + output <- wTGS(as.matrix(X), as.vector(y), n, p, n_iter, burn_in, h1, h2, c, k_weight, weighted) |
| 138 | + |
| 139 | + return(list(PIP = output[[1]], |
| 140 | + states = list(start = output[[2]], |
| 141 | + sample_weights = output[[3]], |
| 142 | + indices_sequence = output[[4]]))) |
| 143 | +} |
0 commit comments