Paper No. 09-25

D Lamnisos, JE Griffin and MFJ Steel

Cross-validation prior choice in Bayesian probit regression with many covariates

Abstract: This paper examines prior choice in probit regression through a predictive cross-validation criterion. In particular, we focus on situations where the number of potential covariates is far larger than the number of observations, such as in gene expression data. Cross-validation avoids the tendency of such models to fit perfectly. We choose the hyperparameter in the ridge prior, c, as the minimizer of the log predictive score. This evaluation requires substantial computational effort, and we investigate computationally cheaper ways of determining c through importance sampling. Various strategies are explored and we find that K−fold importance densities perform best, in combination with either mixing over different values of c or with integrating over c through an auxiliary distribution.

Keywords: Bayesian variable selection, cross-validation, gene expression data, importance sampling, predictive score, ridge prior.