Programme
GPUs in Computational Statistics
University of Warwick, 25 January, 2012
Organisers: Zoubin Gharamani, Jim Griffin, Anthony Lee, Andrew Liddle, David Wild
All talks will take place in CS1.04/5, Computer Science Building
10:00 
Registration/Coffee 
11:0011:20 
Introduction  Anthony Lee 
11:2512:10 
Christian Robert  Vanilla RaoBlackwellisations of MetropolisHastings algorithms

12:1513:00 
Christophe Andrieu  Exact approximations seeking GPUs

13:0014:00 
Buffet Lunch 
14:0514:50 
Pierre Jacob  Parallel WangLandau for automated density exploration 
14:5515:40 
Chris Holmes  Exploratory analysis of regions of genetic association signal using Bayesian adaptive regression methods on GPUs 
15:4516:15 
Coffee 
16:2017:05 
Chris Barnes  Accelerating inference for dynamical models using GPUs 
17:1017:55 
Michael Stumpf  Statistical analysis of network data and evolution on GPUs 
18:00 
Workshop Ends / Wine and Cheese Reception 
Abstracts
Christian Robert  Vanilla RaoBlackwellisations of MetropolisHastings algorithms
Casella and Robert (1996, Biometrika) presented a general RaoBlackwell principle for acceptreject and MetropolisHastings schemes that leads to significant decreases in the variance of the resulting estimators, but at a high cost in computing and storage. Adopting a completely different perspective, we introduce instead a universal scheme that guarantees variance reductions in all MetropolisHastings based estimators while keeping the computing cost under control. The principle relates to the availability of an unbiased estimator of the acceptance probability. In a second if related part, we consider the implications of the fact that parallel rawpower can be exploited by a generic MetropolisHastings algorithm if the proposed values are independent. In particular, we present improvements to the independent MetropolisHastings algorithm that significantly decrease the variance of any estimator derived from the MCMC output, for a null computing cost since those improvements are based on a fixed number of target density evaluations. Furthermore, those techniques do not jeopardize the Markovian convergence properties of the algorithm, since they are based on the RaoBlackwell principles of Gelfand and Smith (1990), already exploited in Casella and Robert (1996). We illustrate those improvements both on a toy normal example and on a classical probit regression model, but stress the fact that they are applicable in any case where the independent MetropolisHastings is applicable. Extensions to the random walk MetropolisHastings algorithm will also be discussed.
These are joint works with Randal Douc (ParistechTelecom), available as http://arxiv.org/abs/0904.2144
Christophe Andrieu  Exact approximations seeking GPUs
Pierre Jacob  Parallel WangLandau for automated density exploration
The WangLandau algorithm is an adaptive MCMC algorithm which generates a Markov chain designed to move efficiently in the state space, by constantly penalizing alreadyvisited regions. It hence falls into the class of exploratory algorithms, especially when the chosen regions correspond to different levels of density values. We explore the extension of this algorithm to parallel chains, where the chains are still dependent one on the other through a common component. That component essentially represents the history of alreadyvisited regions, computed on all the chains. We show the benefit of using parallel chains even if a single processing unit is available, in terms of stabilization of the schedule used in the adaptation process. Finally we discuss some foreseen benefits of using GPUs when available.
Chris Holmes  Exploratory analysis of regions of genetic association signal using Bayesian adaptive regression methods on GPUs
We show how the use of GPU technologies allow for efficient exploration of disease association signals arising in modern genetic epidemiology. In particular we introduce a scalemixture ``sparsity'' prior on logistic regression model coefficients and show how by sweeping through the prior parameter governing the extent of shrinkage we map through a path of models of increasing complexity. This allows for visual interpretation and quantification of multiple predictor effects. The computational algorithm using SMC samplers would be prohibitively expensive to run on singlethreaded computing, but is ideally suited to GPU implementation.
Chris Barnes  Accelerating inference for dynamical models using GPUs
Mathematical modelling has become ubiquitous in systems and synthetic biology and there are many tools available to model biological systems. These models are usually of high dimension with many unknown parameters. This can be problematic since measuring kinetic parameter values in vivo is difficult and interrogated parameter values from experiments performed in vitro are often not applicable. Therefore, in a growing number of cases, the parameter values must be inferred from data such as time series measurements. Additionally there is often a set of competing models proposed to describe a process and the most appropriate model must be chosen given a set of measurements. Sequential Monte Carlo (SMC) is one approach that can be applied to the problem of inference in dynamical models. Combined with approximate Bayesian computation (ABC) these methods can be used in both deterministic and stochastic systems. Here I will show how SMC methods can take advantage of the highly parallel architecture of Graphics Processing Units (GPUs) and demonstrate their utility in examples from biochemical signaling and the design of synthetic biological systems.
Michael Stumpf  Statistical analysis of network data and evolution on GPUs
Network analysis typically involves as set of repetitive tasks that are particularly amenable to poorman's parallelization. This is therefore an ideal application are for GPU architectures, which help to alleviate the tedium inherent to statistically sound analysis of network data. Here we will illustrate the use of GPUs in a range of applications, which include percolation processes on networks, the evolution of proteinprotein interaction networks, and the fusion of different types of biomedical and disease data in the context of molecular interaction networks. We will pay particular attention to the numerical performance of different routines that are frequently invoked in network analysis problems.