# 2015-2016

### 2015/16 Term 1:

- Week 1 - 9th October - Co-incides with the Oxford-Warwick workshop on "Scalable statistical methods for analysis of large and complex data sets".
- Week 2 - 16th October - Daniel Sanz Alonso (Warwick) - "The Intrinsic Dimension of Importance Sampling"
- Abstract: We study importance sampling and particle filters in high dimensions and link the collapse of importance sampling to results about absolute continuity of measures in Hilbert spaces and the notion of effective dimension.

- Week 3 - 23rd October - François-Xavier Briol (Warwick) - "Probabilistic Numerics Approaches to Integration"
- Abstract: This talk will introduce Bayesian Quadrature (BQ), a probabilistic numerical approach to solving integrals in Reproducing Kernel Hilbert Spaces (RKHS). We will discuss some optimality results of the BQ method and show how, in low-dimensional cases, one can obtain BQ methods which converge much faster than Monte Carlo methods to the true solution of the integral.

- Week 4 - 30th October - Paul Jenkins (Warwick) - "Exact Simulation of the Wright-Fisher diffusion"
- Abstract: The Wright-Fisher family of diffusion processes is a class of evolutionary models widely used in population genetics, with applications also in finance and Bayesian statistics. Simulation and inference from these diffusions is therefore of widespread interest. However, simulating a Wright-Fisher diffusion is difficult because there is no known closed-form formula for its transition function. In this article we demonstrate that it is in fact possible to simulate exactly from the scalar Wright-Fisher diffusion with general drift, extending ideas based on retrospective simulation. Our key idea is to exploit an eigenfunction expansion representation of the transition function. This approach also yields methods for exact simulation from several processes related to the Wright-Fisher diffusion: (i) its moment dual, the ancestral process of an infinite-leaf Kingman coalescent tree; (ii) its infinite-dimensional counterpart, the Fleming-Viot process; and (iii) its bridges. Finally, we illustrate our method with an application to an evolutionary model for mutation and diploid selection. We believe our new perspective on idffusion simulation holds promise for other models admitting a transition eigenfunction expansion.

- Week 5 - 6th November - Tom Jin (Warwick) - "Sequential Monte Carlo with Neural Networks"
- Abstract: We will look at using trained neural networks as a proposal for sequential Monte Carlo methods and a review of selected advances in neural networks and how to use these nets to learn probability densities. To finish an overview of recent advances in density estimators and model inversion.

- Week 6 - 13th November - Andrew Duncan (Imperial) - "Variance Reduction for Langevin Samplers"
- Abstract: MCMC provides a powerful and general approach for generating samples from a high-dimensional probability distribution, known up to a normalizing constant. As there are infinitely many Markov processes whose unique invariant measure equals a given target distribution, the natural question is whether a process can be chosen to generate samples of the target distribution as efficiently as possible. In this talk, I will describe two approaches to reducing variance for overdamped Langevin samplers. Firstly, I will describe some recent results on improving the performance of a Langevin sampler by breaking detailed balance, specifically, by introducing an appropriate skew--symmetric drift, characterising the performance in terms of rate of convergence to equilibrium, asymptotic variance and computational cost. In the second part of this talk, I will describe another approach, based on control variates for overdamped Langevin samplers, suited to computing expectations with respect to distributions possessing strong multimodality.

- Week 7 - 20th November - Louis Aslett (Oxford) - Homomorphically Secure Statistical Algorithms
- Week 8 - 27th November - Alexander Terenin (University of California, Santa Cruz) - "Asynchronous Distributed Gibbs Sampling"
- Abstract: The Bayesian paradigm has a variety of properties that make it desirable for use in the big data setting. Computation in this setting is best done in parallel, because data is typically too large to be stored on one machine. Unfortunately, this makes standard Markov Chain Monte Carlo methodology used in Bayesian statistics impractical, because computation must be performed sequentially, one step at a time. In this talk, we first provide a broad description of the computational hardware (such as Hadoop clusters), paradigms (such as MapReduce), and frameworks/languages (Apache Spark, Scala) used in the big data setting. Then, we examine Asynchronous Markov Chain Monte Carlo, a lockfree MCMC parallelization scheme, and prove convergence. Finally, we examine Asynchronous Distributed Gibbs sampling, an extension of Asynchronous MCMC to settings where the data is too big to fit on one machine and is instead partitioned on a compute cluster, and discuss some of its properties. ADG sampling is particularly attractive in settings such as Bayesian hierarchical random-effects models or models with robust error terms, where the dimensionality of the problem grows with the size of the data.

- Week 9 - 4th December - Jake Carson (Warwick) - Unbiased Solutions to PDEs
- Week 10 - 11th December - Jeremy Heng (Oxford) - Mass Transport for Bayesian Inference

### 2015/16 Term 2:

- Week 1 - 15th January - Alex Mijatović (Kings) - "Poisson equation for Metropolis-Hastings chains"
- Abstract: In this talk we will define an approximation scheme for a solution of the Poisson equation of a geometrically ergodic Metropolis-Hastings chain . The approximations give rise to a natural sequence of control variates for the ergodic average , where is the force function in the Poisson equation. The main result shows that the sequence of the asymptotic variances (in the CLTs for the control-variate estimators) converges to zero. We will apply the algorithm to geometrically and non-geometrically ergodic chains and present numerical evidence for a significant variance reduction in both cases. This is joint work with Jure Vogrinc.

- Week 2 - 22nd January - Murray Pollock (Warwick) - "Exact Simulation for Stochastic Volatility Models"
- Week 3 - 29th January - Omiros Papaspiliopoulos (Pompeu Fabra) - "Gibbs-Langevin Samplers"
- Week 4 - 5th February - James Ridgway (Bristol) - "On the properties of variational approximation of Gibbs measures"
- Abstract: I will define the goal of PAC-Bayesian bounds and show that they lead to tight oracle inequality. The corresponding optimal distribution of estimators, usually called the Gibbs posterior, is unfortunately intractable. One may sample from it using Markov chain Monte Carlo, but this is often too slow for big datasets and does not have explicit non asymptotic bounds. I will show that we can however obtain oracle inequalities for the variational Bayes approximation. Our main finding is that such a variational approximation has often the same rate of convergence as the original PAC-Bayesian procedure it approximates. The results will be illustrated on a classification example.

- Week 5 - 12th February - Jere Koskela (Warwick) - "Efficient sequential Monte Carlo sampling of rare trajectories in reverse time"
- Abstract: Rare event simulation seeks estimate probabilities of unlikely but significant events, such as extreme weather, market crashes, or failure rates in communications networks. In complex models the probabilities of such events are often intractable, and naive simulation fails because of the low probability of the event of interest. Sequential Monte Carlo provides a practical method for sampling rare events by biasing probability mass toward the event of interest, though as always the design of good proposal distributions is difficult but crucial. The typical approach for sampling rare trajectories of a stochastic process is an exponential twisting of the forward dynamics, motivated by approximating a large deviation principle. I present an alternative, based on the observation that a forwards-in-time trajectory conditioned to end in a rare state coincides with an unconditioned reverse-time trajectory started from the rare state. This observation has led to very efficient simulation methods in coalescent-based population genetics. I will introduce reverse-time SMC as a generic algorithm, discuss settings in which it is advantageous, and present some novel applications both for coalescents and other stochastic processes.

- Week 6 - 19th February - Valerio Perrone (Warwick) - "Time-Dependent Feature Allocation Models Via Poisson Random Fields"
- Abstract: Feature allocation models are widely used to recover the latent structure underlying observed data. The Indian buffet process (IBP) is a Bayesian nonparametric prior over latent features that allows the number of features to be learnt from the data. If the distribution of the observed data changes over time, however, the IBP needs to be extended to capture this dependency. In this talk, I will present an ongoing work on a time-dependent generalisation of the IBP, where the Poisson random field model from population genetics is used as a way of constructing dependent beta processes. As an application of this construction, a probabilistic topic model for collections of time-stamped text documents will be presented.

- Week 7 - 26th February - * Extended session 11am-2pm * - Frank Wood (Oxford) - "Revolutionizing Decision Making, Democratizing Data Science, and Automating Machine Learning via Probabilistic Programming (and One Example Language: Anglican"
- Abstract: Probabilistic programming aims to enable the next generation of data scientists to easily and efficiently create the kinds of probabilistic models needed to inform decisions and accelerate scientific discovery in the realm of big data and big models.Model creation and the learning of probabilistic models from data are key problems in data science. Probabilistic models are used for forecasting, filling in missing data, outlier detection, cleanup, classification, and scientific understanding of data in every academic field and every industrial sector. While much work in probabilistic modelling has been based on hand-built models and laboriously-derived inference methods, future advances in model-based data science will require the development of much more powerful automated tools than currently exist.
- In the absence of such automated tools, probabilistic models have traditionally co-evolved with methods for performing inference. In both academic and industrial practice, specific modeling assumptions are made not because they are appropriate to the application domain, but because they are required to leverage existing software packages or inference methods. This intertwined nature of modeling and computation leaves much of the promise of probabilistic modeling out of reach for even expert data scientists. The emerging field of probabilistic programming will reduce the technical and cognitive overhead associated with writing and designing novel probabilistic models by both introducing a programming (modelling) language abstraction barrier and automating inference.
- The automation of inference, in particular, will lead to massive productivity gains for data scientists, much akin to how high-level programming languages and advances in compiler technology have transformed software developer productivity. What is more, not only will traditional data science be accelerated, but the number and kind of people who can do data science also will be dramatically increased.
- My talk will touch on all of this, explain how to develop such probabilistic programming languages, highlight some exciting ways such languages are starting to be used, and introduce what I think are some of the most important challenges facing the field as we go forward.

- Week 8 - 4th March - Simon Spencer (Warwick) and Panayiota Touloupou (Warwick) - "Model comparison with missing data using MCMC and importance sampling"
- Abstract: Selecting between competing statistical models is a challenging problem especially when the competing models are non-nested. We offer a simple solution by devising an algorithm which combines MCMC and importance sampling to obtain computationally efficient estimates of the marginal likelihood which can then be used to compare the models. The algorithm is successfully applied to longitudinal epidemic data and shown to outperform existing methods for computing the marginal likelihood.

- Week 9 - 11th March - Dootika Vats (Minnesota) - "Markov Chain Monte Carlo Output Analysis"
- Abstract: Markov chain Monte Carlo (MCMC) is a method of producing a correlated sample in order to estimate expectations with respect to a target distribution. A fundamental question is when should sampling stop so that we have good estimates of the desired quantities? The key to answering these questions lies in assessing the Monte Carlo error through a multivariate Markov chain central limit theorem. However, the multivariate nature of this Monte Carlo error has been ignored in the MCMC literature. I will give conditions for consistently estimating the asymptotic covariance matrix. Based on these theoretical results I present a relative standard deviation fixed volume sequential stopping rule for deciding when to terminate the simulation. This stopping rule is then connected to the notion of effective sample size, giving an intuitive, and theoretically justified approach to implementing the proposed method. The finite sample properties of the proposed method are then demonstrated in examples. The results presented in this talk are based on joint work with James Flegal (UC, Riverside) and Galin Jones (U of Minnesota).

- Week 10 - 18th March - Chris Sherlock (Lancaster) - "Adaptive, delayed-acceptance MCMC for targets with expensive likelihoods"
- Abstract: When conducting Bayesian inference, delayed acceptance (DA) Metropolis-Hastings (MH) algorithms and DA pseudo-marginal MH algorithms can be applied when it is computationally expensive to calculate the true posterior or an unbiased estimate thereof, but a computationally cheap approximation is available. A first accept-reject stage is applied, with the cheap approximation substituted for the true posterior in the MH acceptance ratio. Only for those proposals which pass through the first stage is the computationally expensive true posterior (or unbiased estimate thereof) evaluated, with a second accept-reject stage ensuring that detailed balance is satisfied with respect to the intended true posterior. In some scenarios there is no obvious computationally cheap approximation. A weighted average of previous evaluations of the computationally expensive posterior provides a generic approximation to the posterior. If only the k-nearest neighbours have non-zero weights then evaluation of the approximate posterior can be made computationally cheap provided that the points at which the posterior has been evaluated are stored in a multi-dimensional binary tree, known as a KD-tree. The contents of the KD-tree are potentially updated after every computationally intensive evaluation. The resulting adaptive, delayed-acceptance [pseudo-marginal] Metropolis-Hastings algorithm is justified both theoretically and empirically. Guidance on tuning parameters is provided and the methodology is applied to a discretely observed Markov jump process characterising predator-prey interactions and an ODE system describing the dynamics of an autoregulatory gene network.

### 2015/16 Term 3:

- Week 1 - 29th April - Arthur Gretton (UCL, Gatsby) - "Kernel nonparametric tests of homogeneity, independence, and multi-variable interaction"
- Abstract: We propose a kernel approach to nonparametric statistical testing, using representations of probability measures in reproducing kernel Hilbert spaces. We cover three settings: two-sample tests, of whether two samples are drawn from the same distribution or different distributions; independence tests, of whether observations of a pair of random variables are drawn from a distribution for which these are dependent; and three-variable interaction tests, of whether two variables jointly influence a third. The tests benefit from decades of machine research on kernels for various domains, and thus apply to distributions on high dimensional vectors, images, strings, graphs, groups, and semigroups, among others. The energy distance and distance covariance statistics are particular instances of these RKHS statistics.

- Week 2 - 6th May - Lewis Rendell (Warwick) - "Temperature Schedule Selection for Sequential Monte Carlo Samplers"
- Abstract: Sequential Monte Carlo samplers form a class of algorithms that apply sequential importance sampling to a sequence of unnormalised probability distributions defined on a common space. A frequent setting employs a sequence of distributions that move smoothly from some tractable distribution through several intermediate distributions to a more complex target, from which direct sampling using traditional MCMC schemes may not be straightforward (as might be the case for multimodal Bayesian posteriors). These intermediate distributions are determined by a sequence of tempering parameters, which must be chosen carefully to ensure efficient exploration of the space and to control the variance of resulting estimators. In this talk I shall discuss some previous approaches to this temperature selection problem, and present ongoing work on a procedure that aims to balance estimator variance and computational cost.

- Week 3 - 13th May - * Extended / Double session 11am-2pm *
- 11am-12pm - Flávio Gonçalves (UFMG) - "Exact Bayesian inference for diffusion driven Cox processes"
- Abstract: In this paper we present a novel inference methodology to perform Bayesian inference for Cox processes where the intensity function is driven by a diffusion process. The novelty of the method lies on the fact that no discretisation error is involved, despite the non-tractability of both the likelihood function and the transition density of the diffusion. The method is based on a Markov chain Monte Carlo algorithm that samples from the joint posterior distribution of the parameters and latent variables of the model. This is joint work with Gareth Roberts and Krzysztof Latuszynski.

- 12.45pm-1.45pm - Andrew Stuart (Warwick) - "The Bayesian Level Set Inversion"
- Abstract: The level set approach has proven widely successful in the study of inverse problems for interfaces, since its systematic development in the 1990s. Recently it has been employed in the context of Bayesian inversion, allowing for the quantification of uncertainty within the reconstruction of interfaces. However the Bayesian approach is very sensitive to the length and amplitude scales in the prior probabilistic model. This talk demonstrates how the scale-sensitivity can be circumvented by means of a hierarchical approach, using a single scalar parameter. Together with careful consideration of the development of algorithms which encode probability measure equivalences as the hierarchical parameter is varied, this leads to well-defined Gibbs based MCMC methods found by alternating Metropolis-Hastings updates of the level set function and the hierarchical parameter. These methods demonstrably outperform non-hierarchical Bayesian level set methods. Joint work with Matt Dunlop and M.A.Iglesias. ArXiv: 1601.03605

- 11am-12pm - Flávio Gonçalves (UFMG) - "Exact Bayesian inference for diffusion driven Cox processes"
- Week 4 - 20th May - Sara Wade (Warwick) - "MCMC inference for a Bayesian nonparametric regression model with normalized weights"
- Abstract: Bayesian nonparametric mixture models are popular tools for flexible density estimation. Under the presence of covariates, this flexible class of models can be extended by allowing the mixing measure to depend on the covariates. We propose an interpretable construction for covariate-dependent random mixing measures based on normalized weights. However, this construction introduces to an intractable normalizing constant which poses computational difficulties. An MCMC algorithm is developed to overcome this problem through the introduction of suitable latent variables. Finally, we will introduce an adaptive truncation algorithm (Griffin 2014) based adaptive Metropolis-Hastings with sequential Monte Carlo for faster, yet approximate, inference.

- Week 5 - 27th May - Mike Pitt (Kings) - "The Correlated Pseudo-Marginal Method"
- Abstract: The pseudo-marginal algorithm is a popular variant of the Metropolis--Hastings scheme which allows us to sample asymptotically from a target probability density π, when we are only able to estimate an unnormalized version of π pointwise unbiasedly. It has found numerous applications in Bayesian statistics as there are many scenarios where the likelihood function is intractable but can be estimated unbiasedly using Monte Carlo samples. Using many samples will typically result in averages computed under this chain with lower asymptotic variances than the corresponding averages that use fewer samples. For a fixed computing time, it has been shown in several recent contributions that an efficient implementation of the pseudo-marginal method requires the variance of the log-likelihood ratio estimator appearing in the acceptance probability of the algorithm to be of order 1, which in turn usually requires scaling the number N of Monte Carlo samples linearly with the number T of data points. We propose a modification of the pseudo-marginal algorithm, termed the correlated pseudo-marginal algorithm, which is based on a novel log-likelihood ratio estimator computed using the difference of two positively correlated log-likelihood estimators. We show that the parameters of this scheme can be selected such that the variance of this estimator is order 1 as N,T→∞ whenever N/T→0. By combining these results with the Bernstein-von Mises theorem, we provide an analysis of the performance of the correlated pseudo-marginal algorithm in the large T regime. In our numerical examples, the efficiency of computations is increased relative to the standard pseudo-marginal algorithm by more than 20 fold for values of T of a few hundreds to more than 100 fold for values of T of around 10,000-20,000. Joint work with George Deligiannidis, Arnaud Doucet

- Week 6 - 3rd June - Alex Beskos (UCL) - "Multilevel Sequential Monte Carlo Samplers"
- Abstract: Multilevel Monte-Carlo methods provide a powerful computational technique for reducing the computational cost of estimating expectations for a given computational effort. They are particularly relevant for computational problems when approximate distributions are determined via a resolution parameter h, with h=0 giving the theoretical exact distribution (e.g. SDEs or inverse problems with PDEs). The method provides a benefit by coupling samples from successive resolutions, and estimating differences of successive expectations. We develop a methodology that brings Sequential Monte-Carlo (SMC) algorithms within the framework of the Multilevel idea, as SMC provides a natural set-up for coupling samples over different resolutions. We prove that the new algorithm indeed preserves the benefits of the multilevel principle, even if samples at all resolutions are now correlated.

- Week 7 - 10th June - Heiko Strathmann (UCL, Gatsby) - "Kernel methods for adaptive Monte Carlo"
- Abstract: We introduce a general family of kernel-informed Monte Carlo algorithms for sampling from Bayesian posterior distributions. Our focus is on the Big Models regime, where posteriors often exhibit strong nonlinear correlations and evaluation of the target density (and its gradients) is analytically or computationally intractable. To construct efficient samplers for such cases, adaptive methods that learn the target's geometry are necessary. We present how kernel methods can be embedded into the adaptive Monte Carlo paradigm -- enabling to construct rich classes of proposals with attractive convergence and mixing properties. Our ideas are exemplified for three popular sampling techniques: Metropolis-Hastings, Hamiltonian Monte Carlo and Sequential Monte Carlo.

- Week 8 - 17th June - Sean Malory (Lancaster) - "On Approximately Simulating Conditioned Diffusions"
- Abstract: Importance sampling for paths of conditioned diffusions is a challenging problem, and a first step in inference for partially-observed diffusions using pseudo-marginal (or particle) MCMC. For multivariate diffusions one typically has to resort to a finite-dimensional approximation of the problem by forming a partition and approximating the transition density of the diffusion between consecutive points of the partition by, for example, a Euler-Maruyama step. Under such an approximation, one can utilize importance sampling by proposing discrete paths from a sensibly chosen proposal. In this talk I will introduce a new residual-bridge proposal for approximately simulating conditioned diffusions. This proposal is constructed by considering the difference between the true diffusion and a second, approximate diffusion driven by the same Brownian motion. It can be viewed as a natural extension to the recent work on residual-bridge constructs by Whitaker et al. (2015). I will illustrate that such a proposal can often lead to gains in efficiency over the residual-bridge constructs currently proposed in the literature.

- Week 9 - 24th June - Co-incides with the i-like workshop being held at Lancaster (22nd-24th June).
- Week 10 - 1st July - Marcin Mider (Warwick) and Andi Wang (Oxford)
- Marcin Mider (Warwick) - "Introduction to rejection sampling on a path space for diffusion bridges"
- Abstract: Simulating diffusions without introducing an approximation error is a challenging problem. Direct simulations from the transition densities are possible only in a limited number of cases. A methodology developed in (Beskos and Roberts 2005) based on the idea of rejection sampling on a path space allows us to substantially broaden the class of diffusions that can be simulated without error. Nonetheless, the methodology is applicable only under certain (sometimes) limiting conditions. Efficient simulation of conditioned diffusions under this scheme turns out to be particularly challenging. In this talk I will introduce the idea of rejection sampling on a path space and discuss some of its limitations. I will also present possible extensions for simulating conditioned diffusions. This presentation is a result of a project done for the OxWaSP Statistical Science PhD programme.

- Andi Wang (Oxford) - "Killed Diffusions for Simulation"
- Abstract: Imagine a particle undergoing Brownian motion in Euclidean space, subject to some state-dependent killing rate. Conditional on extended survival, the conditional distribution of the particle's location at large times will typically settle to a 'quasistationary' distribution. It turns out by defining the killing rate appropriately, we can target a large class of distributions this way. This novel approach to simulation has already found applications in big data Bayesian inference. In my talk I will introduce this method of simulation, and discuss an extension to general killed diffusions.

- Marcin Mider (Warwick) - "Introduction to rejection sampling on a path space for diffusion bridges"
- Week 11 - 8th July - Co-incides with the Retrospective Monte Carlo workshop (7th-8th July).