2019-2020
2019/20 Term 1:
- Week 1 - 4th October - Felipe Medina Aguayo (Reading) - "Revisiting the balance heuristic for estimating normalising constants"
- Abstract: Multiple importance sampling estimators are widely used for computing intractable constants due to its reliability and robustness. The celebrated balance heuristic estimator belongs to this class of methods and has proved very successful in computer graphics. The basic ingredients for computing the estimator are: a set of proposal distributions, indexed by some discrete label, and a predetermined number of draws from each of these proposals. However, if the number of available proposals is much larger than the number of permitted importance points, one needs to select, possibly at random, which of these distributions will be used. We explore some improvements and variations of the balance heuristic via a novel extended-space representation of the estimator, leading to straightforward annealing schemes for variance reduction purposes. We will also look at the intractable scenario where the proposal density is only available as a joint function with the discrete label, as may be encountered in problems where an ordering is imposed.
- Week 2 - 11th October - cancelled (OxWaSP workshop)
- Week 3 - 18th October - Alice Corbella (Warwick) - "Pseudo Marginal methods in practice: Inferring epidemics from multiple dependent data"
-
The evolution of an epidemic can be represented as a non-linear dynamical system, combining a transmission model, approximating the unobserved infection process, with further stochastic models for the development and detection of symptoms.
Pseudo Marginal methods (Andrieu and Roberts 2009) provide a useful tool for the inference in dynamic system where the likelihood can be estimated by (sequential) Monte Carlo methods.
We formulate a model for Influenza dynamics in England when multiple, dependent data are available. We also propose a Monte Carlo estimate of the likelihood and use it in a Pseudo Marginal framework. We also present a misspecified model that does not account for dependence between data sources, leading to direct computation of the likelihood. We compare via simulation the inference in the two cases showing that when the dependence is disregarded, this leads to over-efficient estimates.
Lastly, we present an analysis of real data, proving the usefulness of our model and the inference.
-
- Week 4 - 25th October - Lionel Riou-Durand (Warwick) - "Noise contrastive estimation: Asymptotic properties, formal comparison with MC-MLE"
- A statistical model is said to be un-normalised when its likelihood function involves an intractable normalising constant. Two popular methods for parameter inference for these models are MC-MLE (Monte Carlo maximum likelihood estimation), and NCE (noise contrastive estimation); both methods rely on simulating artificial data-points to approximate the normalising constant. While the asymptotics of MC-MLE have been established under general hypotheses (Geyer, 1994), this is not so for NCE. We establish consistency and asymptotic normality of NCE estimators under mild assumptions. We compare NCE and MC-MLE under several asymptotic regimes. In particular, we show that, when m goes to infinity while n is fixed (m and n being respectively the number of artificial data-points, and actual data-points), the two estimators are asymptotically equivalent. Conversely, we prove that, when the artificial data-points are IID, and when n goes to infinity while m/n converges to a positive constant, the asymptotic variance of a NCE estimator is always smaller than the asymptotic variance of the corresponding MC-MLE estimator. We illustrate the variance reduction brought by NCE through a numerical study.
- Week 5 - 1st November - Anthony Mulholland (Bristol) - "Material Map Inversion of Complex and Locally Anisotropic Media for Improved Imaging in Non-Destructive Testing"
-
Imaging defects in austenitic welds presents a significant challenge for the ultrasonic non-destructive testing community. Due to the heating process during their manufacture, a polycrystalline structure develops, where large grains with locally anisotropic properties cause the ultrasonic waves to scatter and refract. When basic imaging algorithms which make straight ray and constant wave speed assumptions are applied to datasets arising from the inspection of these welds, the resulting defect reconstructions are often distorted and difficult to interpret correctly. Knowledge of the underlying material microstructure would allow correction of the expected wave travel times (on which time-domain imaging algorithms are typically based) and thus result in more reliable defect reconstructions. In this talk, an approximation to the underlying, locally anisotropic structure of the weld is constructed from ultrasonic phased array inspection data. A new forward model of wave front propagation in locally anisotropic media, the Anisotropic Multi-Stencil Fast Marching Method, is presented and used within the reversible-jump Markov chain Monte Carlo method to invert for the map of regional crystal orientations. This forward model and estimated map are then used as the basis for an imaging algorithm and the resulting reconstructions of defects embedded within these polycrystalline materials are used to measure the success of the approach. Receiver operating characteristic curves are then calculated to demonstrate the enhanced probability of detection achieved when the material map is accounted for and it is shown that the reconstructed material maps can significantly increase the probability of detection of defects in the far field.
-
- Week 6 - 8th November - Kathryn Leeming (Warwick) - "Local white noise testing using wavelets"
-
In this talk we motivate a local white noise test, testing for structure in observations or residuals at a local level, rather than globally. For this test we do not wish to make the assumption of Gaussian noise. Wavelets are shown to be a natural candidate for creating such a test, and an introduction to wavelets for statistical signal processing will be provided. Properties of the wavelet spectrum such as covariance and asymptotic distribution will be shown, enabling the local white noise test to be evaluated. Our local white noise test is compared in simulations to classical white noise tests, and its use is demonstrated on residuals of a time series. (Joint work with Guy Nason).
-
- Week 7 - 15th November - Nikolas Kantas (Imperial) - "Parameter estimation and sensor placement for the stochastic advection diffusion equation"
-
The talk is on performing inference for dissipative SPDE models using observations obtained by possibly a large number of sensors. This setup is relevant in applications such as environmental monitoring, numerical weather prediction, geosciences and more. When the model parameters (of the SPDE and sensors) are known approximating conditional probability laws given observations (i.e. filter distributions) is possible for non-linear SPDE models using elaborate and expensive particle methods or using heuristics such as the EnKF. The question we would like to pose in this talk is how to estimate model parameters and how to place sensor optimally to minimise uncertainty in our filter estimates. We will focus only on the linear Gaussian case and in particular the stochastic advection diffusion equation. This is work in progress jointly with Louis Sharrock (Imperial).
-
- Week 8 - 22nd November - Chris Sherlock (Lancaster) - "Fast, exact inference for discretely observed Markov jump processes using finite rate matrices"
-
We consider inference for a discretely observed Markov jump process with an infinite statespace, such as occurs in biological and environmental systems. When the statespace is finite, inference for the resulting continuous-time Markov chain using its rate matrix is straightforwards, but in the case of an infinite state space the method of choice is particle MCMC. We provide a new method, the minimal extended statespace algorithm (MESA) for exact Bayesian inference, that uses finite-dimensional rate matrices even though the statespace is infinite. In contrast to particle MCMC, MESA is most efficient when there is no observation noise, and becomes gradually less efficient as the observation noise increases.
-
- Week 9 - 29th November - cancelled (instead, please attend the Dept Colloquium by Terry Lyons (Oxford) in MS.02, 2-3pm on Tuesday 26th November)
- Week 10 - 6th December - Martina Favero (KTH Stockholm) - "Weak convergence for a sequence of coalescent processes"
-
We show weak convergence for a sequence of Markov chains constituted by the block counting process of the Kingman coalescent with parent independent mutations and by the process counting the corresponding number of mutations, as the sample size goes to infinity. Time is scaled and convergence of semigroups is proved. The limiting process consists of a deterministic part and of conditionally independent Poisson processes with varying intensity.
We explain how this result may be used to study the asymptotic efficiency of importance sampling algorithms on the coalescent with parent dependent mutations, and to determine the asymptotic behaviour of the sampling probabilities. This is a work in progress with H.Hult (KTH)
-
2019/20 Term 2:
- Week 1 - 10th January - cancelled (Bayes Comp 2020)
- Week 2 - 17th January - Alex Buchholz (Cambridge) - "Contributions to approximate Bayesian Inference"
- The Bayesian paradigm allows to carefully assess uncertainty in the predicted outcome in machine learning and statistical models. However, computing posterior distributions remains a difficult problem that has lead to approximate methods like MCMC sampling or Variational Inference approaches. In this talk I will present two recent contributions to the field of Bayesian computation. First I will introduce an improved learning algorithm for Variational Inference that is based on a variance reduction of stochastic gradients using quasi-Monte Carlo sampling. The improved algorithm has strong guarantees and can readily be implemented in existing frameworks while speeding up computations substantially.
The second part of the talk will be dedicated to Bayesian model choice in large data settings. By using a divide-and-conquer approach that splits an initial dataset in chunks I achieve huge improvements in computing time that allow to run Bayesian inference algorithms in parallel with restricted communication. The suggested approach is applicable in settings that require privacy preservation. (See https://arxiv.org/abs/1807.01604 and https://arxiv.org/abs/1910.04672)
- The Bayesian paradigm allows to carefully assess uncertainty in the predicted outcome in machine learning and statistical models. However, computing posterior distributions remains a difficult problem that has lead to approximate methods like MCMC sampling or Variational Inference approaches. In this talk I will present two recent contributions to the field of Bayesian computation. First I will introduce an improved learning algorithm for Variational Inference that is based on a variance reduction of stochastic gradients using quasi-Monte Carlo sampling. The improved algorithm has strong guarantees and can readily be implemented in existing frameworks while speeding up computations substantially.
- Week 3 - 24th January - Sam Livingstone (UCL) - "The Barker proposal: robust, gradient-based MCMC"
-
We consider the issue of robustness of MCMC algorithms with respect to heterogeneity in the target and their sensitivity to tuning. We show that the spectral gap of the Markov chains induced by classical gradient-based MCMC schemes (eg Langevin and Hamiltonian Monte Carlo) decays exponentially fast in the degree of mismatch between the scales of the proposal and target distributions, while for the random walk Metropolis (RWM) the decay is polynomial. We propose a novel and simple to implement gradient-based MCMC algorithm, inspired by the classical Barker accept-reject rule, with improved robustness properties. We illustrate with simulation studies how this type of robustness is particularly beneficial in the context of adaptive MCMC, giving examples in which the new scheme gives orders of magnitude improvements in efficiency over state-of-the-art alternatives.
-
- Week 4 - 31st January - Brieuc Lehmann (Oxford) - "Scalable Bayesian sparsity-path analysis with the posterior bootstrap"
-
In classical penalised regression, it is common to perform model estimation across a range of regularisation parameter values, typically with the aim of maximising out-of-sample predictive performance. The analogue in the Bayesian paradigm is to place a sparsity-inducing prior on the regression coefficients and explore a range of prior precisions. This, however, can be computationally challenging due to the need to generate a separate posterior distribution for each precision value. Here, we explore the use of the posterior bootstrap to scalably generate a posterior distribution over sparsity-paths. We develop an embarrassingly parallel method that exploits fast algorithms for computing classical regularisation paths and can thus handle large problems. We demonstrate our method on a sparse logistic regression example using genomic data from the UK Biobank. (Joint work with Chris Holmes, Gil McVean, Edwin Fong & Xilin Jiang)
-
- Week 5 - 7th February - Ioannis Kosmidis (Warwick) - "Empirical bias-reducing adjustments to estimating functions"
-
Many popular methods for the reduction of estimation bias rely on an approximation of the bias function under the assumption that the model is correct and fully specified. Other bias reduction methods, like the bootstrap, the jackknife and indirect inference require fewer assumptions to operate but are typically computer-intensive, requiring repeated optimizations.
We present a novel framework for reducing estimation bias that:
i) can deliver estimators with smaller bias than reference estimators even for partially specified models, as long as estimation is through unbiased estimating functions;
ii) always results in closed-form bias-reducing penalties to the objective function if estimation is through the maximization of one, like maximum likelihood and maximum composite likelihood.
iii) relies only on the estimating functions and/or the objective and their derivatives, greatly facilitating implementation for general modelling frameworks through numerical or automatic differentiation techniques and standard numerical optimization routines.
The bias-reducing penalized objectives closely relate to information criteria for model selection based on the Kullback-Leibler divergence, establishing, for the first time, a strong link between reduction of estimation bias and model selection. We also discuss the asymptotic efficiency properties of the new estimator, inference and model selection, and present illustrations in well-used, important modelling settings of varying complexity.
-
- Week 6 - 14th February - Dennis Prangle (Newcastle) - "Distilling importance sampling"
-
To be efficient, importance sampling requires the selection of an accurate proposal distribution. This talk describes learning a proposal distribution using optimisation over a flexible family of densities developed in machine learning: normalising flows. Training data is generated by running important sampling on a tempered version of the target distribution, and this is "distilled" by using it to train the normalising flow. Over many iterations of importance sampling and optimisation, the amount of tempering is slow reduced until an importance sampling proposal for an accurate target distribution is generated.
The method will be demonstrated for likelihood-free inference on a queueing model, and likelihood-based inference on a discretised SDE model.
-
- Week 7 - 21st February - Alisa Kirichenko (Oxford) - "Function estimation on large graphs"
-
In recent years there has been substantial interest in high-dimensional estimation and prediction problems in the context of relational data. These can in many cases be viewed as high-dimensional or nonparametric regression or classification problems in which the goal is to learn a ”smooth” function on a given graph.
We present a mathematical framework that allows to study the performance of function estimation methods on large graphs and derive the minimax convergence rates within the framework. We then present Bayesian estimation procedures that achieve (asymptotically) optimal regularization. Finally, we study the performance of these procedures on simulated and real life data.
-
- Week 8 - 28th February - cancelled (instead, please attend the Dept Colloquium by Judith Rousseau (Oxford) in OC0.03, 2-3pm on Tuesday 25th February)
- Week 9 - 6th March - Francesca Crucinio (Warwick) - "Sequential Monte Carlo for Fredholm Integral Equations of the First Kind"
-
We present a novel method for the solution of Fredholm integral equations of the first kind, a set of ill-posed inverse problems which model, among others, reconstruction of images from distorted noisy observations and indirect density estimation. This novel method is based upon a non-standard sequential Monte Carlo (SMC) algorithm which provides a stochastic discretisation of a smoothed expectation maximisation scheme (EMS) usually implemented under the assumption of piecewise constant solutions. The stochastic discretisation provided by SMC does not assume piecewise constant signals and results in smooth approximate solutions. We analyse the theoretical properties of the EMS iteration, showing existence of a fixed point, and of the corresponding SMC algorithms. We compare the novel method with alternatives using a simulation study and present results for realistic systems, including motion deblurring and reconstruction of cross-section images of the brain from positron emission tomography.
-
- Week 10 - 13th March - Letizia Angeli (Warwick) - "Interacting Particle Systems Approximations of Feynman-Kac Formulae in continuous time"
-
In this talk, I will present a class of numerical algorithms - based on the evolution of interacting particle systems - for approximating quantities associated with the large deviations of continuous-time pure jump processes. Adapting already established results for sequential Monte Carlo methods, we can study the convergence properties and error bounds of a broader class of particle approximations.
-
2019/20 Term 3:
- Friday 12th June, 13:00 - Jere Koskela (Warwick) - "Non-reversible MCMC for coalescent trees"
-
Coalescent processes are gold-standard models in population genetics, but typically give rise to intractable posterior distributions. Approaches based on Metropolis-Hastings are known to scale poorly with the number of samples. Design of sophisticated, faster-mixing proposal distributions is complicated by the fact that state spaces consist of both discrete binary tree topologies and continuous branch lengths, which makes it difficult to leverage e.g. gradient information. I will introduce two prototypical coalescent models, and show that a non-reversible zig-zag process targeting the corresponding posteriors can be implemented by embedding the tree topologies into a continuous space. The zig-zag process can outperform a well-tuned Metropolis-algorithm by several orders of magnitude in some cases, but also comes with its own pitfalls. I will also discuss how similar ideas can be used to implement zig-zag processes on discrete state spaces more generally.
-
- Friday 26th June, 13:00 - Sebastiano Grazzi (TU Delft) - "The Boomerang sampler"
- In this seminar, I will present the Boomerang sampler, a novel-class of piecewise deterministic Monte Carlo method. The methodology begins by representing the target density as a density, e^{−U} with respect to a prescribed Gaussian reference measure and constructs a continuous trajectory consisting of a piecewise elliptical path. The process moves from one elliptical orbit to another one according to a rate function which can be written in terms of U. The dynamics of the sampler and the possibility to adopt exact subsampling make the sampler suitable for Big Data applications and high dimensional problems. As a challenging non-trivial example, I will explain how we simulate diffusion bridges. The main idea is to expand the diffusion bridge in a truncated Faber-Schauder basis and use a factorised version of the Boomerang sampler to sample the coefficients within the basis.
The talk is based on the upcoming paper "The Boomerang sampler" and is joint work with Joris Bierkens, Kengo Kamatani and Gareth Roberts". The application of diffusion bridges is built on the preliminary work available on Arxiv: "A piecewise deterministic Monte Carlo method for diffusion bridges", joint work with Joris Bierkens, Frank van der Meulen and Moritz Schauer.
- In this seminar, I will present the Boomerang sampler, a novel-class of piecewise deterministic Monte Carlo method. The methodology begins by representing the target density as a density, e^{−U} with respect to a prescribed Gaussian reference measure and constructs a continuous trajectory consisting of a piecewise elliptical path. The process moves from one elliptical orbit to another one according to a rate function which can be written in terms of U. The dynamics of the sampler and the possibility to adopt exact subsampling make the sampler suitable for Big Data applications and high dimensional problems. As a challenging non-trivial example, I will explain how we simulate diffusion bridges. The main idea is to expand the diffusion bridge in a truncated Faber-Schauder basis and use a factorised version of the Boomerang sampler to sample the coefficients within the basis.
- Friday 17th July, 13:00 - Andi Wang (Bristol) - "Regeneration-enriched Markov processes"
-
In recent years there has been a surge in interest in MCMC samplers based on continuous-time dynamics and nonreversible processes. In my talk I will show that it is possible to build a continuous-time nonreversible sampler that also possesses regenerations – stopping times at which the process ‘starts afresh’, in an independent and identically distributed manner. This will be done by enriching a given Markov process – which need not be invariant with respect to the target – with regenerations at the arrival times of an inhomogeneous Poisson process. Regeneration times form the backbone of traditional Markov chain theory, and impart many practical benefits such as straightforward central limit theorems, conditions for uniform ergodicity and parallelizability. I will also show that it is possible in certain cases to establish a coupling from the past approach to obtain exact draws from the target.
Joint work with Murray Pollock, Gareth Roberts and David Steinsaltz.
https://arxiv.org/abs/1910.05037 , to appear in Ann. Appl. Probab.
-