Algorithms & Computationally Intensive Inference seminars
Terms 13, Location MB0.08, Fridays 12:1514:00 (12:1512:45 is an informal sandwich lunch).
Reminder emails are not sent to participants unless there is a change to the scheduled programme at short notice. If you would like to speak, or you want to be included in any emails, please contact one of the organisers.
Current Organisers: Richard Everitt, Jure Vogrinc
 If you would like to talk, or have ideas for possible speakers, then please email one of the organisers above.
Website URL: www.warwick.ac.uk/compstat
Mailing List SignUp: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar
Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB  only approved members can post)
2019/20 Term 1:
 Week 1  4th October  Felipe Medina Aguayo (Reading)  "Revisiting the balance heuristic for estimating normalising constants"
 Abstract: Multiple importance sampling estimators are widely used for computing intractable constants due to its reliability and robustness. The celebrated balance heuristic estimator belongs to this class of methods and has proved very successful in computer graphics. The basic ingredients for computing the estimator are: a set of proposal distributions, indexed by some discrete label, and a predetermined number of draws from each of these proposals. However, if the number of available proposals is much larger than the number of permitted importance points, one needs to select, possibly at random, which of these distributions will be used. We explore some improvements and variations of the balance heuristic via a novel extendedspace representation of the estimator, leading to straightforward annealing schemes for variance reduction purposes. We will also look at the intractable scenario where the proposal density is only available as a joint function with the discrete label, as may be encountered in problems where an ordering is imposed.
 Week 2  11th October  cancelled (OxWaSP workshop)
 Week 3  18th October  Alice Corbella (Warwick)  "Pseudo Marginal methods in practice: Inferring epidemics from multiple dependent data"

The evolution of an epidemic can be represented as a nonlinear dynamical system, combining a transmission model, approximating the unobserved infection process, with further stochastic models for the development and detection of symptoms.
Pseudo Marginal methods (Andrieu and Roberts 2009) provide a useful tool for the inference in dynamic system where the likelihood can be estimated by (sequential) Monte Carlo methods.
We formulate a model for Influenza dynamics in England when multiple, dependent data are available. We also propose a Monte Carlo estimate of the likelihood and use it in a Pseudo Marginal framework. We also present a misspecified model that does not account for dependence between data sources, leading to direct computation of the likelihood. We compare via simulation the inference in the two cases showing that when the dependence is disregarded, this leads to overefficient estimates.
Lastly, we present an analysis of real data, proving the usefulness of our model and the inference.

 Week 4  25th October  Lionel RiouDurand (Warwick)  "Noise contrastive estimation: Asymptotic properties, formal comparison with MCMLE"
 A statistical model is said to be unnormalised when its likelihood function involves an intractable normalising constant. Two popular methods for parameter inference for these models are MCMLE (Monte Carlo maximum likelihood estimation), and NCE (noise contrastive estimation); both methods rely on simulating artificial datapoints to approximate the normalising constant. While the asymptotics of MCMLE have been established under general hypotheses (Geyer, 1994), this is not so for NCE. We establish consistency and asymptotic normality of NCE estimators under mild assumptions. We compare NCE and MCMLE under several asymptotic regimes. In particular, we show that, when m goes to infinity while n is fixed (m and n being respectively the number of artificial datapoints, and actual datapoints), the two estimators are asymptotically equivalent. Conversely, we prove that, when the artificial datapoints are IID, and when n goes to infinity while m/n converges to a positive constant, the asymptotic variance of a NCE estimator is always smaller than the asymptotic variance of the corresponding MCMLE estimator. We illustrate the variance reduction brought by NCE through a numerical study.
 Week 5  1st November  Anthony Mulholland (Bristol)  "Material Map Inversion of Complex and Locally Anisotropic Media for Improved Imaging in NonDestructive Testing"

Imaging defects in austenitic welds presents a significant challenge for the ultrasonic nondestructive testing community. Due to the heating process during their manufacture, a polycrystalline structure develops, where large grains with locally anisotropic properties cause the ultrasonic waves to scatter and refract. When basic imaging algorithms which make straight ray and constant wave speed assumptions are applied to datasets arising from the inspection of these welds, the resulting defect reconstructions are often distorted and difficult to interpret correctly. Knowledge of the underlying material microstructure would allow correction of the expected wave travel times (on which timedomain imaging algorithms are typically based) and thus result in more reliable defect reconstructions. In this talk, an approximation to the underlying, locally anisotropic structure of the weld is constructed from ultrasonic phased array inspection data. A new forward model of wave front propagation in locally anisotropic media, the Anisotropic MultiStencil Fast Marching Method, is presented and used within the reversiblejump Markov chain Monte Carlo method to invert for the map of regional crystal orientations. This forward model and estimated map are then used as the basis for an imaging algorithm and the resulting reconstructions of defects embedded within these polycrystalline materials are used to measure the success of the approach. Receiver operating characteristic curves are then calculated to demonstrate the enhanced probability of detection achieved when the material map is accounted for and it is shown that the reconstructed material maps can significantly increase the probability of detection of defects in the far field.

 Week 6  8th November  Kathryn Leeming (Warwick)  "Local white noise testing using wavelets"

In this talk we motivate a local white noise test, testing for structure in observations or residuals at a local level, rather than globally. For this test we do not wish to make the assumption of Gaussian noise. Wavelets are shown to be a natural candidate for creating such a test, and an introduction to wavelets for statistical signal processing will be provided. Properties of the wavelet spectrum such as covariance and asymptotic distribution will be shown, enabling the local white noise test to be evaluated. Our local white noise test is compared in simulations to classical white noise tests, and its use is demonstrated on residuals of a time series. (Joint work with Guy Nason).

 Week 7  15th November  Nikolas Kantas (Imperial)  "Parameter estimation and sensor placement for the stochastic advection diffusion equation"

The talk is on performing inference for dissipative SPDE models using observations obtained by possibly a large number of sensors. This setup is relevant in applications such as environmental monitoring, numerical weather prediction, geosciences and more. When the model parameters (of the SPDE and sensors) are known approximating conditional probability laws given observations (i.e. filter distributions) is possible for nonlinear SPDE models using elaborate and expensive particle methods or using heuristics such as the EnKF. The question we would like to pose in this talk is how to estimate model parameters and how to place sensor optimally to minimise uncertainty in our filter estimates. We will focus only on the linear Gaussian case and in particular the stochastic advection diffusion equation. This is work in progress jointly with Louis Sharrock (Imperial).

 Week 8  22nd November  Chris Sherlock (Lancaster)  "Fast, exact inference for discretely observed Markov jump processes using finite rate matrices"

We consider inference for a discretely observed Markov jump process with an infinite statespace, such as occurs in biological and environmental systems. When the statespace is finite, inference for the resulting continuoustime Markov chain using its rate matrix is straightforwards, but in the case of an infinite state space the method of choice is particle MCMC. We provide a new method, the minimal extended statespace algorithm (MESA) for exact Bayesian inference, that uses finitedimensional rate matrices even though the statespace is infinite. In contrast to particle MCMC, MESA is most efficient when there is no observation noise, and becomes gradually less efficient as the observation noise increases.

 Week 9  29th November  cancelled (instead, please attend the Dept Colloquium by Terry Lyons (Oxford) in MS.02, 23pm on Tuesday 26th November)
 Week 10  6th December  Martina Favero (KTH Stockholm)  "Weak convergence for a sequence of coalescent processes"

We show weak convergence for a sequence of Markov chains constituted by the block counting process of the Kingman coalescent with parent independent mutations and by the process counting the corresponding number of mutations, as the sample size goes to infinity. Time is scaled and convergence of semigroups is proved. The limiting process consists of a deterministic part and of conditionally independent Poisson processes with varying intensity.
We explain how this result may be used to study the asymptotic efficiency of importance sampling algorithms on the coalescent with parent dependent mutations, and to determine the asymptotic behaviour of the sampling probabilities. This is a work in progress with H.Hult (KTH)

2019/20 Term 2:
 Week 1  10th January  cancelled (Bayes Comp 2020)
 Week 2  17th January  Alex Buchholz (Cambridge)  "Contributions to approximate Bayesian Inference"
 The Bayesian paradigm allows to carefully assess uncertainty in the predicted outcome in machine learning and statistical models. However, computing posterior distributions remains a difficult problem that has lead to approximate methods like MCMC sampling or Variational Inference approaches. In this talk I will present two recent contributions to the field of Bayesian computation. First I will introduce an improved learning algorithm for Variational Inference that is based on a variance reduction of stochastic gradients using quasiMonte Carlo sampling. The improved algorithm has strong guarantees and can readily be implemented in existing frameworks while speeding up computations substantially.
The second part of the talk will be dedicated to Bayesian model choice in large data settings. By using a divideandconquer approach that splits an initial dataset in chunks I achieve huge improvements in computing time that allow to run Bayesian inference algorithms in parallel with restricted communication. The suggested approach is applicable in settings that require privacy preservation. (See https://arxiv.org/abs/1807.01604 and https://arxiv.org/abs/1910.04672)
 The Bayesian paradigm allows to carefully assess uncertainty in the predicted outcome in machine learning and statistical models. However, computing posterior distributions remains a difficult problem that has lead to approximate methods like MCMC sampling or Variational Inference approaches. In this talk I will present two recent contributions to the field of Bayesian computation. First I will introduce an improved learning algorithm for Variational Inference that is based on a variance reduction of stochastic gradients using quasiMonte Carlo sampling. The improved algorithm has strong guarantees and can readily be implemented in existing frameworks while speeding up computations substantially.
 Week 3  24th January  Sam Livingstone (UCL)  "The Barker proposal: robust, gradientbased MCMC"

We consider the issue of robustness of MCMC algorithms with respect to heterogeneity in the target and their sensitivity to tuning. We show that the spectral gap of the Markov chains induced by classical gradientbased MCMC schemes (eg Langevin and Hamiltonian Monte Carlo) decays exponentially fast in the degree of mismatch between the scales of the proposal and target distributions, while for the random walk Metropolis (RWM) the decay is polynomial. We propose a novel and simple to implement gradientbased MCMC algorithm, inspired by the classical Barker acceptreject rule, with improved robustness properties. We illustrate with simulation studies how this type of robustness is particularly beneficial in the context of adaptive MCMC, giving examples in which the new scheme gives orders of magnitude improvements in efficiency over stateoftheart alternatives.

 Week 4  31st January  Brieuc Lehmann (Oxford)  "Scalable Bayesian sparsitypath analysis with the posterior bootstrap"

In classical penalised regression, it is common to perform model estimation across a range of regularisation parameter values, typically with the aim of maximising outofsample predictive performance. The analogue in the Bayesian paradigm is to place a sparsityinducing prior on the regression coefficients and explore a range of prior precisions. This, however, can be computationally challenging due to the need to generate a separate posterior distribution for each precision value. Here, we explore the use of the posterior bootstrap to scalably generate a posterior distribution over sparsitypaths. We develop an embarrassingly parallel method that exploits fast algorithms for computing classical regularisation paths and can thus handle large problems. We demonstrate our method on a sparse logistic regression example using genomic data from the UK Biobank. (Joint work with Chris Holmes, Gil McVean, Edwin Fong & Xilin Jiang)

 Week 5  7th February  Ioannis Kosmidis (Warwick)  "Empirical biasreducing adjustments to estimating functions"

Many popular methods for the reduction of estimation bias rely on an approximation of the bias function under the assumption that the model is correct and fully specified. Other bias reduction methods, like the bootstrap, the jackknife and indirect inference require fewer assumptions to operate but are typically computerintensive, requiring repeated optimizations.
We present a novel framework for reducing estimation bias that:
i) can deliver estimators with smaller bias than reference estimators even for partially specified models, as long as estimation is through unbiased estimating functions;
ii) always results in closedform biasreducing penalties to the objective function if estimation is through the maximization of one, like maximum likelihood and maximum composite likelihood.
iii) relies only on the estimating functions and/or the objective and their derivatives, greatly facilitating implementation for general modelling frameworks through numerical or automatic differentiation techniques and standard numerical optimization routines.
The biasreducing penalized objectives closely relate to information criteria for model selection based on the KullbackLeibler divergence, establishing, for the first time, a strong link between reduction of estimation bias and model selection. We also discuss the asymptotic efficiency properties of the new estimator, inference and model selection, and present illustrations in wellused, important modelling settings of varying complexity.

 Week 6  14th February  Dennis Prangle (Newcastle)  "Distilling importance sampling"

To be efficient, importance sampling requires the selection of an accurate proposal distribution. This talk describes learning a proposal distribution using optimisation over a flexible family of densities developed in machine learning: normalising flows. Training data is generated by running important sampling on a tempered version of the target distribution, and this is "distilled" by using it to train the normalising flow. Over many iterations of importance sampling and optimisation, the amount of tempering is slow reduced until an importance sampling proposal for an accurate target distribution is generated.
The method will be demonstrated for likelihoodfree inference on a queueing model, and likelihoodbased inference on a discretised SDE model.

 Week 7  21st February  Alisa Kirichenko (Oxford)  "Function estimation on large graphs"

In recent years there has been substantial interest in highdimensional estimation and prediction problems in the context of relational data. These can in many cases be viewed as highdimensional or nonparametric regression or classification problems in which the goal is to learn a ”smooth” function on a given graph.
We present a mathematical framework that allows to study the performance of function estimation methods on large graphs and derive the minimax convergence rates within the framework. We then present Bayesian estimation procedures that achieve (asymptotically) optimal regularization. Finally, we study the performance of these procedures on simulated and real life data.

 Week 8  28th February  cancelled (instead, please attend the Dept Colloquium by Judith Rousseau (Oxford) in OC0.03, 23pm on Tuesday 25th February)
 Week 9  6th March  Francesca Crucinio (Warwick)
 Week 10  13th March  Letizia Angeli (Warwick)
2019/20 Term 3:
 Week 1  24th April 
 Week 2  1st May 
 Week 3  8th May  cancelled (instead, please attend the Dept Colloquium)
 Week 4  15th May  BadrEddine ChériefAbdellatif (CREST, ENSAE Paris)
 Week 5  22nd May  Helen Ogden (Southampton)
 Week 6  29th May  Tobias Grafke (Warwick)
 Week 7  5th June  Ruth Baker (Oxford)
 Week 8  12th June  Jeremias Knoblauch (Warwick)
 Week 9  19th June 
 Week 10  26th June 
Previous Years:
Some key phrases:
 Sampling and inference for diffusions
 Exact algorithms
 Intractable likelihood
 Pseudomarginal algorithms
 Particle filters
 Importance sampling
 MCMC
 Adaptive MCMC
 Perfect simulation
 Markov chains...
 Random structures...
 Randomised algorithms...