Skip to main content Skip to navigation


Christophe Andrieu (Bristol)

Estimating likelihood ratios in latent variable models and its application in MCMC

(joint work with Arnaud Doucet and Sinan Yildirim)

The probabilistic modelling of observed phenomena sometimes require the introduction of (unobserved) latent variables, which may or may not be of direct interest. This is for example the case when a realisation of a Markov chain is observed in noise and one is interested in inferring its transition matrix from the data. In such models inferring the parameters of interest (e.g. the transition matrix above) requires one to incorporate the latent variables in the inference procedure, resulting in practical difficulties. The standard approach to carry out inference in such models consists of integrating the latent variables numerically, most often using Monte Carlo methods. In the toy example above there are as many latent variables as there are observations, making the problem high-dimensional and potentially difficult.

We will show how recent advances in Markov chain Monte Carlo methods, in particular the development of “exact approximations” of the Metropolis-Hastings algorithm (which will be reviewed), can lead to algorithms which scale better than existing solutions.

Yves Atchadé (Michigan)

A Scalable quasi-Bayesian framework for graphical models

Doubly-intractable posterior distributions can be handled either by specialized Markov Chain Monte Carlo algorithms, or by developing a quasi-likelihood approximation of the statistical model that is free of intractable normalizing constants. For high-dimensional problems, the latter approach is more tractable and is the focus of this talk. We discuss how this approach applies to high-dimensional graphical models. And we present some results on the contraction properties of the resulting quasi-posterior distributions. Computational aspects will also be discussed.

Michael Betancourt (Warwick)

Adiabatic Monte Carlo

By using local information to guide the exploration of a target distribution, Markov Chain Monte Carlo, in particular modern implementations like Hamiltonian Monte Carlo, has been a cornerstone of modern statistical computation. Unfortunately this local information is not generally sufficient to admit computations that require global information, such as estimating expectations with respect to multimodal distributions or marginal likelihoods. When coupled with an interpolation between the target distribution and a simpler auxiliary distribution, however, Markov Chain Monte Carlo can be an important component, for example in simulated annealing, simulated tempering, and their variants. Unfortunately, determining an effective interpolation is a challenging tuning problem that hampers these methods in practice.

In this talk I will show how the same differential geometry from which Hamiltonian Monte Carlo is built can also be used to construct an optimal interpolation dynamically, with no user intervention. I will then present the resulting Adiabatic Monte Carlo algorithm with discussion of its promise and some of the open problems in its general implementation.

Nicolas Chopin (ENSAE)

The Poisson transform for unnormalised statistical models

(joint work with Simon Barthelmé, GIPSA-LAB, Grenoble)
Paper available at:

Contrary to standard statistical models, unnormalised statistical models only specify the likelihood function up to a constant. While such models are natural and popular, the lack of normalisation makes inference much more difficult. Here we show that inferring the parameters of a unnormalised model on a space Ω can be mapped onto an equivalent problem of estimating the intensity of a Poisson point process on Ω. The unnormalised statistical model now specifies an intensity function that does not need to be normalised. Effectively, the normalisation constant may now be inferred as just another parameter, at no loss of information. The result can be extended to cover non- IID models, which includes for example unnormalised models for sequences of graphs (dynamical graphs), or for sequences of binary vectors. As a consequence, we prove that unnormalised parametric inference in non-IID models can be turned into a semi- parametric estimation problem. Moreover, we show that the noise-contrastive divergence of Gutmann & Hyvarinen (2012) can be understood as an approximation of the Poisson transform, and extended to non-IID settings. We use our results to fit spatial Markov chain models of eye movements, where the Poisson transform allows us to turn a highly non-standard model into vanilla semi-parametric logistic regression.

Michael Gutmann (Helsinki)

Noise-contrastive estimation and its generalizations

Parametric statistical models are often not properly normalized, that is, they do not integrate to unity. While unnormalized models can, in principle, be normalized by dividing them by their integral, the cost of computing the integral is generally prohibitively large. This is an issue because without normalization, the likelihood function is not available for performing inference.

I present a method called "noise-contrastive estimation" where unnormalized models are estimated by solving a classification problem. I explain some of its properties and applications, and show that it is part of a general estimation framework based on the Bregman divergence.

Related papers:

Merrilee Hurn (Bath)

Power posteriors +

One of the approaches available for estimating marginal likelihoods is thermodynamic integration. This talk will consider the method of power posteriors and recent work by various authors to maximise its efficiency and accuracy.

Pierre Jacob (Harvard)

Coupling Particle Systems

In the state-space models, the normalizing constant refers to the likelihood at a given parameter value, of which particle filters give unbiased estimators. In many settings, the interest does not lie in the value of the constant itself, but in the comparison of the normalizing constants associated with different parameters. Such a comparison is facilitated by introducing positive correlations between the estimators produced by particle filters. We propose coupled resampling schemes that increase the correlation between two particle systems. The resulting algorithms improve the precision of finite-difference estimators of the score vector, and can be used in correlated pseudo-marginal algorithms. Furthermore, the coupled resampling schemes can be embedded into debiasing algorithms, yielding unbiased estimators of expectations with respect to the smoothing distribution. We will discuss the pros and cons compared to particle MCMC.

Adam Johansen (Warwick)
Some Perspectives on Sequential Monte Carlo and Normalising Constants

I will discuss the use of sequential Monte Carlo (SMC) methods to "estimate" (ratios of) normalising constants. I will begin with an introduction to SMC and its relationship to the approximation of normalising constants and move on to discuss some more recent ideas including some personal perspectives on interesting features of this approach and some open problems.

Anne-Marie Lyne (Institut Curie)

Russian roulette estimates for Bayesian inference of doubly-intractable models

Doubly-intractable posterior distributions arise when the likelihood has an intractable normalising term which is a function of the unknown parameters. This occurs in a range of situations, but is most common when the data are viewed as realisations of a random graph, with the nodes representing random variables, and the edges representing a probabilistic interaction between nodes. It is difficult to carry out Bayesian parameter inference over such models, as the intractability of the normalising term means that standard sampling techniques such as the Metropolis-Hastings (MH) algorithm cannot be used.

We use Pseudo-marginal Markov chain Monte Carlo (MCMC) methodology - in which an unbiased estimate of the target can be used in place of the exact target in the MH acceptance ratio and remarkably the Markov chain converges to the same invariant distribution. To implement this approach we express the target distribution as an infinite series which is then truncated unbiasedly. As the positivity of these estimates cannot be guaranteed, we use the absolute value of the estimate in the MH acceptance ratio and afterwards correct the samples so that expectations with respect to the exact posterior are obtained. The technique is illustrated on a number of problems such as the 2-D Ising model and the Fisher-Bingham distribution.

Jean-Michel Marin (Montpellier)

Hidden Gibbs random fields model selection using Block Likelihood Information Criterion

Performing model selection between Gibbs random fields is a very challenging task. Indeed, because of the Markovian dependence structure, the normalizing constant of the fields cannot be computed using standard analytical or numerical methods. Furthermore, such unobserved fields cannot be integrated out, and the likelihood evaluation is a doubly intractable problem. This forms a central issue to pick the model that best fits an observed data. We introduce a new approximate version of the Bayesian Information Criterion (BIC). We partition the lattice into contiguous rectangular blocks, and we approximate the probability measure of the hidden Gibbs field by the product of some Gibbs distributions over the blocks. On that basis, we estimate the likelihood and derive the Block Likelihood Information Criterion (BLIC) that answers model choice questions such as the selection of the dependence structure or the number of latent states. We study the performances of BLIC for those questions. In addition, we present a comparison with ABC algorithms to point out that the novel criterion offers a better trade-off between time efficiency and reliable results.

Antonietta Mira (Lugano, Switzerland and Como, Italy)

Reduced-Variance Estimation with Intractable Likelihoods

(joint work with N. Friel and C. Oates)

Many popular statistical models for complex phenomena are intractable, in the sense that the likelihood function cannot easily be evaluated. Bayesian estimation in this setting remains challenging, with a lack of computational methodology to fully exploit modern processing capabilities. We introduce novel control variates for intractable likelihoods that can reduce the Monte Carlo variance of Bayesian estimators, in some cases dramatically. We prove that these control variates are well-defined and provide a positive variance reduction. Furthermore we derive optimal tuning parameters that are targeted at optimising this variance reduction. The methodology is highly parallel and offers a route to exploit multi-core processing architectures for Bayesian estimation that complements recent research in this direction. Results presented on the Ising model, exponential random graphs and non-linear stochastic dierential equations are consistent with our theoretical findings.

Sumit Mukherjee (Columbia)

Mean field Ising models

(joint work with Anirban Basak, Duke University)

In this talk we consider the asymptotics of the log partition function of an Ising model on a sequence of finite but growing graphs/matrices. We give a sufficient condition for the mean field prediction to the log partition function to be asymptotically tight, which in particular covers all regular graphs with degree going to infinity. We show via several examples that our condition is "almost necessary" as well.

As application of our result, we derive the asymptotics of the log partition function for approximately regular graphs, and bi-regular bi-partite graphs. We also re-derive analogous results for a sequence of graphs convering in cut metric.

Chris Sherlock (Lancaster)

Pseudo-marginal MCMC using averages of unbiased estimators

(joint work with Alexandre Thiery, National University of Singapore)

We consider pseudo-marginal MCMC where the unbiased estimator of the posterior is constructed using an average of exchangeable unbiased estimators, and compare the efficiency of a chain which uses the average of m estimators to that of a chain which uses just one of the estimators. Recent theory has shown that the chain that uses all m estimators mixes better than the chain that uses only one of them. We provide theoretical bounds on the improvement in mixing efficiency obtainable by averaging the m estimators and, motivated by this theory and by simulation studies, we discuss the translation to a choice of m for optimal computational efficiency. Staying with averages, we then consider the recent innovation of correlated pseudo-marginal MCMC.

Panayiota Touloupou (Warwick)

Bayesian model selection for partially observed epidemic models

(joint work with Simon Spencer, Bärbel Finkenstädt Rand, Peter Neal and TJ McKinley)

Bayesian model choice considers the evidence in favour of candidate models, where in this instance each model reflects an epidemiologically important hypothesis. Model selection for epidemic models is challenging due to the need to impute a large amount of missing data, in the form of unobserved infection and recovery times. The incompleteness of the data makes the computation of the marginal likelihood, which is used to measure the evidence in favour of each model, intractable and therefore we need to find an effective way of estimating it.

In this talk, we describe an algorithm which combines MCMC and importance sampling to obtain computationally efficient estimates of the marginal likelihood in the context of epidemiology. We compare the proposed approach with several alternative methods under various simulation setups. The method is used to further our understanding of transmission dynamics of Escherichia coli O157:H7 in cattle.

Roberto Trotta (Imperial)

Estimating Bayes Factors for Cosmological Applications

Bayesian model comparison in cosmology is often used as the statistical framework of choice to select among competing physical models for complex and sophisticated datasets, ranging from measurements of temperature differences in the relic radiation from the Big Bang to data on the location of hundreds of thousands of galaxies in the visible Universe.

In this talk I will review algorithmic solutions to the problem of estimating the Bayesian evidence, necessary for computing the Bayes factor, that have been developed in cosmology. I will focus in particular on nested sampling based techniques, like the MultiNest and PolyChord algorithms, and recent machine learning techniques to accelerate their computation. I will also present a computationally useful shortcut to the determination of Bayes factor for nested models, namely the Savage-Dickey density ratio.