# Algorithms & Computationally Intensive Inference seminars

**Terms 1-3, Location C1.06, Fridays 12:15-14:00 (12:15-12:45 is an informal sandwich lunch). **

*Reminder emails are not sent to participants unless there is a change to the scheduled programme at short notice. If you would like to speak, or you want to be included in any emails, please contact one of the organisers.*

**Current Organisers**: Murray Pollock, Dootika Vats

*If you would like to talk, or have ideas for possible speakers, then please email one of the organisers above.*

**Website URL:** www.warwick.ac.uk/compstat

**Mailing List Sign-Up:** http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar

**Mailing List:** algorithmseminar@listserv.csv.warwick.ac.uk **(NB - only approved members can post)**

**2017/18 Term 1:**

**Week 1 - 6th October**- Axel Finke (UCL) - "On embedded hidden Markov models and particle Markov chain Monte Carlo methods"**Abstract:**The embedded hidden Markov model (EHMM) sampling method is an MCMC technique for state inference in non-linear non-Gaussian state-space models which was proposed in Neal (2003); Neal et al. (2004) and extended in Shestopaloff & Neal (2016). An extension to Bayesian parameter inference was presented in Shestopaloff & Neal (2013). An alternative class of MCMC schemes addressing similar inference problems is provided by particle MCMC (PMCMC) methods (Andrieu et al. 2009; 2010). All these methods rely on the introduction of artificial extended target distributions for multiple state sequences which, by construction, are such that one randomly indexed sequence is distributed according to the posterior of interest. By adapting the framework of PMCMC methods to the EHMM framework, we obtain novel particle filter (PF)-type algorithms for state inference, related to the class of "sequential MCMC" algorithms (e.g. Septier & Peters (2016) and parameter inference schemes. In addition, we show that most of these algorithms can be viewed as particular cases of a general PF and PMCMC framework. We demonstrate that a properly tuned conditional PF with "local" MCMC moves proposed in Shestopaloff & Neal (2016) can outperform the standard conditional PF significantly when applied to high-dimensional state-space models. We also derive theoretical guarantees for the novel (unconditional) PF-type algorithm and discuss why it could serve as an interesting alternative to standard PFs for likelihood estimation. This is joint work with Arnaud Doucet and Adam M. Johansen.

**Week 2 - 13th October**- Wilfrid Kendall (Warwick) - "Dirichlet forms and MCMC"**Abstract:**In this talk I will discuss the use of Dirichlet forms to deliver proofs of optimal scaling results for Markov chain Monte Carlo algorithms (specifically, Metropolis-Hastings random walk samplers) under regularity conditions which are substantially weaker than those required by the original approach (based on the use of infinitesimal generators). The Dirichlet form method has the added advantage of providing an explicit construction of the underlying infinite-dimensional context. In particular, this enables us directly to establish weak convergence to the relevant infinite-dimensional diffusion.

**Reference:**Zanella, G., Bédard, M., & Kendall, W. S. (2016). A Dirichlet Form approach to MCMC Optimal Scaling. To appear in Stochastic Processes and Their Applications. URL: arxiv.org/abs/1606.01528.

**Week 3 - 20th October**- Jure Vogrinc (Imperial) - "Asymptotic variance for Random walk Metropolis chains in high dimensions: logarithmic growth via the Poisson equation"**Abstract:**There are two ways of speeding up MCMC algorithms: (1) construct more complex samplers that use gradient and higher order information about the target and (2) design a control variate to reduce the asymptotic variance. The efficiency of (1) as a function of dimension has been studied extensively. The talk will focus on analogous results for (2), rigorous results linking the growth of the asymptotic variance with dimension. Specifically, for a d-dimensional Random walk Metropolis chain with an IID target I will present a control variate, for which the asymptotic variance of the corresponding estimator is bounded by a multiple of (log d)/d over the spectral gap of the chain. The control variate is constructed using the solution of the Poisson equation for the scaling limit in the seminal paper "Weak convergence and optimal scaling of random walk Metropolis algorithms" of Gelman, Gilks and Roberts. I will present the ideas behind the proof and discuss potential extensions and applications of the result.

**Week 4 - 27th October**- Andrew Duncan (Sussex) -"Measuring Sample Quality with Diffusions"**Abstract:**To improve the efficiency of Monte Carlo estimators, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. While a reduction in variance due to more rapid sampling can outweigh the bias introduced, the inexactness creates new challenges for parameter selection. In particular, standard measures of sample quality, such as effective sample size, do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein's method that quantifies the maximum discrepancy between sample and target expectations over a large class of test functions. We demonstrate this tool by comparing exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.

**Week 5 - 3rd November**- Mini Talks- Talk 1 - Cyril Chimisov (Warwick) - Internship experience at Google
- Talk 2 - Lewis Rendell (Warwick) - Global consensus Monte Carlo
**Abstract:**For problems involving large data sets, it is often convenient or necessary to distribute the data across multiple processors. Consider a target probability density function expressed as a product of terms, each being the contribution of one such subset of the data, with an additional 'prior' factor. We introduce an instrumental hierarchical model, associating an instrumental variable with each subset; these variables are conditionally independent when conditioned on a top-level parameter. This permits the construction of an MCMC algorithm on an extended state space, yielding an approximation of the target density as a marginal of its invariant distribution. Specifying the conditional distribution of the artificial variables allows the trade-off between computational tractability and fidelity to the original model to be controlled. In contrast to similar such algorithms, this approach requires few distributional assumptions; we illustrate its performance on some simulated examples, and suggest a number of directions for future investigation.

- Talk 3 - Murray Pollock (Warwick) - Contemporaneous MCMC

**Week 6 - 10th November**- Thibaut Lienart (Oxford) - "Expectation propagation for distributed approximated Bayesian inference"**Abstract:**In this talk, I will present the Expectation Propagation algorithm as compared to Variational Bayes and explain how it can be used in a generic distributed inference setting following [1]. In particular I will re-derive the EP updates in a fixed point framework similar to mirror descent algorithms exposing how the geometry of the problem can be used to obtain updates that are more robust to Monte Carlo noise.

The talk will be concluded by a critical discussion of variational methods in Bayesian inference.

**Week 7 - 17th November**- Short Talks- Talk 1 - Arne Gouwy (OxWaSP) - "Addressing continuous state space Stochastic Optimal Control problems with Iterative Auxiliary Particle Filters"
**Abstract:**There is a class of dynamic Stochastic Optimal Control (SOC) problems that can be reformulated as a minimization problem of a Kullback-Leibler divergence, by leveraging Feynman-Kac type transformations. This representation provides an expression for the probability distribution of a potentially optimally controlled system that can be evaluated pointwise up to an intractable normalizing constant. Since it has the form of a reweighted probability distribution of the controlled systems, the problem of estimating the optimal cost-to-go now appears in the form of an estimation problem. This enables the use of stochastic simulation techniques to approximate the optimal cost-to-go. Moreover, a better choice of control corresponds to a Radon-Nikodym derivative with a lower variance. Better proposal distributions to approximate the optimally controlled system with hence correspond to a better approximation for the optimal control. In this talk, we describe that class of SOC problems. Then we outline one way of the duality between those SOC problems and estimation problems. We finally sketch how iterative auxiliary particle filters can be used to tackle the resulting estimation problems, that typically exhibit a high variance. It addresses questions in SOC problems with a continuous state space of moderate dimension. We investigated this approach within a two month mini-project as part of the OxWaSP doctoral training program.

- Talk 2 - Marcin Mider (OxWaSP) - "Parameter Inference for discretely observed diffusions - reducing computational cost with "Blocking" technique"
**Abstract:**Diffusions find applications in many areas of science, including chemistry, genetics or finance. Statistical inference for these processes is complicated due to the intractability of the likelihood function. In this talk we focus on an unbiased method of inference (where the sole source of error is due to MCMC) introduced by Sermaidis et al. 2011. We propose to combine it with the Blocking technique (Shephard and Pitt 1997) and as a result extend the applicability of the Sermaidis et al. 2011 algorithm to sparsely observed diffusions. We provide partial results for proving the rates of reduction in the computational cost.

- Talk 1 - Arne Gouwy (OxWaSP) - "Addressing continuous state space Stochastic Optimal Control problems with Iterative Auxiliary Particle Filters"
**Week 8 - 24th November**- Ashley Ford (Bristol) - "Transform or Bounce - MCMC and HMC on constrained support"**Abstract:**Constraints on the support of a distribution can be dealt with either by transforming or "bouncing" of the constraint, in high dimensions care is needed in the choice. Inference for the infection times in the SIR epidemic model gives such an example where exact inference using HMC is possible although the likelihood is discontinuous and has constraints. Various approaches have been tried for this application, the reasons for poor performance of some approaches will be described. Some comparisons of a succesful approach with that of Xiang and Neal [1] will be given. Some open theoretical questions remain.- [1] Xiang, Fei and Neal, Peter, Efficient MCMC for temporal epidemics via parameter reduction, Computational Statistics & Data Analysis, 2014.

**Week 9 - 1st December**- Richard Wilkinson (Sheffield) - Gaussian process accelerated ABC- Abstract: Approximate Bayesian computation (ABC) is a class of algorithms used for doing Bayesian inference when you do not have access to the likelihood function. A major challenge for ABC methods is dealing with the computational cost that arises from needing to repeatedly run the simulator. In this talk, I will discuss surrogate modelling approaches to speeding up ABC, in which we seek to approximate the behaviour of the simulator, before inverting it to learn the parameters. The challenges for these types of approaches include: What should our target of approximation be? How should we build the surrogate? What design is best used for the simulator evaluation? And is a fully-probabilistic Bayes still the most sensible choice of inferential approach?

**Week 10 - 8th December**- Yvo Pokern (UCL) - "Gibbs Flow for Approximate Transport with Applications to Bayesian Computation"- Abstract: Given a prior and posterior distribution on , any measurable function T such that follows if X follows is called a transport map from to . If such a map were available analytically, one could simply map samples from the prior to samples from the posterior. Although it is usually impossible to obtain an explicit transport map for complex target distributions, it is shown here how to build a tractable approximation of a novel transport map. This is achieved by moving samples from using an ordinary differential equation whose drift is a function of the full conditional densities of the target distribution. Even when this ordinary differential equation is time-discretized and the full conditional distributions are approximated numerically, the resulting distribution of the mapped samples can be evaluated and used as a proposal/importance distribution within Markov chain Monte Carlo and sequential Monte Carlo schemes. It is shown experimentally that it can significantly improve performance of state-of-the-art sequential Monte Carlo methods for fixed computational complexity with examples up to dimension .

- Reference: https://arxiv.org/abs/1509.08787

- Abstract: Given a prior and posterior distribution on , any measurable function T such that follows if X follows is called a transport map from to . If such a map were available analytically, one could simply map samples from the prior to samples from the posterior. Although it is usually impossible to obtain an explicit transport map for complex target distributions, it is shown here how to build a tractable approximation of a novel transport map. This is achieved by moving samples from using an ordinary differential equation whose drift is a function of the full conditional densities of the target distribution. Even when this ordinary differential equation is time-discretized and the full conditional distributions are approximated numerically, the resulting distribution of the mapped samples can be evaluated and used as a proposal/importance distribution within Markov chain Monte Carlo and sequential Monte Carlo schemes. It is shown experimentally that it can significantly improve performance of state-of-the-art sequential Monte Carlo methods for fixed computational complexity with examples up to dimension .

**2017/18 Term 2:**

- Week 1 - 12th January
*(Lunch starts at 12:30 instead)*- Jere Koskela (Warwick) - "Scaling limits of sequential Monte Carlo algorithms"

- Abstract: Sequentual Monte Carlo (SMC) methods constitute a broad class of numerical approximation schemes for non-linear smoothing and filtering, rare event simulation, and many other applications. In brief, an ensemble of weighted particles is used to approximate a sequence of target densities. The ensemble is evolved from one target to the next by first resampling a new generation of particles proportional to the weights of the current ensemble, and then moving and reweighting the new generation in a manner which preserves the target. The resampling step induces a genealogy between particles, which has played a role in estimating mixing rates of SMC algorithms, variance of SMC estimates, as well as the memory cost of storing SMC output. The genealogy also plays a crucial role in the mixing of the particle Gibbs algorithm. I will briefly introduce SMC algorithms, and show that appropriately rescaled SMC genealogies converge to a well-understood process known as the Kingman coalescent under strong but standard assumptions in the infinite particle limit. This facilitates the a priori estimation of genealogical quantities, which I will demonstrate by showing that simulated genealogies match predicted results even for relatively few particles, and when the assumptions that are needed to prove convergence fail.

- Week 2 - 19th January - Kari Heine (Bath) - "Parallelising particle filters with butterfly interactions"
- Abstract: In modern computing systems an increase in the computational power is primarily obtained by increasing the number of parallel processing elements (PE) rather than by increasing the speed of an individual PE. While in many cases such parallel computing systems have enabled the completion of increasingly complex computational tasks, they can only do so if the task in question admits parallel computations. This talk focuses on an important class of algorithms lacking such inherent parallelism, namely the sequential Monte Carlo (SMC) methods, or particle filters. We consider some new parallel particle filter algorithms whose interaction diagram coincides with the butterfly diagram best known from the context of fast Fourier transform. We present some new convergence results and consider the potential of these algorithms based on numerical experiments. Joint work with Nick Whiteley (University of Bristol) and Ali Taylan Cemgil (Bogazici University).

- Week 3 - 26th January - Shahin Tavakoli (Warwick) - Talk Title / Abstract TBC

- Week 4 - 2nd February - Hongsheng Dai (Essex) - Talk Title / Abstract TBC

- Week 5 - 9th February - Massimo Cavallaro (Warwick) - Talk Title / Abstract TBC

- Week 6 - 16th February - David Jennings (Oxford) - Talk Title / Abstract TBC

- Week 7 - 23rd February - Andi Wang (Oxford) - Talk Title / Abstract TBC

- Week 8 - 2nd March - Patrick Rebeschini (Oxford) - Talk Title / Abstract TBC

- Week 9 - 9th March - Brooks Paige (ATI) - Talk Title / Abstract TBC

- Week 10 - 16th March - * Extended / Double session 11:15am-2pm *
- 11:15 - 12:15: Denis Villemonais (École des Mines de Nancy) - Talk Title / Abstract TBC
- 12:45 - 2:00: Louis Aslett (Durham) - Talk Title / Abstract TBC

**2017/18 Term 3:**

- Week 1 - 27th April - Maria Lomeli (Cambridge) - Talk Title / Abstract TBC

- Week 2 - 4th May - Jack Baker (Lancaster) - Talk Title / Abstract TBC

- Week 3 - 11th May - Nick Whiteley (Bristol) - Talk Title / Abstract TBC

- Week 4 - 18th May - Stephen Connor (York) - Talk Title / Abstract TBC

- Week 5 - 25th May - Valerio Perrone (Warwick) - Talk Title / Abstract TBC

- Week 6 - 1st June -
**Available**

- Week 7 - 8th June -
**Available**

- Week 8 - 15th June -
**Available**

- Week 9 - 22nd June -
**Available**

- Week 10 - 29th June -
*Cancelled (co-incides with 2018 ISBA World Meeting, Edinburgh, 24th-29th June 2018)* *Week 11 - 1st-6th July - MCQMC, Rennes, France**Week 12 - 9th-13th July - LMS Invited Lecture Series / CRISM Summer School on Computational Statistics*

**Previous Years:**

**Some key phrases**:

- Sampling and inference for diffusions

- Exact algorithms

- Intractable likelihood

- Pseudo-marginal algorithms

- Particle filters

- Importance sampling

- MCMC

- Adaptive MCMC

- Perfect simulation

- Markov chains...

- Random structures...

- Randomised algorithms...