# Algorithms & Computationally Intensive Inference seminars

**Terms 1-3, Location C1.06, Fridays 12:15-14:00 (12:15-12:45 is an informal sandwich lunch). **

*Reminder emails are not sent to participants unless there is a change to the scheduled programme at short notice. If you would like to speak, or you want to be included in any emails, please contact one of the organisers.*

**Current Organisers**: Murray Pollock, Dootika Vats

*If you would like to talk, or have ideas for possible speakers, then please email one of the organisers above.*

**Website URL:** www.warwick.ac.uk/compstat

**Mailing List Sign-Up:** http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar

**Mailing List:** algorithmseminar@listserv.csv.warwick.ac.uk **(NB - only approved members can post)**

**2017/18 Term 1:**

**Week 1 - 6th October**- Axel Finke (UCL) - "On embedded hidden Markov models and particle Markov chain Monte Carlo methods"**Abstract:**The embedded hidden Markov model (EHMM) sampling method is an MCMC technique for state inference in non-linear non-Gaussian state-space models which was proposed in Neal (2003); Neal et al. (2004) and extended in Shestopaloff & Neal (2016). An extension to Bayesian parameter inference was presented in Shestopaloff & Neal (2013). An alternative class of MCMC schemes addressing similar inference problems is provided by particle MCMC (PMCMC) methods (Andrieu et al. 2009; 2010). All these methods rely on the introduction of artificial extended target distributions for multiple state sequences which, by construction, are such that one randomly indexed sequence is distributed according to the posterior of interest. By adapting the framework of PMCMC methods to the EHMM framework, we obtain novel particle filter (PF)-type algorithms for state inference, related to the class of "sequential MCMC" algorithms (e.g. Septier & Peters (2016) and parameter inference schemes. In addition, we show that most of these algorithms can be viewed as particular cases of a general PF and PMCMC framework. We demonstrate that a properly tuned conditional PF with "local" MCMC moves proposed in Shestopaloff & Neal (2016) can outperform the standard conditional PF significantly when applied to high-dimensional state-space models. We also derive theoretical guarantees for the novel (unconditional) PF-type algorithm and discuss why it could serve as an interesting alternative to standard PFs for likelihood estimation. This is joint work with Arnaud Doucet and Adam M. Johansen.

**Week 2 - 13th October**- Wilfrid Kendall (Warwick) - "Dirichlet forms and MCMC"**Abstract:**In this talk I will discuss the use of Dirichlet forms to deliver proofs of optimal scaling results for Markov chain Monte Carlo algorithms (specifically, Metropolis-Hastings random walk samplers) under regularity conditions which are substantially weaker than those required by the original approach (based on the use of infinitesimal generators). The Dirichlet form method has the added advantage of providing an explicit construction of the underlying infinite-dimensional context. In particular, this enables us directly to establish weak convergence to the relevant infinite-dimensional diffusion.

**Reference:**Zanella, G., Bédard, M., & Kendall, W. S. (2016). A Dirichlet Form approach to MCMC Optimal Scaling. To appear in Stochastic Processes and Their Applications. URL: arxiv.org/abs/1606.01528.

**Week 3 - 20th October**- Jure Vogrinc (Imperial) - "Asymptotic variance for Random walk Metropolis chains in high dimensions: logarithmic growth via the Poisson equation"**Abstract:**There are two ways of speeding up MCMC algorithms: (1) construct more complex samplers that use gradient and higher order information about the target and (2) design a control variate to reduce the asymptotic variance. The efficiency of (1) as a function of dimension has been studied extensively. The talk will focus on analogous results for (2), rigorous results linking the growth of the asymptotic variance with dimension. Specifically, for a d-dimensional Random walk Metropolis chain with an IID target I will present a control variate, for which the asymptotic variance of the corresponding estimator is bounded by a multiple of (log d)/d over the spectral gap of the chain. The control variate is constructed using the solution of the Poisson equation for the scaling limit in the seminal paper "Weak convergence and optimal scaling of random walk Metropolis algorithms" of Gelman, Gilks and Roberts. I will present the ideas behind the proof and discuss potential extensions and applications of the result.

**Week 4 - 27th October**- Andrew Duncan (Sussex) -"Measuring Sample Quality with Diffusions"**Abstract:**To improve the efficiency of Monte Carlo estimators, practitioners are turning to biased Markov chain Monte Carlo procedures that trade off asymptotic exactness for computational speed. While a reduction in variance due to more rapid sampling can outweigh the bias introduced, the inexactness creates new challenges for parameter selection. In particular, standard measures of sample quality, such as effective sample size, do not account for asymptotic bias. To address these challenges, we introduce a new computable quality measure based on Stein's method that quantifies the maximum discrepancy between sample and target expectations over a large class of test functions. We demonstrate this tool by comparing exact, biased, and deterministic sample sequences and illustrate applications to hyperparameter selection, convergence rate assessment, and quantifying bias-variance tradeoffs in posterior inference.

**Week 5 - 3rd November**- Mini Talks- Talk 1 - Cyril Chimisov (Warwick) - Internship experience at Google
- Talk 2 - Lewis Rendell (Warwick) - Global consensus Monte Carlo
**Abstract:**For problems involving large data sets, it is often convenient or necessary to distribute the data across multiple processors. Consider a target probability density function expressed as a product of terms, each being the contribution of one such subset of the data, with an additional 'prior' factor. We introduce an instrumental hierarchical model, associating an instrumental variable with each subset; these variables are conditionally independent when conditioned on a top-level parameter. This permits the construction of an MCMC algorithm on an extended state space, yielding an approximation of the target density as a marginal of its invariant distribution. Specifying the conditional distribution of the artificial variables allows the trade-off between computational tractability and fidelity to the original model to be controlled. In contrast to similar such algorithms, this approach requires few distributional assumptions; we illustrate its performance on some simulated examples, and suggest a number of directions for future investigation.

- Talk 3 - Murray Pollock (Warwick) - Contemporaneous MCMC

**Week 6 - 10th November**- Thibaut Lienart (Oxford) - "Expectation propagation for distributed approximated Bayesian inference"**Abstract:**In this talk, I will present the Expectation Propagation algorithm as compared to Variational Bayes and explain how it can be used in a generic distributed inference setting following [1]. In particular I will re-derive the EP updates in a fixed point framework similar to mirror descent algorithms exposing how the geometry of the problem can be used to obtain updates that are more robust to Monte Carlo noise.

The talk will be concluded by a critical discussion of variational methods in Bayesian inference.

**Week 7 - 17th November**- Short Talks- Talk 1 - Arne Gouwy (OxWaSP) - "Addressing continuous state space Stochastic Optimal Control problems with Iterative Auxiliary Particle Filters"
**Abstract:**There is a class of dynamic Stochastic Optimal Control (SOC) problems that can be reformulated as a minimization problem of a Kullback-Leibler divergence, by leveraging Feynman-Kac type transformations. This representation provides an expression for the probability distribution of a potentially optimally controlled system that can be evaluated pointwise up to an intractable normalizing constant. Since it has the form of a reweighted probability distribution of the controlled systems, the problem of estimating the optimal cost-to-go now appears in the form of an estimation problem. This enables the use of stochastic simulation techniques to approximate the optimal cost-to-go. Moreover, a better choice of control corresponds to a Radon-Nikodym derivative with a lower variance. Better proposal distributions to approximate the optimally controlled system with hence correspond to a better approximation for the optimal control. In this talk, we describe that class of SOC problems. Then we outline one way of the duality between those SOC problems and estimation problems. We finally sketch how iterative auxiliary particle filters can be used to tackle the resulting estimation problems, that typically exhibit a high variance. It addresses questions in SOC problems with a continuous state space of moderate dimension. We investigated this approach within a two month mini-project as part of the OxWaSP doctoral training program.

- Talk 2 - Marcin Mider (OxWaSP) - "Parameter Inference for discretely observed diffusions - reducing computational cost with "Blocking" technique"
**Abstract:**Diffusions find applications in many areas of science, including chemistry, genetics or finance. Statistical inference for these processes is complicated due to the intractability of the likelihood function. In this talk we focus on an unbiased method of inference (where the sole source of error is due to MCMC) introduced by Sermaidis et al. 2011. We propose to combine it with the Blocking technique (Shephard and Pitt 1997) and as a result extend the applicability of the Sermaidis et al. 2011 algorithm to sparsely observed diffusions. We provide partial results for proving the rates of reduction in the computational cost.

- Talk 1 - Arne Gouwy (OxWaSP) - "Addressing continuous state space Stochastic Optimal Control problems with Iterative Auxiliary Particle Filters"
**Week 8 - 24th November**- Ashley Ford (Bristol) - "Transform or Bounce - MCMC and HMC on constrained support"**Abstract:**Constraints on the support of a distribution can be dealt with either by transforming or "bouncing" of the constraint, in high dimensions care is needed in the choice. Inference for the infection times in the SIR epidemic model gives such an example where exact inference using HMC is possible although the likelihood is discontinuous and has constraints. Various approaches have been tried for this application, the reasons for poor performance of some approaches will be described. Some comparisons of a succesful approach with that of Xiang and Neal [1] will be given. Some open theoretical questions remain.- [1] Xiang, Fei and Neal, Peter, Efficient MCMC for temporal epidemics via parameter reduction, Computational Statistics & Data Analysis, 2014.

**Week 9 - 1st December**- Richard Wilkinson (Sheffield) - Gaussian process accelerated ABC- Abstract: Approximate Bayesian computation (ABC) is a class of algorithms used for doing Bayesian inference when you do not have access to the likelihood function. A major challenge for ABC methods is dealing with the computational cost that arises from needing to repeatedly run the simulator. In this talk, I will discuss surrogate modelling approaches to speeding up ABC, in which we seek to approximate the behaviour of the simulator, before inverting it to learn the parameters. The challenges for these types of approaches include: What should our target of approximation be? How should we build the surrogate? What design is best used for the simulator evaluation? And is a fully-probabilistic Bayes still the most sensible choice of inferential approach?

**Week 10 - 8th December**- Yvo Pokern (UCL) - "Gibbs Flow for Approximate Transport with Applications to Bayesian Computation"- Abstract: Given a prior and posterior distribution on , any measurable function T such that follows if X follows is called a transport map from to . If such a map were available analytically, one could simply map samples from the prior to samples from the posterior. Although it is usually impossible to obtain an explicit transport map for complex target distributions, it is shown here how to build a tractable approximation of a novel transport map. This is achieved by moving samples from using an ordinary differential equation whose drift is a function of the full conditional densities of the target distribution. Even when this ordinary differential equation is time-discretized and the full conditional distributions are approximated numerically, the resulting distribution of the mapped samples can be evaluated and used as a proposal/importance distribution within Markov chain Monte Carlo and sequential Monte Carlo schemes. It is shown experimentally that it can significantly improve performance of state-of-the-art sequential Monte Carlo methods for fixed computational complexity with examples up to dimension .

- Reference: https://arxiv.org/abs/1509.08787

- Abstract: Given a prior and posterior distribution on , any measurable function T such that follows if X follows is called a transport map from to . If such a map were available analytically, one could simply map samples from the prior to samples from the posterior. Although it is usually impossible to obtain an explicit transport map for complex target distributions, it is shown here how to build a tractable approximation of a novel transport map. This is achieved by moving samples from using an ordinary differential equation whose drift is a function of the full conditional densities of the target distribution. Even when this ordinary differential equation is time-discretized and the full conditional distributions are approximated numerically, the resulting distribution of the mapped samples can be evaluated and used as a proposal/importance distribution within Markov chain Monte Carlo and sequential Monte Carlo schemes. It is shown experimentally that it can significantly improve performance of state-of-the-art sequential Monte Carlo methods for fixed computational complexity with examples up to dimension .

**2017/18 Term 2:**

- Week 1 - 12th January
*(Lunch starts at 12:30 instead)*- Jere Koskela (Warwick) - "Scaling limits of sequential Monte Carlo algorithms"

- Abstract: Sequentual Monte Carlo (SMC) methods constitute a broad class of numerical approximation schemes for non-linear smoothing and filtering, rare event simulation, and many other applications. In brief, an ensemble of weighted particles is used to approximate a sequence of target densities. The ensemble is evolved from one target to the next by first resampling a new generation of particles proportional to the weights of the current ensemble, and then moving and reweighting the new generation in a manner which preserves the target. The resampling step induces a genealogy between particles, which has played a role in estimating mixing rates of SMC algorithms, variance of SMC estimates, as well as the memory cost of storing SMC output. The genealogy also plays a crucial role in the mixing of the particle Gibbs algorithm. I will briefly introduce SMC algorithms, and show that appropriately rescaled SMC genealogies converge to a well-understood process known as the Kingman coalescent under strong but standard assumptions in the infinite particle limit. This facilitates the a priori estimation of genealogical quantities, which I will demonstrate by showing that simulated genealogies match predicted results even for relatively few particles, and when the assumptions that are needed to prove convergence fail.

- Week 2 - 19th January - Kari Heine (Bath) - "Parallelising particle filters with butterfly interactions"
- Abstract: In modern computing systems an increase in the computational power is primarily obtained by increasing the number of parallel processing elements (PE) rather than by increasing the speed of an individual PE. While in many cases such parallel computing systems have enabled the completion of increasingly complex computational tasks, they can only do so if the task in question admits parallel computations. This talk focuses on an important class of algorithms lacking such inherent parallelism, namely the sequential Monte Carlo (SMC) methods, or particle filters. We consider some new parallel particle filter algorithms whose interaction diagram coincides with the butterfly diagram best known from the context of fast Fourier transform. We present some new convergence results and consider the potential of these algorithms based on numerical experiments. Joint work with Nick Whiteley (University of Bristol) and Ali Taylan Cemgil (Bogazici University).

- Week 3 - 26th January - Shahin Tavakoli (Warwick) - "A Spatial Modeling Approach for Linguistic Object Data: Analysing dialect sound variations across Great Britain"
- Abstract: Dialect variation is of considerable interest in linguistics and other social sciences. However, traditionally it has been studied using proxies (transcriptions) rather than acoustic recordings directly. We introduce novel statistical techniques to analyse geolocalised speech recordings and to explore the spatial variation of pronunciations continuously over the region of interest, as opposed to traditional isoglosses, which provide a discrete partition of the region. Data of this type require an explicit modeling of the variation in the mean and the covariance. Usual Euclidean metrics are not appropriate, and we therefore introduce the concept of d-covariance, which allows consistent estimation both in space and at individual locations. We then propose spatial smoothing for these objects which accounts for the possibly non convex geometry of the domain of interest. We apply the proposed method to data from the spoken part of the British National Corpus, deposited at the British Library, London, and we produce maps of the dialect variation over Great Britain. In addition, the methods allow for acoustic reconstruction across the domain of interest, allowing researchers to listen to the statistical analysis. This is joint work with Davide Pigoli and John Aston (Cambridge), and John Coleman (Oxford).

- Week 4 - 2nd February - Hongsheng Dai (Essex) - "Recent advances in Bayesian computation -- Monte Carlo fusion"
- Abstract: The talk will present an exact Monte Carlo fusion simulation method via path-space diffusion sampling. This new method is a simple rejection sampler. Such Monte Carlo fusion method can be used both in situations where analyses are distributed naturally as a consequence of a given application (for instance, distributed privacy, expert elicitation or multi-view learning), or artificially through design to aid inference (Bayesian statistics). This approach is also a substantial extension to existing Consensus Monte Carlo approaches. Such Monte Carlo fusion rejection sampler may be limited in certain applications due to low acceptance probability. A forward sequential Monte Carlo method will then be presented to solve more practical problems. This forward algorithm offers exciting possibilities to be applied to other application areas.

- Week 5 - 9th February - Massimo Cavallaro (Warwick) - "The "cloning" approach to large deviations"
- Abstract: Non-equilibrium physics is mainly concerned with the study of time-extensive observables, also called currents, which measure the violation of detailed balance. While the observed average currents are doomed to converge to their typical values as the time approaches infinity, there are many situations in which the behaviour that matters is the elusive atypical one. We consider the problem of probing large deviations of such currents and discuss a particle method, which in the Physics literature is referred to as the "cloning" method. More specifically, we focus on its implementation for continuous-time non-Markovian processes. The method is demonstrated on selected models and applications.

- Week 6 - 16th February - David Jennings (Oxford) - "The Quantum Bernoulli Factory"
- Abstract: The von Neumann coin trick solves the following: given access to a coin of unknown bias p, can we simulate a perfectly unbiased coin-flip? The Bernoulli Factory generalises this problem even further: given access to p-coins of unknown bias, when can we simulate an f(p)-coin? Here I will discuss how the Bernoulli Factory problem can be extended into a quantum-mechanical form, and will show that quantum theory allows for a provably larger class of functions than is possible with traditional means. Notably, I will show how the probability amplification function f(p) = 2p can be realised exactly with quantum coins, despite being impossible with classical coins. I will discuss the strengths and weaknesses of the framework, and discuss how these results could be used in the development of classically intractable simulations with existing quantum technologies. No prior knowledge of quantum mechanics will be assumed.

- Week 7 - 23rd February - Andi Wang (Oxford) - "Quasistationary Monte Carlo Methods"
- Abstract: The recent Scalable Langevin Exact (ScaLE) Algorithm is a promising Monte Carlo algorithm for performing exact Bayesian inference on large data sets. It is built on the theory of killed Markov processes conditioned on survival - the theory of quasistationarity. In my talk I will introduce quasistationarity and discuss conditions for the quasistationary distribution of a killed diffusion to coincide with a target density of interest. I will finally discuss an alternative stochastic simulation approach, dubbed ReScaLE, and describe steps towards proving its convergence.

- Week 8 - 2nd March - Patrick Rebeschini (Oxford) - Cancelled due to bad weather.
- Week 9 - 9th March - Brooks Paige (ATI) - "Amortizing the computational costs of Bayesian inference with neural networks as proposal distributions"
- Abstract: In this talk, I will describe how deep neural network models for approximating conditional probability distributions can be leveraged to automate the design of model-specific proposal distributions for use in importance sampling, sequential Monte Carlo, and Markov chain Monte Carlo. The central idea is to learn these proposals for a given probabilistic generative model offline, prior to performing inference on any particular dataset. The learned proposals can then be reused as desired, allowing sampling-based inference to be performed quickly and efficiently in the same probabilistic model, but for new data — that is, for new settings of the observed random variables — once we have incurred the up-front cost of learning the proposals. The procedure as a whole takes a specified model as its input and generates an artifact that can be leveraged for accelerating future inference tasks. I will additionally describe a new in-progress application to single-cell RNA sequencing data.

- Week 10 - 16th March - * Extended / Double session 11:15am-2pm *
- 11:15 - 12:15: Louis Aslett (Durham) - "Contemporaneous MCMC"
- Abstract: Some population-based methods are amenable to highly parallel computation, but these methods typically don't help with the choice of transition kernel for the underlying MCMC scheme. We present a framework which allows development of adaptive methods for population schemes in a way that avoids the usual containment condition and diminishing adaptation requirements, and which allows fully asynchronous implementation across multiple GPUs. These methods therefore achieve iterative performance which scales nearly linearly but incorporates adaptation. Joint work with Murray Pollock and Gareth Roberts.

- 12:45 - 2:00: Denis Villemonais (École des Mines de Nancy) - "Non-failable approximation method for conditioned distributions"
- Abstract: We present a new approximation method for conditional distributions, which is non-failable and, under mild conditions, which converges uniformly in time. Our results are illustrated by their application to a neutron transport model. This is a joint work with William Oçafrain.

- 11:15 - 12:15: Louis Aslett (Durham) - "Contemporaneous MCMC"

**2017/18 Term 3:**

- Week 1 - 27th April - Matteo Fasiolo (Bristol) - "Fast calibrated additive quantile regression"
- Abstract - Generalized Additive Models (GAMs) are an extension of Generalized Linear Models (GLMs), which

allow the inclusion of random effects, complex non-linear effects (built using spline bases) and,

more recently, response distributions outside the exponential family. In this talk I will describe a computationally efficient Bayesian framework for fitting quantile GAMs, which are based on the pinball loss, rather than on a probabilistic response distribution. The new hierarchical framework selects both the smoothing parameter and the so-called "learning-rate" automatically and efficiently, and provides posterior credible intervals that are approximately calibrated in a frequentist sense. I will illustrate the new methods in the context of electricity demand forecasting, where they provide a predictive performance that is competitive with that of Gradient Boosting (GB), but at a fraction of GB's computational cost.

- Abstract - Generalized Additive Models (GAMs) are an extension of Generalized Linear Models (GLMs), which
- Week 2 - 4th May - Jack Baker (Lancaster) - "Stochastic Sampling from the Probability Simplex"
- Abstract: Traditional Markov chain Monte Carlo algorithms tend to perform prohibitively slowly on large datasets. Many modern models rely on sampling from simplex spaces, for example when modelling discrete probability distributions. Patterson and Teh (2013) developed a sampler for simplex spaces which can be applied in a large data setting. They apply this method to latent Dirichlet allocation, a popular document clustering method, and acheive state of the art performance. Though this algorithm acheived promising results, we find it can perform poorly for components on the simplex that are close to zero. We revisit this problem and develop an alternative method to sample from simplex spaces that has no discretisation error and acheives a large improvement in performance, especially when there are many components close to zero. The simple form of the method also means we can theoretically quantify the performance of the sampler more easily than the sampler developed by Patterson and Teh (2013).

- Week 3 - 11th May - Valerio Perrone
*in C0.08*- "Scalable Hyperparameter Transfer Learning"- Abstract: Bayesian optimization (BO) is a model-based approach for gradient-free black-box function optimization, such as hyperparameter optimization. Typically, BO is powered by a conventional Gaussian process, whose algorithmic complexity is cubic in the number of evaluations. Hence, Gaussian process based BO cannot leverage large amounts of past or related function evaluations, for example, to warm-start related BO runs. We propose a multi-task adaptive Bayesian linear regression model, whose complexity is linear in the number of function evaluations: one Bayesian linear regression model is associated to each black-box function optimization problem (or task), while transfer learning is achieved by coupling the models through a shared deep neural net. Experimental results show that the neural net learns a representation suitable for transfer learning between the black-box optimization problems and that BO runs can be accelerated when the target black-box function (e.g., validation loss) is learned together with other related signals (e.g., training loss).
- arXiv: https://arxiv.org/pdf/1712.02902.pdf

- Week 4 - 18th May -
*Cancelled (co-incides with departmental meeting).* - Week 5 - 25th May - Nigel Burroughs (Warwick) - "Fitting mechanical models to spatial time series biological data."
- Abstract: Biological imaging data, time-series 3D data in particular, poses significant challenges to interpretation. Mechanically inspired biophysical models can be a valuable tool in that enterprise, but the fitting of such models (often stochastic differential equations) to imaging derived data can be very challenging from a computational statistics perspective. Here I will present the Bayesian analysis of chromosome oscillations during cell division through model dependent trajectory annotation and force inference that has lead to a new (data driven) model to explain how chromosomes use force sensing to switch direction during their oscillation. The fitting of this force sensing model demonstrates that chromosome trajectory data contains information not only on mechanical processes, but also on the inherent signalling pathways that control switching behaviour.

- Week 6 - 1st June - Jon Cockayne (Warwick) - Talk Title / Abstract TBC

- Week 7 - 8th June - François Caron (Oxford) - Talk Title / Abstract TBC

- Week 8 - 15th June - Helen Ogden (Southhampton) - Talk Title / Abstract TBC

- Week 9 - 22nd June -
*Cancelled (co-incides with i-like Workshop in Newcastle, 20th-22nd June, 2018)*

- Week 10 - 29th June -
*Cancelled (co-incides with 2018 ISBA World Meeting, Edinburgh, 24th-29th June 2018)* *Week 11 - 1st-6th July - MCQMC, Rennes, France**Week 12 - 9th-13th July - LMS Invited Lecture Series / CRISM Summer School on Computational Statistics*

**Previous Years:**

**Some key phrases**:

- Sampling and inference for diffusions

- Exact algorithms

- Intractable likelihood

- Pseudo-marginal algorithms

- Particle filters

- Importance sampling

- MCMC

- Adaptive MCMC

- Perfect simulation

- Markov chains...

- Random structures...

- Randomised algorithms...