# Algorithms & Computationally Intensive Inference seminars

**Terms 1-3, Location MB2.23, Fridays 12:15-14:00 (12:15-12:45 is an informal sandwich lunch). **

*Reminder emails are not sent to participants unless there is a change to the scheduled programme at short notice. If you would like to speak, or you want to be included in any emails, please contact one of the organisers.*

**Current Organisers**: Dootika Vats,

*If you would like to talk, or have ideas for possible speakers, then please email one of the organisers above.*

**Website URL:** www.warwick.ac.uk/compstat

**Mailing List Sign-Up:** http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar

**Mailing List:** algorithmseminar@listserv.csv.warwick.ac.uk **(NB - only approved members can post)**

**2018/19 Term 1:**

- Week 1 - 5th October - Cancelled
- Week 2 - 12th October - Summary of research short talks:
- Week 3 - 19th October - Summary of research short talks:

- Week 4 - 26th October - Giulio Morina (Warwick) - "From the Bernoulli Factory to a Dice Enterprise via Perfect Sampling of Markov Chains"
- Abstract: Given a p-coin that lands heads with unknown probability p, we wish to construct an algorithm that produces an f(p)-coin for a given function f:(0, 1)→(0, 1). This problem is commonly known as the Bernoulli

Factory and generic ways to design a practical algorithm for a given function f exist only in a few special cases. We present a constructive way to build an efficient Bernoulli Factory when f(p) is a rational function. Moreover, we extend the original problem to a more general setting where we have access to an m-sided

die and we wish to roll a v-sided one, where the probability of rolling each face is a fixed function of the original probabilties. We achieve this by perfectly simulating from the stationary distribution of a

certain class of Markov chains.

- Abstract: Given a p-coin that lands heads with unknown probability p, we wish to construct an algorithm that produces an f(p)-coin for a given function f:(0, 1)→(0, 1). This problem is commonly known as the Bernoulli
- Week 5 - 2nd November - Cancelled. CoSInES Launch Day.

- Week 6 - 9th November - Iker Perez (Nottingham) - "Novel approaches to efficiently augment a Markov Jump process for exact Bayesian Inference"
- Abstract: In this talk we will discuss foundational statistical challenges associated with families of Markovian jump models, which often find applications in domains such as genetics, epidemiology, mathematical biology or queueing theory. We will first review Markov jump processes, and by means of common accessible examples, illustrate the computational impediments posed by real-world application scenarios to inverse uncertainty quantification tasks. Then, we will give an overview of the recent advances linked to structured jump systems. Our work is concerned with building on uniformization procedures and we propose a novel efficient auxiliary-variable algorithm for data augmentation, which yields computationally tractable distributions suited for exact (in Monte Carlo sense) Bayesian inference in often large, infinite or multivariate population systems. We demonstrate the capabilities of the presented methods by drawing Bayesian inference for partially observed stochastic epidemics and show that it overcomes the limitations of existing vanilla approaches. This is joint work (in progress) with Theo Kypraios.

- Week 7 - 16th November - Ritabrata Dutta (Warwick) - "Well-Tempered Hamiltonian Monte Carlo on Active-Space"
- Abstract: When the gradient of the log-target distribution is available, Hamiltonian Monte

Carlo (HMC) has been proved to be an efficient simulation algorithm. However,

HMC performs poorly when the target is high-dimensional and has multiple isolated

modes. To alleviate these problems we propose to perform HMC on a locally and

continuously tempered target distribution. This tempering is based on an efficient

approach to simulate molecular dynamics in high-dimensional space, known as well-

tempered meta-dynamics. The tempering we suggest is performed locally and only

along the directions of the maximum changes in the target which we identify as

the active space of the target. The active space is the span of the eigenfunctions

corresponding to the dominant eigenvalues of the expected Hessian matrix of the

log-target. To capture the state dependent non-linearity of the target, we iteratively

estimate the active space from the most recent batch of samples obtained from the

simulation. Finally, we suggest a re-weighting scheme based on path-sampling to

provide importance weights for the samples drawn from the continuously-tempered

distribution. We illustrate the performance of this scheme for target distributions

with complex geometry and multiple modes on high-dimensional spaces in comparison

with traditional HMC with No-U-Turn-Sampler.

- Abstract: When the gradient of the log-target distribution is available, Hamiltonian Monte
- Week 8 - 23rd November - Stephen Connor (York) - "Omnithermal Perfect Simulation for Multi-server Queues"
- Abstract: The last few years have seen much progress in the area of perfect simulation algorithms for multi-server queueing systems, allowing us to sample from the exact equilibrium distribution of the Kiefer-Wolfowitz workload vector. This talk will describe an “omnithermal" variant of one such algorithm for M/G/c queues, which permits simultaneous sampling from the equilibrium distributions for a range of c (the number of servers) at relatively little additional cost.

- Week 9 - 30th November - Emilia Pompe (Oxford) - "Adaptive MCMC for Multimodal Distributions"
- Abstract: We propose a new Monte Carlo method for sampling from multimodal distributions (Jumping Adaptive Multimodal Sampler). The idea of this technique is based on splitting the task into two: finding the modes of the target distribution and sampling, given the knowledge of the locations of the modes. The sampling algorithm is based on steps of two types: local ones, preserving the mode, and jumps to a region associated with a different mode. Besides, the method learns the optimal parameters while it runs, without requiring user intervention. The main properties of our algorithm will be discussed and its performance will be illustrated with several examples of multimodal target distributions. Some ergodic results that we proved for this method will also be presented. This is joint work with Chris Holmes and Krys Latuszynski.

- Week 10 - 7th December - Flávio Gonçalves (UFMG) - "Exact Bayesian inference for Level-Set Cox Processes"
- Abstract: We consider a Level-Set spatial Cox process which assumes the intensity function to be piece-wise constant. The Level-Set approach considers the levels of a latent Gaussian Process to define the IF contours in a continuous and flexible way. This is an infinite dimensional model for which existing inference solutions rely on discrete approximations. This introduces a significant bias to the estimation procedure and, often, model decharacterisation. Attempts to mitigate the approximation problems inevitably lead to impractical computational costs. We propose a novel pseudo-marginal MCMC algorithm that has the exact posterior distribution as the target. The likelihood function estimator for the pseudo-marginal algorithm is devised through a Poisson estimator in which a non-centred parametrisation plays an important roll.

**2018/19 Term 2:**

- Week 1 - 11th January- Neil Chada (NUS) - "Title: Posterior convergence analysis of $\alpha$-stable processes: applications in Bayesian inversion"
- Abstract: This talk is concerned with the theoretical understanding of $\alpha$-stable sheets ${X}$. Our motivation for this is in the context of Bayesian inverse problems, where we consider the treatment of these processes as prior forms for parameter estimation. We derive various convergence results of these processes. In doing so we use a number of variants which these sheets can take, such as a stochastic integral representation, but also random series expansions through Poisson processes. Our convergence analysis will rely on the fact of whether ${X}$ omits $L^p$ sample paths, and if so how regular the paths are.

- Week 2 - 18th January - James Flegal (UC Riverside)
**(in room MB2.22)**- "Weighted batch means estimators in Markov chain Monte Carlo"- Abstract: We propose a family of weighted batch means variance estimators, which are computationally efficient and can be conveniently applied in practice. The focus is on Markov chain Monte Carlo simulations and estimation of the asymptotic covariance matrix in the Markov chain central limit theorem, where conditions ensuring strong consistency are provided. Finite sample performance is evaluated through auto-regressive, Bayesian spatial-temporal, and Bayesian logistic regression examples, where the new estimators show significant computational gains with a minor sacrifice in variance compared with existing methods.

- Week 3 - 25th January - Joint Session (Short Talks)
- Jure Vogrinc (Warwick) - “Skipping MCMC: A new approach to sampling tail events”
- Abstract: We will focus on the model problem of sampling from a given target distribution conditioned on the samples being outside of a given “common” set. This is of potential interest in a wide range of applications where a rare or atypical event needs to be understood. Standard MCMC methods may struggle in this setting as the resulting conditional target can easily be multi-modal and the MCMC method may struggle to cross the common set. I will present a modification, called the Skipping Random walk Metropolis, of a “parent” RWM, that instead of automatically rejecting the proposals into the common set tries to skip across it. It can be shown that under very mild conditions the Skipping RWM is actually just a RWM with a different proposal density and sufficient conditions for ergodicity and CLT are inherited from the parent RWM. Per step, Skipping RWM mixes at least as fast as the parent RWM and is also asymptotically at least as efficient. I will show some toy examples and discuss method’s applicability and the connections to other MCMC methods.

- Georgios Vasdekis (Warwick) - "A generalisation of the ZigZag process"
- Abstract: Piecewise Deterministic Markov Processes have recently drawn the attention of MCMC community. The first reason for that is that one can simulate exactly the entire path of such a process. The second is that these processes are non-reversible, which sometimes leads to quicker mixing. In 2016, in Bierkens and Roberts, one of these Processes, the ZigZag process was introduced. This process moves linearly in space in specific directions for a random period of time and then it changes direction. However, the original directions were only allowed to be of the form {-1,+1}^d. In this talk I will explain how one can extend this process to more directions. I will give some examples where such a generalisation could be useful, but I will, also, give some heuristic reasons why one should not introduce too many directions. I will, then, present results involving the ergodicity of this process, which extends to geometric ergodicity when the target distribution has light tails. Time permitting, I will sketch why one cannot have geometric ergodicity in the case of heavy tail target.

- Jure Vogrinc (Warwick) - “Skipping MCMC: A new approach to sampling tail events”
- Week 4 - 1st February - Mateusz Majka (Warwick) - "Sampling from invariant measures of SDEs: Non-asymptotic analysis via coupling"
- Abstract: We consider the problem of approximate sampling from invariant measures of Langevin stochastic differential equations. We first discuss a general method of analysing convergence of Euler discretisations of such equations in the Wasserstein distances, based on the coupling technique. In particular, we develop a way to control the distance between an Euler scheme and its perturbed version. We show how to apply our perturbation result to study the Unadjusted Langevin Algorithm (ULA) for approximate sampling from non-log-concave measures. We also consider its counterpart based on the Stochastic Gradient Langevin Dynamics (SGLD). Finally, we discuss how our techniques apply to the Multi-level Monte Carlo (MLMC) method for Euler schemes. The talk is based on joint work with Aleksandar Mijatovic (Warwick) and Lukasz Szpruch (Edinburgh).

- Week 5 - 8th February - Ioannis Kosmidis (Warwick) - "Towards the finite-sample variance bounds for unbiased estimators"
- Abstract: The inverse of the Fisher information matrix in a likelihood problem is i) the variance-covariance matrix of the asymptotic distribution of the maximum likelihood (ML) estimator ii) the dominant term in the expansion of the finite-sample variance of the ML estimator, and iii) the "lowest" achievable variance-covariance that an unbiased estimator can achieve, where "lowest" here indicates that its difference from the variance of any unbiased estimator is a positive definite matrix. These three characterizations and the asymptotic unbiasedness of the ML estimator are key justifications for the wide-spread use of the latter in statistical practice. For example, standard regression software typically reports the ML estimates alongside with estimated standard errors coming from the inversion of the Fisher information matrix at the estimates. Nevertheless, the use of that pair of estimates and estimated standard errors for inference implicitly assumes, amongst other things, that the information about the parameters in the sample is large enough for the estimator to be almost unbiased and its variance to be well-approximated by the inverse of the Fisher information matrix. In this talk, we present results from work-in-progress that aims to bridge that finite-sample gap between estimates and the estimated variance-covariance matrix. Specifically, we introduce a novel family of estimators that not only have the same limiting optimality properties as the ML estimator (consistency and asymptotic normality, unbiasedness and efficiency), but also have finite sample variance that is asymptotically closer to the inverse of the Fisher information matrix than the variance of the ML estimator is. We illustrate the properties of the proposed estimators in some well-used inferential settings.

- Week 6 - 15th February - Matthew Ludkin (Lancaster) - Title:"Hug 'N' Hop: Explicit, non-reversible, contour-hugging MCMC"
- Abstract: Both the Bouncy Particle Sampler (BPS) and the Discrete Bouncy Particle Sampler (DBPS) are non-reversible Markov chain Monte Carlo algorithms whose action can be visualised in terms of a particle moving with a fixed-magnitude velocity. Both algorithms include an occasional step where the particle `bounces' off a hyperplane which is tangent to the gradient of the target density, making the BPS rejection-free and allowing the DBPS to propose relatively large jumps whilst maintaining a high acceptance rate. Analogously to the concatenation of leapfrog steps in HMC, we describe an algorithm which omits the straight-line movement

of the BPS and DBPS and, instead, at each iteration concatenates several discrete `bounces' to provide a proposal which is on almost the same target contour as the starting point, producing a large proposed move

with a high acceptance probability. Combined with a separate kernel designed for moving between contours, an explicit bouncing scheme which takes account of the local Hessian at each bounce point ensures that the

proposal respects the local geometry of the target, and leads to an efficient, skew-reversible MCMC algorithm.

- Abstract: Both the Bouncy Particle Sampler (BPS) and the Discrete Bouncy Particle Sampler (DBPS) are non-reversible Markov chain Monte Carlo algorithms whose action can be visualised in terms of a particle moving with a fixed-magnitude velocity. Both algorithms include an occasional step where the particle `bounces' off a hyperplane which is tangent to the gradient of the target density, making the BPS rejection-free and allowing the DBPS to propose relatively large jumps whilst maintaining a high acceptance rate. Analogously to the concatenation of leapfrog steps in HMC, we describe an algorithm which omits the straight-line movement
- Week 7 - 22nd February - Patrick Rebeschini (Oxford) - "On the Interplay between Statistics, Computation and Communication in Decentralised Learning."
- Abstract: Motivated by bandwidth limitations, privacy concerns and network instability, a large body of literature has been investigating the performance of decentralised optimisation methods to fit statistical models on datasets stored across multiple machines where there is no central server to coordinate computation. In this literature, data is typically treated as deterministic and one is interested in controlling the optimisation error. In this talk, we take a statistical point of view and assume that data come from the same unknown probability distribution. We consider the decentralised version of two of the most well-studied algorithmic paradigms for serial statistical optimisation in offline and online learning: gradient descent and upper confidence bound (UCB). For decentralised gradient descent in non-parametric regression, we investigate the choice of step-size, stopping time, and communication topology that allows to recover optimal statistical rates with respect to the entire data on the network. We show that the choice of the communication graph can be considered as a regulariser, and that more statistics allows for less computation and less communication. (based on joint work with D. Richards)

For decentralised UCB in multi-armed bandit problems, we show that network-dependent delayed actions are key to obtain improved regret bounds as a function of the graph topology. (based on joint work D. Martínez-Rubio and V. Kanade)

- Abstract: Motivated by bandwidth limitations, privacy concerns and network instability, a large body of literature has been investigating the performance of decentralised optimisation methods to fit statistical models on datasets stored across multiple machines where there is no central server to coordinate computation. In this literature, data is typically treated as deterministic and one is interested in controlling the optimisation error. In this talk, we take a statistical point of view and assume that data come from the same unknown probability distribution. We consider the decentralised version of two of the most well-studied algorithmic paradigms for serial statistical optimisation in offline and online learning: gradient descent and upper confidence bound (UCB). For decentralised gradient descent in non-parametric regression, we investigate the choice of step-size, stopping time, and communication topology that allows to recover optimal statistical rates with respect to the entire data on the network. We show that the choice of the communication graph can be considered as a regulariser, and that more statistics allows for less computation and less communication. (based on joint work with D. Richards)
- Week 8 - 1st March - Susana Gomes (Warwick) - Title / Abstract TBA

- Week 9 - 8th March - Joint Session (Short Talks)
- Reece Mears (Warwick) - Title / Abstract TBA
- Ryan Chan (Warwick) - Title / Abstract TBA

- Week 10 - 15th March - Jake Carson (Warwick) - Title / Abstract TBA

**2018/19 Term 3:**

- Week 1 - 26th April - Toni Karvonen (Aalto University) - Title / Abstract TBA

- Week 2 - 3rd May - Leah South (Lancaster) - Title / Abstract TBA

- Week 3 - 10th May - Dootika Vats (Warwick) - Title / Abstract TBA

- Week 4 - 17th May - Jonathan Harrison (Warwick) - Title / Abstract TBA
- Week 5 - 24th May -
**Available** - Week 6 - 31st May -
**Available** - Week 7 - 7th June -
**Available** - Week 8 - 14th June -
**Available** - Week 9 - 21st June -
**Available** - Week 10 - 28th June -
**Available**

**Previous Years:**

**Some key phrases**:

- Sampling and inference for diffusions

- Exact algorithms

- Intractable likelihood

- Pseudo-marginal algorithms

- Particle filters

- Importance sampling

- MCMC

- Adaptive MCMC

- Perfect simulation

- Markov chains...

- Random structures...

- Randomised algorithms...