Algorithms & Computationally Intensive Inference seminars
The seminars will take place on Fridays 1 pm UK time in room MB0.08 in a hybrid format.
20222023 Organisers: Austin BrownLink opens in a new window, Alice CorbellaLink opens in a new window and Krzysztof ŁatuszyńskiLink opens in a new window
If you would like to speak, or you want to be included in any emails, please contact one of the organisers.
Website URL: www.warwick.ac.uk/compstat
Mailing List SignUp: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar
Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB  only approved members can post)
Term 3:
Date  Speaker  Title  F2F  Slides  Video 
28/04  Flávio GonçalvesLink opens in a new window 
Exact Bayesian inference for levelset Cox processes with piecewise constant intensity function 
Y  
This work proposes a new methodology to perform Bayesian inference for a class of multidimensional Cox processes in which the intensity function is piecewise constant. Poisson processes with piecewise constant intensity functions are believed to be suitable to model a variety of point process phenomena and, given its simpler structure, are expected to provide more precise inference when compared to processes with nonparametric and continuously varying intensity functions. The partition of the space domain is flexibly determined by a levelset function of a latent Gaussian process. Despite the intractability of the likelihood function and the infinite dimensionality of the parameter space, inference is performed exactly, in the sense that no space discretization approximation is used and MCMC error is the only source of inaccuracy. That is achieved by using retrospective sampling techniques and devising a pseudomarginal infinitedimensional MCMC algorithm that converges to the exact target posterior distribution. Computational efficiency is favored by considering a nearest neighbor Gaussian process, allowing for the analysis of large datasets. An extension to consider spatiotemporal models is also proposed. The efficiency of the proposed methodology is investigated in simulated examples and its applicability is illustrated in the analysis of some real point process datasets. 

05/05  Renato Paes LemeLink opens in a new window 
Multiparameter Bernoulli Factories 
Y  
We consider the problem of computing with many coins of unknown bias. We are given samples access to n coins with unknown biases p1,…,pn and are asked to sample from a coin with bias f(p1,…,pn) for a given function f : [0, 1]n → [0, 1] . We give a complete characterization of the functions f for which this is possible. As a consequence, we show how to extend various combinatorial sampling procedures (most notably, the classic Sampford Sampling for ksubsets) to the boundary of the hypercube. 

12/05  Anna KorbaLink opens in a new window 
Sampling with Mollified Interaction Energy Descent 
Y  
Sampling from a target measure whose density is only known up to a normalization constant is a fundamental problem in computational statistics and machine learning. In this talk, I will present a new optimizationbased method for sampling called mollified interaction energy descent (MIED). MIED minimizes a new class of energies on probability measures called mollified interaction energies (MIEs). These energies rely on mollifier functions—smooth approximations of the Dirac delta originated from PDE theory. We show that as the mollifier approaches the Dirac delta, the MIE converges to the chisquare divergence with respect to the target measure and the gradient flow of the MIE agrees with that of the chisquare divergence. Optimizing this energy with proper discretization yields a practical firstorder particlebased algorithm for sampling in both unconstrained and constrained domains. We show experimentally that for unconstrained sampling problems our algorithm performs on par with existing particlebased algorithms like Stein Variational Gradient Descent (SVGD), while for constrained sampling problems our method readily incorporates constrained optimization techniques to handle more flexible constraints with strong performance compared to alternatives.  
19/05  Ye HeLink opens in a new window 
Highdimensional scaling limits and fluctuations of online leastsquares SGD 
Y  
We derive highdimensional scaling limits and fluctuations for the online leastsquares Stochastic Gradient Descent (SGD) algorithm by taking the properties of the data generating model explicitly into consideration. Our approach treats the SGD iterates as an interacting particle system, where the expected interaction is characterized by the covariance structure of the input. Assuming smoothness conditions on moments of order up to eight orders, and without explicitly assuming Gaussianity, we establish the highdimensional scaling limits and fluctuations in the form of infinitedimensional Ordinary Differential Equations (ODEs) or Stochastic Differential Equations (SDEs). Our results reveal a precise threestep phase transition of the iterates; it goes from being ballistic, to diffusive, and finally to purely random behavior, as the noise variance goes from low, to moderate and finally to veryhigh noise setting. In the lownoise setting, we further characterize the precise fluctuations of the (scaled) iterates as infinitedimensional SDEs. We also show the existence and uniqueness of solutions to the derived limiting ODEs and SDEs. Our results have several applications, including characterization of the limiting meansquare estimation or prediction errors and their fluctuations which can be obtained by analytically or numerically solving the limiting equations.  
26/05  Daniel PaulinLink opens in a new window 
Contraction rate estimates of kinetic Langevin dynamics integrators 
Y  
We provide a framework to prove convergence rates for discretizations of kinetic Langevin dynamics for strongly logconcave densities. Our approach provides convergence rates of O(m/M), with explicit stepsize restrictions which are of the same order as the stability threshold for Gaussian targets and are valid for a large interval of the friction parameter. We apply this methodology to various integration methods which are popular in the molecular dynamics and machine learning communities. Finally we study underdamped Langevin schemes that converge to overdamped dynamics in the high friction limit and which have step size restrictions that are independent of the friction parameter; we show that this property is not generic. This is joint work with Peter Whalley and Ben Leimkuhler. 

02/06  Marina RiabizLink opens in a new window  Y  
09/05  George DiligiannidisLink opens in a new window  Y  
16/05  Michael FaulknerLink opens in a new window  Y  
23/05  Jeffrey Rosenthal Link opens in a new window  Y  
30/05  No SeminarLink opens in a new window  Y  
Term 2:
Date  Speaker  Title  F2F  Slides  Video 
13/01  Austin BrownLink opens in a new window  Lower bounds on the rate of convergence for acceptrejectbased Markov chains  Y  
To avoid poor empirical performance in MetropolisHastings and other acceptrejectbased algorithms practitioners often tune these algorithms by trial and error. Lower bounds on the geometric convergence rate are developed in both total variation and Wasserstein distances in order to identify how the simulations will fail so these settings can be avoided, providing guidance on tuning. The theory is applied in several settings. For example, if the target density concentrates with a parameter n (e.g. posterior concentration), it is demonstrated that the convergence rate of a MetropolisHastings chain can tend to 1 exponentially fast if the tuning parameters do not depend carefully on n. This is demonstrated in Bayesian logistic regression with Zellner's gprior and flat prior Bayesian logistic regression.  
20/01  Jeremie HoussineauLink opens in a new window  Trading model selection for robustness with possibility theory  Y  
Robustness is a desirable property of inference in applications where the amount of data is too large to be cleaned by hand and/or where the inference is online. In this talk, I will explain how the evidence, or marginal likelihood, can be used to obtain robust Bayesian inference in the context of possibility theory. This robustness replaces the ability to do model selection in general, and we will discuss the underlying reasons for this difference in meaning of the evidence. We will consider the limitations of the proposed approach and how to compensate for these by using tricks that are specific to possibility theory. For better or worse, I will mostly use the board for this talk.  
27/01  Sebastiano Grazzi Link opens in a new window 
PDMP samplers with boundary conditions 
Y  
In this talk, I will formally introduce piecewise deterministic Markov processes (PDMPs) endowed with "sticky floors", "soft/hard walls" and "teleportation portals" which can be used for Monte Carlo simulation and allow to target efficiently a rich class of measures arising in Bayesian inference. 

03/02  Raiha Browning Link opens in a new window 
Discretetime Hawkes process and their applications 
Y  
A Hawkes process is a temporal, selfexciting point process, such that events in the process may trigger future events. Hawkes processes have been studied widely due to their applicability to many realworld phenomena, such as seismic activity and financial markets. The key feature of these processes is the function that controls how previous events in the process impact the current intensity of events, referred to as the triggering kernel. The most common approach for modelling the triggering kernel is to assume it takes the form of simple parametric functions, such as exponential or power law decay. However, this is often too restrictive to capture more complex patterns of excitation. Furthermore, Hawkes processes were introduced as a continuoustime point process. This does not align with common data collection strategies since data is often not available at a continuous time resolution. Instead, event data is often aggregated and provided in the form of event counts over regular, discrete time intervals whereby the exact timing of these events is unknown. This talk develops and applies new statistical methods for modelling multivariate, discretetime Hawkes processes using flexible Bayesian representations of the triggering kernel. Spatiotemporal variants of discretetime Hawkes processes are also considered. These models are applied to several substantive case studies; to describe the spread of the novel coronavirus COVID19, and to monitor the risk of conflict events (riots, protests etc.), in South Asia. 

10/02  Max HirdLink opens in a new window 
Preconditioning for MCMC 
Y  
Linear transformation of the state variable (linear preconditioning) is a common technique that often drastically improves the practical performance of a Markov chain Monte Carlo algorithm. Despite this, however, quantifying the benefits of linear preconditioning is not wellstudied theoretically, and rigorous guidelines for choosing preconditioners are not always readily available. Mixing time bounds for various samplers (HMC, MALA, Unadjusted HMC, Unadjusted Langevin) have been produced in recent works for the class of strongly logconcave and Lipschitz target distributions and depend strongly on a quantity known as the condition number. We study linear preconditioning for this class of distributions, and under appropriate assumptions we provide bounds on the condition number after using a given linear preconditioner. Nonlinear transformations are less common but potentially far more powerful, and are often known by other names, such as Riemannian Manifold methods and the Mirror Langevin. We propose nonlinear preconditioning as a unifying framework in which to conceive of, and compare these techniques. 

17/02  Tamas PappLink opens in a new window 
Dimensional scaling of coupled MCMC algorithms

Y  
Couplings can be used to remove the bias of MCMC estimators. However, it can be challenging to design practically implementable couplings which scale well to high dimensions. I will address this question for the Random Walk Metropolis by connecting coupled MCMC sampling with the optimal scaling literature. In particular, I will introduce a coupling which is optimal in an asymptotic sense and overcomes the shortcomings of previous proposals. Our main technical tools will be highdimensional ODE limits for Gaussian targets. This talk is based on joint work with Chris Sherlock (arxiv:2211.12585Link opens in a new window). 

24/02  Andi WangLink opens in a new window  Explicit convergence bounds for Metropolis Markov chains  Y  
Metropolis Markov chains, such as Random Walk Metropolis, are a class of fundamental MCMC algorithms used to perform Bayesian inference. In practice, running and tuning MCMC algorithms can be a delicate task, and theoretical properties regarding the convergence rates of the chain in question can be useful for providing guidance for tuning algorithms, deciding burnin periods, and giving bounds on the asymptotic variance. In this talk I will present the first explicit bounds for the spectral gap of Random Walk Metropolis, for any value of the proposal variance. When this variance is appropriately scaled, the bounds recover the correct $d^{1}$ scaling with dimension for suitably regular invariant distributions. I will outline the techniques used to obtain these bounds, which rely on notions such as isoperimetry and conductance. Joint work with Christophe Andrieu, Anthony Lee and Sam Power. 

3/03  Luke KellyLink opens in a new window  Lagged couplings for Markov chain Monte Carlo phylogenetic inference  Y  
Phylogenetic inference attempts to reconstruct the ancestry of a set of observed taxa and is an intractable statistical problem on a complex, highdimensional space. The likelihood function is an integral over unobserved evolutionary events on a tree and is frequently multimodal. MCMC methods are the primary tool for Bayesian phylogenetic inference, but constructing sampling schemes to efficiently explore the associated posterior distributions and diagnosing their performance are difficult tasks.
Couplings have recently been used to construct unbiased MCMC estimators and estimate useful bounds on the convergence of chains to their stationary distribution. We describe a procedure to couple a pair of Markov chains targeting a posterior distribution over a space of phylogenetic tree topologies, branch lengths, scalar parameters and latent variables such that the chains meet exactly after a random, finite number of iterations and remain coupled. We use the meeting times to diagnose convergence in total variation distance jointly across all components of the model on trees with up to 200 leaves.


10/03  Qian QinLink opens in a new window  Spectral telescope: Convergence rate bounds for randomscan Gibbs samplers based on a hierarchical structure  Y  
Randomscan Gibbs samplers possess a natural hierarchical structure. The structure connects Gibbs samplers targeting higher dimensional distributions to those targeting lower dimensional ones. This leads to a quasitelescoping property of their spectral gaps. Based on this property, we derive three new bounds on the spectral gaps and convergence rates of Gibbs samplers on general domains. The three bounds relate a chain’s spectral gap to, respectively, the correlation structure of the target distribution, a class of random walk chains, and a collection of influence matrices. Notably, one of our results generalizes the technique of spectral independence, which has received considerable attention for its success on finite domains, to general state spaces. We illustrate our methods through a sampler targeting the uniform distribution on a corner of an ncube.

Term 1:
Date  Speaker  Title  F2F  Slides  Video 
07/10  Francesca CrucinioLink opens in a new window  Optimal scaling of proximal MCMC  Y  
Proximal MCMC is a recently proposed class of MCMC methods which uses proximity maps instead of gradients to build proposal mechanisms which can be employed for both differentiable and nondifferentiable targets. These methods have been shown to be stable for a wide class of targets, making them a valuable alternative to Metropolisadjusted Langevin algorithms (MALA); and have found wide application in imaging contexts. The wider stability properties are obtained by building the MoreauYoshida envelope for the target of interest, which depends on a parameter . In this work we investigate the optimal scaling problem for proximal MCMC, show that MALA is a special case of this class of algorithms and provide practical guidelines for the selection of the scale parameter and the parameter . 

14/10 
Richard EverittLink opens in a new window Paul JenkinsLink opens in a new window 
Short talks  
21/10  Juan KuntzNussioLink opens in a new window  Scalable particlebased alternatives to EM  Y  
(Neal and Hinton, 1998) recast the problem tackled by EM as the minimization of a free energy functional F on an infinitedimensional space and EM itself as coordinate descent applied to F. Here, we explore alternative ways to optimize the functional. In particular, we identify various gradient flows associated with F and show that their limits coincide with F's stationary points. By discretizing the flows, we obtain three practical particlebased algorithms for maximum likelihood estimation in broad classes of latent variable models. The novel algorithms scale well to highdimensional settings and outperform existing stateoftheart methods in experiments.  
28/10 
Jeremie HoussineauLink opens in a new window Giovanni MontanaLink opens in a new window 
Short talks  
04/11  Aki NishimuraLink opens in a new window  Priorpreconditioned conjugate gradient method for accelerated Gibbs sampling in "large n & large p" Bayesian sparse regression  Y  
In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 10^5 ~ 10^6 and of 10^4 ~ 10^5. Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian methods based on shrinkage priors. In the "large n & large p" setting, however, posterior computation encounters a major bottleneck at repeated sampling from a highdimensional Gaussian distribution, whose precision matrix $\Phi$ is expensive to compute and factorize. In this talk, I will present a novel algorithm to speed up this bottleneck based on the following observation: we can cheaply generate a random vector $b$ such that the solution to the linear system $\Phi \beta = b$ has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through matrixvector multiplications by $\Phi$; this involves no explicit factorization or calculation of $\Phi$ itself. Rapid convergence of CG in this context is guaranteed by the theory of priorpreconditioning we develop. We apply our algorithm to a clinically relevant largescale observational study with n = 72,489 patients and p = 22,175 clinical covariates, designed to assess the relative risk of adverse events from two alternative blood anticoagulants. Our algorithm demonstrates an order of magnitude speedup in posterior inference, in our case cutting the computation time from two weeks to less than a day. 

11/11 
Jon ForsterLink opens in a new window Adam JohansenLink opens in a new window Martyn PlummerLink opens in a new window 
Short talks  
18/11  Lucas Catillo Martì  Using Random Generation to reverseengineer the brain’s sampling algorithm  Y  
Over the past two decades, a wave of Bayesian explanations has swept through cognitive science, explaining behaviour in domains from intuitive physics and causal learning, to perception, motor control and language. Yet people produce stunningly incorrect answers in response to even the simplest questions about probabilities. How can a supposedly Bayesian brain paradoxically reason so poorly with probabilities? We propose that the brain approximates Bayesian inference through local sampling, approximating the solutions to inferential problems in similar ways to how MCMC algorithms operate. In a recent experiment, we explicitly asked people to generate a sequence of random samples, and found that human sequences are different to iid sequences in very predictable ways. We compared human performance to that of multiple MCMC algorithms, and found that people are most similar to an MCMC algorithm with autocorrelated proposals. 

25/11  Lyudmila GrigoryevaLink opens in a new window  Reservoir kernels and Volterra series.  Y  
A universal kernel whose sections approximate any causal and timeinvariant filter in the fading memory category with inputs and outputs in a finitedimensional Euclidean space is constructed. This kernel is built using the reservoir functional associated with a statespace representation of the Volterra series expansion available for any analytic fading memory filter. We call it the Volterra reservoir kernel. Even though the statespace representation and the corresponding reservoir feature map are defined on an infinite dimensional tensor algebra space, the kernel map is characterized by explicit recursions that are readily computable for specific datasets when employed in estimation problems using the representer theorem. We showcase the application of the reservoir kernel in two empirical exercises, namely, MackeyGlass time series forecasting and bitcoin price prediction. We compare the performance of the reservoir kernel to the sigkernel in Salvi et al. and other kernel benchmarks therein. 

02/12  Frank van der MeulenLink opens in a new window  Backward Filtering Forward Guiding  Y  
Consider a stochastic process evolving on a directed acyclic graph. Observations are at the leaf vertices and interest lies in smoothing and parameter estimation. This concerns joint work with Moritz Schauer (Chalmers University of Technology  University of Gothenburg). The BFFGalgorithm for diffusions is discussed in the paperrMider, Schauer & Van der Meulen (2021), while generalisations to graphical models and other stochastic processes are treated in Van derMeulen & Schauer (2022). An accessible introduction to that paper is given in Van der Meulen (2022). M. Mider, M. Schauer and F.H. van der Meulen (2021), Continuousdiscrete smoothing of diffusions, Electronic Journal of Statistics 15, 42954342. F.H. van der Meulen and M. Schauer (2021), Automatic Backward Filtering Forward Guiding for Markov processes and graphical models, arXiv:2010.03509 (submitted). F.H. van der Meulen (2022), Introduction to Automatic Backward Filtering Forward Guiding, arXiv:2203.04155. 

09/12  Ritabrata DuttaLink opens in a new window and Pablo Ramses Alonso MartinLink opens in a new window  Efficient Bayesian model averaging for Neural Networks via Gibbs Boomerang  Y  
We explore the strength of Markov chain Monte Carlo (MCMC) methods based on piecewise deterministic Markov processes (PDMP) for the uncertainty quantification of neural network based models. Our sampling machinery builds upon a preconditioned PDMP sampler called Boomerang sampler, which achieves superior mixing due to nonreversibility and can sample exactly while only using the subsamples of the training data, to deal with the highdimensionality of parameter space and big data bottleneck. To solve the identifiability problem we propose a framework based on Bayesian model averaging using globallocal shrinkage priors, more specifically horseshoe priors. We illustrate that in this setting most variational and standard MCMC techniques fail due to the exploding behavior of the gradients of the posterior distribution. Borrowing ideas from the literature of Gibbs PDMPs, we propose Gibbs Boomerang sampler, which does not suffer from the exploding nature of the gradient and further removes the need of estimation of the reference measure to precondition. The only approximation in our method comes while sampling event times from the associated nonhomogeneous Poisson process for intractable likelihoods. 
Previous Years:
2021/2022Link opens in a new window