Algorithms & Computationally Intensive Inference seminars
The seminars will take place on Fridays 1 pm UK time in room MB0.08.
20232024 Organisers: Shenggang Hu and Andi Wang.
If you would like to speak, or you want to be included in any emails, please contact one of the organisers.
Website URL: www.warwick.ac.uk/compstat
Mailing List SignUp: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar
Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB  only approved members can post)
Term 1:
Date  Speaker  Title 
06/10  Artavazd MaranjyanLink opens in a new window (KAUST)  GradSkip: CommunicationAccelerated Local Gradient Methods with Better Computational Complexity 
Note: Hybrid 
Abstract:
We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradienttype training steps prior to communication. While methods of this type have been studied for about a decade, the empirically observed acceleration properties of local training eluded all attempts at theoretical understanding. In a recent breakthrough, Mishchenko et al. (ICML 2022) proved that local training, when properly executed, leads to provable communication acceleration, and this holds in the strongly convex regime without relying on any data similarity assumptions. However, their method ProxSkip requires all clients to take the same number of local training steps in each communication round. Inspired by a common sense intuition, we start our investigation by conjecturing that clients with ``less important'' data should be able to get away with fewer local training steps without this impacting the overall communication complexity of the method. It turns out that this intuition is correct: we managed to redesign the original ProxSkip method to achieve this. In particular, we prove that our modified method, for which coin the name GradSkip, converges linearly under the same assumptions, has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number. We further generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and considering a generic proximable regularizer. This generalization, which we call GradSkip+, recovers several related methods in the literature as special cases. Finally, we present an empirical study on carefully designed toy problems that confirm our theoretical claims. 

13/10  Robin EvansLink opens in a new window (Oxford)  Parameterizing and Simulating from Causal Models 
Note: Hybrid 
Many statistical problems in causal inference involve a probability distribution other than the one from which data are actually observed; as an additional complication, the object of interest is often a marginal quantity of this other probability distribution. This creates many practical complications for statistical inference, even where the problem is nonparametrically identified. In particular, it is difficult to perform likelihoodbased inference, or even to simulate from the model in a general way. We introduce the frugal parameterization, which places the causal effect of interest at its centre, and then builds the rest of the model around it. We do this in a way that provides a recipe for constructing a regular, nonredundant parameterization using causal quantities of interest. In the case of discrete variables we can use odds ratios to complete the parameterization, while in the continuous case copulas are the natural choice. Our methods allow us to construct and simulate from models with parametrically specified causal distributions, and fit them using likelihoodbased methods, including fully Bayesian approaches. Our proposal includes parameterizations for the average causal effect and effect of treatment on the treated, as well as other common quantities of interest. I will also discuss some other applications of the frugal parameterization, including to survival analysis, parameterizing nested Markov models, and ‘Many Data’: combining randomized and observational datasets in a single parametric model. This is joint work with Vanessa DidelezLink opens in a new window.
Reference Evans, R.J. and Didelez, V. Parameterizing and Simulating from Causal Models (with discussion), J. Roy. Statist. Ser. B, 2023. 

20/10  Julian Hofstadler (Universität Passau)  Error Bounds for Adaptive MCMC within Doubly Intractable Distributions 
Note: Hybrid 
We study the problem of sampling from doubly intractable distributions, where, additionally to an unknown normalisation constant, we


27/10  Michael FaulknerLink opens in a new window (Bristol)  Phase transitions, metastability and critical slowing down in statistical physics 
Note: Inperson 
Sampling algorithms are commonplace in statistics and machine learning – in particular, in Bayesian computation – and have been used for decades to enable inference, prediction and model comparison in many different settings. They are also widely used in statistical physics, where many popular sampling algorithms first originated [1, 2]. At a high level, the goals within each discipline are the same – to sample from and approximate statistical expectations with respect to some probability distribution – but the motivations, nomenclature and methods of explanation differ significantly. This has led to challenges in communicating between the fields, and indeed the fundamental goals of one field are often misunderstood in the other. In this talk, we elucidate statistical physics for the statistician, emphasising that probability models are studied as functions of thermodynamic hyperparameters such as the temperature. This is particularly useful for characterising phase transitions, ie, boundaries in thermodynamic hyperparameter space between distinct thermodynamic phases.
We then move on to sampling algorithms, with a particular focus on the behaviour of the Metropolis algorithm [1] when simulating the 2D Ising and 2DXY models of magnetism. Metropolis dynamics are metastable in the lowtemperature phase of each model, mixing between states of equal probability density on a timescale the diverges with system size (proportional to the dimensionality of parameter space). Moreover, the Metropolis algorithm also suffers from the closely related phenomenon of critical slowing down at phase transitions. These strongly correlated dynamics are characterised by asymptotically long integrated autocorrelation times, due to a flattening of the target density that essentially results from the system trying to exist simultaneously in both thermodynamic phases. Indeed, these key aspects of statistical physics have led to innovations in sampling algorithms that inform the Bayesian world. In particular, we present the Swendsen—Wang [3], Wolff [4] and eventchain Monte Carlo [57] algorithms. The first two simulate the 2D Ising model and were developed in response to the metastability and critical slowing down of the Metropolis algorithm. They circumvent both phenomena to mix with low autocorrelation and independent of system size. We then show that eventchain Monte Carlo similarly circumvents the lowtemperature Metropolis metastability of the 2DXY model [7] and discuss its potential utility in bypassing an hypothesised critical slowing down at the phase transition. This talk is based on a recent review paper on the subject [8].
[1] Metropolis et al., J. Chem. Phys. 21 1087 (1953)
[2] Alder & Wainwright, J. Chem. Phys. 27 1208 (1957)
[3] Swendsen & Wang, Phys. Rev. Lett. 58 86 (1987)
[4] Wolff, Phys. Rev. Lett. 62 361 (1989)
[5] Bernard, Krauth & Wilson, Phys. Rev. E 80 056704 (2009)
[6] Michel, Mayer & Krauth, EPL (Europhys. Lett.) 112 20003 (2015)
[7] Faulkner, arXiv:2209.03699 (2022)
[8] Faulkner & Livingstone, arXiv:2209.03699 (2022)


03/11  MarieTherese WolframLink opens in a new window (Warwick)  Ensemble Inference Methods for Models with Noisy and Expensive Likelihoods (joint work with O.R. Dunbar, A. Duncan, A.M. Stuart) 
Note: Hybrid 
The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical, and social sciences. However, model complexity often leads to parametertodata maps which are expensive to evaluate and are only available through noisy approximations.
In this talk we focus on interacting particle systems for the solution of the resulting inverse problems for parameters. Of particular interest is the case where the available forward model evaluations are subject to rapid fluctuations, in parameter space, superimposed on the smoothly varying largescale parametric structure of interest. Multiscale analysis is then used to analyze the behavior of interacting particle system algorithms when rapid fluctuations, which we refer to as noise, pollute the largescale parametric dependence of the parametertodata map. Ensemble Kalman methods and Langevinbased methods (the latter use the derivative of the parametertodata map) are compared in this light. The ensemble Kalman methods are shown to behave favorably in the presence of noise in the parametertodata map, whereas Langevin methods are adversely affected. On the other hand, Langevin methods have the correct equilibrium distribution in the setting of noisefree forward models, while ensemble Kalman methods only provide an uncontrolled approximation, except in the linear case. Therefore a new class of algorithms, ensemble Gaussian process samplers, which combine the benefits of both ensemble Kalman and Langevin methods, are introduced and shown to perform favorably.


10/11  Joe BentonLink opens in a new window (Oxford)  From Denoising Diffusions to Denoising Markov Models 
Note: Inperson 
Denoising diffusions are stateoftheart generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic datapoints. In this talk, we will introduce denoising Markov models (DMMs), a unifying framework generalising diffusion models to a wide class of spaces. We will show how to derive suitable training objectives for DMMs and discover that DMMs lead to a novel and principled generalization of score matching to arbitrary state spaces.  
17/11  Woojung Kim (Warwick)  Latent variable approaches for multimorbidity analysis 
Note: Inperson at MB0.02 12.4513.30 
Multimorbidity refers to the acquisition of multiple longterm chronic health conditions in a single person. This is becoming an increasing public health issue with aging populations and gaining insights into patterns of multimorbidity is essential for managing increased health system burdens. In this talk, we focus on latent variable approaches for the analysis of multimorbidity. We introduce a novel discrete latent variable approach for multimorbidity acquisition within a Bayesian framework of latent feature allocation, which allows an individual's morbidity profile to be driven by multiple latent factors and allows the modelling of agedependent multimorbidity trajectories. We demonstrate the utility of our model in applications to both simulated data and disease event data from patient electronic health records. In each setting, we show our model can reconstruct clinically meaningful latent multimorbidity patterns and their agedependent prevalence trajectories. 

24/11 
Filippo Ascolani (Università Bocconi) 
Dimensionfree mixing times of coordinatewise samplers for Bayesian hierarchical models 
Note: Inperson 
Coordinatewise MCMC schemes (e.g. Gibbs and MetropoliswithinGibbs) are popular algorithms to sample from posterior distributions arising from Bayesian hierarchical models. We introduce a novel technique to analyse the asymptotic behaviour of their mixing times, based on tools from Bayesian asymptotics. We apply our methodology to highdimensional hierarchical models, obtaining dimensionfree convergence results for Gibbs under random datagenerating assumptions, for a broad class of twolevel models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed. We then extend the results to MetropoliswithinGibbs schemes combining the Bayesian asymptotics approach with a novel notion of conditional conductance.  
01/12  Tengyao Wang (LSE) 
Sparse change detection in highdimensional linear regression

Note: Inperson 
We introduce a new methodology 'charcoal' for estimating the location of sparse changes in highdimensional linear regression coefficients, without assuming that those coefficients are individually sparse. The procedure works by constructing different sketches (projections) of the design matrix at each time point, where consecutive projection matrices differ in sign in exactly one column. The sequence of sketched design matrices is then compared against a single sketched response vector to form a sequence of test statistics whose behaviour shows a surprising link to the wellknown CUSUM statistics of univariate changepoint analysis. The procedure is computationally attractive, and strong theoretical guarantees are derived for its estimation accuracy. Simulations confirm that our methods perform well in extensive settings, and a realworld application to a large singlecell RNA sequencing dataset showcases the practical relevance.  
08/12 
1. Ensemble Kalman inversion approximate Bayesian computation
2. Network inference in a stochastic multipopulation neural mass model via approximate Bayesian computation


Note: Inperson 
1.
Approximate Bayesian computation (ABC) is the most popular approach to inferring parameters in the case where the data model is specified in the form of a simulator. It is not possible to directly implement standard Monte Carlo methods for inference in such a model, due to the likelihood not being available to evaluate pointwise. The main idea of ABC is to perform inference on an alternative model with an approximate likelihood, sometimes known as the ABC likelihood. The ABC likelihood is chosen such that an unbiased estimator of it is easy to construct from simulations from the data model, allowing the use of pseudomarginal Monte Carlo algorithms for inference under the approximate model. The central challenge of ABC is then to tradeoff bias (introduced by approximating the model) with the variance introduced by estimating the ABC likelihood. Stabilising the variance of the ABC likelihood requires a computational cost that is exponential in the dimension of the data, thus the most common approach to reducing variance is to perform inference conditional on summary statistics.
In this talk we introduce a new approach to estimating the ABC likelihood: using ensemble Kalman inversion (EnKI). Ensemble Kalman algorithms are Monte Carlo approximations of Bayesian inference for linear/Gaussian models. These methods are often applied outside of the linear/Gaussian setting being used, for example, as an alternative to particle filtering for inference in nonlinear state space models. Loosely speaking, EnKI can be used as an alternative to an SMC sampler on a sequence of annealed likelihoods. We see that EnKI has some appealing properties when used to estimate the ABC likelihood. It circumvents the exponential scaling with dimension of standard ABC, and does not require the reparameterisation imposed by the rareevent SMC approach of Prangle et al. (2018). It is able to achieve this with no additional simulations from the data model, thus it is likely to bring the most benefit in cases where this model is very expensive to simulate.


2. The aim of this article is to infer the connectivity structures of brain regions before and during epileptic seizure. Our contributions are fourfold. First, we propose a 6Ndimensional stochastic differential equation for modelling the activity of N coupled populations of neurons in the brain. This model further develops the (single population) stochastic Jansen and Rit neural mass model, which describes human electroencephalography (EEG) rhythms, in particular signals with epileptic activity. Second, we construct a reliable and efficient numerical scheme for the model simulation, extending a splitting procedure proposed for one neural population. Third, we propose an adapted Sequential Monte Carlo Approximate Bayesian Computation algorithm for simulationbased inference of both the relevant realvalued model parameters as well as the {0,1}valued network parameters, the latter describing the coupling directions among the N modelled neural populations. Fourth, after illustrating and validating the proposed statistical approach on different types of simulated data, we apply it to a set of multichannel EEG data recorded before and during an epileptic seizure. The real data experiments suggest, for example, a larger activation in each neural population and a stronger connectivity on the left brain hemisphere during seizure.
Reference: S. Ditlevsen, M. Tamborrino, I. Tubikanec, Preprint at arXiv:2306.15787Link opens in a new window
