Skip to main content Skip to navigation

Algorithms & Computationally Intensive Inference seminars

The seminars will take place on Fridays 1 pm UK time in room MB0.08.

2023-2024 Organisers: Shenggang Hu and Andi Wang.

If you would like to speak, or you want to be included in any emails, please contact one of the organisers.

Website URL: www.warwick.ac.uk/compstat

Mailing List Sign-Up: http://mailman1.csv.warwick.ac.uk/mailman/listinfo/algorithmseminar

Mailing List: algorithmseminar@listserv.csv.warwick.ac.uk (NB - only approved members can post)

Term 1:
Date Speaker Title
06/10 Artavazd MaranjyanLink opens in a new window (KAUST) GradSkip: Communication-Accelerated Local Gradient Methods with Better Computational Complexity

Note:

Hybrid

SlidesLink opens in a new window

Abstract:

We study a class of distributed optimization algorithms that aim to alleviate high communication costs by allowing the clients to perform multiple local gradient-type training steps prior to communication. While methods of this type have been studied for about a decade, the empirically observed acceleration properties of local training eluded all attempts at theoretical understanding. In a recent breakthrough, Mishchenko et al. (ICML 2022) proved that local training, when properly executed, leads to provable communication acceleration, and this holds in the strongly convex regime without relying on any data similarity assumptions. However, their method ProxSkip requires all clients to take the same number of local training steps in each communication round. Inspired by a common sense intuition, we start our investigation by conjecturing that clients with ``less important'' data should be able to get away with fewer local training steps without this impacting the overall communication complexity of the method. It turns out that this intuition is correct: we managed to redesign the original ProxSkip method to achieve this. In particular, we prove that our modified method, for which coin the name GradSkip, converges linearly under the same assumptions, has the same accelerated communication complexity, while the number of local gradient steps can be reduced relative to a local condition number. We further generalize our method by extending the randomness of probabilistic alternations to arbitrary unbiased compression operators and considering a generic proximable regularizer. This generalization, which we call GradSkip+, recovers several related methods in the literature as special cases. Finally, we present an empirical study on carefully designed toy problems that confirm our theoretical claims.

13/10 Robin EvansLink opens in a new window (Oxford) Parameterizing and Simulating from Causal Models

Note:

Hybrid

SlidesLink opens in a new window

Many statistical problems in causal inference involve a probability distribution other than the one from which data are actually observed; as an additional complication, the object of interest is often a marginal quantity of this other probability distribution. This creates many practical complications for statistical inference, even where the problem is non-parametrically identified. In particular, it is difficult to perform likelihood-based inference, or even to simulate from the model in a general way.

We introduce the frugal parameterization, which places the causal effect of interest at its centre, and then builds the rest of the model around it. We do this in a way that provides a recipe for constructing a regular, non-redundant parameterization using causal quantities of interest. In the case of discrete variables we can use odds ratios to complete the parameterization, while in the continuous case copulas are the natural choice. Our methods allow us to construct and simulate from models with parametrically specified causal distributions, and fit them using likelihood-based methods, including fully Bayesian approaches. Our proposal includes parameterizations for the average causal effect and effect of treatment on the treated, as well as other common quantities of interest.

I will also discuss some other applications of the frugal parameterization, including to survival analysis, parameterizing nested Markov models, and ‘Many Data’: combining randomized and observational datasets in a single parametric model. This is joint work with Vanessa DidelezLink opens in a new window.

 

Reference

Evans, R.J. and Didelez, V. Parameterizing and Simulating from Causal Models (with discussion), J. Roy. Statist. Ser. B, 2023.

20/10 Julian Hofstadler (Universität Passau) Error Bounds for Adaptive MCMC within Doubly Intractable Distributions

Note: 

Hybrid

SlidesLink opens in a new window

We study the problem of sampling from doubly intractable distributions, where, additionally to an unknown normalisation constant, we
have to deal with an intractable factor in the non-normalised density. Distributions (or densities) of this form occur, e.g., in biophysics or statistical mechanics. Not being able to evaluate the non-normalised density is problematic in classical MCMC algorithms and to overcome this issue different approaches have been suggested. We consider an adaptive MCMC approach and show error bounds for the corresponding Monte Carlo estimator for numerical integration.


The talk is based on ongoing work with B. Eltzner, B. de Groot, M. Habeck and D. Rudolf.

27/10 Michael FaulknerLink opens in a new window (Bristol) Phase transitions, metastability and critical slowing down in statistical physics

Note:

In-person

SlidesLink opens in a new window

Sampling algorithms are commonplace in statistics and machine learning – in particular, in Bayesian computation – and have been used for decades to enable inference, prediction and model comparison in many different settings. They are also widely used in statistical physics, where many popular sampling algorithms first originated [1, 2]. At a high level, the goals within each discipline are the same – to sample from and approximate statistical expectations with respect to some probability distribution – but the motivations, nomenclature and methods of explanation differ significantly. This has led to challenges in communicating between the fields, and indeed the fundamental goals of one field are often misunderstood in the other. In this talk, we elucidate statistical physics for the statistician, emphasising that probability models are studied as functions of thermodynamic hyperparameters such as the temperature. This is particularly useful for characterising phase transitions, ie, boundaries in thermodynamic hyperparameter space between distinct thermodynamic phases.
We then move on to sampling algorithms, with a particular focus on the behaviour of the Metropolis algorithm [1] when simulating the 2D Ising and 2DXY models of magnetism. Metropolis dynamics are metastable in the low-temperature phase of each model, mixing between states of equal probability density on a timescale the diverges with system size (proportional to the dimensionality of parameter space). Moreover, the Metropolis algorithm also suffers from the closely related phenomenon of critical slowing down at phase transitions. These strongly correlated dynamics are characterised by asymptotically long integrated autocorrelation times, due to a flattening of the target density that essentially results from the system trying to exist simultaneously in both thermodynamic phases. Indeed, these key aspects of statistical physics have led to innovations in sampling algorithms that inform the Bayesian world. In particular, we present the Swendsen—Wang [3], Wolff [4] and event-chain Monte Carlo [5-7] algorithms. The first two simulate the 2D Ising model and were developed in response to the metastability and critical slowing down of the Metropolis algorithm. They circumvent both phenomena to mix with low autocorrelation and independent of system size. We then show that event-chain Monte Carlo similarly circumvents the low-temperature Metropolis metastability of the 2DXY model [7] and discuss its potential utility in bypassing an hypothesised critical slowing down at the phase transition. This talk is based on a recent review paper on the subject [8].
[1] Metropolis et al., J. Chem. Phys. 21 1087 (1953)
[2] Alder & Wainwright, J. Chem. Phys. 27 1208 (1957)
[3] Swendsen & Wang, Phys. Rev. Lett. 58 86 (1987)
[4] Wolff, Phys. Rev. Lett. 62 361 (1989)
[5] Bernard, Krauth & Wilson, Phys. Rev. E 80 056704 (2009)
[6] Michel, Mayer & Krauth, EPL (Europhys. Lett.) 112 20003 (2015)
[7] Faulkner, arXiv:2209.03699 (2022)
[8] Faulkner & Livingstone, arXiv:2209.03699 (2022)
03/11 Marie-Therese WolframLink opens in a new window (Warwick) Ensemble Inference Methods for Models with Noisy and Expensive Likelihoods (joint work with O.R. Dunbar, A. Duncan, A.M. Stuart)

Note: 

Hybrid

SlidesLink opens in a new window

The increasing availability of data presents an opportunity to calibrate unknown parameters which appear in complex models of phenomena in the biomedical, physical, and social sciences. However, model complexity often leads to parameter-to-data maps which are expensive to evaluate and are only available through noisy approximations.
 
In this talk we focus on interacting particle systems for the solution of the resulting inverse problems for parameters. Of particular interest is the case where the available forward model evaluations are subject to rapid fluctuations, in parameter space, superimposed on the smoothly varying large-scale parametric structure of interest. Multiscale analysis is then used to analyze the behavior of interacting particle system algorithms when rapid fluctuations, which we refer to as noise, pollute the large-scale parametric dependence of the parameter-to-data map. Ensemble Kalman methods and Langevin-based methods (the latter use the derivative of the parameter-to-data map) are compared in this light. The ensemble Kalman methods are shown to behave favorably in the presence of noise in the parameter-to-data map, whereas Langevin methods are adversely affected. On the other hand, Langevin methods have the correct equilibrium distribution in the setting of noise-free forward models, while ensemble Kalman methods only provide an uncontrolled approximation, except in the linear case. Therefore a new class of algorithms, ensemble Gaussian process samplers, which combine the benefits of both ensemble Kalman and Langevin methods, are introduced and shown to perform favorably.
10/11 Joe BentonLink opens in a new window (Oxford) From Denoising Diffusions to Denoising Markov Models

Note:

In-person

SlidesLink opens in a new window

Denoising diffusions are state-of-the-art generative models exhibiting remarkable empirical performance. They work by diffusing the data distribution into a Gaussian distribution and then learning to reverse this noising process to obtain synthetic datapoints. In this talk, we will introduce denoising Markov models (DMMs), a unifying framework generalising diffusion models to a wide class of spaces. We will show how to derive suitable training objectives for DMMs and discover that DMMs lead to a novel and principled generalization of score matching to arbitrary state spaces.
17/11 Woojung Kim (Warwick) Latent variable approaches for multimorbidity analysis

Note:

In-person

at MB0.02

12.45-13.30

SlidesLink opens in a new window

Multimorbidity refers to the acquisition of multiple long-term chronic health conditions in a single person. This is becoming an increasing public health issue with aging populations and gaining insights into patterns of multimorbidity is essential for managing increased health system burdens.

In this talk, we focus on latent variable approaches for the analysis of multimorbidity. We introduce a novel discrete latent variable approach for multimorbidity acquisition within a Bayesian framework of latent feature allocation, which allows an individual's morbidity profile to be driven by multiple latent factors and allows the modelling of age-dependent multimorbidity trajectories. We demonstrate the utility of our model in applications to both simulated data and disease event data from patient electronic health records. In each setting, we show our model can reconstruct clinically meaningful latent multimorbidity patterns and their age-dependent prevalence trajectories.

24/11

Filippo Ascolani (Università Bocconi)

Dimension-free mixing times of coordinate-wise samplers for Bayesian hierarchical models

Note:

In-person

SlidesLink opens in a new window

Coordinate-wise MCMC schemes (e.g. Gibbs and Metropolis-within-Gibbs) are popular algorithms to sample from posterior distributions arising from Bayesian hierarchical models. We introduce a novel technique to analyse the asymptotic behaviour of their mixing times, based on tools from Bayesian asymptotics. We apply our methodology to high-dimensional hierarchical models, obtaining dimension-free convergence results for Gibbs under random data-generating assumptions, for a broad class of two-level models with generic likelihood function. Specific examples with Gaussian, binomial and categorical likelihoods are discussed. We then extend the results to Metropolis-within-Gibbs schemes combining the Bayesian asymptotics approach with a novel notion of conditional conductance.
01/12 Tengyao Wang (LSE)
Sparse change detection in high-dimensional linear regression

Note:

In-person

We introduce a new methodology 'charcoal' for estimating the location of sparse changes in high-dimensional linear regression coefficients, without assuming that those coefficients are individually sparse. The procedure works by constructing different sketches (projections) of the design matrix at each time point, where consecutive projection matrices differ in sign in exactly one column. The sequence of sketched design matrices is then compared against a single sketched response vector to form a sequence of test statistics whose behaviour shows a surprising link to the well-known CUSUM statistics of univariate changepoint analysis. The procedure is computationally attractive, and strong theoretical guarantees are derived for its estimation accuracy. Simulations confirm that our methods perform well in extensive settings, and a real-world application to a large single-cell RNA sequencing dataset showcases the practical relevance.
08/12

Richard EverittLink opens in a new window (Warwick)

Massimiliano TamborrinoLink opens in a new window (Warwick)

1. Ensemble Kalman inversion approximate Bayesian computation
2. Network inference in a stochastic multi-population neural mass model via approximate Bayesian computation

Note:

In-person

1.
Approximate Bayesian computation (ABC) is the most popular approach to inferring parameters in the case where the data model is specified in the form of a simulator. It is not possible to directly implement standard Monte Carlo methods for inference in such a model, due to the likelihood not being available to evaluate pointwise. The main idea of ABC is to perform inference on an alternative model with an approximate likelihood, sometimes known as the ABC likelihood. The ABC likelihood is chosen such that an unbiased estimator of it is easy to construct from simulations from the data model, allowing the use of pseudo-marginal Monte Carlo algorithms for inference under the approximate model. The central challenge of ABC is then to trade-off bias (introduced by approximating the model) with the variance introduced by estimating the ABC likelihood. Stabilising the variance of the ABC likelihood requires a computational cost that is exponential in the dimension of the data, thus the most common approach to reducing variance is to perform inference conditional on summary statistics.
In this talk we introduce a new approach to estimating the ABC likelihood: using ensemble Kalman inversion (EnKI). Ensemble Kalman algorithms are Monte Carlo approximations of Bayesian inference for linear/Gaussian models. These methods are often applied outside of the linear/Gaussian setting being used, for example, as an alternative to particle filtering for inference in non-linear state space models. Loosely speaking, EnKI can be used as an alternative to an SMC sampler on a sequence of annealed likelihoods. We see that EnKI has some appealing properties when used to estimate the ABC likelihood. It circumvents the exponential scaling with dimension of standard ABC, and does not require the reparameterisation imposed by the rare-event SMC approach of Prangle et al. (2018). It is able to achieve this with no additional simulations from the data model, thus it is likely to bring the most benefit in cases where this model is very expensive to simulate.

2.

The aim of this article is to infer the connectivity structures of brain regions before and during epileptic seizure. Our contributions are fourfold. First, we propose a 6N-dimensional stochastic differential equation for modelling the activity of N coupled populations of neurons in the brain. This model further develops the (single population) stochastic Jansen and Rit neural mass model, which describes human electroencephalography (EEG) rhythms, in particular signals with epileptic activity. Second, we construct a reliable and efficient numerical scheme for the model simulation, extending a splitting procedure proposed for one neural population. Third, we propose an adapted Sequential Monte Carlo Approximate Bayesian Computation algorithm for simulation-based inference of both the relevant real-valued model parameters as well as the {0,1}-valued network parameters, the latter describing the coupling directions among the N modelled neural populations. Fourth, after illustrating and validating the proposed statistical approach on different types of simulated data, we apply it to a set of multi-channel EEG data recorded before and during an epileptic seizure. The real data experiments suggest, for example, a larger activation in each neural population and a stronger connectivity on the left brain hemisphere during seizure.
Reference: S. Ditlevsen, M. Tamborrino, I. Tubikanec, Preprint at arXiv:2306.15787Link opens in a new window